CRSP Missing Codes
+A note on the missing codes in CRSP.
+ + + + +diff --git a/posts/archive/2023/index.html b/posts/archive/2023/index.html index 19923527..ed608416 100644 --- a/posts/archive/2023/index.html +++ b/posts/archive/2023/index.html @@ -655,6 +655,17 @@
A note on the missing codes in CRSP.
+ + + + +A note on the missing codes in CRSP.
+ + +Variable Name | +C/Fortran Value | +SAS Missing Code | +Description | +
---|---|---|---|
RET |
+-44.0 | +.E | +No valid comparison for an excess return | +
+ | -55.0 | +.D | +No listing information | +
+ | -66.0 | +.C | +No valid previous price | +
+ | -77.0 | +.B | +Off-exchange | +
+ | -88.0 | +.A | +Out of Range | +
+ | -99.0 | +. | +No valid price | +
DLRET |
+-55.0 | +.S | +CRSP has no source to establish a value after delisting | +
+ | -66.0 | +.T | +More than 10 trading periods between a security's last price and its first available price on a new exchange | +
+ | -88.0 | +.A | +Security is still active | +
+ | -99.0 | +.P | +Security trades on a new exchange after delisting, but CRSP currently h as no sources to gather price information | +
DLRETX |
+-55.0 | +.S | +CRSP has no source to establish a value after delisting | +
+ | -66.0 | +.T | +More than 10 trading periods between a security's last price and its first available price on a new exchange | +
+ | -88.0 | +.A | +Security is still active | +
+ | -99.0 | +.P | +Security trades on a new exchange after delisting, but CRSP currently has no sources to gather price information | +
DCLRDT |
+0 | +. | +Declaration date cannot be found | +
TRTSCD |
+0 | +. | +Unknown trading status of the issue | +
NMSIND |
+0 | +. | +Unknown whether or not the issue is a member of the Nasdaq National Market | +
VOL |
+-99.0 | +. | ++ |
PRC
: A positive amount is an actual close for the trading date while a negative amount denotes the average between BIDLO
and ASKHI
.Units:
+SHROUT
in thousands.VOL
: The sum of the trading volumes during the month, reported in units of 100, and are not adjusted for splits during the month.https://wrds-www.wharton.upenn.edu/demo/crsp/form/
+ + + + + + + + + + + + +Since Stata 15, we can search, browse and import almost a million U.S. and international economic and financial time series made available by the St. Louis Federal Reserve's Federal Research Economic Data. This post briefly explains this great feature.
+A note on the missing codes in CRSP.
Can we estimate the coefficient of gender while controlling for individual fixed effects? This sounds impossible as an individual's gender typically does not vary and hence would be absorbed by individual fixed effects. However, Correlated Random Effects (CRE) may actually help.
-At last year's FMA Annual Meeting, I learned this CRE estimation technique when discussing a paper titled "Gender Gap in Returns to Publications" by Piotr Spiewanowski, Ivan Stetsyuk and Oleksandr Talavera. Let me recollect my memory and summarize the technique in this post.
+Since Stata 15, we can search, browse and import almost a million U.S. and international economic and financial time series made available by the St. Louis Federal Reserve's Federal Research Economic Data. This post briefly explains this great feature.
Can we estimate the coefficient of gender while controlling for individual fixed effects? This sounds impossible as an individual's gender typically does not vary and hence would be absorbed by individual fixed effects. However, Correlated Random Effects (CRE) may actually help.
+At last year's FMA Annual Meeting, I learned this CRE estimation technique when discussing a paper titled "Gender Gap in Returns to Publications" by Piotr Spiewanowski, Ivan Stetsyuk and Oleksandr Talavera. Let me recollect my memory and summarize the technique in this post.
Merton (1974) Distance to Default (DD) model is useful in forecasting defaults. This post documents a few ways to empirically estimate Merton DD (and default probability) as in Bharath and Shumway (2008 RFS).
+Uninitialized variable in C can be anything (most of the time). I find, in some cases, we can know the value of an uninitialized variable and thus maybe exploit it.
+Merton (1974) Distance to Default (DD) model is useful in forecasting defaults. This post documents a few ways to empirically estimate Merton DD (and default probability) as in Bharath and Shumway (2008 RFS).
Question
-Given a centrifuge with \(n\) holes, can we balance it with \(k\) (\(1\le k \le n\)) identical test tubes?
-Uninitialized variable in C can be anything (most of the time). I find, in some cases, we can know the value of an uninitialized variable and thus maybe exploit it.
In a traditional principal-agent model, firm output is a function of the agent's effort and the principal observes only the output not agent's effort. The principal carefully designs the agent's compensation package, especially the sensitivity of the agent's pay to firm output, to maximize the firm value. Now, what if we add another factor to the relationship between firm output and agent's effort? How would the optimal pay sensitivity change?
+Question
+Given a centrifuge with \(n\) holes, can we balance it with \(k\) (\(1\le k \le n\)) identical test tubes?
+As in Eisfeldt and Papanikolaou (2013), we obtain firm-year accounting data from the Compustat and compute the stock of organization capital for firms using the perpetual inventory method that recursively calculates the stock of OC by accumulating the deflated value of SG&A expenses.
+In a traditional principal-agent model, firm output is a function of the agent's effort and the principal observes only the output not agent's effort. The principal carefully designs the agent's compensation package, especially the sensitivity of the agent's pay to firm output, to maximize the firm value. Now, what if we add another factor to the relationship between firm output and agent's effort? How would the optimal pay sensitivity change?
Thomson One Banker SDC Platinum database provides comprehensive M&A transaction data from early 1980s, and is perhaps the most widely used M&A database in the world.
-This post documents the steps of downloading M&A deals from the SDC Platinum database. Specifically, I show how to download the complete M&A data where:
-As in Eisfeldt and Papanikolaou (2013), we obtain firm-year accounting data from the Compustat and compute the stock of organization capital for firms using the perpetual inventory method that recursively calculates the stock of OC by accumulating the deflated value of SG&A expenses.
More often than not, empirical researchers need to argue that their chosen model specification reigns. If not, they need to run a battery of tests on alternative specifications and report them. The problem is, researchers can fit a few tables each with a few models in the paper at best, and it's extremely hard for readers to know whether the reported results are being cherry-picked.
--+So, why not run all possible model specifications and find a concise way to report them all?
-
Thomson One Banker SDC Platinum database provides comprehensive M&A transaction data from early 1980s, and is perhaps the most widely used M&A database in the world.
+This post documents the steps of downloading M&A deals from the SDC Platinum database. Specifically, I show how to download the complete M&A data where:
+Never underestimate what programmers can do.
+More often than not, empirical researchers need to argue that their chosen model specification reigns. If not, they need to run a battery of tests on alternative specifications and report them. The problem is, researchers can fit a few tables each with a few models in the paper at best, and it's extremely hard for readers to know whether the reported results are being cherry-picked.
++So, why not run all possible model specifications and find a concise way to report them all?
+
In the Compustat database, a firm's headquarter state (and other identification) is in fact the current record stored in comp.company
. This means once a firm relocates (or updates its incorporate state, address, etc.), all historical observations will be updated and not recording historical state information anymore.
To resolve this issue, an effective way is to use the firm's historical SEC filings. You can follow my previous post Textual Analysis on SEC filings to extract the header information, which includes a wide range of meta data. Alternatively, the University of Notre Dame's Software Repository for Accounting and Finance provides an augmented 10-X header dataset.
-2023 March Update
-In this update I use 1,491,368 8-K filings of U.S. firms from 2004 to Dec 2022 and extract their HQ state and zipcode. -hist_state_zipcode_from_8k_2004_2022.csv.zip
-Never underestimate what programmers can do.
In certain scenarios, we want to estimate a model's parameters on the sample for -each observation with itself excluded. This can be achieved by estimating the -model repeatedly on the leave-one-out samples but is very inefficient. If we -estimate the model on the full sample, however, the coefficient estimates will -certainly be biased. Thankfully, we have the Jackknife method to correct for the -bias, which produces the Jackknifed coefficient estimates for each -observation.
+In the Compustat database, a firm's headquarter state (and other identification) is in fact the current record stored in comp.company
. This means once a firm relocates (or updates its incorporate state, address, etc.), all historical observations will be updated and not recording historical state information anymore.
To resolve this issue, an effective way is to use the firm's historical SEC filings. You can follow my previous post Textual Analysis on SEC filings to extract the header information, which includes a wide range of meta data. Alternatively, the University of Notre Dame's Software Repository for Accounting and Finance provides an augmented 10-X header dataset.
+2023 March Update
+In this update I use 1,491,368 8-K filings of U.S. firms from 2004 to Dec 2022 and extract their HQ state and zipcode. +hist_state_zipcode_from_8k_2004_2022.csv.zip
+Python 3.8 introduced a new module multiprocessing.shared_memory
that provides
-shared memory for direct access across processes. My test shows that it
-significantly reduces the memory usage, which also speeds up the program by
-reducing the costs of copying and moving things around.1
In certain scenarios, we want to estimate a model's parameters on the sample for +each observation with itself excluded. This can be achieved by estimating the +model repeatedly on the leave-one-out samples but is very inefficient. If we +estimate the model on the full sample, however, the coefficient estimates will +certainly be biased. Thankfully, we have the Jackknife method to correct for the +bias, which produces the Jackknifed coefficient estimates for each +observation.
Nowadays top journals favour more granular studies. Sometimes it's useful to dig into the raw SEC filings and perform textual analysis. This note documents how I download all historical SEC filings via EDGAR and conduct some textual analyses.
+Python 3.8 introduced a new module multiprocessing.shared_memory
that provides
+shared memory for direct access across processes. My test shows that it
+significantly reduces the memory usage, which also speeds up the program by
+reducing the costs of copying and moving things around.1
A measure of market impact cost from Kyle (1985), which can be interpreted as -the cost of demanding a certain amount of liquidity over a given time period.
+Nowadays top journals favour more granular studies. Sometimes it's useful to dig into the raw SEC filings and perform textual analysis. This note documents how I download all historical SEC filings via EDGAR and conduct some textual analyses.
A simple test for the random walk hypothesis of prices and efficient market.
+A measure of market impact cost from Kyle (1985), which can be interpreted as +the cost of demanding a certain amount of liquidity over a given time period.
This note briefly explains what's the minimum variance hedge ratio and how -to derive it in a cross hedge, where the asset to be hedged is not the same as -underlying asset.
+A simple test for the random walk hypothesis of prices and efficient market.
These are two versions of winsorization in SAS, of which I recommend the first one.
+This note briefly explains what's the minimum variance hedge ratio and how +to derive it in a cross hedge, where the asset to be hedged is not the same as +underlying asset.
Suppose today the stock price is \(S\) and in one year time, the stock price could -be either \(S_1\) or \(S_2\). You hold an European call option on this stock with an -exercise price of \(X=S\), where \(S_1<X<S_2\) for simplicity. So you'll exercise -the call when the stock price turns out to be \(S_2\) and leave it unexercised if -\(S_1\).
+These are two versions of winsorization in SAS, of which I recommend the first one.
A SAS macro used to extract BHC data.
+Suppose today the stock price is \(S\) and in one year time, the stock price could +be either \(S_1\) or \(S_2\). You hold an European call option on this stock with an +exercise price of \(X=S\), where \(S_1<X<S_2\) for simplicity. So you'll exercise +the call when the stock price turns out to be \(S_2\) and leave it unexercised if +\(S_1\).
Using the CRSP/Compustat Merged Database (CCM) to extract data is one of the -fundamental steps in most finance studies. Here I document several SAS programs -for annual, quarterly and monthly data, inspired by and adapted from several -examples from the WRDS.1
+A SAS macro used to extract BHC data.
This note is just to show that the different variants of Black-Scholes formula in textbook and tutorial solutions are in fact the same.
+Using the CRSP/Compustat Merged Database (CCM) to extract data is one of the +fundamental steps in most finance studies. Here I document several SAS programs +for annual, quarterly and monthly data, inspired by and adapted from several +examples from the WRDS.1
Herfindahl–Hirschman (HHI) Index is a well-known market concentration measure -determined by two factors:
-Intuitively, having a hundred similar-sized gas stations in town means a far -less concentrated environment than just one or two available, and when the -number of firms is constant, their size distribution or variance determines the -magnitude of market concentration.
-Since these two properties jointly determine the HHI measure of concentration, -naturally we want a decomposition of HHI that can reflects these two dimensions -respectively. This is particularly useful when two distinct markets have the -same level of HHI measure, but the concentration may result from different -sources. Note that here these two markets do not necessarily have to be industry -A versus industry B, but can be the same industry niche in two geographical -areas, for example.
-Thus, we can think of HHI as the sum of the actual market state's deviation from -1) all firms having the same size, and the deviation from 2) a fully -competitive environment with infinite number of firms in the market. Some simple -math can solve our problem.
+This note is just to show that the different variants of Black-Scholes formula in textbook and tutorial solutions are in fact the same.
Bloomberg is developing a new function in the Terminal, called BQuant, BQNT
Herfindahl–Hirschman (HHI) Index is a well-known market concentration measure +determined by two factors:
+Intuitively, having a hundred similar-sized gas stations in town means a far +less concentrated environment than just one or two available, and when the +number of firms is constant, their size distribution or variance determines the +magnitude of market concentration.
+Since these two properties jointly determine the HHI measure of concentration, +naturally we want a decomposition of HHI that can reflects these two dimensions +respectively. This is particularly useful when two distinct markets have the +same level of HHI measure, but the concentration may result from different +sources. Note that here these two markets do not necessarily have to be industry +A versus industry B, but can be the same industry niche in two geographical +areas, for example.
+Thus, we can think of HHI as the sum of the actual market state's deviation from +1) all firms having the same size, and the deviation from 2) a fully +competitive environment with infinite number of firms in the market. Some simple +math can solve our problem.
Computing the weekly returns from the CRSP daily stock data is a common task but -may be tricky sometimes. Let's discuss a few different ways to get it done -incorrectly and correctly.
-TL;DR Take me to the final solution!
-Surely -> The solution
-Bloomberg is developing a new function in the Terminal, called BQuant, BQNT
Converting between numeric and character variables is one of the most frequently -encountered issues when processing datasets. This article explains how to do -this conversion correctly and efficiently.
+Computing the weekly returns from the CRSP daily stock data is a common task but +may be tricky sometimes. Let's discuss a few different ways to get it done +incorrectly and correctly.
+TL;DR Take me to the final solution!
+Surely -> The solution
+This is a note for setting up a Docker, Nginx and Let's Encrypt environment on -Ubuntu 20.04 LTS.
+Converting between numeric and character variables is one of the most frequently +encountered issues when processing datasets. This article explains how to do +this conversion correctly and efficiently.
This Stata program creates the Fama-French industry classification from SIC code.
+This is a note for setting up a Docker, Nginx and Let's Encrypt environment on +Ubuntu 20.04 LTS.
Many research papers on Chinese firms include a control variable that indicates if the firm is a state-owned enterprise (SOE). This is important as SOEs and non-SOEs differ in many aspects and may have structural differences. This post documents the way to construct this indicator variable from the CSMAR databases.
+This Stata program creates the Fama-French industry classification from SIC code.
Many research papers on Chinese firms include a control variable that indicates if the firm is a state-owned enterprise (SOE). This is important as SOEs and non-SOEs differ in many aspects and may have structural differences. This post documents the way to construct this indicator variable from the CSMAR databases.
+ + + +Mingze Gao, aka Adrian, is a Postdoctoral Research Fellow at the University of Sydney Business School. With a focus on banking and corporate finance, his work has been published at journals including Journal of Banking & Finance, Finance Research Letters, and/or presented at conferences such as WFA, EFA (scheduled), FMA, FIRN, AFBC, etc.
Mingze has a strong background in programming and received First Prize in the 2010 National Olympiad in Informatics in Provinces (NOIP). His PhD thesis involves large-scale textual analysis and novel machine learning application, leading to a $500,000 grant from the Australian Research Council (ARC) Discovery Project financing his postdoctoral fellowship. He also has a Grad.Cert. in computing from UNSW with High Distinction, on, e.g., database, crypto and distributed ledger technology. Some of his open-source works include, for example, frds, specurve, mktstructure and edgar-analyzer.
CV, Google Scholar, Faculty Profile and SSRN Profile.
"},{"location":"#education","title":"Education","text":"Made in 1994. Married in 2021. Husband to Sherry. Father of four cats.
I came to Sydney in 2013 and have since been with the University of Sydney. Started with a commerce degree in econometrics and finance majors, I enjoy very much the study and life here and successfully completed my research degrees in finance afterwards. A fan of computer science too, I've completed a degree in computing at the University of New South Wales during my postdoctoral research fellowship.
My PhD work is summarised by three papers. The first theoretically extends the principal-agent model and empirically shows that a firm's accumulated knowledge substitutes for costly executive performance incentives. The second involves textual analysis on millions of firm 8K filings and documents a positive effect of corporate real estate holdings on M&A performance. The third applies machine learning on high-dimensional bank loan data and proposes an effective early-warning predictor for bank risks, which forms the backbone of a successful $500,000 grant from the Australian Research Council (ARC) Discovery Project financing my postdoctoral fellowship.
I'm blessed to have my wonderful supervisors over the past many years, Henry Leung, Buhui Qiu, Joakim Westerholm and Eliza Wu,1 and the invaluable mentoring from Iftekhar Hasan. I strive to produce more high-quality research outputs myself and together with my awesome coauthors. To the best I can, I also like to provide as much as possible to all researchers so that we can thrive together, which motivates me writing the research notes, apps and more. My favourite quote:
Work until you no longer have to introduce yourself.
Apart from work, I workout regularly. I cycle to/from work and train at home. I also do running training and aim to complete a marathon one day soon.
Alphabetically ordered by last name.\u00a0\u21a9
\ud83d\udc68\u200d\ud83d\udcbb The apps, programs and other tools I developed.
"},{"location":"apps/#research","title":"Research","text":"frds
- a Python framework to compute a collection of academic measures used in the finance literature.specurve
- a Stata command used to perform Specification Curve Analysis and generate the Specification Curve plot - listed in Harvard Business School Research Computing Services blog.edgar-analyzer
- a Python command-line tool to download SEC filings and perform textual analyses.mktstructure
- a Python command-line tool to download Refinitiv Tick History data and compute some market microstructure measures.phd.io
- a website for PhDs to build their personal web pages.PaperManager
- a simple tag-based paper manager with a fast PDF viewer in pure Python.LeGao
- a web application used to make LEGO mosaics.specurve
.edgar-analyzer
.Organization Capital and Executive Performance Incentives, with Henry Leung and Buhui Qiu, Journal of Banking & Finance, 2021.
A firm's organization capital has a significant substitution effect on its executive pay-for-performance sensitivity. SSRN
Consumer Behaviour and Credit Supply: Evidence from an Australian FinTech Lender, with Henry Leung, Linhui Liu and Buhui Qiu, Finance Research Letters, 2023.
Consumer behaviour affects FinTech lending decisions. SSRN
OtherCloser than ever: Growing business-level connections between Australia and Europe, with Boris Choy, Teresa Davis, Hanyun Ding, Massimo Garbuio, Catherine Hardy, Henry Leung, Thanh Son Luong, Greg Patmore, Sandra Peter, Buhui Qiu, Kai Riemer, John Shields, Catherine Sutton-Brady, Carlos Vazquez-Hernandez, and Eliza Wu, European Management Journal, 2023.
"},{"location":"research/#working-papers","title":"\ud83d\udcdd Working Papers","text":""},{"location":"research/#in-circulation","title":"In circulation","text":"\"Lone (Loan) Wolf Pack Risk\", with Iftekhar Hasan, Buhui Qiu and Eliza Wu.
\"Anomalous Lending and Bank Risk\", with Iftekhar Hasan, Buhui Qiu, Eliza Wu and Yan Yu.
\"Borrower Technology Similarity and Bank Loan Contracting\", with Yunying Huang, Steven Ongena and Eliza Wu.
\"Corporate Real Estate Holdings and M&As\", with Thanh Son Luong and Buhui Qiu.
\"Catering to Environmental Premium in Green Venture Financing: Evidence from a Bert-Based Deep Learning Approach\", with Henry Leung, Tse-Chun Lin and Tracy Thi Vu.
"},{"location":"research/#in-progress","title":"In progress","text":"FINC50 is one half of your Finance 101. (1)
The objectives are
Note
This is a proof-of-concept and always a work-in-progress.
It could take a relatively long time for me to \"complete\".
"},{"location":"finc50/#course-notes","title":"Course notes","text":"An interactive chart and calculator of bond cashflows, present values and prices.
{ \"$schema\": \"https://vega.github.io/schema/vega/v5.json\", \"description\": \"A chart of bond's cashflows, present value and price, made by Mingze Gao\", \"width\": 700, \"height\": 300, \"title\": { \"text\": \"Cashflows, PV and Price of a $10,000 Bond\", \"fontSize\": 18, \"anchor\": \"middle\" }, \"data\": [ { \"name\": \"table\", \"transform\": [ { \"type\": \"sequence\", \"as\": \"year\", \"start\": 0, \"step\": 0.5, \"stop\": 31 }, { \"type\": \"formula\", \"as\": \"i\", \"expr\": \"(0.5*(couponFrequency=='semiannual')+(couponFrequency=='annual'))\" }, { \"type\": \"formula\", \"as\": \"i2\", \"expr\": \"(2*(couponFrequency=='semiannual')+(couponFrequency=='annual'))\" }, { \"type\": \"formula\", \"as\": \"cashflow\", \"expr\": \"10000*couponRate*(datum.i)*(datum.year>0)+10000*(datum.year==maturityInYears)\" }, { \"type\": \"formula\", \"as\": \"r\", \"expr\": \"couponFrequency=='annual'? discountRate : pow(1+discountRate,0.5)-1\" }, { \"type\": \"formula\", \"as\": \"pv\", \"expr\": \"datum.cashflow / pow(1+datum.r, datum.year)\" }, { \"type\": \"formula\", \"as\": \"price\", \"expr\": \"datum.r>0 ? (10000*couponRate*(datum.i)*(1-pow(1+datum.r,-maturityInYears*datum.i2))/datum.r+10000*pow(1+datum.r,-maturityInYears*datum.i2)) : 10000*(1+couponRate*(datum.i)*maturityInYears*datum.i2)\" }, { \"type\": \"filter\", \"expr\": \"datum.year<=maturityInYears\" }, { \"type\": \"filter\", \"expr\": \"couponFrequency=='annual'? (datum.year==round(datum.year)) : 1 \" } ] }, { \"name\": \"scaledata\", \"source\": \"table\", \"transform\": [ { \"type\": \"aggregate\", \"fields\": [\"cashflow\", \"price\"], \"ops\": [\"max\", \"max\"], \"as\": [\"maxCashflow\", \"mP\"] }, { \"type\": \"formula\", \"as\": \"maxV\", \"expr\": \"max(datum.maxCashflow, datum.mP*1.1)\" } ] } ], \"signals\": [ { \"name\": \"maturityInYears\", \"value\": 10, \"bind\": { \"input\": \"range\", \"min\": 1, \"max\": 30, \"step\": 1 } }, { \"name\": \"discountRate\", \"value\": 0.08, \"bind\": { \"input\": \"range\", \"min\": 0, \"max\": 0.2, \"step\": 0.0001 } }, { \"name\": \"couponRate\", \"value\": 0.05, \"bind\": { \"input\": \"range\", \"min\": 0, \"max\": 0.2, \"step\": 0.0001 } }, { \"name\": \"couponFrequency\", \"value\": \"annual\", \"bind\": { \"input\": \"radio\", \"options\": [\"annual\", \"semiannual\"] } } ], \"scales\": [ { \"name\": \"x\", \"type\": \"band\", \"domain\": { \"data\": \"table\", \"field\": \"year\", \"sort\": true }, \"range\": \"width\", \"padding\": 0.7 }, { \"name\": \"y\", \"type\": \"linear\", \"domain\": { \"data\": \"scaledata\", \"field\": \"maxV\" }, \"range\": \"height\" } ], \"axes\": [ { \"orient\": \"bottom\", \"scale\": \"x\", \"title\": \"Year\" }, { \"orient\": \"left\", \"scale\": \"y\", \"title\": \"Cash Flows, PV and Bond Price\" } ], \"marks\": [ { \"type\": \"rect\", \"from\": { \"data\": \"table\" }, \"encode\": { \"update\": { \"fill\": { \"value\": \"steelblue\" }, \"x\": { \"scale\": \"x\", \"field\": \"year\" }, \"width\": { \"scale\": \"x\", \"band\": 1 }, \"y\": { \"scale\": \"y\", \"field\": \"cashflow\" }, \"y2\": { \"scale\": \"y\", \"value\": 0 }, \"tooltip\": { \"signal\": \"{ 'Cashflow': format(datum.cashflow, '$,.2f') }\" } } } }, { \"type\": \"rect\", \"from\": { \"data\": \"table\" }, \"encode\": { \"update\": { \"fill\": { \"value\": \"#d6001c\" }, \"x\": { \"scale\": \"x\", \"field\": \"year\" }, \"width\": { \"scale\": \"x\", \"band\": 1 }, \"y\": { \"scale\": \"y\", \"field\": \"pv\" }, \"y2\": { \"scale\": \"y\", \"value\": 0 }, \"tooltip\": { \"signal\": \"{ 'PV': format(datum.pv, '$,.2f') }\" } } } }, { \"type\": \"rect\", \"from\": { \"data\": \"table\" }, \"encode\": { \"update\": { \"fill\": { \"value\": \"darkgray\" }, \"x\": { \"scale\": \"x\", \"value\": 0 }, \"width\": { \"scale\": \"x\", \"band\": 1 }, \"y\": { \"scale\": \"y\", \"field\": \"price\" }, \"y2\": { \"scale\": \"y\", \"value\": 0 }, \"tooltip\": { \"signal\": \"{ 'Bond Price': format(datum.price, '$,.2f') }\" } } } }, { \"type\": \"text\", \"from\": { \"data\": \"table\" }, \"encode\": { \"update\": { \"x\": { \"scale\": \"x\", \"value\": 0 }, \"y\": { \"scale\": \"y\", \"field\": \"price\", \"offset\": -5 }, \"text\": { \"signal\": \"format(datum.price, '$,.2f')\" }, \"fontSize\": { \"value\": 12 }, \"align\": { \"value\": \"left\" }, \"baseline\": { \"value\": \"bottom\" }, \"fill\": { \"value\": \"black\" } } } }, { \"type\": \"text\", \"encode\": { \"enter\": { \"align\": { \"value\": \"right\" }, \"baseline\": { \"value\": \"bottom\" }, \"fill\": { \"value\": \"rgba(0, 0, 0, 0.2)\" }, \"fontSize\": { \"value\": 14 }, \"x\": { \"value\": 0, \"offset\": \"width*0.85\" }, \"y\": { \"value\": 0, \"offset\": \"height*1.2\" }, \"text\": { \"value\": \"Assume coupons paid in arrears and effective annual discount rate (conversion based on coupon frequency).\" } } } } ] }
"},{"location":"finc50/#bond-price-and-yield","title":"Bond price and yield","text":"An interactive chart of bond price and yield.
{ \"$schema\": \"https://vega.github.io/schema/vega/v5.json\", \"description\": \"A chart of bond's price and yield, made by Mingze Gao\", \"width\": 700, \"height\": 300, \"title\": { \"text\": \"Bond Price and Yield\", \"fontSize\": 18, \"anchor\": \"middle\" }, \"data\": [ { \"name\": \"table\", \"transform\": [ { \"type\": \"sequence\", \"as\": \"yield\", \"start\": 0.0, \"step\": 0.5, \"stop\": 20.5 }, { \"type\": \"formula\", \"as\": \"price\", \"expr\": \"datum.yield>0 ? (10000*couponRate*(1-pow(1+datum.yield/100,-maturityInYears))/(datum.yield/100)+10000*pow(1+datum.yield/100,-maturityInYears)) : 10000*(1+couponRate*maturityInYears)\" }, { \"type\": \"formula\", \"as\": \"price5\", \"expr\": \"datum.yield>0 ? (10000*0.05*(1-pow(1+datum.yield/100,-maturityInYears))/(datum.yield/100)+10000*pow(1+datum.yield/100,-maturityInYears)) : 10000*(1+0.05*maturityInYears)\" } ] }, { \"name\": \"scaledata\", \"source\": \"table\", \"transform\": [ { \"type\": \"formula\", \"as\": \"maxV\", \"expr\": \"max(datum.price, datum.price5*1.2)\" } ] } ], \"signals\": [ { \"name\": \"maturityInYears\", \"value\": 10, \"bind\": { \"input\": \"range\", \"min\": 1, \"max\": 30, \"step\": 1 } }, { \"name\": \"couponRate\", \"value\": 0.05, \"bind\": { \"input\": \"range\", \"min\": 0, \"max\": 0.1, \"step\": 0.0001 } } ], \"scales\": [ { \"name\": \"x\", \"type\": \"linear\", \"domain\": { \"data\": \"table\", \"field\": \"yield\", \"sort\": true }, \"range\": \"width\" }, { \"name\": \"y\", \"type\": \"linear\", \"domain\": { \"data\": \"scaledata\", \"field\": \"maxV\" }, \"range\": \"height\" } ], \"axes\": [ { \"orient\": \"bottom\", \"scale\": \"x\", \"title\": \"Yield (%)\", \"ticks\": false }, { \"orient\": \"left\", \"scale\": \"y\", \"title\": \"Bond Price\" } ], \"marks\": [ { \"type\": \"rule\", \"encode\": { \"update\": { \"x\": { \"scale\": \"x\", \"value\": 0 }, \"y\": { \"scale\": \"y\", \"value\": 10000 }, \"x2\": { \"scale\": \"x\", \"value\": 5 }, \"y2\": { \"scale\": \"y\", \"value\": 10000 }, \"strokeWidth\": { \"value\": 1 }, \"strokeDash\": { \"value\": [8, 3] }, \"strokeCap\": { \"value\": \"round\" }, \"opacity\": { \"value\": 1 } } } }, { \"type\": \"rule\", \"encode\": { \"update\": { \"x\": { \"scale\": \"x\", \"value\": 5 }, \"y\": { \"scale\": \"y\", \"value\": 0 }, \"x2\": { \"scale\": \"x\", \"value\": 5 }, \"y2\": { \"scale\": \"y\", \"value\": 10000 }, \"strokeWidth\": { \"value\": 1 }, \"strokeDash\": { \"value\": [8, 3] }, \"strokeCap\": { \"value\": \"round\" }, \"opacity\": { \"value\": 1 } } } }, { \"type\": \"line\", \"from\": { \"data\": \"table\" }, \"encode\": { \"update\": { \"x\": { \"scale\": \"x\", \"field\": \"yield\" }, \"width\": { \"scale\": \"x\", \"band\": 1 }, \"y\": { \"scale\": \"y\", \"field\": \"price\" }, \"tooltip\": { \"signal\": \"{ 'Bond Price': format(datum.price, '$,.2f') }\" } } } }, { \"type\": \"line\", \"from\": { \"data\": \"table\" }, \"encode\": { \"update\": { \"x\": { \"scale\": \"x\", \"field\": \"yield\" }, \"y\": { \"scale\": \"y\", \"field\": \"price5\" }, \"stroke\": { \"value\": \"#d6001c\" }, \"tooltip\": { \"signal\": \"{ 'Bond Price': format(datum.price5, '$,.2f') }\" } } } }, { \"type\": \"text\", \"from\": { \"data\": \"table\" }, \"encode\": { \"update\": { \"x\": { \"scale\": \"x\", \"value\": 20 }, \"y\": { \"scale\": \"y\", \"field\": \"price\", \"offset\": -5 }, \"text\": { \"signal\": \"format(datum.price, '$,.0f')+'@'+format(datum.yield,'.1f')+'%'\" }, \"fontSize\": { \"value\": 12 }, \"align\": { \"value\": \"left\" }, \"baseline\": { \"value\": \"bottom\" }, \"fill\": { \"value\": \"black\" } } } }, { \"type\": \"text\", \"encode\": { \"enter\": { \"align\": { \"value\": \"right\" }, \"baseline\": { \"value\": \"bottom\" }, \"fill\": { \"value\": \"rgba(0, 0, 0, 0.2)\" }, \"fontSize\": { \"value\": 14 }, \"x\": { \"value\": 0, \"offset\": \"width\" }, \"y\": { \"value\": 0, \"offset\": \"height*1.2\" }, \"text\": { \"value\": \"Assume $10,000 bond, annual coupons paid in arrears and effective annual discount rate.\" } } } } ] }
"},{"location":"finc50/#risk-and-return","title":"Risk and return","text":"A graph showing volatility and return of S&P500 constituents in 2022.(1) Try to pan, zoom, select and click.
import yfinance as yf\nimport pandas as pd\nimport numpy as np\nlink = \"https://en.wikipedia.org/wiki/List_of_S%26P_500_companies#S&P_500_component_stocks\"\ndf = pd.read_html(link, header=0)[0]\ndf = yf.download(tickers=df['Symbol'].tolist(), start=\"2022-01-01\", end=\"2022-12-31\", progress=False, rounding=True)\ndf = df[['Adj Close']]\ndf.columns = df.columns.droplevel(0)\nret = ((df.pct_change()+1).cumprod()-1).iloc[-1]\nstd = df.pct_change().std() * np.sqrt(252)\ndf = pd.DataFrame({'return': ret.values, \"std\": std.values, \"ticker\": ret.index}).round(3).dropna()\ndf.to_json(\"./spy_risk_return.json\", orient=\"records\") \n
{ \"$schema\": \"https://vega.github.io/schema/vega/v5.json\", \"title\": { \"text\": \"Return and Volatility of S&P500 Stocks in 2022\", \"fontSize\": 18, \"anchor\": \"middle\" }, \"description\": \"An interactive scatter plot example supporting pan and zoom.\", \"width\": 700, \"height\": 300, \"padding\": { \"top\": 30, \"left\": 40, \"bottom\": 20, \"right\": 10 }, \"autosize\": \"none\", \"config\": { \"axis\": { \"domain\": false, \"tickSize\": 1, \"tickColor\": \"#888\", \"labelFont\": \"Monaco, Courier New\" } }, \"signals\": [ { \"name\": \"margin\", \"value\": 20 }, { \"name\": \"hover\", \"on\": [ { \"events\": \"*:mouseover\", \"encode\": \"hover\" }, { \"events\": \"*:mouseout\", \"encode\": \"leave\" }, { \"events\": \"*:mousedown\", \"encode\": \"select\" }, { \"events\": \"*:mouseup\", \"encode\": \"release\" } ] }, { \"name\": \"xoffset\", \"update\": \"-(height + padding.bottom)\" }, { \"name\": \"yoffset\", \"update\": \"-(width + padding.left)\" }, { \"name\": \"xrange\", \"update\": \"[0, width]\" }, { \"name\": \"yrange\", \"update\": \"[height, 0]\" }, { \"name\": \"down\", \"value\": null, \"on\": [ { \"events\": \"touchend\", \"update\": \"null\" }, { \"events\": \"mousedown, touchstart\", \"update\": \"xy()\" } ] }, { \"name\": \"xcur\", \"value\": null, \"on\": [ { \"events\": \"mousedown, touchstart, touchend\", \"update\": \"slice(xdom)\" } ] }, { \"name\": \"ycur\", \"value\": null, \"on\": [ { \"events\": \"mousedown, touchstart, touchend\", \"update\": \"slice(ydom)\" } ] }, { \"name\": \"delta\", \"value\": [0, 0], \"on\": [ { \"events\": [ { \"source\": \"window\", \"type\": \"mousemove\", \"consume\": true, \"between\": [ { \"type\": \"mousedown\" }, { \"source\": \"window\", \"type\": \"mouseup\" } ] }, { \"type\": \"touchmove\", \"consume\": true, \"filter\": \"event.touches.length === 1\" } ], \"update\": \"down ? [down[0]-x(), y()-down[1]] : [0,0]\" } ] }, { \"name\": \"anchor\", \"value\": [0, 0], \"on\": [ { \"events\": \"wheel\", \"update\": \"[invert('xscale', x()), invert('yscale', y())]\" }, { \"events\": { \"type\": \"touchstart\", \"filter\": \"event.touches.length===2\" }, \"update\": \"[(xdom[0] + xdom[1]) / 2, (ydom[0] + ydom[1]) / 2]\" } ] }, { \"name\": \"zoom\", \"value\": 1, \"on\": [ { \"events\": \"wheel!\", \"force\": true, \"update\": \"pow(1.001, event.deltaY * pow(16, event.deltaMode))\" }, { \"events\": { \"signal\": \"dist2\" }, \"force\": true, \"update\": \"dist1 / dist2\" } ] }, { \"name\": \"dist1\", \"value\": 0, \"on\": [ { \"events\": { \"type\": \"touchstart\", \"filter\": \"event.touches.length===2\" }, \"update\": \"pinchDistance(event)\" }, { \"events\": { \"signal\": \"dist2\" }, \"update\": \"dist2\" } ] }, { \"name\": \"dist2\", \"value\": 0, \"on\": [ { \"events\": { \"type\": \"touchmove\", \"consume\": true, \"filter\": \"event.touches.length===2\" }, \"update\": \"pinchDistance(event)\" } ] }, { \"name\": \"xdom\", \"update\": \"slice(xext)\", \"on\": [ { \"events\": { \"signal\": \"delta\" }, \"update\": \"[xcur[0] + span(xcur) * delta[0] / width, xcur[1] + span(xcur) * delta[0] / width]\" }, { \"events\": { \"signal\": \"zoom\" }, \"update\": \"[anchor[0] + (xdom[0] - anchor[0]) * zoom, anchor[0] + (xdom[1] - anchor[0]) * zoom]\" } ] }, { \"name\": \"ydom\", \"update\": \"slice(yext)\", \"on\": [ { \"events\": { \"signal\": \"delta\" }, \"update\": \"[ycur[0] + span(ycur) * delta[1] / height, ycur[1] + span(ycur) * delta[1] / height]\" }, { \"events\": { \"signal\": \"zoom\" }, \"update\": \"[anchor[1] + (ydom[0] - anchor[1]) * zoom, anchor[1] + (ydom[1] - anchor[1]) * zoom]\" } ] }, { \"name\": \"size\", \"update\": \"clamp(20 / span(xdom), 1, 1000)\" } ], \"data\": [ { \"name\": \"points\", \"url\": \"./demo/spy_risk_return.json\", \"transform\": [ { \"type\": \"extent\", \"field\": \"std\", \"signal\": \"xext\" }, { \"type\": \"extent\", \"field\": \"return\", \"signal\": \"yext\" }, { \"type\": \"formula\", \"as\": \"url\", \"expr\": \"'https://www.google.com/search?q=ticker:'+datum.ticker\", \"initonly\": true }, { \"type\": \"formula\", \"as\": \"tip\", \"expr\": \"'Ticker:'+datum.ticker\", \"initonly\": true } ] } ], \"scales\": [ { \"name\": \"xscale\", \"zero\": false, \"domain\": { \"signal\": \"xdom\" }, \"range\": { \"signal\": \"xrange\" } }, { \"name\": \"yscale\", \"zero\": false, \"domain\": { \"signal\": \"ydom\" }, \"range\": { \"signal\": \"yrange\" } } ], \"axes\": [ { \"scale\": \"xscale\", \"orient\": \"top\", \"offset\": { \"signal\": \"xoffset\" }, \"title\": \"Volatility\", \"titlePadding\": 15 }, { \"scale\": \"yscale\", \"orient\": \"right\", \"offset\": { \"signal\": \"yoffset\" }, \"title\": \"Return\", \"titleAngle\": -90, \"titlePadding\": 20 } ], \"marks\": [ { \"type\": \"symbol\", \"from\": { \"data\": \"points\" }, \"clip\": true, \"encode\": { \"enter\": { \"fillOpacity\": { \"value\": 0.6 }, \"fill\": { \"value\": \"#a6192e\" } }, \"update\": { \"x\": { \"scale\": \"xscale\", \"field\": \"std\" }, \"y\": { \"scale\": \"yscale\", \"field\": \"return\" }, \"size\": { \"signal\": \"size\" } }, \"hover\": { \"fill\": { \"value\": \"firebrick\" }, \"tooltip\": { \"field\": \"tip\", \"type\": \"nominal\" }, \"size\": { \"signal\": \"size\", \"mult\": 5 } }, \"leave\": { \"fill\": { \"value\": \"#a6192e\" } }, \"select\": { \"size\": { \"signal\": \"size\", \"mult\": 5 }, \"href\": { \"field\": \"url\", \"type\": \"nominal\" } }, \"release\": { \"size\": { \"signal\": \"size\" } } } } ] }
"},{"location":"finc50/fixed-income/","title":"Fixed Income Securities","text":"In the last post we introduce features of a bond. Now, let's look at how to price a plain vanilla bond and examine the relation between bond price and yield.
"},{"location":"finc50/fixed-income/bond-prices-and-yields/#price-of-a-bond","title":"Price of a bond","text":"First, what's the fair price of a bond?
For the investor, a bond represents a series of cashflows to receive in the future. So its price is naturally the total present value of all cashflows from the bond, including coupon payments (if any) and the repayment of principal (bond face value).
Therefore, the following equation holds true universally at all time \\(t\\):
\\[ \\text{Bond Price}_{t} = \\text{PV}_t(\\text{future coupons}) + \\text{PV}_t(\\text{face value}) \\]A bond's price at time \\(t\\) is the present value as at time \\(t\\) of all coupons to receive in the future, plus the present value as at time \\(t\\) of the bond face value. Personally, I'd call this fundamental.
Before we move on
Remember that all the complications we will see later are just results of uncertainties about the PVs, which utilmately are determined by cashflows and discount rates. Whenever you're lost, pause and think about how they are affected, and then reasone about how bond price may be affected.
I like examples. Suppose we are to issue a 10-year bond with a $10,000 face value that pays 5% annual coupon in arrears(1), with 8% discount rate(2), what would be the price today \\(t=0\\) at which we can sell the bond to investors?
The chart below shows the result. Specifically, the blue bars indicate the bond's cashflows, the overlaying red bars indicate their present values as at time \\(t=0\\). The gray bar at \\(t=0\\) is the sum of all red bars, i.e., present values, and the price of the bond today.
{ \"$schema\": \"https://vega.github.io/schema/vega/v5.json\", \"description\": \"A chart of bond's cashflows, present value and price, made by Mingze Gao\", \"width\": 700, \"height\": 300, \"title\": { \"text\": \"Cashflows, PV and Price of a $10,000 Bond\", \"fontSize\": 18, \"anchor\": \"middle\" }, \"data\": [ { \"name\": \"table\", \"transform\": [ { \"type\": \"sequence\", \"as\": \"year\", \"start\": 0, \"step\": 0.5, \"stop\": 31 }, { \"type\": \"formula\", \"as\": \"i\", \"expr\": \"(0.5*(couponFrequency=='semiannual')+(couponFrequency=='annual'))\" }, { \"type\": \"formula\", \"as\": \"i2\", \"expr\": \"(2*(couponFrequency=='semiannual')+(couponFrequency=='annual'))\" }, { \"type\": \"formula\", \"as\": \"cashflow\", \"expr\": \"10000*couponRate*(datum.i)*(datum.year>0)+10000*(datum.year==maturityInYears)\" }, { \"type\": \"formula\", \"as\": \"r\", \"expr\": \"couponFrequency=='annual'? discountRate : pow(1+discountRate,0.5)-1\" }, { \"type\": \"formula\", \"as\": \"pv\", \"expr\": \"datum.cashflow / pow(1+datum.r, datum.year)\" }, { \"type\": \"formula\", \"as\": \"price\", \"expr\": \"datum.r>0 ? (10000*couponRate*(datum.i)*(1-pow(1+datum.r,-maturityInYears*datum.i2))/datum.r+10000*pow(1+datum.r,-maturityInYears*datum.i2)) : 10000*(1+couponRate*(datum.i)*maturityInYears*datum.i2)\" }, { \"type\": \"filter\", \"expr\": \"datum.year<=maturityInYears\" }, { \"type\": \"filter\", \"expr\": \"couponFrequency=='annual'? (datum.year==round(datum.year)) : 1 \" } ] }, { \"name\": \"scaledata\", \"source\": \"table\", \"transform\": [ { \"type\": \"aggregate\", \"fields\": [\"cashflow\", \"price\"], \"ops\": [\"max\", \"max\"], \"as\": [\"maxCashflow\", \"mP\"] }, { \"type\": \"formula\", \"as\": \"maxV\", \"expr\": \"max(datum.maxCashflow, datum.mP*1.1)\" } ] } ], \"signals\": [ { \"name\": \"maturityInYears\", \"value\": 10, \"bind\": { \"input\": \"range\", \"min\": 1, \"max\": 30, \"step\": 1 } }, { \"name\": \"discountRate\", \"value\": 0.08, \"bind\": { \"input\": \"range\", \"min\": 0, \"max\": 0.2, \"step\": 0.0001 } }, { \"name\": \"couponRate\", \"value\": 0.05, \"bind\": { \"input\": \"range\", \"min\": 0, \"max\": 0.2, \"step\": 0.0001 } }, { \"name\": \"couponFrequency\", \"value\": \"annual\", \"bind\": { \"input\": \"radio\", \"options\": [\"annual\", \"semiannual\"] } } ], \"scales\": [ { \"name\": \"x\", \"type\": \"band\", \"domain\": { \"data\": \"table\", \"field\": \"year\", \"sort\": true }, \"range\": \"width\", \"padding\": 0.7 }, { \"name\": \"y\", \"type\": \"linear\", \"domain\": { \"data\": \"scaledata\", \"field\": \"maxV\" }, \"range\": \"height\" } ], \"axes\": [ { \"orient\": \"bottom\", \"scale\": \"x\", \"title\": \"Year\" }, { \"orient\": \"left\", \"scale\": \"y\", \"title\": \"Cash Flows, PV and Bond Price\" } ], \"marks\": [ { \"type\": \"rect\", \"from\": { \"data\": \"table\" }, \"encode\": { \"update\": { \"fill\": { \"value\": \"steelblue\" }, \"x\": { \"scale\": \"x\", \"field\": \"year\" }, \"width\": { \"scale\": \"x\", \"band\": 1 }, \"y\": { \"scale\": \"y\", \"field\": \"cashflow\" }, \"y2\": { \"scale\": \"y\", \"value\": 0 }, \"tooltip\": { \"signal\": \"{ 'Cashflow': format(datum.cashflow, '$,.2f') }\" } } } }, { \"type\": \"rect\", \"from\": { \"data\": \"table\" }, \"encode\": { \"update\": { \"fill\": { \"value\": \"#d6001c\" }, \"x\": { \"scale\": \"x\", \"field\": \"year\" }, \"width\": { \"scale\": \"x\", \"band\": 1 }, \"y\": { \"scale\": \"y\", \"field\": \"pv\" }, \"y2\": { \"scale\": \"y\", \"value\": 0 }, \"tooltip\": { \"signal\": \"{ 'PV': format(datum.pv, '$,.2f') }\" } } } }, { \"type\": \"rect\", \"from\": { \"data\": \"table\" }, \"encode\": { \"update\": { \"fill\": { \"value\": \"darkgray\" }, \"x\": { \"scale\": \"x\", \"value\": 0 }, \"width\": { \"scale\": \"x\", \"band\": 1 }, \"y\": { \"scale\": \"y\", \"field\": \"price\" }, \"y2\": { \"scale\": \"y\", \"value\": 0 }, \"tooltip\": { \"signal\": \"{ 'Bond Price': format(datum.price, '$,.2f') }\" } } } }, { \"type\": \"text\", \"from\": { \"data\": \"table\" }, \"encode\": { \"update\": { \"x\": { \"scale\": \"x\", \"value\": 0 }, \"y\": { \"scale\": \"y\", \"field\": \"price\", \"offset\": -5 }, \"text\": { \"signal\": \"format(datum.price, '$,.2f')\" }, \"fontSize\": { \"value\": 12 }, \"align\": { \"value\": \"left\" }, \"baseline\": { \"value\": \"bottom\" }, \"fill\": { \"value\": \"black\" } } } }, { \"type\": \"text\", \"encode\": { \"enter\": { \"align\": { \"value\": \"right\" }, \"baseline\": { \"value\": \"bottom\" }, \"fill\": { \"value\": \"rgba(0, 0, 0, 0.2)\" }, \"fontSize\": { \"value\": 14 }, \"x\": { \"value\": 0, \"offset\": \"width*0.85\" }, \"y\": { \"value\": 0, \"offset\": \"height*1.2\" }, \"text\": { \"value\": \"Assume coupons paid in arrears and effective annual discount rate (conversion based on coupon frequency).\" } } } } ] }
Try it out
Change the parameters, see what happens to the bond price!
Now, let's have fun with the interactive chart above.
The initial price \\(P_{t=0}\\) of a plain vanilla \\(N\\)-year bond with face value \\(F\\), annual coupon \\(C\\), at a constant discount rate \\(r\\)(1), is given by
Note that this is only the initial price when the bond is issued at time \\(t=0\\).
"},{"location":"finc50/fixed-income/bond-prices-and-yields/#price-over-time","title":"Price over time","text":"Question
So, other things equal, how does bond price changes over time as we approaches the maturity date?
We need a better formula that can let \\(t\\) take values other than 0. Recall the rationale that the price is nothing but sum of all PVs of future payments.
"},{"location":"finc50/fixed-income/bond-prices-and-yields/#a-slightly-improved-formula","title":"A slightly improved formula","text":"At time \\(t\\), which is exactly \\(n\\) years till maturity, the price, \\(P_{t}\\), of a plain vanilla bond with face value \\(F\\), annual coupon \\(C\\), at a constant discount rate \\(r\\), is given by
\\[ P_{t} = \\underbrace{\\sum_{\\tau=1}^{n} \\frac{C}{(1+r)^{\\tau}}}_{\\text{sum of coupons' PVs}} + \\underbrace{\\frac{F}{(1+r)^n}}_{\\text{face value's PV}} \\]From only \\(P_{t=0}\\) to \\(\\{P_{t}\\}\\) is a major improvement!(1)
Let me show you another graph. Note that in this graph, each bar represents the bond price as at a point in time.(1)
{ \"$schema\": \"https://vega.github.io/schema/vega/v5.json\", \"description\": \"A chart of bond's price over time till maturity, made by Mingze Gao\", \"width\": 700, \"height\": 300, \"title\": { \"text\": \"Bond Price Over Time From Issue To Maturity\", \"fontSize\": 18, \"anchor\": \"middle\" }, \"data\": [ { \"name\": \"table\", \"transform\": [ { \"type\": \"sequence\", \"as\": \"year\", \"start\": 0, \"step\": 0.5, \"stop\": 31 }, { \"type\": \"formula\", \"as\": \"i\", \"expr\": \"(0.5*(couponFrequency=='semiannual')+(couponFrequency=='annual'))\" }, { \"type\": \"formula\", \"as\": \"i2\", \"expr\": \"(2*(couponFrequency=='semiannual')+(couponFrequency=='annual'))\" }, { \"type\": \"formula\", \"as\": \"cashflow\", \"expr\": \"10000*couponRate*(datum.i)*(datum.year>0)+10000*(datum.year==maturityInYears)\" }, { \"type\": \"formula\", \"as\": \"r\", \"expr\": \"couponFrequency=='annual'? discountRate : pow(1+discountRate,0.5)-1\" }, { \"type\": \"formula\", \"as\": \"pv\", \"expr\": \"datum.cashflow / pow(1+datum.r, datum.year)\" }, { \"type\": \"formula\", \"as\": \"price\", \"expr\": \"datum.r>0 ? (10000*couponRate*(datum.i)*(1-pow(1+datum.r,-(maturityInYears-datum.year)*datum.i2))/datum.r+10000*pow(1+datum.r,-(maturityInYears-datum.year)*datum.i2)) : 10000*(1+couponRate*(datum.i)*(maturityInYears-datum.year)*datum.i2)\" }, { \"type\": \"filter\", \"expr\": \"datum.year<=maturityInYears\" }, { \"type\": \"filter\", \"expr\": \"couponFrequency=='annual'? (datum.year==round(datum.year)) : 1 \" } ] } ], \"signals\": [ { \"name\": \"maturityInYears\", \"value\": 30, \"bind\": { \"input\": \"range\", \"min\": 1, \"max\": 30, \"step\": 1 } }, { \"name\": \"discountRate\", \"value\": 0.08, \"bind\": { \"input\": \"range\", \"min\": 0, \"max\": 0.2, \"step\": 0.0001 } }, { \"name\": \"couponRate\", \"value\": 0.05, \"bind\": { \"input\": \"range\", \"min\": 0, \"max\": 0.2, \"step\": 0.0001 } }, { \"name\": \"couponFrequency\", \"value\": \"annual\", \"bind\": { \"input\": \"radio\", \"options\": [\"annual\", \"semiannual\"] } } ], \"scales\": [ { \"name\": \"x\", \"type\": \"band\", \"domain\": { \"data\": \"table\", \"field\": \"year\", \"sort\": true }, \"range\": \"width\", \"padding\": 0.7 }, { \"name\": \"y\", \"type\": \"linear\", \"domain\": { \"data\": \"table\", \"field\": \"price\" }, \"range\": \"height\" } ], \"axes\": [ { \"orient\": \"bottom\", \"scale\": \"x\", \"title\": \"Year\" }, { \"orient\": \"left\", \"scale\": \"y\", \"title\": \"Bond Price\" } ], \"marks\": [ { \"type\": \"rule\", \"encode\": { \"update\": { \"x\": { \"scale\": \"x\", \"value\": 0 }, \"y\": { \"scale\": \"y\", \"value\": 10000 }, \"x2\": { \"scale\": \"x\", \"signal\": \"maturityInYears\" }, \"y2\": { \"scale\": \"y\", \"value\": 10000 }, \"strokeWidth\": { \"value\": 2 }, \"strokeDash\": { \"value\": [8, 3] }, \"strokeCap\": { \"value\": \"round\" }, \"opacity\": { \"value\": 1 } } } }, { \"type\": \"rect\", \"from\": { \"data\": \"table\" }, \"encode\": { \"update\": { \"fill\": { \"value\": \"steelblue\" }, \"x\": { \"scale\": \"x\", \"field\": \"year\" }, \"width\": { \"scale\": \"x\", \"band\": 1 }, \"y\": { \"scale\": \"y\", \"field\": \"price\" }, \"y2\": { \"scale\": \"y\", \"value\": 0 }, \"tooltip\": { \"signal\": \"{ 'Bond Price': format(datum.price, '$,.2f') }\" } } } }, { \"type\": \"text\", \"from\": { \"data\": \"table\" }, \"encode\": { \"update\": { \"x\": { \"scale\": \"x\", \"signal\": \"maturityInYears\" }, \"y\": { \"scale\": \"y\", \"value\": 10000, \"offset\": -5 }, \"text\": { \"value\": \"Bond Face Value\" }, \"fontSize\": { \"value\": 12 }, \"align\": { \"value\": \"left\" }, \"baseline\": { \"value\": \"bottom\" }, \"fill\": { \"value\": \"black\" } } } }, { \"type\": \"text\", \"encode\": { \"enter\": { \"align\": { \"value\": \"right\" }, \"baseline\": { \"value\": \"bottom\" }, \"fill\": { \"value\": \"rgba(0, 0, 0, 0.2)\" }, \"fontSize\": { \"value\": 14 }, \"x\": { \"value\": 0, \"offset\": \"width*0.85\" }, \"y\": { \"value\": 0, \"offset\": \"height*1.2\" }, \"text\": { \"value\": \"Assume coupons paid in arrears and effective annual discount rate (conversion based on coupon frequency).\" } } } } ] }
Try it out
Change the discount rate to be lower than coupon rate, what do you find?
It's not difficult to find that, as it approaches the maturity, the bond price approaches the face value, regardless of whether the bond is traded at a premium or discount.(1)
Recall earlier we said that the longer the maturity, the lower the bond price. This is true because we are talking about the initial price at issue. For example, other things equal, the price of a 30-year bond is lower than the price of a 10-year bond.
Here, time is changing. There is a bond of a given maturity (e.g., 30 years), and we study how its price changes over time as we get close to the 30-year mark.
"},{"location":"finc50/fixed-income/bond-prices-and-yields/#a-more-improved-formula","title":"A more improved formula","text":"But we can still do better!
Question
So far we have been assuming the next coupon is exactly one period (e.g., year) in the future, or in other words, the last coupon has just been paid. What if this is not the case? What if the next annual coupon is in 2 months, not in 12 months, from now?
When the coupon payment date does not align with the time at which we compute the bond price, only a simple adjustment is required.
The basic idea is, since next coupon and all future payments are closer than what's assumed in computation, we have over-discounted the bond value. So, we can correct it by \"growing\" the undervalued price for the time since last coupon payment.
\\[ P_{t} = \\underbrace{\\left[\\sum_{\\tau=1}^{n} \\frac{C}{(1+r)^{\\tau}} + \\frac{F}{(1+r)^n}\\right]}_{\\text{bond price right after last coupon}} \\times (1+r)^{\\frac{\\text{days since last coupon}}{\\text{days between coupons}}} \\]As such, we can now derive a continuous path for the bond price since issue to maturity, assuming other things equal. This is shown in the next chart as a blue line.(1)
Attention!
Now imagine you are to buy a bond immediately before it matures. What would be the price according to the formula and the chart above?
No matter how much coupon the bond pays, the price (indicated by the the last bar) is the face value of the bond $10,000. After the purchase, however, you will immediately receive a total payment of bond face value and the last coupon payment, which surely is greater than $10,000.
Apparently, you need to pay more than the price described by the formula to the seller.
In fact, a bondholder starts to accumulate accrued interest the moment their own the bond. Even though they may sell the bond right before a coupon payment, but given that they have been holding the bond for almost entire the time until selling just before the next coupon payment, they should be given compensation for not receiving the next coupon, which will be paid to the buyer.
Further, we generalize this idea to bond transactions any time between coupon payments -- the buyer should compensate the seller additionally a coupon payment proportional to the time that the seller has been holding since last coupon payment relative to the time between two coupon payments.
{ \"$schema\": \"https://vega.github.io/schema/vega/v5.json\", \"description\": \"A chart of bond's dirty and clean prices over time till maturity, made by Mingze Gao\", \"width\": 700, \"height\": 300, \"title\": { \"text\": \"Bond Price, Dirty & Clean, Over Time From Issue To Maturity\", \"fontSize\": 18, \"anchor\": \"middle\" }, \"data\": [ { \"name\": \"table\", \"transform\": [ { \"type\": \"sequence\", \"as\": \"year\", \"start\": 0, \"step\": 0.01, \"stop\": 31 }, { \"type\": \"formula\", \"as\": \"price\", \"expr\": \"discountRate>0 ? (10000*couponRate*(1-pow(1+discountRate,-(maturityInYears-datum.year)))/discountRate+10000*pow(1+discountRate,-(maturityInYears-datum.year))) : 10000*(1+couponRate*(maturityInYears-datum.year))\" }, { \"type\": \"formula\", \"as\": \"dirtyprice\", \"expr\": \"datum.price+10000*couponRate*(datum.year-floor(datum.year))\" }, { \"type\": \"filter\", \"expr\": \"datum.year<=maturityInYears\" } ] } ], \"signals\": [ { \"name\": \"maturityInYears\", \"value\": 30, \"bind\": { \"input\": \"range\", \"min\": 1, \"max\": 30, \"step\": 1 } }, { \"name\": \"discountRate\", \"value\": 0.08, \"bind\": { \"input\": \"range\", \"min\": 0, \"max\": 0.2, \"step\": 0.0001 } }, { \"name\": \"couponRate\", \"value\": 0.05, \"bind\": { \"input\": \"range\", \"min\": 0, \"max\": 0.2, \"step\": 0.0001 } }, { \"name\": \"showDirtyPrice\", \"value\": \"true\", \"bind\": { \"input\": \"radio\", \"options\": [\"true\", \"false\"] } } ], \"scales\": [ { \"name\": \"x\", \"type\": \"linear\", \"domain\": { \"data\": \"table\", \"field\": \"year\", \"sort\": true }, \"range\": \"width\" }, { \"name\": \"y\", \"type\": \"linear\", \"domain\": { \"data\": \"table\", \"field\": \"price\" }, \"range\": \"height\" } ], \"axes\": [ { \"orient\": \"bottom\", \"scale\": \"x\", \"title\": \"Year\" }, { \"orient\": \"left\", \"scale\": \"y\", \"title\": \"Bond Price\" } ], \"marks\": [ { \"type\": \"rule\", \"encode\": { \"update\": { \"x\": { \"scale\": \"x\", \"value\": 0 }, \"y\": { \"scale\": \"y\", \"value\": 10000 }, \"x2\": { \"scale\": \"x\", \"signal\": \"maturityInYears\" }, \"y2\": { \"scale\": \"y\", \"value\": 10000 }, \"strokeWidth\": { \"value\": 2 }, \"strokeDash\": { \"value\": [8, 3] }, \"strokeCap\": { \"value\": \"round\" }, \"opacity\": { \"value\": 1 } } } }, { \"type\": \"line\", \"from\": { \"data\": \"table\" }, \"encode\": { \"update\": { \"x\": { \"scale\": \"x\", \"field\": \"year\" }, \"y\": { \"scale\": \"y\", \"field\": \"price\" } } } }, { \"type\": \"line\", \"from\": { \"data\": \"table\" }, \"encode\": { \"update\": { \"x\": { \"scale\": \"x\", \"field\": \"year\" }, \"y\": { \"scale\": \"y\", \"field\": \"dirtyprice\" }, \"strokeWidth\": { \"signal\": \"showDirtyPrice=='true'? 1: 0\" }, \"strokeDash\": { \"value\": [2, 2] }, \"strokeCap\": { \"value\": \"round\" }, \"opacity\": { \"value\": 1 }, \"stroke\": { \"value\": \"#d6001c\" } } } }, { \"type\": \"text\", \"from\": { \"data\": \"table\" }, \"encode\": { \"update\": { \"x\": { \"scale\": \"x\", \"signal\": \"maturityInYears\" }, \"y\": { \"scale\": \"y\", \"value\": 10000, \"offset\": -5 }, \"text\": { \"value\": \"Bond Face Value\" }, \"fontSize\": { \"value\": 12 }, \"align\": { \"value\": \"left\" }, \"baseline\": { \"value\": \"bottom\" }, \"fill\": { \"value\": \"black\" } } } }, { \"type\": \"text\", \"encode\": { \"enter\": { \"align\": { \"value\": \"right\" }, \"baseline\": { \"value\": \"bottom\" }, \"fill\": { \"value\": \"rgba(0, 0, 0, 0.2)\" }, \"fontSize\": { \"value\": 14 }, \"x\": { \"value\": 0, \"offset\": \"width*0.6\" }, \"y\": { \"value\": 0, \"offset\": \"height*1.2\" }, \"text\": { \"value\": \"Assume annual coupons paid in arrears and effective annual discount rate.\" } } } } ] }
The chart above shows the continuous path of bond price described by the formula in blue and the price including the additional compensation, i.e., the accrued interest. We names these two prices \"clean price\" and \"dirty price\", respectively.
And the accrued interest is given by
\\[ \\text{coupon} \\times \\frac{\\text{time since last coupon}}{\\text{time between coupons}} \\]which periodically increases and resets.
Day count convention
So far, we have not yet spent a single word on a bond's \"yield\", but instead been using \"discount rate\". What's the difference?
The answer is straightforward. A bond consists of a series of cashflows in the future, each of which may be discounted at different rates. For example, the coupon in 1 year may be discounted at a 5% discount rate, the coupon in 2 years may be discounted at 6%, and so on.
It turns out that, at any time \\(t\\), while a bond can have multiple future payments each discounted at different rates \\(\\{r_{\\tau}\\}\\), we can always find a single discount rate \\(\\color{red}y\\) which, when applied to all future payments, leads to the same bond price at the time:
\\[ P_{t} = \\underbrace{\\sum_{\\tau=1}^{n} \\frac{C}{(1+r_\\tau)^{\\tau}} + \\frac{F}{(1+r_n)^n}}_{\\text{each discounted at varying rates}} = \\underbrace{\\sum_{\\tau=1}^{n} \\frac{C}{(1+{\\color{red}y})^{\\tau}} + \\frac{F}{(1+{\\color{red}y})^n}}_{\\text{all discounted at the same rate}} \\]This single discount rate \\(\\color{red}y\\) is called the yield to maturity, or yield, of the bond at time \\(t\\).
If we plot bond prices against yields, at a given time, it's easy to see that they have a one-to-one mapping and a inverse non-linear relationship.
{ \"$schema\": \"https://vega.github.io/schema/vega/v5.json\", \"description\": \"A chart of bond's price and yield, made by Mingze Gao\", \"width\": 700, \"height\": 300, \"title\": { \"text\": \"Bond Price and Yield\", \"fontSize\": 18, \"anchor\": \"middle\" }, \"data\": [ { \"name\": \"table\", \"transform\": [ { \"type\": \"sequence\", \"as\": \"yield\", \"start\": 0.0, \"step\": 0.5, \"stop\": 20.5 }, { \"type\": \"formula\", \"as\": \"price\", \"expr\": \"datum.yield>0 ? (10000*couponRate*(1-pow(1+datum.yield/100,-maturityInYears))/(datum.yield/100)+10000*pow(1+datum.yield/100,-maturityInYears)) : 10000*(1+couponRate*maturityInYears)\" }, { \"type\": \"formula\", \"as\": \"price5\", \"expr\": \"datum.yield>0 ? (10000*0.05*(1-pow(1+datum.yield/100,-maturityInYears))/(datum.yield/100)+10000*pow(1+datum.yield/100,-maturityInYears)) : 10000*(1+0.05*maturityInYears)\" } ] }, { \"name\": \"scaledata\", \"source\": \"table\", \"transform\": [ { \"type\": \"formula\", \"as\": \"maxV\", \"expr\": \"max(datum.price, datum.price5*1.2)\" } ] } ], \"signals\": [ { \"name\": \"maturityInYears\", \"value\": 10, \"bind\": { \"input\": \"range\", \"min\": 1, \"max\": 30, \"step\": 1 } }, { \"name\": \"couponRate\", \"value\": 0.05, \"bind\": { \"input\": \"range\", \"min\": 0, \"max\": 0.1, \"step\": 0.0001 } } ], \"scales\": [ { \"name\": \"x\", \"type\": \"linear\", \"domain\": { \"data\": \"table\", \"field\": \"yield\", \"sort\": true }, \"range\": \"width\" }, { \"name\": \"y\", \"type\": \"linear\", \"domain\": { \"data\": \"scaledata\", \"field\": \"maxV\" }, \"range\": \"height\" } ], \"axes\": [ { \"orient\": \"bottom\", \"scale\": \"x\", \"title\": \"Yield (%)\", \"ticks\": false }, { \"orient\": \"left\", \"scale\": \"y\", \"title\": \"Bond Price\" } ], \"marks\": [ { \"type\": \"rule\", \"encode\": { \"update\": { \"x\": { \"scale\": \"x\", \"value\": 0 }, \"y\": { \"scale\": \"y\", \"value\": 10000 }, \"x2\": { \"scale\": \"x\", \"value\": 5 }, \"y2\": { \"scale\": \"y\", \"value\": 10000 }, \"strokeWidth\": { \"value\": 1 }, \"strokeDash\": { \"value\": [8, 3] }, \"strokeCap\": { \"value\": \"round\" }, \"opacity\": { \"value\": 1 } } } }, { \"type\": \"rule\", \"encode\": { \"update\": { \"x\": { \"scale\": \"x\", \"value\": 5 }, \"y\": { \"scale\": \"y\", \"value\": 0 }, \"x2\": { \"scale\": \"x\", \"value\": 5 }, \"y2\": { \"scale\": \"y\", \"value\": 10000 }, \"strokeWidth\": { \"value\": 1 }, \"strokeDash\": { \"value\": [8, 3] }, \"strokeCap\": { \"value\": \"round\" }, \"opacity\": { \"value\": 1 } } } }, { \"type\": \"line\", \"from\": { \"data\": \"table\" }, \"encode\": { \"update\": { \"x\": { \"scale\": \"x\", \"field\": \"yield\" }, \"width\": { \"scale\": \"x\", \"band\": 1 }, \"y\": { \"scale\": \"y\", \"field\": \"price\" }, \"tooltip\": { \"signal\": \"{ 'Bond Price': format(datum.price, '$,.2f') }\" } } } }, { \"type\": \"line\", \"from\": { \"data\": \"table\" }, \"encode\": { \"update\": { \"x\": { \"scale\": \"x\", \"field\": \"yield\" }, \"y\": { \"scale\": \"y\", \"field\": \"price5\" }, \"stroke\": { \"value\": \"#d6001c\" }, \"tooltip\": { \"signal\": \"{ 'Bond Price': format(datum.price5, '$,.2f') }\" } } } }, { \"type\": \"text\", \"from\": { \"data\": \"table\" }, \"encode\": { \"update\": { \"x\": { \"scale\": \"x\", \"value\": 20 }, \"y\": { \"scale\": \"y\", \"field\": \"price\", \"offset\": -5 }, \"text\": { \"signal\": \"format(datum.price, '$,.0f')+'@'+format(datum.yield,'.1f')+'%'\" }, \"fontSize\": { \"value\": 12 }, \"align\": { \"value\": \"left\" }, \"baseline\": { \"value\": \"bottom\" }, \"fill\": { \"value\": \"black\" } } } }, { \"type\": \"text\", \"encode\": { \"enter\": { \"align\": { \"value\": \"right\" }, \"baseline\": { \"value\": \"bottom\" }, \"fill\": { \"value\": \"rgba(0, 0, 0, 0.2)\" }, \"fontSize\": { \"value\": 14 }, \"x\": { \"value\": 0, \"offset\": \"width\" }, \"y\": { \"value\": 0, \"offset\": \"height*1.2\" }, \"text\": { \"value\": \"Assume $10,000 bond, annual coupons paid in arrears and effective annual discount rate.\" } } } } ] }
There are many interesting features about the bond price-yield relationship.
The one-to-one mapping between bond price and yield implies that we can always compute the other when given either of them. So, knowing either price or yield is sufficient when dealing with bonds.(1)
The inverse relationship suggests that the higher the yield, the lower the bond price.
The non-linearity suggests that the sensitivity of bond price to yield is not static. In fact, we can tell from the graph that the curve is convex. This convexity will be of great importance in later studies of bond risk.
"},{"location":"finc50/fixed-income/bond-prices-and-yields/#other-yields","title":"Other yields","text":"When we talk about a bond's yield, we usually refer to the yield to maturity. But there can be some other yield measures:
Portfolio of bonds
The yield of a portfolio of bonds is NOT a weighted average of individual bonds' yields because the bonds are not homogeneous.
"},{"location":"finc50/fixed-income/bond-prices-and-yields/#trouble-maker-floating-rate-bonds","title":"Trouble maker: floating-rate bonds","text":"We cannot easily compute the yield of a floater, simply because the values of reference rate in the future are unknown. Instead, we can use some spread measures that describe the yield in excess of the reference rate. The most popular measure of yield spread for a floating-rate bond is discount margin.
As its name suggests, discount margin basically captures the \"discount rate in excess of reference rate\".
Suppose the bond market price is \\(P_t\\), reference rate is assumed constant at \\(R\\), the discount margin \\(DM\\) is the one that solves the equation below:
\\[ P_{t} = \\sum_{\\tau=1}^{n} \\frac{C}{(1+R+{\\color{red}DM})^{\\tau}} + \\frac{F}{(1+R+{\\color{red}DM})^n} \\]Note that here the coupon payment \\(C\\) is determined by the reference rate \\(R\\) and the quoted margin.
"},{"location":"finc50/fixed-income/bond-yields-and-returns/","title":"Bond Yields and Returns","text":"Info
This post is still under construction.
"},{"location":"finc50/fixed-income/introduction/","title":"Introduction to Fixed Income Securities","text":""},{"location":"finc50/fixed-income/introduction/#what-are-fixed-income-securities","title":"What are fixed income securities?","text":"Fixed income securities are financial instruments that provide a fixed, or predictable, stream of income to investors. These securities typically take the form of bonds, but also include other investment types like certificates of deposit and preferred shares.
Fixed income securities are essentially loans made by an investor to an issuer. In exchange for the loan, the issuer agrees to pay the investor a specified rate of interest during the life of the bond and to repay the principal when it \"matures,\" or comes due.
Types of fixed income securities
By the type of issuer, there are
Government bondsCorporate bondsMunicipal bondsRisk-free rate
When we talk about \"risk-free rate\", we largely refer to the yield of such government bonds.
Governments issue bonds to borrow money. These bonds are considered among the safest investments, as they are backed by the taxing power of the government.
Companies also issue bonds to finance their operations or projects. Corporate bonds are considered higher risk than government bonds, but they also typically pay a higher rate of interest.
Municipal bonds are issued by cities, states, or other local entities for various public purposes. These bonds often have tax advantages, making them attractive to certain investors. (1)
In terms of underlying asset, there are
Asset-backed securities (ABS)Mortgage-backed securities (MBS)These are bonds backed by loan receivables other than real estate, such as credit card debt, auto loans, student loans, or even royalties from music. In the case of ABS, a pool of these non-mortgage assets is packaged and sold to investors as securities. The principal and interest payments made by the borrowers on these underlying loans are then passed through to the investors.
These are investment products backed by home and commercial mortgage loans. These loans are packaged into securities and sold to investors. Similar to ABS, the principal and interest payments made by the borrowers are passed through to the investors. However, MBS are directly tied to the mortgage industry and are susceptible to the performance of the housing market.
"},{"location":"finc50/fixed-income/introduction/#why-invest-in-fixed-income-securities","title":"Why invest in fixed income securities?","text":"Investors choose fixed income securities for several reasons:
Income: Fixed income securities provide regular interest payments, which can be an attractive source of income.
Preservation of capital: When the bond matures, the full principal amount is returned to the investor. This makes bonds appealing for those looking to preserve their capital.
Diversification: Including fixed income securities in a portfolio can help diversify investments and reduce risk.
Investors in fixed income securities come from a broad spectrum and include both individuals and institutions.
IndividualsInstitutionsIndividual investors, particularly those in or nearing retirement, often invest in fixed-income securities as a way to preserve capital and generate a steady stream of income.
Pension Funds: Pension funds invest heavily in fixed-income securities as they provide predictable returns which can be matched against their future payout obligations.
Insurance Companies: Like pension funds, insurance companies have long-term, predictable liabilities and thus invest significantly in fixed-income securities to match these liabilities.
Mutual Funds: There are many mutual funds, known as bond funds, that specialize in investing in fixed-income securities.
Banks and Financial Institutions: Banks and other financial institutions often invest in fixed-income securities as a way to generate a return on their excess capital and to help manage their interest rate risk.
Endowments and Foundations: These entities often include fixed-income securities in their portfolios for diversification and income generation.
Central Banks: Central banks often hold domestic and foreign fixed-income securities as a part of their reserves and as a tool for implementing monetary policy.
Each type of investor may have different investment objectives and constraints, and therefore might focus on different types of fixed-income securities (e.g., government bonds, corporate bonds, municipal bonds, etc.) based on their risk tolerance, income requirements, tax situation, and other factors.
Market size of fixed income securitiesSource of figure: SIFMA
\"Although they usually attract less attention than equity markets, fixed-income markets are more than three times the size of global equity markets\", CFA Institute.
"},{"location":"finc50/fixed-income/introduction/#features-of-a-bond","title":"Features of a bond","text":""},{"location":"finc50/fixed-income/introduction/#the-basics","title":"The basics","text":"In its simplest form, a bond may be specified by the following characteristics:
Example
On March 17, 2021, Microsoft (1) \"issued a $6,250,000,000 (2) aggregate principal amount of its 2.921% (3) Notes (5) due 2052 (4) (the \u201c2052 Notes\u201d)\".
Source: the firm's SEC filing.
The bond's indenture (1) also specifies that
The 2052 Notes will bear interest (computed on the basis of a 360-day year consisting of twelve 30-day months) from March 17, 2021 at the rate of 2.921% per annum, payable semi-annually in arrears.
... (1)
A sequence of cash flows between the bond issuer and investor is described below
sequenceDiagram\n autonumber\n Issuer (bond seller)-->>Investor (bond buyer): Bond\n Investor (bond buyer)->>Issuer (bond seller): Price of bond\n note right of Investor (bond buyer): We will figure out the price later\n\n loop every 6 months until 2052\n Issuer (bond seller)->>Investor (bond buyer): Coupon (1/2 of 2.921% of face value)\n end\n\n Issuer (bond seller)->>Investor (bond buyer): Return principal ($6,250,000,000)\n Investor (bond buyer)-->>Issuer (bond seller): Redeem bond
"},{"location":"finc50/fixed-income/introduction/#additional-things","title":"Additional things","text":"A bond can be secured in that the issuer can pledge certain assets (1) to \"secure\" the payments to investors. In case of defaults \ud83d\ude14, bondholders have a direct claim on the pledged assets.
An unsecured bond (or \"debenture\"), on the other hand, relies solely on the issuer's creditworthiness and ability to generate cash flow to repay bondholders. In case of defaults \ud83d\ude14, bondholders of unsecured bonds only have a claim on the issuer's general assets.
A bond has also a seniority. In case of defaults \ud83d\ude14, investors of more senior bonds can claim before investors of less senior bonds. (1)
A bond needs to specify the currency as well, along with many other things...
"},{"location":"finc50/fixed-income/introduction/#embedded-options","title":"Embedded options","text":"A bond is plain vanilla when it has no embedded options, which can add a lot \"flavour\".
Note
Embedded options in bonds refer to features or provisions that give either the bond issuer or the bondholder the right to take certain actions under specific circumstances.
Embedded options provide added flexibility to the bond's terms and can impact the bond's cash flows and overall value. The three main types of embedded options found in bonds are:
Call Option (Callable Bonds)Put Option (Puttable Bonds)Conversion Option (Convertible Bonds)sequenceDiagram\nautonumber\n participant Issuer\n participant Underwriter\n participant Investors\n\n rect rgba(0, 0, 255, .1)\n note left of Issuer: Primary Market\n Issuer->>Underwriter: Request Bond Underwriting\n Underwriter->>Issuer: Analyze Issuer's Creditworthiness\n Underwriter->>Investors: Offer Bonds for Sale\n Investors->>Underwriter: Express Interest in Buying Bonds\n Underwriter->>Issuer: Finalize Bond Terms and Pricing\n Investors->>Underwriter: Place Orders for Bonds\n Underwriter->>Investors: Issue Bonds to Investors\n end\n rect rgba(0, 0, 255, .1)\n note right of Investors: Secondary Market\n Investors->>Investors: Buy and Sell Bonds\n end
"},{"location":"finc50/stata/","title":"Stata Workshop","text":"Welcome
This series of introductory notes is prepared for the BUSS7902 Quantitative Business Research Methods at the University of Sydney Business School.
First, let's get familiar with Stata and see what we can do with it.
Take a side trip to see some amazing features of Stata.
Note
This is just to showcase one of the many amazing features of Stata.
Since Stata 15, we can search, browse and import almost a million U.S. and international economic and financial time series made available by the St. Louis Federal Reserve's Federal Research Economic Data. This post briefly explains this great feature.
","tags":["Stata"]},{"location":"finc50/stata/fred/#prerequisite","title":"Prerequisite","text":"Before you start, you will need an API Key from FRED. Register one here
Then in Stata, you can store this key permanently so you don't need to provide again.(1)
_key_
with your actual API Key obtained.set fredkey _key_, permanently\n
","tags":["Stata"]},{"location":"finc50/stata/fred/#gui-is-always-a-good-start","title":"GUI is always a good start","text":"Alternatively, click on menu File>Import>Fedearl Reserve Economic Data (FRED)
will bring up the dialog as shown below.
Enter API Key and you'll be free to explore all the data series available on FRED.
For example, let's see the CPI of Australia...
Describe the data series, we can find many useful meta info.
Vintage
Note that \"vintage\" section lists a number of dates, with each vintage referring a particular version of the data series at that point of time.
It may sound strange but an economic data series may be revised multiple times after it has been published. Potential reasons may be that later people collect more accurate information, or that there is a change of estimation method, etc.
For example, the CPI from 2005 to 2010 collected by a research as at 2011 may be different from the one collected as at 2023. Without specifying the data vintage, replicating a prior work can be hard.
Another tricky part is that ignoring vintages introduces look-ahead bias in analysis.
For example, a trading strategy using the revised GDP accessed today, instead of the vintage GDP, implicitly uses hindsight as the GDP series may have been revised to accomodate more accurate data obtained after release.
Let's close the description, double click on the series and click on import. Another dialog will be shown to confirm some final details.
The outputs will be like the following:
. import fred AUSCPIALLQINMEI, daterange(2010-01-01 2023-08-08) aggregate(quarterly,avg)\n\nSummary\n----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------\nSeries ID Nobs Date range Frequency\n----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------\nAUSCPIALLQINMEI 53 2010-01-01 to 2023-01-01 Quarterly\n----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------\n# of series imported: 1\n highest frequency: Quarterly\n lowest frequency: Quarterly\n
","tags":["Stata"]},{"location":"finc50/stata/fred/#programmatical-is-recipe-to-reproducibility","title":"Programmatical is recipe to reproducibility","text":"We don't need to go through the GUI process every time. In fact, Stata already told us what the corresponding command is:
import fred AUSCPIALLQINMEI, daterange(2010-01-01 2023-08-08) aggregate(quarterly,avg)\n
We can simply put this line of code into our program.
For example, the code below generates a time-series chart for Australia's CPI.
// Import\nimport fred AUSCPIALLQINMEI, daterange(2010-01-01 2023-03-31) vintage(2023-05-10) aggregate(quarterly,avg) clear\nrename AUSCPIALLQINMEI_20230510 cpi_australia\n// Time format\ngen yrqtr = yq(year(daten),quarter(daten))\nformat yrqtr %tq\ntsset yrqtr\n// Set start of the period to 100\ngen cpi_ret = cpi_australia/L.cpi_australia - 1\nreplace cpi_australia = 100 if _n==1\nreplace cpi_australia = L.cpi_australia * (1+cpi_ret) if _n>1\n// Plotting\ntwoway (tsline cpi_australia), title(\"Quarterly CPI of Australia 2010Q1-2023Q1\") ytitle(\"\") ttitle(\"\") note(\"Index 2010Q1=100. Source: FRED, 2023-05-10 vintage.\")\n
Note
The code snippet above specifies the data vintage. Therefore, even if someone runs it 30 years from now, they will still get exactly the same data and plot as I do in 2023.
","tags":["Stata"]},{"location":"finc50/stata/introduction/","title":"Stata - Introduction","text":"Stata is a powerful statistical analysis software that we often use in empirical researches. This series of posts aims to provide some basic knowledge for junior researchers to get started with Stata, as well as some personal tips on more efficiently using Stata in research projects.1
","tags":["Stata"]},{"location":"finc50/stata/introduction/#stata-gui","title":"Stata GUI","text":"The Graphical User Interface (GUI) of Stata looks like this:
This is the default layout of Stata 16. Some preset preferences can be found via menu option Edit>Preferences>Load preference set
. You can also save your personal preferences (color theme, layout, etc.) in the Edit>Preference
menu.
Of the many windows of the GUI, we are mostly interested in the Results window (Ctrl+2
) where outputs are displayed. While all other windows can be hidden or closed, the Results window always remains center of the GUI.
The Command window (Ctrl+1
) is where we mostly interact with Stata by entering Stata commands. Since usually we need a group of commands to complete a task, it is a good idea to place them together in a do file (with the extension .do
), which is the file type native to Stata, just like .py
file to Python. Later, we will introduce the Stata's Do Editor (Ctrl+9
) as a nice editor for .do
files.
Tip
As a researcher, keeping good record of the programs and codes used is a merit. For oneself, it boosts productivity as more codes are accumulated. Beyond that, it ensures all results can be replicated even years later. Nowadays, more and more top journals also require submission of the codes used in the paper.
","tags":["Stata"]},{"location":"finc50/stata/introduction/#basic-demonstration","title":"Basic demonstration","text":"Now let's start tying our first Stata command in the Command window.
","tags":["Stata"]},{"location":"finc50/stata/introduction/#working-directory","title":"Working directory","text":"First, let's type pwd
(and hit Eneter
), which is a command to display the current working directory. Knowing the current working directory, for example, allows you to use relative path correctly.
Throughout this series, I'll follow the tradition and prefix all Stata commands with .
, hence . pwd
meaning \"enter the pwd
command in the Command window\":
. pwd\n
From the Results window, we can see a line of text like \"C:\\Users\\mgao\\Documents\", which is the output of executing the pwd
command, i.e., the current working directory of Stata on my PC.
We can change the current working directory to another directory on the computer via the command cd
, for example:
. cd \"C:\\Users\\mgao\\Dropbox (Sydney Uni)\\BUSS7902 Stata\"\n
Then we can verify that it's indeed changed by pwd
again.
The above two examples (pwd
and cd
) already showcase the basic syntax structure of Stata commands. With a few exceptions, a Stata command is like:
. _commmand_ _parameter1_ _parameter2_ ... , _options_\n
or technically,
cmd [varlist | namelist | anything] [if] [in] [using filename] [= exp] [weight] [, options]
where cmd
is the name of a command and everything in [ ]
is optional.
display
prints a message to the Results window:
. display \"hello world!\"\n
cls
clears the Results window:
. cls\n
clear
clears memory, removing the dataset loaded, if any:
. clear\n
log
echos a copy of the session to file:
Create a log file name stata101.log
in the current working directory. The replace
option asks Stata to replace the log file if it already exists. Untill . log close
, everything displayed in Results will be saved in the log file.
. log using \"stata101.log\", replace\n. display \"hello from Mingze\"\n. log close\n
cmdlog
is similar to log
but records only the commands but no results.
Of course, Stata is famous for its superior statistical analysis. Let's see how regressions can be easily done in Stata.
","tags":["Stata"]},{"location":"finc50/stata/introduction/#load-dataset","title":"Load dataset","text":"We start by loading an example dataset that comes with Stata installation. This can be done via sysuse
command. We use the dataset named \"auto\":
. sysuse auto\n
Tip
Stata comes with several builtin datasets. Use . sysuse dir
to have a look.
More generally, we can use
our own datasets. We'll see more on this later.
We can now ask Stata to describe
the meta information of the dataset and summarize
the variables in terms of number of observations, mean, standard deviation, etc.
The screenshot below shows the output:
Enlarge the output of describe
:
. describe\n\nContains data from C:\\Program Files\\Stata16\\ado\\base/a/auto.dta\n obs: 74 1978 Automobile Data\n vars: 12 13 Apr 2018 17:45\n (_dta has notes)\n--------------------------------------------------------------------------------\n storage display value\nvariable name type format label variable label\n--------------------------------------------------------------------------------\nmake str18 %-18s Make and Model\nprice int %8.0gc Price\nmpg int %8.0g Mileage (mpg)\nrep78 int %8.0g Repair Record 1978\nheadroom float %6.1f Headroom (in.)\ntrunk int %8.0g Trunk space (cu. ft.)\nweight int %8.0gc Weight (lbs.)\nlength int %8.0g Length (in.)\nturn int %8.0g Turn Circle (ft.)\ndisplacement int %8.0g Displacement (cu. in.)\ngear_ratio float %6.2f Gear Ratio\nforeign byte %8.0g origin Car type\n--------------------------------------------------------------------------------\nSorted by: foreign\n
","tags":["Stata"]},{"location":"finc50/stata/introduction/#run-regression","title":"Run regression","text":"Suppose we'd like to estimate a simple liner regression to study the relation between car price and mileage, headroom and weight:
\\[ price = \\alpha + \\beta_1 mpg + \\beta_2 headroom + \\beta_3 weight + \\varepsilon \\]All we need to do is a simple line of code,
. regress price mpg headroom weight\n
which would generate the following estimation results:
","tags":["Stata"]},{"location":"finc50/stata/introduction/#save-results","title":"Save results","text":"One of the coolest things Stata can do is to export the tabulated regression results to Microsoft Word, PDF, LaTeX and more.
","tags":["Stata"]},{"location":"finc50/stata/introduction/#save-to-word","title":"Save to Word","text":"For example, we can save the previous results (as shown in the screenshot above) to a Word document named \"table1\" (table1.docx
) easily with the following three lines of codes.
. putdocx begin\n. putdocx table mytable = etable\n. putdocx save table1.docx, replace\n
Behind the scene, Stata creates a .docx
to work with. putdocx table
command creates a new table (mytable) in the .docx
file containing estimation results (etable
tells it to tabulates the coefficients from previous estimation). Lastly, Stata saves the .docx
file as \"table1.docx\" in the current working directory.
If working with LaTeX, you can export results as TeX files conveniently too.
For this purpose, though, an additional Stata package estout
is required. Personally I'd say this is gold. You can install estout
package via a single command in Stata:
. ssc install estout, replace\n
Now, you can use the following two lines of codes:
. eststo: regress price mpg headroom weight\n. esttab using \"table1.tex\", tex replace label star(* 0.10 ** 0.05 *** 0.01) nogaps compress\n
to produce a TeX file (\"table1.tex\") with the following content:
{\n\\def\\sym#1{\\ifmmode^{#1}\\else\\(^{#1}\\)\\fi}\n\\begin{tabular}{l*{1}{c}}\n\\hline\\hline\n&\\multicolumn{1}{c}{(1)}\\\\\n&\\multicolumn{1}{c}{Price}\\\\\n\\hline\nMileage (mpg) & -56.19 \\\\\n& (-0.66) \\\\\nHeadroom (in.) & -675.6\\sym{*} \\\\\n& (-1.72) \\\\\nWeight (lbs.) & 2.062\\sym{***}\\\\\n& (3.13) \\\\\nConstant & 3158.3 \\\\\n& (0.87) \\\\\n\\hline\nObservations & 74 \\\\\n\\hline\\hline\n\\multicolumn{2}{l}{\\footnotesize \\textit{t} statistics in parentheses}\\\\\n\\multicolumn{2}{l}{\\footnotesize \\sym{*} \\(p<0.10\\), \\sym{**} \\(p<0.05\\), \\sym{***} \\(p<0.01\\)}\\\\\n\\end{tabular}\n}\n
You can check the PDF compiled from the above TeX code at this Overleaf link.
Note
We will revisit and elaborate on these topics later. I deliberately make them oversimplified only to show you what Stata can do in making our lives much easier.
I prepare this series of introductory course notes for the BUSS7902 Quantitative Business Research Methods for PhD students at the University of Sydney Business School in Semester 1, 2023.\u00a0\u21a9
If you've ever used Python, you may know that it's famous for its simplicity and the many packages available for use. Good news is, Stata is no different. We can install a wide range of Stata packages easily and then use them to achieve a ton of things.
This post briefly explains where to find, how to install and update Stata packages.
","tags":["Stata"]},{"location":"finc50/stata/packages/#stata-packages_1","title":"Stata packages","text":"In a nutshell, installing and using Stata packages are as simple as the following two lines of code (and a line of output):
. ssc install nicewords\n. nicewords\nAbsolutely excellent!\n
Specifically, we use the builtin command ssc
to install a package named nicewords
in the first line, and then execute the command nicewords
in the second line, which randomly prints some nice words.
Generally, a package can provide one or more Stata commands to use, depending on the complexity of the task it solves.
","tags":["Stata"]},{"location":"finc50/stata/packages/#where-to-find-stata-packages","title":"Where to find Stata packages","text":"","tags":["Stata"]},{"location":"finc50/stata/packages/#ssc-statistical-software-components-archive","title":"ssc
- Statistical Software Components Archive","text":"Stata packages are hosted at the Statistical Software Components (SSC) Archive, which is often called the Boston College Archive and provided by http://repec.org. This explains the example above where we used the command ssc
to manage (install) packages.
We can find recently added packages with . ssc new
, and the top 10 most popular packages on SSC with . ssc hot
. In fact, the top 10 for December 2022 are:
net
- e.g., GitHub","text":"Apart from SSC, some packages are available on other websites like GitHub. A growing trend is that package authors publish their code repositories on GitHub, which contain the devlopment version of the packages.
","tags":["Stata"]},{"location":"finc50/stata/packages/#how-to-install-and-update-packages","title":"How to install and update packages","text":"ssc install
is pretty much all we need. For example, to install the package reghdfe
:
. ssc install reghdfe\n
For packages outside SSC, we can install them using net
. As an example, I have a package specurve
on GitHub, which can be installed by:
. net install specurve, from(\"https://raw.githubusercontent.com/mgao6767/specurve/master\")\n
To update an existing package, we can add the option replace
to the above command:
. ssc install reghdfe, replace\n. net install specurve, replace from(\"https://raw.githubusercontent.com/mgao6767/specurve/master\")\n
Alternatively, we can use ado update
:
. ado update, update // for community-contributed packages\n. ado update, update ssconly // for SSC only\n
","tags":["Stata"]},{"location":"finc50/stata/packages/#some-packages-of-my-choice","title":"Some packages of my choice","text":"","tags":["Stata"]},{"location":"finc50/stata/packages/#reghdfe-and-ivreghdfe","title":"reghdfe
and ivreghdfe
","text":"reghdfe
is among the top 10 Stata packages as we've seen above. It allows for multiple fixed effects in linear regressions, while the builtin xtreg
allows only one fixed effect. It's gold!
ivreghdfe
is essentially reghdfe
plus ivreg2
, which allows us to include multiple fixed effects in instrumental variable regressions.
estout
and outreg2
","text":"estout
is also a top 10 Stata package that provides tools to make regression tables. We've seen an example from the previous post. I highly recommend, too!
outreg2
does a similar job in a simpler way. Yet if we want finer controls estout
is perhaps better, in my humble opinion.
winsor
and winsor2
","text":"Data is often noisy with extreme values or impossible values recorded by mistake. In some fields of research, we try to mitigate such concern by winsorization. Note that they may yield different results due to their different approaches in determining percentile values.
","tags":["Stata"]},{"location":"finc50/stata/working-with-datasets/","title":"Stata - Working with datasets","text":"Recap
We can use the command sysuse
to use builtin datasets, and use
to load other external datasets.
In the introduction, we briefly mentioned how to load Stata datasets to use. Now, let's take a more in-depth look at how we work with datasets in Stata.
","tags":["Stata"]},{"location":"finc50/stata/working-with-datasets/#datasets-here-and-there","title":"Datasets, here and there","text":"Datasets are stored at different places, locally on our computer's hard disk or remotely on a server. For Stata to use them, we need to load them into Stata, or putting them into memory.
Because Stata commands (e.g., summarize
, describe
, count
, etc.) operate on the current dataset in memory, working simultaneously on multiple datasets was painful -- one needs to save current, load the other dataset, perform tasks and save/load again. But since Stata 16, a feature called frame
is introduced, where different datasets can be loaded into memory at the same time, but in different \"frames\". The chart below gives a simple illustration.
flowchart LR\n subgraph Stata\n direction TB\n Engine\n Engine -->|frame change| default & frame2 & frame3 & ...\n subgraph default\n end\n subgraph frame2\n end\n subgraph ...\n end\n subgraph frame3\n end\n end\n\n\n\n default -->|sysuse| auto.dta\n frame2 -->|use| /User/mgao/Desktop/anotherDataset.dta\n frame3 -->|use| http://www.stata-press.com/data/r13/nlswork.dta\n\n subgraph network\n http://www.stata-press.com/data/r13/nlswork.dta\n end\n subgraph local\n /User/mgao/Desktop/anotherDataset.dta\n end\n subgraph builtin\n auto.dta\n end
Although we still can only operate Stata commands on a single frame/dataset at a given time, we no longer need to save/load datasets as they all reside in memory frames.
Tip
We as beginners can be agnostic about frame
, especially when dealing with only one dataset throughout. Technically, we are loading data into the default
frame (1), and work in the default frame.
default
frame is just a frame named \"default\".dta
","text":"The dta
is Stata's proprietary binary data file format, and is the default file format used by Stata.
What I like very much about the dta
data format include:
dta
files can store different types of variables, including numeric variables (e.g., integers, floats) and string variables (text). It can represent missing values too.dta
files can store metadata, such as variable labels (descriptive names for variables), value labels (labels for specific variable values), and variable formats (e.g., date formats).dta
files created in one version of Stata can generally be read by other versions of Stata, ensuring cross-platform compatibility.Also, dta
files are typically compressed to reduce file size and optimize storage.
More importantly, Stata provides commands (use
, save
, etc.) to read data from dta
files into memory and save data from memory to dta
files. This makes it extremely easy to work with.
For example, you can easily load a Stata dataset online to your Stata via use
:
use \"http://www.stata-press.com/data/r13/nlswork.dta\", clear\n
Note that , clear
option tells Stata to clear the memory (1) in case there is already some other dataset in it.
Alternatively, you can download the nlswork.dta
dataset to your computer, and load it from your local computer:
use \"/Users/mgao/Downloads/nlswork.dta\", clear\n
use \"C:\\Users\\mgao\\Downloads\\nlswork.dta\", clear\n
After some work on the dataset, say, keeping only observations where year
is 88,
keep if year==88\n
we can save the modified dataset either to its original place, overwriting the original dataset, or to a different place, creating a new dataset:
Mac / Linux Windowssave \"/Users/mgao/Downloads/nlswork.dta\", replace\n
save \"C:\\Users\\mgao\\Downloads\\nlswork.dta\", replace\n
","tags":["Stata"]},{"location":"finc50/stata/working-with-datasets/#mighty-csv","title":"Mighty csv
","text":"We all love csv
or \"comma-separated-values\" files. They are simple and readable without requiring any special software.(1) Many datasets are also published online in csv
format.
csv
files, in case you didn't know...What if we want to save a csv
version of the dataset? Easy, we use export
command:
export delimited using \"/Users/mgao/Downloads/nlswork.csv\", replace\n
export delimited using \"C:\\Users\\mgao\\Downloads\\nlswork.csv\", replace\n
Of course, we can import the csv
file back to Stata using import
command:
import delimited using \"/Users/mgao/Downloads/nlswork.csv\", clear \n
import delimited using \"C:\\Users\\mgao\\Downloads\\nlswork.csv\", clear\n
In some rare cases where the text file is not delimited/separated by comma, we can manually specify the delimiter. For example, some datasets use \"tab-separated-values\" or tsv
format:
import delimited \"path/to/datafile.tsv\", delimiter(tab)\n
","tags":["Stata"]},{"location":"finc50/stata/working-with-datasets/#did-someone-say-excel","title":"Did someone say \"Excel\"?","text":"Stata gets you covered. import excel
is all you need.
For example, we next are to use an Excel spreadsheet named \"BUSS7902 Chapter 4A Lecture (Data).xlsx\".
Before everything, we can ask Stata to describe the file:
. import excel \"~/Downloads/BUSS7902 Chapter 4A Lecture (Data).xlsx\", describe\n Sheet | Range\n -----------------+-----------------\n Magic Box | A1:C101\n Assembly | A1:H76\n Distance | A1:L42\n Insurance+Survey | A1:H1501\n
As shown, the spreadsheet contains four sheets of different names, \"Magic Box\" and so on. Let's say we are interested in the data in the \"Magic Box\" sheet,(1) we can instruct Stata to load data from the sheet and optionally specify the data range in the sheet.(2)
firstrow
option to tell Stata to treat the first row as variable name, not value of an observation.. import excel \"~/Downloads/BUSS7902 Chapter 4A Lecture (Data).xlsx\", firstrow sheet(\"Magic Box\") cellrange(A1:A101) clear\n(1 var, 100 obs)\n
And that's it! Stata will take care of the variable type and etc., and is pretty good at it most of the time.
","tags":["Stata"]},{"location":"finc50/stata/working-with-datasets/#so-you-want-more-frames","title":"So you want moreframe
s?","text":"Tip
This is for the tech-savvy. You don't need frame
almost surely.
Okay. So you noticed that every time we import/use a new dataset, we set the clear
option to clear the memory, discarding whatever dataset we currently work on to make room for new new dataset. This is troublesome. What if we don't want to give up the intermediate results while taking a peak at the different dataset?
We make a new frame
and load the new dataset into the new frame.(1)
Let's have a look first at what frames are currently there:
. frame list\n* default 100 x 1\nNote: Frames marked with * contain unsaved data.\n
","tags":["Stata"]},{"location":"finc50/stata/working-with-datasets/#create-a-new-frame","title":"Create a new frame","text":"We create a new frame
to which a new dataset can be loaded without clearing the existing dataset in the default frame. We can name it whichever we like, say, \"assembly\":
frame create \"assembly\"\n
Now check again the frames, we can see it is indeed there.
. frame list\n assembly 0 x 0\n* default 100 x 1\nNote: Frames marked with * contain unsaved data.\n
","tags":["Stata"]},{"location":"finc50/stata/working-with-datasets/#change-to-the-new-frame","title":"Change to the new frame","text":"Let's now switch to the newly created \"assembly\" frame, leaving the \"Magic Box\" data untouched in the default frame.
frame change assembly\n
Tip
Forgot which frame you are in?
. frame\n(current frame is assembly)\n
You may notice that the Variables window is now blank, showing that there is no variable in this frame. Rest assured that the x
variable we imported earlier from \"Magic Box\" sheet stays in memory and in the default frame.
We can now load data into this empty frame using the same methods as discussed above.
For example, I can now load the data in the \"Assembly\" sheet into the frame.
. import excel \"~/Downloads/BUSS7902 Chapter 4A Lecture (Data).xlsx\", firstrow sheet(\"Assembly\") cellrange(A1:A76) clear case(lower)\n(1 var, 75 obs)\n
If we check the frames, we can see that now both datasets exist in memory, albeit in two frames.
. frame list\n* assembly 75 x 1\n* default 100 x 1\nNote: Frames marked with * contain unsaved data.\n
To go back to the original frame (named \"default\"), use frame change default
.
This post is just another piece of my serious nonsense. All of a sudden, I wanted to know how many Bitcoins I could have mined since 2012? This is because I\u2019ve known Bitcoin since its existence in 2009, but have never really put any effort in mining. Instead, I was fascinated by the idea of using distributed (volunteer) computing to solve scientific problems. For example, BOINC and related projects like World Community Grid are using the computing power donated from around the world to find effective treatments for cancer and HIV/AIDS, low-cost water filtration systems and new materials for capturing solar energy efficiently, etc. I was one of the many volunteers for a long time, even before the genesis block of Bitcoin.
An interesting question is, what if I didn\u2019t donate my computers to volunteer computing, but used them in Bitcoin instead? How many Bitcoins I could have mined? To solve this question, I started from looking at my contribution history of the World Community Grid (it\u2019s awesome that the full history is available).
According to WCG\u2019s website, 7 WCG Point are equal to 1 BOINC credit, which represents 1/100 day of CPU time on a reference computer that does 1,000 MFLOPS based on the Whetstone benchmark.
However, the definition of BOINC credit has been changed to 1/200 day of CPU time since 2010, though on WCG\u2019s website it still says the total WCG Points divided by 700 gives the number of GigaFLOPs. I\u2019m going to stick to the WCG\u2019s website for now.
Suppose I\u2019ve got one WCG Point today, then it means my computer has spent 1/700 day of CPU time, i.e. 123 seconds, at a computing rate of 1 GigaFLOP/second. So, if I can convert GigaFLOPs to Bitcoin hashrate, the problem will be quite easy.
However, FLOPs cannot be converted to hashrate in a simple manner as Bitcoin hashes are about integer math, totally different from floating point operations. I\u2019m just going to use a very rough estimate that 1 hash results 12.7k FLOPs (source: BitcoinTalk thread, CoinDesk), so that
1 WCG Point implies mining at a speed of 78.7kH/s for 123 seconds. -- a very rough estimate
Then, if I received 1k Points a day, it might be safe to say I\u2019ve been mining for about 123k seconds at a speed of 78.7kH/s, which translates to an average daily hashrate of 112kH/s or 0.112MH/s.
I did some math and found that it seems in June 2012 my hashrate was as high as 0.006% of the whole network, though one year later it\u2019s effectively 0%. lol.
Next step will be calculating how many Bitcoins I could have mined based on the hashrate history.
Taking into account the average block time and the controlled supply of Bitcoin (table below), I plot the daily average number of blocks and Bitcoins generated in this period.
Date reached Block Reward Era BTC/block End BTC % of Limit 2009-01-03 0 1 50.00 12.500% 2010-04-22 52500 1 50.00 25.000% 2011-01-28 105000 1 50.00 37.500% 2011-12-14 157500 1 50.00 50.000% 2012-11-28 210000 2 25.00 56.250% 2013-10-09 262500 2 25.00 62.500% 2014-08-11 315000 2 25.00 68.750% 2015-07-29 367500 2 25.00 75.000% 2016-07-09 420000 3 12.50 78.125% 2017-06-23 472500 3 12.50 81.250% 2018-05-29 525000 3 12.50 84.375%Based my average hashrate and the historical network hashrate, the plot below shows how many Bitcoins I could have mined if I didn\u2019t denote my computers\u2019 computing power to the World Community Grid but to Bitcoin mining \u2013 14.8 Bitcoins!
Okay, problem solved.
If I\u2019ve really mined these 14.8 Bitcoins, then I\u2019ll probably have a shot at becoming a millionaire, if again I could hold them and time the market perfectly. At Bitcoin\u2019s highest historical price in Australian dollar, 14.8 Bitcoins are roughly 380,505 dollars. Even if I follow the redefined BOINC credit, I still could have mined half of the 14.8 Bitcoins and potentially pocketed 190k dollars.
I\u2019ve also participated in more than just World Community Grid, including some famous ones like SETI@Home and Einstein@Home. Below are two certificates of contributed computing power.
So together I\u2019ve put in about 2.28 quintillion, or 2.28E18, FLOPs into these two projects.
The funny thing is that I\u2019ve put in only 348 PetaFLOPs into World Community Grid during this entire period, or 0.348 quintillion FLOPs in total.
Hence, if my donation of computing power to SETI@Home and Einstein@Home happened at similar time as to WCG, then potentially I could have mined at least 6 times more Bitcoins. Well, I couldn\u2019t imagine what my life would be if I\u2019ve mined 100 Bitcoins, which might be $2.5 million.
","tags":["Bitcoin"]},{"location":"posts/accumulator-option-pricing/","title":"Accumulator Option Pricing","text":"An accumulator is a financial derivative that is sometimes known as \"I kill you later\". This post attempts to explain how it is structured and price it via Monte Carlo simulations in Python.
","tags":["Option","Simulation","Python"]},{"location":"posts/accumulator-option-pricing/#1-overview-of-accumulator","title":"1. Overview of Accumulator","text":"Like all derivatives, there are two parties involved in an accumulator, the buyer and the seller, both agree on a strike price that is usually at a discount to the prevailing market price of the underlying security at the time of contract origination.
The accumulator is settled periodically throughout its term. At each settlement:
Let's make up an example so as to illustrate how it works.
","tags":["Option","Simulation","Python"]},{"location":"posts/accumulator-option-pricing/#21-month-0","title":"2.1. Month 0","text":"Suppose that I bought an accumulator from Sherry the seller today, where the underlying security is TSC (hypothetical ticker), currently trading at $100. The strike price is $90 and the knock-out price is $105. The amount of stocks that I can buy is 1,000 in each settlement. The accumulator lasts for 6 months and settles monthly.
","tags":["Option","Simulation","Python"]},{"location":"posts/accumulator-option-pricing/#22-month-1","title":"2.2. Month 1","text":"At the end of month 1, the market price of TSC is $102, which is between the strike price ($90) and the knock-out price ($105). I can buy 1,000 shares from Sherry at the strike price of $90 each and make a profit of \\((\\$102-\\$90)\\times 1000=\\$12,000\\).
","tags":["Option","Simulation","Python"]},{"location":"posts/accumulator-option-pricing/#23-month-2","title":"2.3. Month 2","text":"At the end of month 2, the market price of TSC is $95, which is between the strike price ($90) and the knock-out price ($105). I can buy 1,000 shares from Sherry at the strike price of $90 each and make a profit of \\((\\$95-\\$90)\\times1000=\\$5,000\\).
","tags":["Option","Simulation","Python"]},{"location":"posts/accumulator-option-pricing/#24-month-3","title":"2.4. Month 3","text":"At the end of month 3, the market price of TSC is $85, which is below the strike price ($90). I have to buy 2,000 shares from Sherry at the strike price of $90, making a loss of \\((\\$90-\\$85)\\times2000=\\$10,000\\).
","tags":["Option","Simulation","Python"]},{"location":"posts/accumulator-option-pricing/#25-month-4","title":"2.5. Month 4","text":"At the end of month 4, the market price of TSC is $88, which is below the strike price ($90). I have to buy 2,000 shares from Sherry at the strike price of $90, making a loss of \\((\\$90-\\$88)\\times2000=\\$4,000\\).
","tags":["Option","Simulation","Python"]},{"location":"posts/accumulator-option-pricing/#26-month-5","title":"2.6. Month 5","text":"At the end of month 5, the market price of TSC is $106, which is above the knock-out price, so the contract is terminated immediately. I cannot make any profit from Sherry any longer.
","tags":["Option","Simulation","Python"]},{"location":"posts/accumulator-option-pricing/#3-some-observations","title":"3. Some Observations","text":"In the example above:
All these taken together, we can find that the buyer has:
But this is not the full story. Another hidden feature is that while the accumulator is terminated when the share price is above the knock-out price, the contract does not terminate when the buyer is at a loss until it matures. So, even though the maximum losses of both the buyer and the seller are fixed, but they differ significantly and disproportionately.
If so, why would anyone become interested in buying the contract? Potentially it's because the strike is set to be lower at market price, therefore at the beginning the buyers always feel like they are taking advantages. They may also think that once the price increases to above the knock-out level, which might be set to slightly higher than market price, the contract is terminated so they are free of any loss.
However, the buyers often underestimate the probability of a price decline and how big the impact it will have on accumulator buyers. The \"I kill you later\" earns its name for a reason.
","tags":["Option","Simulation","Python"]},{"location":"posts/accumulator-option-pricing/#4-some-math","title":"4. Some Math ...","text":"Let's make some notations.
So at each settlement, the payoff matrix conditional on the contract not terminated in the previous settlement is:
Share Price Buyer's Payoff Seller's Payoff \\(S_t>K^+\\) 0 0 \\(K\\le S_t\\le K^+\\) \\(A(S_t-K)\\ge0\\) \\(-A(S_t-K)\\le0\\) \\(S_t<K\\) \\(c A(S_t-K)<0\\) \\(-cA(S_t-K)>0\\)However, deriving a closed-end analytical solution is not easy since there are many settlements in the contract and the total payoff is path-dependent (the knock out). There is a conference paper in 2009 discussing the issue and the PDF version is available here.
","tags":["Option","Simulation","Python"]},{"location":"posts/accumulator-option-pricing/#5-a-simulation-approach","title":"5. ... A Simulation Approach","text":"I am to going to use Monte Carlo simulations to find out the distribution of buyer's payoff.
","tags":["Option","Simulation","Python"]},{"location":"posts/accumulator-option-pricing/#51-assumptions","title":"5.1. Assumptions","text":"For simplicity, I'm going to make the following assumptions:
Then, there are only two variables: \\(k\\) and \\(\\sigma\\) that I will need to vary!
","tags":["Option","Simulation","Python"]},{"location":"posts/accumulator-option-pricing/#52-core-code","title":"5.2. Core Code","text":"The simulation code I write below leverages Numba to speed up the calculation.
For 1 million simulations per pair of \\((k, \\sigma)\\), it takes about 2 seconds on my laptop with JIT and almost 1 minute without it.
import numpy as np\nfrom collections import OrderedDict\nfrom numba import int32, float32\nfrom numba.experimental import jitclass\n@jitclass(OrderedDict({\n'times': int32,\n'strike_price': float32,\n'knock_out_price': float32,\n'volatility': float32\n}))\nclass FastSimulation:\ndef __init__(self, times, strike_price, knock_out_price, volatility):\nself.times = times\nself.strike_price = strike_price\nself.knock_out_price = knock_out_price\nself.volatility = volatility\ndef run(self):\nnp.random.seed(1)\nbuyer_payoffs = []\nfor i in range(self.times):\n# generate 12 monthly returns from a normal distribution\n# written this way as size parameter is not supported by numba\nreturns = [np.random.normal(\nloc=0, scale=self.volatility)/100 + 1 for _ in range(12)]\n# convert returns to a price array\nprices = np.asarray(returns).cumprod() * 100\npayoff = 0\nfor price in prices:\n# the accumulator is terminated immediately\nif price > self.knock_out_price:\nbreak\npayoff += self.buyer_payoff(price)\nbuyer_payoffs.append(payoff)\nreturn buyer_payoffs\ndef buyer_payoff(self, share_price):\n\"Buyer payoff conditional on the accumulator not terminated\"\nif share_price > self.knock_out_price:\nreturn 0\npayoff = 1000 * (share_price - self.strike_price)\nif self.strike_price <= share_price <= self.knock_out_price:\nreturn payoff\nelse:\nreturn payoff * 2\n
","tags":["Option","Simulation","Python"]},{"location":"posts/accumulator-option-pricing/#53-results","title":"5.3. Results","text":"Numbers are boring. So here I put two plots showing the distribution of the buyer's payoffs. The Python code to generate the plots is as below (1000 simulations).
import plotly.figure_factory as ff\nimport plotly.graph_objs as go\nfrom plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot\ninit_notebook_mode(connected=True)\nhist_data, group_labels = [], []\nfor v in range(1, 6):\nhist_data.append(FastSimulation(times=1_000, strike_price=95, knock_out_price=105, volatility=v).run())\ngroup_labels.append(f'Volatility = {v}%')\ncolors = ['#75b0ec', '#338be3', '#34669c', '#344054', '#161c25']\n# Create distplot with curve_type set to 'normal'\nfig = ff.create_distplot(hist_data, group_labels, show_hist=False, colors=colors, curve_type=\"kde\")\n# Add title\nfig['layout'].update(title='Accumulator With Strike Price of 95 and Knock-Out Price of 105 | MingzeGao', \nxaxis=dict(title=\"Buyer's Payoff\", range=[-700e3, 300e3]), yaxis=dict(title='Probability'))\n# Plot!\niplot(fig)\n
","tags":["Option","Simulation","Python"]},{"location":"posts/accumulator-option-pricing/#531-k5-and-vin-15","title":"5.3.1. \\(k=5\\) and \\(v\\in [1..5]\\)","text":"","tags":["Option","Simulation","Python"]},{"location":"posts/accumulator-option-pricing/#532-k10-and-vin-15","title":"5.3.2. \\(k=10\\) and \\(v\\in [1..5]\\)","text":"","tags":["Option","Simulation","Python"]},{"location":"posts/accumulator-option-pricing/#6-discussion","title":"6. Discussion","text":"Apparently, the accumulator is a very interesting and sometimes evil derivative. From the plots above we can notice several things:
Hence, as a buyer of an accumulator, you win small with low volatility but lose big and huge with high volatility. I don't think any rational investor would like to take the long position. However, we do find exceptions, like CITIC Limited lost HK$15 billion in accumulators back in 2008.
","tags":["Option","Simulation","Python"]},{"location":"posts/adding-another-factor-to-principal-agent-model/","title":"Adding Another Factor to Principal-Agent Model","text":"In a traditional principal-agent model, firm output is a function of the agent's effort and the principal observes only the output not agent's effort. The principal carefully designs the agent's compensation package, especially the sensitivity of the agent's pay to firm output, to maximize the firm value. Now, what if we add another factor to the relationship between firm output and agent's effort? How would the optimal pay sensitivity change?
My earlier paper studied this issue by assuming such a factor, organization capital, that substitutes agent's effort in improving firm output. I find that if firm output is a function of two substituting factors (of which one is agent's effort), the optimal sensitivity of agent's pay to firm output can be both higher or lower, depending on the principal's choice.
To yield this two-way prediction, let's see a simple extension to the standard principal-agent model following Holmstrom and Milgrom (1987), where the principal hires an agent (CEO) to run the firm. We added organization capital (OC) as an additional determinant of firm outcomes, but in fact we can assume any factor, e.g., intellectual property, IT infrastructure, etc., that either strengthens or weakens the relation between firm output and executive effort.
The production function is given by \\(V(a,o)=f(a,o)+\\varepsilon\\), where \\(a\\) is the effort by the agent, \\(o\\) is the firm\u2019s organization capital, and \\(\\varepsilon\\) is random noise.
The agent is paid a wage \\(c(V)\\) and has reservation utility of \\(\\underline U\\). His objective function is given by \\(E\\left[U\\right]=E\\left[u\\left(v\\left(c\\right)-g\\left(a\\right)\\right)\\right]\\).
The function \\(u\\) represents his utility function and \\(v\\) represents his felicity function (i.e., his utility from cash), both increasing and weakly concave.
The functions \\(g\\), \\(u\\) and \\(v\\) are all twice continuously differentiable.
The risk-neutral principal chooses the effort level \\(a\\) and contract \\(c\\) to maximize the expected firm value minus the wage paid to the agent,
\\[ \\max_{c(\\cdot),a} E\\left[V\\left(a,o\\right)-c\\left(V\\left(a,o\\right)\\right)\\right] \\]subject to the individual rationality or participation constraint (IR) and incentive compatibility constraint (IC) as follows:
\\[ E\\left[u\\left(v\\left(c\\left(V\\left(a,o\\right)\\right)\\right)-g\\left(a\\right)\\right)\\right] \\ge \\underline{U} \\\\ a \\in \\arg\\max_{\\hat{a}}E\\left[u\\left(v\\left(c\\left(V\\left(\\hat{a},o\\right)\\right)\\right)-g\\left(\\hat{a}\\right)\\right)\\right] \\]We first consider the case where the optimal effort is determined endogenously. Under the Holmstrom and Milgrom (1987) framework, the following assumptions are made:
Further, Holmstrom and Milgrom (1987) show that the problem is equivalent to a single-period static problem under these assumptions. For simplicity, we also assume a quadratic cost of effort, \\(g(a)=\\frac{1}{2}ga^2\\), so that the principal\u2019s optimization problem becomes:
\\[ \\max_{\\phi,\\theta,a^*} E\\left[V-c\\right] \\]subject to
\\[ E\\left[-e^{-\\eta\\left(c-\\frac{1}{2}ga^{*2}\\right)}\\right] \\ge \\underline{U} \\\\ a^* \\in \\arg\\max_{\\hat{a}} E\\left[-e^{-\\eta\\left(c-\\frac{1}{2}g\\hat{a}^2\\right)}\\right] \\]Substituting in \\(c=\\phi+\\theta V\\) and \\(V(a,o)=f(a,o)+\\varepsilon\\), maximizing the agent\u2019s (negative exponential) utility function is equivalent to maximizing \\(\\phi+\\theta f(a,o)-\\frac{1}{2}ga^2-\\frac{1}{2}\\eta \\theta^2 \\sigma^2\\).
Since \\(f(a,o)=a+(1-a)o\\), the first-order condition of the agent\u2019s objective function with respect to a is given by \\(a^*=\\theta(1-o)\u2044g\\), which implies his effort choice is decreasing in the cost of effort \\(g\\), decreasing in the firm\u2019s organization capital \\(o\\), and increasing in the pay-for-performance sensitivity \\(\\theta\\).
Moreover, his chosen effort is independent of the fixed wage \\(\\phi\\), so that the principal can adjust the fixed pay to satisfy his participation constraint without affecting the incentives. Substituting \\(a^*=\\theta(1-o)\u2044g\\) into the principal\u2019s objective function and setting the participation constraint to bind, the optimal level of pay-for-performance sensitivity is given by:
\\[ \\theta = \\frac{1}{1+\\eta g \\frac{\\sigma^2}{(1-o)^2}} \\]Note
This optimal level of pay-for-performance sensitivity is derived as follows. Substituting \\(c=\\phi + \\theta V\\) and \\(V(a,o)=f(a,o)+\\varepsilon\\) into the agent's objective function of \\(E\\left[U\\right]=E\\left[u\\left(v(c)-g(a)\\right)\\right]\\), where \\(u(x)=-e^{-\\eta x}\\), \\(v(c)=c\\), and \\(\\varepsilon \\sim N(0,\\sigma^2)\\), we obtain:
\\[ E\\left[U\\right] = E\\left[e^{-\\eta \\left(\\phi+\\theta f(a,o)+\\theta\\varepsilon-\\frac{1}{2}ga^2\\right)}\\right] \\\\ = -E\\left[e^{-\\eta \\left(\\phi+\\theta f(a,o)-\\frac{1}{2}ga^2\\right)}\\right] \\times E\\left[e^{-\\eta \\theta \\varepsilon}\\right]\\\\ = -e^{-\\eta \\left(\\phi+\\theta f(a,o)-\\frac{1}{2}ga^2\\right)} \\times e^{\\frac{\\eta^2 \\theta^2 \\sigma^2}{2}} \\\\ = -e^{-\\eta \\left(\\phi+\\theta f(a,o)-\\frac{1}{2}ga^2-\\frac{1}{2}\\eta \\theta^2 \\sigma^2\\right)} \\]The first-order condition (FOC) of the agent with respect to \\(a\\) is given by
\\[ \\frac{\\partial}{\\partial a}\\left(\\phi+\\theta f(a^*,o)-\\frac{1}{2}ga^{*2}-\\frac{1}{2}\\eta\\theta^2\\sigma^2\\right)=0 \\]Since we assume \\(f(a,o)=a+(1-a)o\\), this yields the agent's FOC:
\\[ a^*=\\theta(1-o)/g \\]Setting the participation constraint to bind, we have
\\[ E\\left[U\\right] = -e^{-\\eta\\left(\\phi+\\theta\\left(a^*+(1-a^*)o\\right)-\\frac{1}{2}ga^{*2}-\\frac{1}{2}\\eta\\theta^2\\sigma^2\\right)} = \\underline{U} \\]The above equation implies:
\\[ \\phi + \\theta\\left(a^*+\\left(1-a^*\\right)o\\right)-\\frac{1}{2}ga^{*2}-\\frac{1}{2}\\eta\\theta^2\\sigma^2=w \\]where \\(w\\equiv -\\ln(-\\underline{U})/\\eta\\) is a constant determined by the agent's reservation utility and his coefficient of constant absolute risk aversion. Substituting in \\(a^*=\\theta(1-o)/g\\), we yield
\\[ E\\left[c\\right] = \\phi + \\theta E\\left[V\\right] \\\\ = w+\\frac{\\theta^2(1-o)^2}{2g} +\\frac{1}{2}\\eta\\theta^2\\sigma^2 \\]Thus, by substituting \\(a^*=\\theta(1-o)/g\\) into the principal\u2019s objective function \\(E\\left[V-c\\right]\\), we yield
\\[ a^*+(1-a^*)o-\\left[w+\\frac{\\theta^2(1-o)^2}{2g}+\\frac{1}{2}\\eta\\theta^2\\sigma^2\\right] \\\\ =\\frac{\\theta}{g}(1-o)^2+o-\\left[w+\\frac{\\theta^2(1-o)^2}{2g}+\\frac{1}{2}\\eta\\theta^2\\sigma^2\\right] \\]The principal's FOC with respect to \\(\\theta\\) yields:
\\[ \\theta = \\frac{1}{1+\\eta g \\frac{\\sigma^2}{(1-o)^2}} \\]Other things equal, we can see that the optimal pay-for-performance sensitivity \\(\\theta\\) is decreasing in the firm\u2019s organization capital \\(o\\). Specifically, this substitution effect is from the fact that OC reduces the marginal effect of executive effort on firm outcomes and thus reduces the optimal effort level endogenously.
On the other hand, fixing \\(a^*\\), the required pay-for-performance sensitivity is \\(\\theta=(a^* g)/(1-o)\\), which is increasing in organization capital \\(o\\). Thus, to elicit any given level of effort, the incentive compensation must be more high-powered (\"fixed target action\" as in Edmans and Gabaix (2016).
The relation between OC and executive pay-for-performance sensitivity depends critically on the optimal level of effort the principal wants to implement:
Therefore, the model offers two empirical predictions. On the one hand, high OC firms may offer higher pay-for-performance sensitivity to induce executive effort. On the other hand, pay-for-performance sensitivity may be reduced in high OC firms as a result of efficiency gains from the substitution of OC for executive effort.
Now, coming back at the question at the beginning, adding another factor to the principal-agent model may cause the optimal pay structure to change in either direction, even if such factor has a directional impact on the relation between firm output and agent's effort. In our case, such factor reduces the marginal effect of agent's effort on firm output. But one can easily find many other factors that may increase the marginal effect of agent's effort and yield similar predictions.
Perhaps, what's also interesting is that, if we know the directional effect of a factor while observing both pay-for-performance sensitivity and the level of such factor, we may be able to infer whether the principal elicits full executive effort at all costs. Paired with firm performance, could this be some indicators of governance or board ability? Seems like some future research questions.
This post is adapted from the online appendix of my JBF paper \"organization capital and executive performance incentives\".
","tags":["Organization Capital","Principal-Agent model"]},{"location":"posts/beta-unlevered-and-levered/","title":"Beta - Unlevered and Levered","text":"Beta is a measure of market risk. This post tries to explain the unlevered and levered betas.
","tags":["Beta"]},{"location":"posts/beta-unlevered-and-levered/#unlevered-firm-u","title":"Unlevered Firm u","text":"If a firm has no debt, it's all equity-financed and thus its equity's beta \\(\\beta_{E}\\) equals its asset's beta \\(\\beta_{A}\\). This beta is also the unlevered beta, \\(\\beta_{\\text{unlevered}}\\), since it's unaffected by leverage. The unlevered beta measures the market risk exposure of the firm's shareholders. Let's call this firm \\(u\\), Hence, we have:
\\[\\begin{equation} \\beta_{\\text{unlevered}}=\\beta_E^u=\\beta_A^u \\end{equation}\\]This equality says that in an unlevered firm, the unlevered beta equals its equity beta and its asset beta.
","tags":["Beta"]},{"location":"posts/beta-unlevered-and-levered/#levered-firm-l","title":"Levered Firm l","text":"If the same firm is partly financed by debt, let's call it firm \\(l\\). The asset of the levered firm \\(l\\) is financed by both equity and debt, and hence the asset's market risk is from both equity and debt. The asset's beta is a weighted average of its equity beta and debt beta.
\\[\\begin{equation} \\beta_A^l = \\frac{E}{E+D(1-t)} \\beta_E^l + \\frac{D(1-t)}{E+D(1-t)} \\beta_D^l \\end{equation}\\]\\(\\beta_A^l\\) measures the change in the return on a portfolio of all firm \\(l\\)'s securities (debt and equity) for each additional one percent change in the market return.
This part is not very hard to understand. The beta of a portfolio is the weighted average beta of its constituents. If you believe that debt beta is zero since the value of debt may not be affected by the equity market, then \\(\\beta_D^l=0\\) and the equation (2) can be simplified to:
\\[ \\begin{align} \\beta_A^l &= \\frac{E}{E+D(1-t)} \\beta_E^l \\newline &= \\frac{1}{1+\\frac{D}{E}(1-t)} \\beta_E^l \\end{align} \\]However, this firm's shareholders are now more exposed to the market risk than before, because leverage increases the variation in the payoff to shareholders. This means the equity's beta of this levered firm is higher than the equity's beta of the unlevered firm, i.e. \\(\\beta_E^l>\\beta_E^u\\).
Note that, the levered beta \\(\\beta_{\\text{levered}}\\) that we talk about refers to \\(\\beta_E^l\\), which is the equity beta of the levered firm \\(l\\).
","tags":["Beta"]},{"location":"posts/beta-unlevered-and-levered/#unlevered-vs-levered","title":"Unlevered vs Levered","text":"On the other hand, firm \\(u\\) and firm \\(l\\) differ only in capital structure whilst both have the same asset. Let's say we have a portfolio of firm \\(u\\)'s asset and the other portfolio of firm \\(l\\)'s asset, then these two portfolios should have the same expected return and market risk exposure.2 This means the two portfolios have the same beta, implying:
\\[\\begin{equation}\\beta_A^u = \\beta_A^l \\end{equation}\\]If we substitue in the definition of unlevered and levered beta (equation (1) and (4)):
\\[ \\begin{equation} \\beta_{\\text{unlevered}} = \\frac{1}{1+\\frac{D}{E}(1-t)} \\beta_{\\text{levered}} \\end{equation} \\]or
\\[ \\begin{equation} \\beta_{\\text{levered}} = \\left( 1+\\frac{D}{E}(1-t) \\right) \\beta_{\\text{unlevered}} \\end{equation} \\]This is the formula that we use to lever and unlever beta.1
","tags":["Beta"]},{"location":"posts/beta-unlevered-and-levered/#further-clarification","title":"Further Clarification","text":"The equity beta of a firm with debts is levered. To remove the impact of leverage on shareholders' market risk exposure, we need to unlever this beta in order to get the unlevered beta. This unlevered beta is also called the asset beta.
Note that the asset beta is a syncronym for unlevered beta. It is not, however, the asset's beta \\(\\beta_A^l\\) when the firm is leveraged as in equation (2) to (4). This convention is confusing indeed, so throughout this post, I'm using asset's beta to refer to the beta of a portfolio of all securities (debt and equity) of the levered firm.
","tags":["Beta"]},{"location":"posts/beta-unlevered-and-levered/#notations","title":"Notations","text":"This eq.(7) is also named Hamada Equation, where we assumed a zero debt beta. It draws on the Modigliani-Miller theorem on capital structure, and appeared in Prof. Robert Hamada's paper \"The Effect of the Firm's Capital Structure on the Systematic Risk of Common Stocks\" in the Journal of Finance in 1972.\u00a0\u21a9
Modigliani-Miller theorem states that the capital structure should not affect a firm's value.\u00a0\u21a9
Never underestimate what programmers can do.
The code below shows a fully-functioning Bitcoin address generator in obfuscated Python (2.5-2.7), which I saw in an interesting article posted in 2013.
_ =r\"\"\"A(W/2,*M(3*G\n *G*V(2*J%P),G,J,G)+((M((J-T\n )*V((G-S)%P),S,T,G)if(S@(G,J))if(\n W%2@(S,T)))if(W@(S,T);H=2**256;import&h\n ashlib&as&h,os,re,bi nascii&as&k;J$:int(\n k.b2a_hex(W),16);C$:C (W/ 58)+[W%58]if(W@\n [];X=h.new(\"rip em d160\");Y$:h.sha25\n 6(W).digest();I$ d=32:I(W/256,d-1)+\n chr(W%256)if(d>0@\"\"; U$:J(k.a2b_base\n 64(W));f=J(os.urando m(64)) %(H-U(\"AUVRIxl\nQt1/EQC2hcy/JvsA=\"))+ 1;M$Q,R,G :((W*W-Q-G)%P,\n(W*(G+2*Q-W*W)-R)%P) ;P=H-2** 32-977;V$Q=P,L=\n1,O=0:V(Q%W,W,O-Q/W* L,L)if(W@O%P;S,\nT=A(f,U(\"eb5mfvncu6 xVoGKVzocLBwKb/Nst\nzijZWfKBWxb4F5g=\"), U(\"SDra dyajxGVdpPv8DhEI\nqP0XtEimhVQZnEfQj/ sQ1Lg=\"), 0,0);F$:\"1\"+F(W\n [1:])if(W[:1 ]==\"\\0\"@\"\" .join(map(B,C(\n J(W))));K$: F(W +Y(Y(W))[:4]);\n X.update(Y(\"\\4\"+ I(S)+I(T)));B$\n :re.sub(\"[0OIl _]| [^\\\\w]\",\"\",\"\".jo\n in(map(chr,ra nge (123))))[W];print\"Addre\n ss:\",K(\"\\0\"+X.dig est())+\"\\nPrivkey:\",K(\n \"\\x80\"+I(f))\"\"\";exec(reduce(lambda W,X:\nW.replace(*X),zip(\" \\n&$@\",[\"\",\"\",\n\" \",\"=lambda W,\",\")else \"])\n,\"A$G,J,S,T:\"+_))\n
I\u2019ve tested it on Python 2.7 on Ubuntu. Working like a charm.
Warning
Don't use this address! The private key is not private!
","tags":["Bitcoin","Python"]},{"location":"posts/bloomberg-bquant/","title":"Bloomberg BQuant (BQNT)","text":"Bloomberg is developing a new function in the Terminal, called BQuant, BQNT, under the Bloomberg Anywhere license. I happen to be able to test it thanks to a fund manager and find it could be a future way of using Bloomberg Terminal.","tags":["Bloomberg","BQNT","Python","Quant"]},{"location":"posts/bloomberg-bquant/#background","title":"Background","text":"
Bloomberg recently made JupyterLab available inside the Terminal and invited partners to test it out. This function is named BQuant, or BQNT<GO>, which is still under heavy development, but the idea is just great. Jupyter notebooks inside Bloomberg Terminal! Just before this news, I was helping a fund manager in writing some alert programs that do some analysis on equity market and then send email notifications, which didn\u2019t go well because first it is very easy to breach the data limit using Bloomberg API (blpapi) and second I wasn\u2019t very comfortable about the presentation of analysis results. I was using poor HTML code in emails and didn\u2019t find a convenient way to insert plots and figures. Besides, I was also writing some back testing code to evaluate potential trading strategies. But still there\u2019s a concern as I won\u2019t be working there full time and they probably won\u2019t have a permanent programmer, so if they want to alter parameters a little bit it\u2019ll be a problem.
But things happen, with BQNT or more specifically the Jupyter notebook, I can make an interactive UI-based application without worrying about the data limit issue, as they also provide a new data retrieval interface, BQL, Bloomberg Query Language. In the past, pulling data through blpapi is basically retrieving data from the Terminal. But BQL, something like SQL, is to submit the query request to Bloomberg\u2019s server and get the data directly from server, which also enables basic calculations so as to further reduce the size of data being pulled out. Then, BQNT comes with pre-installed bqplot and some wrappers of libraries like ipwidgets, which makes visualization much easier and interactive. As BQNT is a customized JupyterLab, output cells can be maximized and code hided. The result is just like a single-page application.
The tearsheet above shows some basic features of BQNT, and of course there are more. There\u2019s a gallery in the Terminal with several demos showing what BQNT can make, including portfolio performance report, security filtering, trading strategy back test, etc., quite inspiring.
With a quick play, I was able to write a multi-security back test of William %R based strategy with trailing stop. All input parameters can be varied using sliders, dropdowns, calendars and etc. There is also an autocomplete security selection widget to assist you in defining the universe. Plots and tables can be aligned nicely using HBox and VBox\u2026 So, I\u2019m impressed, really.
I can foresee that in the future, users of Bloomberg Terminal can have BQNT powered applications tailored to their needs. For example, I want to know the stock volatility and price plot together with some commodity futures orderbook info. BQNT may give you the app. But of course, I\u2019ve only a rough guess and there could be many possibles and impossibles ahead of BQNT. I\u2019m a big fan, though.
","tags":["Bloomberg","BQNT","Python","Quant"]},{"location":"posts/bloomberg-bquant/#my-work","title":"My Work","text":"","tags":["Bloomberg","BQNT","Python","Quant"]},{"location":"posts/bloomberg-bquant/#bql-for-data-retrieval","title":"BQL for Data Retrieval","text":"We know there\u2019s a blpapi available already. Using this API one can pull data from a Terminal to Excel, Python, etc. But there is a limit on the frequency or total queries allowed in a certain period, which however isn\u2019t clear. As Bloomberg doesn\u2019t allow local storage of its data, if we need to retrieve a sizeable data too many times, there will be an issue.
The good thing about BQNT is that it comes with a new query system \u2013 so called BQL. It allows simple calculations done on the server side so as to reduce the size of data transferred. And, people in Bloomberg said, by using BQL we are not very likely to face any data limit issue again. I haven\u2019t done much stress tests so I can\u2019t tell whether there is still a limit or not.
","tags":["Bloomberg","BQNT","Python","Quant"]},{"location":"posts/bloomberg-bquant/#some-quick-examples","title":"Some Quick Examples","text":"Get all component stocks of an index:
import bql\nbq = bql.Service()\nsecurities = bq.univ.members('AS31 Index')\n
Get OHLC data of all component stocks:
from bql.util import get_time_series\nstart_date = '2017-01-01'\nend_date = '2018-01-01'\ndata = get_time_series(securities, ['PX_LAST', 'PX_OPEN', 'PX_HIGH', 'PX_LOW'], start_date, end_date)\n
If I want to know the industry sector of these stock, all I need is:
req = bql.Request(securities, bq.data.industry_sector())\ndata_industry = bql.combined_df(bq.execute(req))\n
The returned data is a pandas.DataFrame
, which is just awesome!
Jupyter Notebook has always been a favourite environment in data science. No need to say much. A JupyterLab inside Bloomberg Terminal together with BQL, basically the core idea of BQNT, is no doubt fantastic. For quants who need to do a lot of testings on trading ideas, filtering of securities, etc., this integrated environment is absolutely a good place to sort everything out. Moreover, files in BQNT are synced under a BBA license, you can easily pick up your work from any Terminal. In our meeting today, the size of this free cloud storage is said to be about 250MB but may be upgraded.
For fund managers or traders who want only a ready-to-use application, they can have some programmers to make one for them. The BQNT team kindly demonstrated a beta feature, where a \u2018consumer view\u2019 can be shared to others, which hides all Jupyter Notebook related parts and is really the final output alone \u2014 just like the Calculator on Windows.
","tags":["Bloomberg","BQNT","Python","Quant"]},{"location":"posts/bloomberg-bquant/#the-r-backtesting-app","title":"The %R Backtesting App","text":"This App I wrote replicates BT<GO> in its back testing outputs, but comes with more flexibility such as trailing stop loss, which isn\u2019t available in BT<GO>. It serves as a demo of BQNT powered application, validating current beta.
The objectives of the app are:
The main UI provides a short description of the trading strategy under back test, followed by a control panel where we can specify benchmark, underlying, time range, % parameters as well as trailing stop loss percentage. I also put a progress bar and status bar below for more immediate feedback.
","tags":["Bloomberg","BQNT","Python","Quant"]},{"location":"posts/bloomberg-bquant/#outputs","title":"Outputs","text":"If the underlying selected is a single security, e.g. CBA AU Equity, the simple back test output is something like below. An InteractiveLinePlot
linked with a subplot to show equity evolution in selection; a LinePlot
for the price series of the security with markers for enters and exits; and a LinePlot
for the %R indicator.
If the underlying selected is an index, e.g. AS31 Index, the back test is performed on each individual component of the index and results are presented below. A KDEPlot
shows the distribution of total return, max return and min return, followed by a ToggleButtons
to show All, Positive only and Negative only. Equity Return by industry sector and the benchmark return are sorted and plotted below.
Then there is the detailed DataGrid
for all calculated metrics of all securities and of each industry sectors, just like the output in BT<GO>. Results can be exported to a spreadsheet which will be conveniently stored in the BQNT platform, or the \u2018cloud\u2019 of size 250MB in total. A qualitative summary of this particular back test is provided in the end.
This App is by no means a finished work. I basically tried to mix in as many different things as possible. The end product should be one such that provides a condensed and conclusive opinion after each run, considering that its users may be those fund managers who do not want to get their hands dirty.
","tags":["Bloomberg","BQNT","Python","Quant"]},{"location":"posts/bloomberg-bquant/#other-thoughts","title":"Other Thoughts","text":"In my chat with Bloomberg BQNT team, I visioned BQNT powered apps may be the future way of using Bloomberg. For one, with more internal integration worked out, like the current one with PORT<GO>, surely users can use these UI-based apps to get jobs done. The good thing is that it can put everything you need together in one place, and only those you need. Once consumer view is rolled out, this will be more evident. They also are developing a scheduling module which will run Notebooks automatically, although at an additional cost.
Another thing I suggested is a marketplace for those BQNT powered apps. Say, I\u2019ve developed a market analysis application on BQNT, maybe I can put it for sale on the marketplace so someone else won\u2019t need to reinvent the wheel. It can also foster a community around BQNT, if any. The only downside is that BQNT is accessible only under BBA licence, which isn\u2019t cheap. Individual programmers / quants may not be able to afford it, and those in big institutions may not have the time and right to build and sell apps on it. This kinda sucks.
I can see the huge potential of BQNT, which if operates well can be the new way of using Bloomberg Terminal \u2014 the learning curve of Terminal is really too steep for many current and potential users, and they don\u2019t get very much out of it. But, if there are many ready-to-use UI-based applications for their customised needs, things definitely will be better. Unfortunately, BQNT is not open-source, and the access to it is very limited (BBA licence), I don\u2019t believe there will be an active community and hence a marketplace of a variety of apps.
","tags":["Bloomberg","BQNT","Python","Quant"]},{"location":"posts/call-option-value-from-two-approaches/","title":"Call Option Value from Two Approaches","text":"Suppose today the stock price is \\(S\\) and in one year time, the stock price could be either \\(S_1\\) or \\(S_2\\). You hold an European call option on this stock with an exercise price of \\(X=S\\), where \\(S_1<X<S_2\\) for simplicity. So you'll exercise the call when the stock price turns out to be \\(S_2\\) and leave it unexercised if \\(S_1\\).
","tags":["Option"]},{"location":"posts/call-option-value-from-two-approaches/#1-replicating-portfolio-approach","title":"1. Replicating Portfolio Approach","text":"Case 1 Case 2 Stock Price \\(S_1\\) \\(S_2\\) Option: 1 Call of cost \\(c\\) Exercise? No Yes Payoff (to replicate) 0 \\(S_2-X\\) Stock: \\(\\delta\\) shares of cost \\(\\delta S\\) Payoff \\(\\delta S_1\\) \\(\\delta S_2\\) Borrowing PV(K) Repay K KSo we have:
\\[ \\begin{equation} \\delta S_1-K=0 \\end{equation} \\] \\[ \\begin{equation} \\delta S_2 -K = S_2-X \\end{equation} \\]Therefore, the call option value is given by the difference between the cost of \\(\\delta\\) units of shares and the amount of borrowing:
\\[ \\begin{align} c_{REP} &= \\delta S - PV(K) \\newline &= \\delta S - Ke^{-r_f} \\newline &= \\delta S - \\delta S_1e^{-r_f} \\end{align} \\]When \\(\\delta\\) is defined as \\(\\frac{(S_2-X)-0}{S_2-S_1}\\) as in the textbook (at introductory level),
\\[ \\begin{equation} c_{REP}= \\frac{S_2-X}{S_2-S_1}(S - S_1e^{-r_f}) \\end{equation} \\]","tags":["Option"]},{"location":"posts/call-option-value-from-two-approaches/#2-risk-neutral-approach","title":"2. Risk Neutral Approach","text":"Without too much trouble, we can derive the call value using risk neutral approach as
\\[ \\begin{align} c_{RN} &= \\frac{p(S_2-X)+(1-p)\\times0}{e^{r_f}}\\newline &= \\frac{p(S_2-X)+0}{e^{r_f}}\\newline &= p(S_2-X) e^{-r_f} \\end{align} \\]We know that
\\[ \\begin{equation} p\\times \\frac{S_2}{S} + (1-p)\\frac{S_1}{S} = e^{r_f} \\end{equation} \\]so
\\[ \\begin{align} p &= \\frac{e^{r_f}-\\frac{S_1}{S}}{\\frac{S_2}{S}-\\frac{S_1}{S}}\\newline &=\\frac{Se^{r_f}-S_1}{S_2-S_1} \\end{align} \\]Therefore,
\\[ \\begin{align} c_{RN} &= p(S_2-X) e^{r_f}\\newline &=\\frac{Se^{r_f}-S_1}{S_2-S_1}(S_2-X) e^{-r_f}\\newline &=\\frac{S-S_1e^{-r_f}}{S_2-S_1}(S_2-X) \\end{align} \\]","tags":["Option"]},{"location":"posts/call-option-value-from-two-approaches/#identical-result-from-the-two-methods","title":"Identical Result from the Two Methods","text":"It's easy to find that
\\[ c_{RN} = c_{REP} \\]Hence, the call option value from replicating portfolio is the same as from risk neutral approach.
","tags":["Option"]},{"location":"posts/compute-jackknife-coefficient-estimates-in-sas/","title":"Compute Jackknife Coefficient Estimates in SAS","text":"In certain scenarios, we want to estimate a model's parameters on the sample for each observation with itself excluded. This can be achieved by estimating the model repeatedly on the leave-one-out samples but is very inefficient. If we estimate the model on the full sample, however, the coefficient estimates will certainly be biased. Thankfully, we have the Jackknife method to correct for the bias, which produces the Jackknifed coefficient estimates for each observation.
","tags":["SAS","Code","Discretionary Accruals","Jackknife"]},{"location":"posts/compute-jackknife-coefficient-estimates-in-sas/#variable-definition","title":"Variable Definition","text":"Let's start with some variable definitions to help with the explanation.
Variable Definition \\(b(i)\\) the parameter estimates after deleting the \\(i\\)th observation \\(s^2(i)\\) the variance estimate after deleting the \\(i\\)th observation \\(X(i)\\) the \\(X\\) matrix without the \\(i\\)th observation \\(\\hat{y}(i)\\) the \\(i\\)th value predicted without using the \\(i\\)th observation \\(r_i = y_i - \\hat{y}_i\\) the \\(i\\)th residual \\(h_i = x_i(X'X)^{-1}x_i'\\) the \\(i\\)th diagonal of the projection matrix for the predictor space, also called the hat matrix \\(RStudent =\\frac{r_i}{s(i) \\sqrt{1-h_i}}\\) studentized residual \\((X'X)_{jj}\\) the \\((j,j)\\)th element of \\((X'X)^{-1}\\) \\(DFBeta_j = \\frac{b_{j} - b_{(i)j}}{s(i)\\sqrt{(X'X)_{jj}}}\\) the scaled measures of the change in the \\(j\\)th parameter estimate calculated by deleting the \\(i\\)th observation","tags":["SAS","Code","Discretionary Accruals","Jackknife"]},{"location":"posts/compute-jackknife-coefficient-estimates-in-sas/#objective","title":"Objective","text":"Compute the coefficient estiamtes with the \\(i\\)th observation excluded from the sample, i.e. \\(b(i)\\), or the Jackknifed coefficient estimate.
","tags":["SAS","Code","Discretionary Accruals","Jackknife"]},{"location":"posts/compute-jackknife-coefficient-estimates-in-sas/#formula","title":"Formula","text":"From the table above, we can get that the \\(j\\)th Jackknifed coefficient estimate \\(b_{(i)j}\\) without using the \\(i\\)th observation is:
\\[b_{(i)j} = b_j - DFBeta_j \\times s(i) \\sqrt{(X'X)_{jj}} \\]Hence,
\\[b_{(i)j} = b_j - DFBeta_j \\times \\frac{r_i}{RStudent\\times \\sqrt{1-h_i}} \\sqrt{(X'X)_{jj}}\\]The good thing is that PROC REG
produces the coefficient estimate \\(b_j\\) for \\(j=1,2,...K\\), where \\(K\\) is the number of coefficients, and the INFLUENCE
and I
options produce the remaining statistics just enough to compute \\(b(i)\\):
PROC REG
or MODEL
statement Name in the output dataset \\(b_j\\) Outest=
option in PROC REG
<jthVariable>
\\(r_i\\) OutputStatistics=
from INFLUENCE
option in MODEL
statement Residual
\\(RStudent\\) OutputStatistics=
from INFLUENCE
option in MODEL
statement RStudent
\\(h_i\\) OutputStatistics=
from INFLUENCE
option in MODEL
statement HatDiagnol
\\(DFBeta_j\\) OutputStatistics=
from INFLUENCE
option in MODEL
statement DFB_<jthVariable>
\\((X'X)_{jj}\\) InvXPX=
from I
option in MODEL
statement <jthVariable>
","tags":["SAS","Code","Discretionary Accruals","Jackknife"]},{"location":"posts/compute-jackknife-coefficient-estimates-in-sas/#example","title":"Example","text":"","tags":["SAS","Code","Discretionary Accruals","Jackknife"]},{"location":"posts/compute-jackknife-coefficient-estimates-in-sas/#discretionary-accruals","title":"Discretionary accruals","text":"Suppose we want to calculate the firm-level discretionary accruals for each year using the Jones (1991) model and Kothari et al (2005) model. For a firm \\(i\\), we need to first estimate the model for the industry-year excluding firm \\(i\\), then use the coefficient estimates to generate predicted accruals for firm \\(i\\). The firm's discretionary accruals is the actual accruals minus the predicted accruals.
Below is an example PROC REG
that produces three datasets named work.params
, work.outstats
and work.xpxinv
, which contain sufficient statistics to compute the Jackknifed estimates and thus the predicted accruals.
ods listing close; \nproc reg data=work.funda edf outest=work.params;\n /* industry-year regression */\nby fyear sic2;\n /* id is necessary for later matching Jackknifed coefficients to firm-year */\n id key;\n /* Jones Model */\n Jones: model tac = inv_at_l drev ppe / noint influence i;\n /* Kothari Model with ROA */\n Kothari: model tac = inv_at_l drevadj ppe roa / noint influence i;\n ods output OutputStatistics=work.outstats InvXPX=work.xpxinv;\nrun;\nods listing;\n
Full SAS program for estimating 5 different measures of discretionary accruals:
","tags":["SAS","Code","Discretionary Accruals","Jackknife"]},{"location":"posts/compute-weekly-return-from-daily-crsp-data/","title":"Compute Weekly Return from Daily CRSP Data","text":"Computing the weekly returns from the CRSP daily stock data is a common task but may be tricky sometimes. Let's discuss a few different ways to get it done incorrectly and correctly.
TL;DR Take me to the final solution!
Surely -> The solution
","tags":["CRSP","SAS","Code"]},{"location":"posts/compute-weekly-return-from-daily-crsp-data/#incorrect-ways","title":"INCORRECT ways","text":"Let me start with a few incorrect ways, which may seem perfectly okay at first glance. This part is important because it shows you how a small mistake can lead to hard-to-discover bugs.
","tags":["CRSP","SAS","Code"]},{"location":"posts/compute-weekly-return-from-daily-crsp-data/#weekly-index-return-from-daily-data","title":"Weekly index return from daily data","text":"Date as the Friday of the weekDate as the last trading day of the weekUsing intnx()
, we can derive the Friday of the week given a date, as shown below.
proc sql;\n/* Compute weekly marekt return from daily data */\ncreate table mktret_weekly as \nselect distinct date, \n year(date) as Year,\n week(date) as Week,\n case when weekday(date)=6 then date\n else intnx(\"week.6\",date,1) end as FridayOfWeek format=date9.,\n (exp(sum(log(1+sprtrn)))-1)*100 as mktret label=\"Weekly SP500 Index Return (%)\"\nfrom crsp.dsi \nwhere \nyear(date) between &startyear. and &endyear.\ngroup by year(date), week(date) order by date;\nquit;\n
Note that intnx(\"weekday.6\", date, 0)
will give the last Friday, which is not what we want. We want the next Friday of the week for a given date, so we use intnx(\"weekday.6\", date, 1)
. The case...when...
statement ensures that if the given date is already a Friday, we don't go for the next one. Below is a sample output of the mktret_weekly
table generated.
mktret_weekly
Obs Date Year Week FridayOfWeek mktret 1 19860102 1986 0 03JAN1986 -0.1893222 2 19860103 1986 0 03JAN1986 -0.1893222 3 19860106 1986 1 10JAN1986 -2.333080418 4 19860107 1986 1 10JAN1986 -2.333080418 5 19860108 1986 1 10JAN1986 -2.333080418 6 19860109 1986 1 10JAN1986 -2.333080418 7 19860110 1986 1 10JAN1986 -2.333080418 8 19860113 1986 2 17JAN1986 1.1992620931 9 19860114 1986 2 17JAN1986 1.1992620931 We can verify that the FridayOfWeek
indeed gives the Friday of the week. Therefore, the final weekly dataset using Friday as the date identifier just need to keep FridayOfWeek
and mktret
.
proc sql;\n/* Compute weekly marekt return from daily data */\ncreate table mktret_weekly as \nselect distinct\n case when weekday(date)=6 then date else intnx(\"week.6\",date,1) end \nas date format=date9. label=\"Friday of the Week\",\n (exp(sum(log(1+sprtrn)))-1)*100 \nas mktret label=\"Weekly SP500 Index Return (%)\"\nfrom crsp.dsi \nwhere \nyear(date) between &startyear. and &endyear.\ngroup by year(date), week(date) order by date;\nquit;\n
Example output of mktret_weekly
Obs date mktret 1 03JAN1986 -0.1893222 2 10JAN1986 -2.333080418 3 17JAN1986 1.1992620931 4 24JAN1986 -0.959555101 5 31JAN1986 2.5916781551 6 07FEB1986 1.3126828796 %let startyear=1986;\n%let endyear=2019;\nproc sql;\n/* Compute weekly marekt return from daily data */\ncreate table mktret_weekly as \nselect distinct date, \n (exp(sum(log(1+sprtrn)))-1)*100 as mktret label=\"Weekly SP500 Index Return (%)\"\nfrom crsp.dsi where year(date) between &startyear. and &endyear. \ngroup by year(date), week(date) \nhaving date=max(date) \norder by date;\nquit;\n
Note here that it's tempting to use having weekday(date)=6
to make sure the dates are all Friday. However, if Friday in a week is not the last trading day, then the weekly return will be missing. This is why here I use date=max(date)
to ensure non-missing weekly returns. The date is the last trading day in any given week, consistent with the CRSP's daily stock file.
The caveat here is that since the dates are the weekly last trading days, when merged with other weekly datasets, you should be very careful about whether the other dataset is using Friday or the last trading day per week as its date variable.
","tags":["CRSP","SAS","Code"]},{"location":"posts/compute-weekly-return-from-daily-crsp-data/#weekly-stock-return-from-daily-data","title":"Weekly stock return from daily data","text":"Following the same logic, we can calculate the weekly stock returns from daily CRSP data, where dates are aligned to the Friday of the week.
proc sql;\n/* Stocks (ordinary shares only) in the financial sector */\ncreate table stocks as select distinct permno from crsp.stocknames\nwhere shrcd in (10, 11) and floor(siccd/100) between 60 and 67;\n\ncreate table stockrets_weekly as \nselect distinct permno,\n case when weekday(date)=6 then date else intnx(\"week.6\",date,1) end \nas date format=date9. label=\"Friday of the Week\",\n (exp(sum(log(1+ret)))-1)*100 as ret label=\"Weekly Return (%)\"\nfrom crsp.dsf \nwhere \nyear(date) between &startyear. and &endyear.\nand permno in (select * from stocks) \n and prc>0 and not missing(ret)\ngroup by year(date), week(date), permno order by permno, date;\nquit;\n
","tags":["CRSP","SAS","Code"]},{"location":"posts/compute-weekly-return-from-daily-crsp-data/#whats-wrong","title":"What's wrong?","text":"The code above seems okay. We know that CRSP daily stock file contains many observations where the daily trading volume is 0, in which case the price is recorded as the negative bid-ask midpoint. Therefore, we restrict to only those with positive stock prices. So what's the problem?
The problem is that a week can span two calendar years.
For example, check out the last week of 2019:
Mon Tue Wed Thu Fri Sat Sun 30 31 1 2 3 4 5Now we have a mistake. A single week is broken into two because of the use of week()
function in SAS. Another consequence is that when there're many years of data, there will be a lot of duplicates.
Now let's explore two ways that avoid this mistake. Although both generate the same result (there can be a few differences, see the caveat), the second one is much faster.
","tags":["CRSP","SAS","Code"]},{"location":"posts/compute-weekly-return-from-daily-crsp-data/#1-start-with-a-list-of-dates-slow-version","title":"1. Start with a list of dates (slow version)","text":"Now we can write some correct code to compute the weekly returns. We'll generate a series of Fridays first, then we merge based on the past 5 calendar days. This will ensure all trading days with non-missing data will be included in the weekly return calculation, and correct the mistake mentioned above.
%let start_date = 01Jan1986;\n%let end_date = 31Dec2019;\n\n/* Generate a series of Fridays */\ndata fridays;\ndate=\"&start_date\"d;\ndo while (date<=\"&end_date\"d);\n if weekday(date)=6 then output;\n date=intnx('day', date, 1, 's');\nend;\nformat date date9.;\nrun;\n
Weekly index return from daily data (as at Friday)Weekly stock return from daily data (as at Friday) proc sql;\n/* Compute weekly index return from daily data */\ncreate table mktret_weekly as \nselect distinct a.date,\n (exp(sum(log(1+sprtrn)))-1)*100 \nas mktret label=\"Weekly SP500 Index Return (%)\"\nfrom fridays as a left join crsp.dsi as dsi\non dsi.date between intnx('day', a.date, -4) and a.date\ngroup by a.date\norder by a.date;\nquit;\n
Note that this version is inefficient and takes a long time to run.
proc sql;\n/* Stocks (ordinary shares) in the financial sector (2-digit SIC=60-67) */\ncreate table stocks as select distinct permno from crsp.stocknames\nwhere shrcd in (10, 11) and floor(siccd/100) between 60 and 67;\n\n/* Compute weekly stock return from daily data */\ncreate table stockrets_weekly as \nselect distinct a.date, dsf.permno, dsf.hsiccd,\n (exp(sum(log(1+ret)))-1)*100 as ret label=\"Weekly Return (%)\"\nfrom fridays as a left join crsp.dsf as dsf\non dsf.date between intnx('day', a.date, -4) and a.date\n and dsf.permno in (select * from stocks) \n and dsf.prc>0 and not missing(dsf.ret)\ngroup by dsf.permno, a.date\norder by dsf.permno, a.date;\nquit;\n
","tags":["CRSP","SAS","Code"]},{"location":"posts/compute-weekly-return-from-daily-crsp-data/#2-group-using-aligned-dates-fast-version-with-caveat","title":"2. Group using aligned dates (fast version with caveat)","text":"This version uses a similar logic from the previous incorrect one, but it groups based on the aligned dates instead of year(date)
and week(date)
.
proc sql;\n/* Compute weekly stock return from daily data */\ncreate table stockrets_weekly2 as \nselect distinct permno, hsiccd,\n case when weekday(date)=6 then date else intnx(\"week.6\",date,1) end \nas date format=date9. label=\"Friday of the Week\",\n (exp(sum(log(1+ret)))-1)*100 as ret label=\"Weekly Return (%)\"\nfrom crsp.dsf (keep=permno date ret prc shrout hsiccd)\nwhere \n date between \"01Jan1986\"d and \"31Dec2019\"d\n and permno in (select * from stocks) \n and prc>0 and not missing(ret)\ngroup by permno, calculated date order by permno, date;\nquit;\n
Caveat
If the beginning and ending dates, \"01Jan1986\"d and \"31Dec2019\"d
in the example, are not Fridays, then the first and last weekly returns for all stocks will be incorrect, because they are not using all the daily data in those weeks.
To fix this minor issue, simply extend the beginning and ending dates beyond your sample period by a few weeks.
","tags":["CRSP","SAS","Code"]},{"location":"posts/convert-between-numeric-and-character-variables/","title":"Convert Between Numeric and Character Variables","text":"Converting between numeric and character variables is one of the most frequently encountered issues when processing datasets. This article explains how to do this conversion correctly and efficiently.
","tags":["SAS","Stata","Code"]},{"location":"posts/convert-between-numeric-and-character-variables/#numeric-to-character","title":"Numeric to Character","text":"Assume there's an imported dataset named filings
, where cik
is stored as a numeric variable as shown below:
Because cik
is of different digits, to convert the numeric cik
into a character variable, the natural procedure is to pad it with leading zeros. For example, cik
(Central Index Key) itself is a 10-digit number used by SEC.
In SAS, convert numeric variable to string with leading zeros (assuming 10-digit fixed length) is done via PUT()
function:
data filings(drop=cik); set filings;\n cik_char = put(cik, z10.); \nrun;\n
Tip
PUT()
function also works in PROC SQL
.
The generated cik_char
variable is of format and informat $10.
, and the dataset becomes:
In STATA, convert numeric variable to string with leading zeros (assuming 6-digit fixed length) can be achieved via the string()
function.
gen char_var = string(num_var,\"%06.0f\")\n
","tags":["SAS","Stata","Code"]},{"location":"posts/convert-between-numeric-and-character-variables/#character-to-numeric","title":"Character to Numeric","text":"In SAS, converting a character variable to a numeric one uses the INPUT()
function:
var_numeric = input(var_char, best12.);\n
In STATA, this conversion be can be done via either real()
function or destring
command.
gen num_var = real(char_var);\n
The real()
function works on a single variable. destring
command can convert all character variables into numeric in one go.
destring, repalce\n
Warning
If a character variable has non-numeric characters in it, then it will not be converted. In such a case, you may choose to use the encode
command, although it in fact is generating categories.
A more detailed explanation with examples is available at stats.idre.ucla.edu
","tags":["SAS","Stata","Code"]},{"location":"posts/correlated-random-effects/","title":"Correlated Random Effects","text":"Can we estimate the coefficient of gender while controlling for individual fixed effects? This sounds impossible as an individual's gender typically does not vary and hence would be absorbed by individual fixed effects. However, Correlated Random Effects (CRE) may actually help.
At last year's FMA Annual Meeting, I learned this CRE estimation technique when discussing a paper titled \"Gender Gap in Returns to Publications\" by Piotr Spiewanowski, Ivan Stetsyuk and Oleksandr Talavera. Let me recollect my memory and summarize the technique in this post.
","tags":["Econometrics"]},{"location":"posts/correlated-random-effects/#random-intercept-effect-model","title":"Random Intercept (Effect) Model","text":"Consider a random intercept model for a firm-year regression, e.g., to examine the relationship between firm performance, R&D expense, and whether the firm is VC-backed,
\\[ \\begin{equation} y_{it} = \\beta_0 + \\beta_1 x_{it} + \\beta_2 c_i + \\mu_i + \\varepsilon_{it} \\end{equation} \\]where,
We can estimate \\(\\beta_0\\), \\(\\beta_1\\), \\(\\beta_2\\) and \\(\\mu_i\\). Assuming that we've properly controlled for observable firm characteristics, \\(\\beta_1\\) tells the relationship between R&D expenditure and firm performance. \\(\\beta_2\\) tells the difference in firm performance between VC-backed and non-VC-backed firms.
However, we cannot rely on \\(\\beta_2\\) to assert whether VC-back firms have better or worse performance. The drawback here is that we are unable to exhaustively control for all other time-invariant firm attributes that correlate with both firm performance and VC investment, thereby leading to biased \\(\\beta_2\\) estimate due to omitted variables.
Similarly, our estimate of \\(\\beta_1\\) may also be biased if some omitted firm-specific and time-invariant attributes correlate with both R&D expenditure and firm performance.
","tags":["Econometrics"]},{"location":"posts/correlated-random-effects/#fixed-effect-model","title":"Fixed Effect Model","text":"If we subtract the \"between\" model
\\[ \\begin{equation} \\bar{y}_{i} = \\beta_0 + \\beta_1 \\bar{x}_{it} + \\beta_2 c_i + \\mu_i + \\bar{\\varepsilon}_{i} \\end{equation} \\]from Equation (1), we have the fixed effect model in the demeaned form:
\\[ \\begin{equation} (y_{it} - \\bar{y}_i) = \\beta_1 (x_{it}-\\bar{x}_i) + (\\varepsilon_{it} - \\bar{\\varepsilon}_{i}) \\end{equation} \\]The fixed effect model above removes the firm-level error \\(\\mu_i\\) so that the within effect (or fixed effect) estimate of \\(\\beta_1\\) is unbiased even if \\(E(\\mu_i|x_{it}) \\ne 0\\). This helps a lot and is why most of the time we control for firm fixed effects when estimating firm-year regressions.
However, the firm-level variable \\(c_i\\) is also removed. It is now impossible to estimate \\(\\beta_2\\) as in the random intercept model. In fact, we can no longer estimate the effect of any firm-level time-invariant attributes after controlling for firm fixed effects.
","tags":["Econometrics"]},{"location":"posts/correlated-random-effects/#hybrid-model","title":"Hybrid Model","text":"So, how to estimate both \\(\\beta_2\\) when firm fixed effects are controlled for?
The same question, if paraphrased differently, is how to estimate the within effect in a random intercept model.
Interestingly, we can decompose the firm-year level variable \\(x_{it}\\) into two components, a between component \\(\\bar{x}_i\\) and a cluster component \\((x_{it}-\\bar{x}_i)\\), so that
\\[ \\begin{equation} y_{it} = \\beta_0 + \\beta_1 (x_{it}-\\bar{x}_i) + \\beta_2 c_i + \\beta_3 \\bar{x}_i + \\mu_i + \\varepsilon_{it} \\end{equation} \\]It is apparent that the \\(\\beta_1\\) estimate gives the within effect as in the fixed effect model, identical to \\(\\beta_1\\) in Equation (3).
Moreover, the firm-level variable \\(c_i\\) is kept in the model and we can estimate \\(\\beta_2\\). The inclusion of cluster mean \\(\\bar{x}_i\\) corrects the estimate of \\(\\beta_2\\) for between-cluster differences in \\(x_{it}\\). Note that, however, for \\(\\beta_2\\) estimate to be unbiased, we still require \\(E(\\mu_i|x_{it},c_i)=0\\) and \\(\\mu_i|x_{it},c_i \\sim N(0,\\sigma^2_\\mu)\\).
","tags":["Econometrics"]},{"location":"posts/correlated-random-effects/#correlated-random-effect-model","title":"Correlated Random Effect Model","text":"A related model is correlated random effect model (Wooldridge 2010) that relaxes the assumption of zero correlation between the firm-level error \\(\\mu_i\\) and firm-year variable \\(x_{it}\\). Specifically, it assumes that \\(\\mu_i=\\pi\\bar{x}_i + v_i\\), so Equation (1) becomes
\\[ \\begin{align} y_{it} &= \\beta_0 + \\beta_1 x_{it} + \\beta_2 c_i + \\mu_i + \\varepsilon_{it} \\\\ &= \\beta_0 + \\beta_1 x_{it} + \\beta_2 c_i + \\pi\\bar{x}_i + v_i + \\varepsilon_{it} \\end{align} \\]By including the cluster mean \\(\\bar{x}_i\\), we can account for the correlation between the random effects \\(\\mu_i\\) and the independent variable \\(x_{it}\\) and obtain consistent estimates of the coefficients. The inclusion of \\(\\bar{x}_i\\) in the random intercept (effect) model makes the estimate for \\(\\beta_1\\) the same within effect (fixed-effect) estimate as in Equation (4).
Of course, as the time-invariant firm-specific attribute \\(c_i\\) remains in the model, we can estimate \\(\\beta_2\\) as in the hybrid model.
","tags":["Econometrics"]},{"location":"posts/correlated-random-effects/#estimation","title":"Estimation","text":"Note that there are many caveats for estimating CRE.
To be discussed.
","tags":["Econometrics"]},{"location":"posts/correlated-random-effects/#further-readings","title":"Further Readings","text":"This post is based on Within and between Estimates in Random-Effects Models: Advantages and Drawbacks of Correlated Random Effects and Hybrid Models.
Some other suggested readings include:
Herfindahl\u2013Hirschman (HHI) Index is a well-known market concentration measure determined by two factors:
Intuitively, having a hundred similar-sized gas stations in town means a far less concentrated environment than just one or two available, and when the number of firms is constant, their size distribution or variance determines the magnitude of market concentration.
Since these two properties jointly determine the HHI measure of concentration, naturally we want a decomposition of HHI that can reflects these two dimensions respectively. This is particularly useful when two distinct markets have the same level of HHI measure, but the concentration may result from different sources. Note that here these two markets do not necessarily have to be industry A versus industry B, but can be the same industry niche in two geographical areas, for example.
Thus, we can think of HHI as the sum of the actual market state's deviation from 1) all firms having the same size, and the deviation from 2) a fully competitive environment with infinite number of firms in the market. Some simple math can solve our problem.
","tags":["HHI"]},{"location":"posts/decomposing-hhi-index/#some-math","title":"Some math","text":"Let's say in a market ther are \\(n\\) firms sized \\(x_1, x_2, ... x_n\\), thus we can describe the market using a \\(\\mathbb R_+^n\\) vector:
\\[ \\mathbf{x}=(x_1, x_2, ... x_n) \\]In the first scenario where all firms' sizes are equal, we can describe it with:
\\[ \\mathbf{\\bar{x}}=(\\bar{x}, \\bar{x}, ... \\bar{x}) \\]where \\(\\bar{x}=\\frac{1}{n} \\sum_{i=1}^{n}x_i\\) is the average firm size.
The Euclidean distance between the point \\(\\mathbf{x}\\) and \\(\\mathbf{\\bar{x}}\\), denoted as \\(d(\\mathbf{x}, \\mathbf{\\bar{x}})\\), is thus
\\[ d(\\mathbf{x}, \\mathbf{\\bar{x}})=\\sqrt{ \\sum_{i=1}^{n} x_{i}^2 - n \\bar{x}^2 } \\]For the ease of discussion, let's consider the other spectrum of the second scenario where there's only one firm in the market instead of infinite firms, assuming its size is the sum of all firms in the first scenario (i.e. its size is \\(n\\bar{x}\\)), we know that this market is the most concentrated state, \\(\\mathbf{x^*}\\). In other words, its distance to the market state in scenario one is the largest.
\\[ \\max_{x} d(\\mathbf{x}, \\mathbf{\\bar{x}})=d(\\mathbf{x^*}, \\mathbf{\\bar{x}}) = ... = \\sqrt{ (n-1)n \\bar{x}^2 } \\]Hence, the distance of any market state \\(\\mathbf{x}\\) to the first scenario, the equidistribution point \\(\\mathbf{\\bar{x}}\\), should lie between \\(0\\) to \\(d(\\mathbf{x^*}, \\mathbf{\\bar{x}})\\).
Thus we can derive a relative index of concentration (when \\(n>1\\)) as \\(\\tau\\):
\\[ \\tau=\\frac{ d(\\mathbf{x}, \\mathbf{\\bar{x}}) }{ d(\\mathbf{x^*}, \\mathbf{\\bar{x}}) } \\in [0, 1] \\]Now, given the definition of Herfindahl-Hirschman Index \\(H\\) that
\\[ H=\\sum_{i=1}^{n} (\\frac{x_i}{n\\bar{x}})^2 \\]we can get:
\\[ \\tau=\\sqrt{\\frac{n}{n-1}(H-\\frac{1}{n})} = \\sqrt{\\frac{nH-1}{n-1}} \\]Here comes the important implications. Recall that \\(\\tau\\) represents the ratio of the distance between a market state and the equidistribution point to the maximum possible distance given a total market size of \\(n\\bar{x}\\).
When we observe a market state \\(\\mathbf{x}=(x_1, x_2, ... x_n)\\) at a given time, the total market size is fixed and thus \\(\\tau\\) is only varying with the distance between the observed actual market state and the equidistribution state where all firms have the same size. This implies that \\(\\tau\\) could be a measure of the first determinant of market concentration, i.e. the size distribution (variance) of firms.
Further, \\(\\tau\\) represents a sequence of functions whose limit is \\(\\sqrt{H}\\) as \\(n \\to +\\infty\\), when the market is in a fully competitive environment. Thus, given a \\(H'\\) from the knowledge of \\(n'\\) and \\(\\mathbf{x'}\\), we know there is one and only one matching \\(\\tau'\\) and its limit of \\(\\sqrt{H'}\\) in the fully competitive environment.
The graph below shows that \\(H\\) can therefore be decomposed into two components, that is
\\[ H = E_i + E_n \\]where \\(E_i = \\tau^2\\), and \\(E_n = H-\\tau^2\\).
We mentioned before that \\(\\tau\\) can be measure of the market concentration resulted from the size distribution (variance) of firms, such that \\(E_i=\\tau^2\\) can be an even better one since it's smaller than \\(H\\), which enables us to measure the concentration contributed from the number of firms, measured by \\(E_n\\).
This decomposition is appealing also in that \\(E_n\\), from the graph above, effectively is the horizontal difference between the two curves, i.e. the 'distance' between the actual market state and the fully competitive market with infinite number of firms (scenario two).
Thus, it's safe to say this decomposition produces two components explaining the observed market concentration, 1) \\(E_i\\), the inequality of firm sizes effect, and 2) \\(E_n\\), the number of firms effect.
Another finding from the graph is that with higher market concentration measured by \\(H\\), the relative importance of the two components is changing.
When \\(H\\) is small, most of the concentration is resulted from \\(E_n\\) as highlighted below, which means the number of firms has a greater impact on market concentration.
When \\(H\\) is larger, on the other hand, \\(E_i\\) contributes more to \\(H\\), which means the firm size inequality plays a bigger role in market concentration.
A potential implication for regulators who are concerned about market concentration, I think, is to 1) focus more on reducing the entry barrier if the current concentration level is moderate, and to 2) focus more on antitrust if the concentration level is already high.
Another implication for researchers is that even though \\(H \\in [\\frac{1}{n}, 1]\\) is affected by the number of firms in a market, we should not attempt to use the \\(\\text{normalized HHI}=\\frac{H-1/n}{1-1/n} \\in [0,1]\\). The reason is now very simple and clear -- the normalized HHI is nothing but \\(E_i=\\tau^2\\), which reflects only the market concentration due to the inequality of firm sizes. When we compare across markets or the same market over time, apparently a market with 1,000 firms has a different competitive landscape than a market with only 2 firms.
","tags":["HHI"]},{"location":"posts/decomposing-hhi-index/#acknowledgement","title":"Acknowledgement","text":"This post is largely a replicate of the paper \"A Decomposition of the Herfindahl Index of Concentration\" by Giacomo de Gioia in 2017.
","tags":["HHI"]},{"location":"posts/docker-nginx-letsencrypt/","title":"Setup Docker/Ngnix and Let's Encrypt on Ubuntu","text":"This is a note for setting up a Docker, Nginx and Let's Encrypt environment on Ubuntu 20.04 LTS.
","tags":["Docker","Nginx","WordPress"]},{"location":"posts/docker-nginx-letsencrypt/#create-a-ubuntu-2004-lts-instance","title":"Create a Ubuntu 20.04 LTS instance","text":"","tags":["Docker","Nginx","WordPress"]},{"location":"posts/docker-nginx-letsencrypt/#install-docker-using-the-convenience-script","title":"Install Docker using the convenience script","text":"$ curl -fsSL https://get.docker.com -o get-docker.sh\n$ sudo sh get-docker.sh\n
","tags":["Docker","Nginx","WordPress"]},{"location":"posts/docker-nginx-letsencrypt/#manage-docker-as-a-non-root-user","title":"Manage Docker as a non-root user","text":"If you don't want to preface the docker
command with sudo
, create a Unix group called docker
and add users to it. When the Docker daemon starts, it creates a Unix socket accessible by members of the docker
group.
To create the docker
group and add your user:
docker
group.$ sudo groupadd docker\n
docker
group.$ sudo usermod -aG docker $USER\n
Log out and log back in so that your group membership is re-evaluated.
On Linux, you can also run the following command to activate the changes to groups:
$ newgrp docker
","tags":["Docker","Nginx","WordPress"]},{"location":"posts/docker-nginx-letsencrypt/#configure-docker-to-start-on-boot","title":"Configure Docker to start on boot","text":"$ sudo systemctl enable docker\n
To disable this behavior, use disable
instead.
$ sudo systemctl disable docker\n
","tags":["Docker","Nginx","WordPress"]},{"location":"posts/docker-nginx-letsencrypt/#install-docker-compose","title":"Install Docker Compose","text":"On Linux, you can download the Docker Compose binary from the Compose repository release page on GitHub. Follow the instructions from the link, which involve running the curl
command in your terminal to download the binaries. These step-by-step instructions are also included below.
$ sudo curl -L \"https://github.com/docker/compose/releases/download/1.25.5/docker-compose-$(uname -s)-$(uname -m)\" -o /usr/local/bin/docker-compose\n
Note
To install a different version of Compose, substitute 1.25.5
with the version of Compose you want to use.
$ sudo chmod +x /usr/local/bin/docker-compose\n
Note
If the command docker-compose
fails after installation, check your path. You can also create a symbolic link to /usr/bin
or any other directory in your path. For example:
$ sudo ln -s /usr/local/bin/docker-compose /usr/bin/docker-compose\n
","tags":["Docker","Nginx","WordPress"]},{"location":"posts/docker-nginx-letsencrypt/#set-up-nginx-proxy","title":"Set up Nginx-Proxy","text":"Create a unique network for nginx-proxy and other Docker containers to communicate through.
$ docker network create nginx-proxy\n
Create a directory nginx-proxy
for the compose file.
$ mkdir nginx-proxy && cd nginx-proxy\n
In the nginx-proxy directory, create a new file named docker-compose.yml
and paste in the following text:
docker-compose.yml
for nginx-proxy version: '3'\nservices:\nnginx:\nimage: nginx\nrestart: always\ncontainer_name: nginx-proxy\nports:\n- \"80:80\"\n- \"443:443\"\nvolumes:\n- conf:/etc/nginx/conf.d\n- vhost:/etc/nginx/vhost.d\n- html:/usr/share/nginx/html\n- certs:/etc/nginx/certs\nlabels:\n- \"com.github.jrcs.letsencrypt_nginx_proxy_companion.nginx_proxy=true\"\ndockergen:\nimage: jwilder/docker-gen\nrestart: always\ncontainer_name: nginx-proxy-gen\ndepends_on:\n- nginx\ncommand: -notify-sighup nginx-proxy -watch -wait 5s:30s /etc/docker-gen/templates/nginx.tmpl /etc/nginx/conf.d/default.conf\nvolumes:\n- conf:/etc/nginx/conf.d\n- vhost:/etc/nginx/vhost.d\n- html:/usr/share/nginx/html\n- certs:/etc/nginx/certs\n- /var/run/docker.sock:/tmp/docker.sock:ro\n- ./nginx.tmpl:/etc/docker-gen/templates/nginx.tmpl:ro\nletsencrypt:\nimage: jrcs/letsencrypt-nginx-proxy-companion\nrestart: always\ncontainer_name: nginx-proxy-le\ndepends_on:\n- nginx\n- dockergen\nenvironment:\nNGINX_PROXY_CONTAINER: nginx-proxy\nNGINX_DOCKER_GEN_CONTAINER: nginx-proxy-gen\nvolumes:\n- conf:/etc/nginx/conf.d\n- vhost:/etc/nginx/vhost.d\n- html:/usr/share/nginx/html\n- certs:/etc/nginx/certs\n- /var/run/docker.sock:/var/run/docker.sock:ro\nvolumes:\nconf:\nvhost:\nhtml:\ncerts:\nnetworks:\ndefault:\nexternal:\nname: nginx-proxy\n
Inside of the nginx-proxy
directory, use the following curl
command to copy the developer\u2019s sample nginx.tmpl
file to your VPS.
$ curl https://raw.githubusercontent.com/jwilder/nginx-proxy/master/nginx.tmpl > nginx.tmpl\n
Increase upload file size
To increase the maximum upload size, for example, add client_max_body_size 100M;
to the server{}
section in the nginx.tmpl
template file. For WordPress,
Running nginx-proxy
.
$ docker-compose up -d\n
","tags":["Docker","Nginx","WordPress"]},{"location":"posts/docker-nginx-letsencrypt/#add-a-wordpress-container","title":"Add a WordPress container","text":"Create a directory for the docker-compose.yml
with:
docker-compose.yml
for WordPress container version: \"3\"\nservices:\ndb_node_domain:\nimage: mysql:5.7\nvolumes:\n- db_data:/var/lib/mysql\nrestart: always\nenvironment:\nMYSQL_ROOT_PASSWORD: somewordpress\nMYSQL_DATABASE: wordpress\nMYSQL_USER: wordpress\nMYSQL_PASSWORD: wordpress\ncontainer_name: wp_test_db\nwordpress:\ndepends_on:\n- db_node_domain\nimage: wordpress:latest\nexpose:\n- 80\nrestart: always\nenvironment:\nVIRTUAL_HOST: blog.example.com\nLETSENCRYPT_HOST: blog.example.com\nLETSENCRYPT_EMAIL: foo@example.com\nWORDPRESS_DB_HOST: db_node_domain:3306\nWORDPRESS_DB_USER: wordpress\nWORDPRESS_DB_PASSWORD: wordpress\ncontainer_name: wp_test\nvolumes:\ndb_data:\nnetworks:\ndefault:\nexternal:\nname: nginx-proxy\n
To create a second WordPress container, add MYSQL_TCP_PORT
environment variable and set it to a different port.
Enter the bash of the WordPress container.
$ docker exec -t wordpress_container_name bash\n
Move inside your /var/www/html directory (already there if you\u2019re using the standard Docker Compose image). Run the following command to insert the values.
$ sed -i '/^# END WordPress.*/i php_value upload_max_filesize 256M\\nphp_value post_max_size 256M' .htaccess\n
Note
To restore the values, run $ sed -i \"11,12d\" .htaccess
The Wharton Research Data Services (WRDS) allows one to submit and execute SAS programs to the cloud. WRDS has an instruction on accessing WRDS data from SAS on our own PCs. Generally, you should use:
%let wrds=wrds-cloud.wharton.upenn.edu 4016;\noptions comamid=TCP remote=WRDS;\nsignon username=_prompt_;\n\nrsubmit;\n\n/* Code for remote execution goes here. */\nendrsubmit;\nsignoff;\n
However, if you want to save the effort of entering username and password every time, you'll need to encode your password. Concluding the two articles, basically you just need to follow the steps below.
","tags":["SAS"]},{"location":"posts/encode-password-for-sas-remote-submission/#simple-steps","title":"Simple Steps","text":"First, open your SAS program locally on your PC, run the following command and replace 1234567890
with your WRDS password:
proc pwencode in=\"1234567890\"; run;\n
The output {SAS002}23AA9C2811439227077603C8365060A44800CA1F
is the encoded password (which is 1234567890
in this example).
Do NOT share your SAS program with encoded password!
Encoded password functions the same as your plain-text password. You should never make public your password in any way.
Next, put the following statements at the beginning of your SAS program and replace my_username
with your WRDS username:
%let wrds=wrds-cloud.wharton.upenn.edu 4016;\noptions comamid=TCP remote=WRDS;\nsignon username=my_username password=\"{SAS002}23AA9C2811439227077603C8365060A44800CA1F\";\n
After these statements, you'll be able to submit your SAS program remotely to and execute on the WRDS server by enclosing your statements with rsubmit
and endrsubmit
. An example would be:
rsubmit;\nproc download data=comp.funda out=funda; run;\nendrsubmit;\n
As you can guess, this statement actually downloads the whole Compustat Fundamentals Annual to the local work directory, with the downloaded dataset also named funda
.
Lastly, after everything, you should run signoff
to close the connection with WRDS.
Full code is as below.
%let wrds=wrds-cloud.wharton.upenn.edu 4016;\noptions comamid=TCP remote=WRDS;\nsignon username=my_username password=\"{SAS002}23AA9C2811439227077603C8365060A44800CA1F\";\n\nrsubmit;\nproc download data=comp.funda out=funda; run;\nendrsubmit;\nsignoff;\n
Replace my_username
and the encoded password with your actual WRDS username and encoded password, paste it in the SAS program editor and press F3
. You'll be downloading comp.funda
in a few seconds!
I made a short video introduction as well, available on my YouTube channel.
","tags":["SAS"]},{"location":"posts/estimate-organization-capital/","title":"Estimate Organization Capital","text":"As in Eisfeldt and Papanikolaou (2013), we obtain firm-year accounting data from the Compustat and compute the stock of organization capital for firms using the perpetual inventory method that recursively calculates the stock of OC by accumulating the deflated value of SG&A expenses.
","tags":["Code","SAS"]},{"location":"posts/estimate-organization-capital/#organization-capital","title":"Organization Capital","text":"\\[ OC_{i,t} = (1-\\delta_{OC})OC_{i,t-1} + \\frac{SGA_{i,t}}{CPI_t} \\]where \\(SGA_{i,t}\\) is firm \\(i\\)'s SG&A expenses in year \\(t\\), \\(CPI_t\\) is the consumer price index, and \\(\\delta_{OC}\\) is the depreciation rate of OC stock, which is set to be 15% as used by the U.S. Bureau of Economic Analysis (BEA). The initial value of OC stock is set to:
\\[ OC_{i,0} = \\frac{SGA_{i,1}}{g+\\delta_{OC}} \\]where \\(g\\) is the average real growth rate of firm-level SG&A expenses, which is 10% in Eisfeldt and Papanikolaou (2013) or specific for an industry-decade in Li, Qiu and Shen (2018).
","tags":["Code","SAS"]},{"location":"posts/estimate-organization-capital/#code","title":"Code","text":"This code estimates the organization capital for all Compustat firm-years.
Note that it requires an external dataset of CPI. You need to name it cpiaucsl
and store it in your WRDS home directory.
%let wrds=wrds-cloud.wharton.upenn.edu 4016;\noptions comamid=TCP remote=WRDS;\nsignon username=_prompt_;\n\nrsubmit;\n\n/* ==============================================================================================\n * This SAS program calcualtes the firm-year Organization Capital, measured by the capitalized \n * SG&A expenses using perpetual inventory method.\n * See e.g. Eisfeldt and Papanikolaou (2013), Li, Qiu and Shen (2018), Gao, Leung and Qiu (2021).\n *\n * Input: Compustat from WRDS.\n * Output:\n * sgastock: capitalized SG&A expenses\n * oc: capitalized SG&A expenses scaled by CPI-adjusted total assets\n * indadj_oc: industry median adjusted oc\n * rank_oc: annual decile rank of oc\n * rank_indadj_oc: annual decile rank of indadj_oc\n *\n * Note:\n * This program requires an external dataset of CPI named `cpiaucsl` in your home directory.\n * I use the Consumer Price Index for All Urban Consumers: All Items (CPIAUCSL)\n * sourced from Federal Reserve Bank of St.Louis,\n * available at https://fred.stlouisfed.org/series/CPIAUCSL/\n * Also, the industry-adjustment is based on sich from compustat only.\n * This code may contain error. Please check before use.\n *\n * Author: Mingze (Adrian) Gao\n * mingze.gao@sydney.edu.au\n *\n * Last Modifed: 24 Feb 2019\n * ============================================================================================== */\nlibname home \"~/\";\ndata funda(keep=gvkey cusip cik fyear datadate at xsga xrd xad sic2);\n /* Variables from Compustat:\n * AT: Assets Total;\n * XSGA: Selling, General and Administrative Expense;\n * XRD: Research and Development Expense;\n * XAD: Advertising Expense; */\nset comp.funda;\n if cmiss(of fyear datadate)=0;\n if indfmt = 'INDL' and datafmt='STD' and popsrc='D' and consol='C';\n sic2 = int(sich/100);\nrun;\nproc sql;\n/* Keep only obs from the first year with non-missing XSGA */\ncreate table funda_nonmissing_xsga as\nselect distinct a.*\n from funda as a left join \n /* This subquery selects the first year of appearance \n with non-missing XSGA */\n (select gvkey, fyear as firstfyear from funda \n where xsga is not missing \ngroup by gvkey having fyear = min(fyear)) as b\n on a.gvkey=b.gvkey\n where a.fyear>=b.firstfyear; \n\n/* CPIAUCSL: Consumer Price Index for All Urban Consumers: All Items\n Source: https://fred.stlouisfed.org/series/CPIAUCSL/ */\ncreate table funda_cpi as\nselect distinct a.*, b.cpiaucsl as cpi\n from funda_nonmissing_xsga as a left join home.cpiaucsl as b\n on year(a.datadate) = year(b.date) and month(a.datadate) = month(b.date)\n order by gvkey, fyear;\nquit;\n/* Sanity Check -- No Duplicates */\nproc sort nodupkey data=funda_cpi; \n by gvkey fyear;\nrun;\ndata funda_adj;\n set funda_cpi;\n by gvkey fyear;\n /* Replace missing XSGA, XRD and XAD with 0 */\nif xsga=. then xsga=0;\n if xrd=. then xrd=0;\n if xad=. then xad=0;\n /* Total assets adjusted for CPI */\n adjat = at / cpi;\n /* Two alternative SG&A measures */\n adjxsga1 = xsga / cpi;\n adjxsga2 = sum(xsga, -xrd, -xad) / cpi;\nrun;\ndata sgastock(drop=cnt adjxsga1 adjxsga2 lag:);\n set funda_adj(keep=gvkey cik cusip datadate fyear sic2 adj:);\n by gvkey;\n if first.gvkey then call missing(of cnt lag:);\n cnt+1;\n array adjxsga adjxsga1-adjxsga2;\n array sgastock sgastock1-sgastock2;\n array sgastock_r sgastock_r1-sgastock_r2;\n array lag_sgastock lag_sgastock1-lag_sgastock2;\n select (cnt);\n when (1) do;\n /* Under Perpetual Inventory Method, \n * the initial value of capitalized SG&A at time 0, O(0), is:\n * O(0)=O(1)/(g+delta)\n * where g is average SGA growth rate (10%) and delta is depreciation rate (15%).\n * So that,\n * O(0)=SGA(1)/(0.15+0.1)=SGA(1)/0.25=SGA(1)*4\n * This is why `adjxsga*4` is used below, specifically,\n * O(1)=O(0)*0.85+SGA(1)\n * =SGA(0)*4*0.85+SGA(1) */\ndo over sgastock;\n sgastock = (adjxsga * 4)* 0.85 + adjxsga; end;\n end;\n otherwise do;\n /* When t>1,\n * the capitalized SG&A at time t, O(t), is:\n * O(t)=O(t-1)*(1-delta)+SGA(t)\n * where g is average SGA growth rate (10%) and delta is depreciation rate (15%).\n * Note that here SG&A is adjusted for CPI. */\ndo over sgastock;\n sgastock = lag_sgastock * 0.85 + adjxsga; end;\n end;\n end;\n do over sgastock;\n lag_sgastock = sgastock;\n /* `sgastock_r` is sgastock scaled by adjusted total assets. */\n sgastock_r = sgastock / adjat;\n if adjat=. then sgastock_r=0;\n end;\n output;\n retain lag:;\nrun;\n/* industry-adjusted OC and rank-based OC measures */\nproc sql;\ncreate table tmp as\nselect gvkey, cik, cusip, datadate, fyear, sic2,\n sgastock_r1 as oc1,\n sgastock_r2 as oc2,\n sgastock_r1 - median(sgastock_r1) as indadj_oc1,\n sgastock_r2 - median(sgastock_r2) as indadj_oc2\n from sgastock\n group by fyear, sic2\n order by gvkey, fyear;\nquit;\nproc sort data=tmp; by fyear; run;\nproc rank data=tmp out=result groups=10;\n by fyear;\n var oc1 oc2 indadj_oc1 indadj_oc2;\n ranks rank_oc1 rank_oc2 rank_indadj_oc1 rank_indadj_oc2;\nrun;\ndata download(compress=yes); set work.result; run;\nproc download data=work.download out=sgastock; run;\nendrsubmit;\nsignoff;\n
Lastly, if you use this code above, please consider citing the following article for which it was written.
Gao, M. Leung, H. and Qiu, B. (2021). Organization Capital and Executive Performance Incentives, Journal of Banking & Finance, 123, 106017.
","tags":["Code","SAS"]},{"location":"posts/firm-historical-headquarter-state-from-10k/","title":"Firm Historical Headquarter State from SEC 10K/Q Filings","text":"","tags":["SEC","Textual Analysis","Compustat","Data","Code","8-K"]},{"location":"posts/firm-historical-headquarter-state-from-10k/#why-the-need-to-use-sec-filings","title":"Why the need to use SEC filings?","text":"In the Compustat database, a firm's headquarter state (and other identification) is in fact the current record stored in comp.company
. This means once a firm relocates (or updates its incorporate state, address, etc.), all historical observations will be updated and not recording historical state information anymore.
To resolve this issue, an effective way is to use the firm's historical SEC filings. You can follow my previous post Textual Analysis on SEC filings to extract the header information, which includes a wide range of meta data. Alternatively, the University of Notre Dame's Software Repository for Accounting and Finance provides an augmented 10-X header dataset.
2023 March Update
In this update I use 1,491,368 8-K filings of U.S. firms from 2004 to Dec 2022 and extract their HQ state and zipcode. hist_state_zipcode_from_8k_2004_2022.csv.zip
","tags":["SEC","Textual Analysis","Compustat","Data","Code","8-K"]},{"location":"posts/firm-historical-headquarter-state-from-10k/#do-i-have-to-use-sec-filings","title":"Do I have to use SEC filings?","text":"I'll skip the parsing procedure for now. The most important point is that using the historical SEC filings, you can ensure that you truly are using the historical headquarter state in your empirical estimation. Based on the augmented 10-X header dataset, I find that around 2-3% of Compustat firms changed their headquarter state (as indicated by their business address) each year.
Year Firms Changed State Total Firms % Firms Changed State 1995 22 4205 0.52 1996 69 7939 0.87 1997 199 8101 2.46 1998 206 8126 2.54 1999 202 8199 2.46 2000 202 8252 2.45 2001 204 7802 2.61 2002 167 7421 2.25 2003 214 6930 3.09 2004 175 6742 2.6 2005 154 6478 2.38 2006 156 6267 2.49 2007 144 6091 2.36 2008 125 5797 2.16 2009 127 5523 2.3 2010 128 5479 2.34 2011 152 5445 2.79 2012 160 5494 2.91 2013 171 5491 3.11 2014 195 5455 3.57 2015 147 5322 2.76 2016 117 5092 2.3 2017 129 4914 2.63 2018 107 4847 2.21Moreover, 2,947 out of the 17,221 firms, or about 17% firms changed their headquarter state in the merged sample. This is by no means a small number that can be ignored. So, whenever possible, you should try to use the historical information from past SEC filings' metadata.
","tags":["SEC","Textual Analysis","Compustat","Data","Code","8-K"]},{"location":"posts/firm-historical-headquarter-state-from-10k/#how-to-get-the-actual-historical-firm-hq-state-using-sec-filings","title":"How to get the actual historical firm HQ state using SEC filings?","text":"","tags":["SEC","Textual Analysis","Compustat","Data","Code","8-K"]},{"location":"posts/firm-historical-headquarter-state-from-10k/#1969-2003","title":"1969 - 2003","text":"I start with the firm historical HQ state provided by Bai, Fairhurst and Serfling (2020 RFS). This dataset contains the historical HQ locations from 1969 to 2003, which is based on the SEC filings post 1994 and hand-collected by the authors from the Moody\u2019s Manuals (later Mergent Manuals) and Dun & Bradstreet\u2019s Million Dollar Directory (later bought by Mergent).1
","tags":["SEC","Textual Analysis","Compustat","Data","Code","8-K"]},{"location":"posts/firm-historical-headquarter-state-from-10k/#1994-2018","title":"1994 - 2018","text":"To extend the dataset, I download the augmented 10-X header dataset and use the following Python script to extract the business address (state) filed.
import pandas as pd\nfilepath = \"~/Downloads/LM_EDGAR_10X_Header_1994_2018.csv\"\nif __name__ == \"__main__\":\ndf = pd.read_csv(\nfilepath,\nusecols=[\"cik\", \"file_date\", \"ba_state\"],\ndtype={\"cik\": str},\nparse_dates=[\"file_date\"],\n)\n# Some `ba_stata` codes are lowercase\ndf[\"ba_state\"] = df[\"ba_state\"].str.upper()\n# Some `ba_state` codes are not valid US states\ndf = df[df[\"ba_state\"].str.isalpha() & ~pd.isnull(df[\"ba_state\"])]\ndf.drop_duplicates().to_stata(\n\"~/Downloads/historical_state_1994_2018.dta\",\nwrite_index=False,\nconvert_dates={\"file_date\": \"td\"},\n)\n
The result is a historical_state.dta
Stata file like this:
Finally, to merge the two datasets together, I imported them into WRDS Cloud and run the following SAS script:
libname hs \"~/historical_state\";\n\n/* Historical HQ state (1994 to 2018) from augmented 10-X header dataset */\nproc import datafile=\"~/historical_state/historical_state_1994_2018.dta\"\nout=historical_state_1994_2018 dbms=stata replace;\n/* Historical HQ state (1969 to 2003) from Bai, Fairhurst and Serfling (2020 RFS) */\nproc import datafile=\"~/historical_state/hist_headquarters_Bai_et_al.dta\"\nout=hist_headquarters_Bai_et_al dbms=stata replace;\n\n/* Build the post-1994 dataset using SEC filings */\nproc sql;\ncreate table funda as \nselect gvkey, cik, datadate, fyear from comp.funda\nwhere indfmt= 'INDL' and datafmt='STD' and popsrc='D' and consol='C'\nand year(datadate) between 1994 and 2018\n/* \"For firms that change fiscal year within a calendar year, \n we take the last reported date when extracting financial data. \n This leaves us with one set of observations for each firm (gvkey) in each year.\" \n -- Pelueger, Siriwardane and Sunderam (2020 QJE) */\ngroup by gvkey, fyear having datadate=max(datadate);\n\ncreate table firm_historical_state as \nselect a.*, b.ba_state as state_sec label=\"State from SEC filings\"\nfrom funda as a left join historical_state as b \non a.cik=b.cik and year(a.datadate)=year(b.file_date) and b.file_date<=a.datadate\ngroup by a.gvkey, a.datadate\n/* use the SEC filing closet to and before the Compustat datadate */\nhaving b.file_date=max(b.file_date);\n\ncreate table historical_state_1994_2018 as\nselect a.*, b.state as state_comp label=\"State from Compustat\"\nfrom firm_historical_state as a left join comp.company as b \non a.gvkey=b.gvkey\norder by a.gvkey, a.datadate;\nquit;\n/* Sanity check: no duplicated gvkey-fyear */\nproc sort data=historical_state_1994_2018 nodupkey; by gvkey datadate; run;\nproc sql;\ncreate table hist_headquarters_Bai_et_al as \nselect put(gvkeyn, z6.) as gvkey, fyear, state \nfrom hist_headquarters_Bai_et_al;\nquit;\n/* Stack together the two datasets */\ndata states; \nset hist_headquarters_Bai_et_al \n historical_state_1994_2018(where=(fyear>2003) keep=gvkey fyear state:);\nrun;\nproc sql;\ncreate table hs.corrected_hist_state_1969_2018 as \nselect *, coalesce(state, state_sec, state_comp) as corrected_state\nfrom states where not missing(calculated corrected_state)\norder by gvkey, fyear;\nquit;\n/* Sanity check: no duplicated gvkey-fyear */\nproc sort data=hs.corrected_hist_state_1969_2018 nodupkey; by gvkey fyear; run;\n
","tags":["SEC","Textual Analysis","Compustat","Data","Code","8-K"]},{"location":"posts/firm-historical-headquarter-state-from-10k/#data-available-for-download","title":"Data available for download","text":"You can download the data I compiled here: corrected_hist_state_1969_2018.dta.zip (1MB).
","tags":["SEC","Textual Analysis","Compustat","Data","Code","8-K"]},{"location":"posts/firm-historical-headquarter-state-from-10k/#2023-update","title":"2023 Update","text":"In this update, I use 1,491,368 8-K filings of U.S. firms from 2004 to Dec 2022 and extract their HQ state and zipcode.
Data: hist_state_zipcode_from_8k_2004_2022.csv.zip
Specifically, I download all 8-K filings from EDGAR and run a script to extract business address from filing header into a database. If a firm reported different states in its filings in a given year, I keep the last one. The code is here at GitHub.
","tags":["SEC","Textual Analysis","Compustat","Data","Code","8-K"]},{"location":"posts/firm-historical-headquarter-state-from-10k/#suggested-citation","title":"Suggested citation","text":"Lastly, if you use the code/data above, please consider citing the following article for which it was written/constructed.
Gao, M. Leung, H. and Qiu, B. (2021). Organization Capital and Executive Performance Incentives, Journal of Banking & Finance, 123, 106017.
The authors note that \"for our final sample of 115,432 firm-year observations, we find that over the 1969 to 2003 period, 9,847 (87.50%) never relocate, 1,211 (10.76%) relocate once, 178 (1.58%) relocate twice, and 18 (0.16%) relocate three times.\"\u00a0\u21a9
Since Stata 15, we can search, browse and import almost a million U.S. and international economic and financial time series made available by the St. Louis Federal Reserve's Federal Research Economic Data. This post briefly explains this great feature.
","tags":["Stata"]},{"location":"posts/fred-federal-reserve-economic-data/#prerequisite","title":"Prerequisite","text":"Before you start, you will need an API Key from FRED. Register one here
Then in Stata, you can store this key permanently so you don't need to provide again.(1)
_key_
with your actual API Key obtained.set fredkey _key_, permanently\n
","tags":["Stata"]},{"location":"posts/fred-federal-reserve-economic-data/#gui-is-always-a-good-start","title":"GUI is always a good start","text":"Alternatively, click on menu File>Import>Fedearl Reserve Economic Data (FRED)
will bring up the dialog as shown below.
Enter API Key and you'll be free to explore all the data series available on FRED.
For example, let's see the CPI of Australia...
Describe the data series, we can find many useful meta info.
Vintage
Note that \"vintage\" section lists a number of dates, with each vintage referring a particular version of the data series at that point of time.
It may sound strange but an economic data series may be revised multiple times after it has been published. Potential reasons may be that later people collect more accurate information, or that there is a change of estimation method, etc.
For example, the CPI from 2005 to 2010 collected by a research as at 2011 may be different from the one collected as at 2023. Without specifying the data vintage, replicating a prior work can be hard.
Another tricky part is that ignoring vintages introduces look-ahead bias in analysis.
For example, a trading strategy using the revised GDP accessed today, instead of the vintage GDP, implicitly uses hindsight as the GDP series may have been revised to accomodate more accurate data obtained after release.
Let's close the description, double click on the series and click on import. Another dialog will be shown to confirm some final details.
The outputs will be like the following:
. import fred AUSCPIALLQINMEI, daterange(2010-01-01 2023-08-08) aggregate(quarterly,avg)\n\nSummary\n----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------\nSeries ID Nobs Date range Frequency\n----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------\nAUSCPIALLQINMEI 53 2010-01-01 to 2023-01-01 Quarterly\n----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------\n# of series imported: 1\n highest frequency: Quarterly\n lowest frequency: Quarterly\n
","tags":["Stata"]},{"location":"posts/fred-federal-reserve-economic-data/#programmatical-is-recipe-to-reproducibility","title":"Programmatical is recipe to reproducibility","text":"We don't need to go through the GUI process every time. In fact, Stata already told us what the corresponding command is:
import fred AUSCPIALLQINMEI, daterange(2010-01-01 2023-08-08) aggregate(quarterly,avg)\n
We can simply put this line of code into our program.
For example, the code below generates a time-series chart for Australia's CPI.
// Import\nimport fred AUSCPIALLQINMEI, daterange(2010-01-01 2023-03-31) vintage(2023-05-10) aggregate(quarterly,avg) clear\nrename AUSCPIALLQINMEI_20230510 cpi_australia\n// Time format\ngen yrqtr = yq(year(daten),quarter(daten))\nformat yrqtr %tq\ntsset yrqtr\n// Set start of the period to 100\ngen cpi_ret = cpi_australia/L.cpi_australia - 1\nreplace cpi_australia = 100 if _n==1\nreplace cpi_australia = L.cpi_australia * (1+cpi_ret) if _n>1\n// Plotting\ntwoway (tsline cpi_australia), title(\"Quarterly CPI of Australia 2010Q1-2023Q1\") ytitle(\"\") ttitle(\"\") note(\"Index 2010Q1=100. Source: FRED, 2023-05-10 vintage.\")\n
Note
The code snippet above specifies the data vintage. Therefore, even if someone runs it 30 years from now, they will still get exactly the same data and plot as I do in 2023.
","tags":["Stata"]},{"location":"posts/generate-fama-french-industry-classification-from-sic/","title":"Generate Fama-French Industry Classification From SIC","text":"This Stata program creates the Fama-French industry classification from SIC code.
","tags":["Stata","Code"]},{"location":"posts/generate-fama-french-industry-classification-from-sic/#basic-usage","title":"Basic usage","text":"ffind sic, generate(\u201cFF48\u201d) type(48)\n
where sic is SIC code, FF48 is the generated industry variable name, and we are using 48-industry classification. Alternatively, one can choose 5, 10, 12, 17, 30, 38 or 49 industries.
","tags":["Stata","Code"]},{"location":"posts/generate-fama-french-industry-classification-from-sic/#full-stata-code","title":"Full Stata code","text":"/****************************************\n* ffind.ado\n* Creates variable containing Fama-French\n* industry classification.\n*\n* Author: Judson Caskey, UCLA\n* December 9, 2007\n*\n* Revised by Malcolm Wardlaw, Uiversity of Texas at Dallas (November 1, 2011)\n****************************************/\ncapture program drop ffind\n\nprogram define ffind\n version 9.2\n syntax varlist(min=1 max=1 numeric) [if] [in], Generate(string) Type(numlist max=1 min=1)\n\n tempvar ftyp\n tokenize \"`type'\"\n local `ftyp'=`1'\n * Check if generate is valid variable name\n capture confirm new variable `generate'\n if _rc != 0 {\n di as error \"Variable `generate' is invalid\"\n exit 111\n }\n\n * Check type\n if ~inlist(``ftyp'',5,10,12,17,30,38,48,49) {\n di as error \"Type must be 5, 10, 12, 17, 30, 38, 48 or 49\"\n exit 111\n }\n\n * Set industries\n tempvar ffind\n tokenize \"`varlist'\"\n local `ffind' \"`1'\"\n qui gen `generate'=.\n label variable `generate' \"Fama-French industry code (``ftyp'' industries)\"\n capture label drop `generate'\n if ``ftyp''==5 {\n label define `generate' 1 \"Consumer Durables, NonDurables, Wholesale, Retail, and Some Services (Laundries, Repair Shops)\" 2 \"Manufacturing, Energy, and Utilities\" 3 \"Business Equipment, Telephone and Television Transmission\" 4 \"Healthcare, Medical Equipment, and Drugs\" 5 \"Other -- Mines, Constr, BldMt, Trans, Hotels, Bus Serv, Entertainment, Finance\"\n label values `generate' `generate'\n qui replace `generate'=1 if inrange(``ffind'',100,999) | inrange(``ffind'',2000,2399) | inrange(``ffind'',2700,2749) | inrange(``ffind'',2770,2799) | inrange(``ffind'',3100,3199) | inrange(``ffind'',3940,3989) | inrange(``ffind'',2500,2519) | inrange(``ffind'',2590,2599) | inrange(``ffind'',3630,3659) | inrange(``ffind'',3710,3711) | inrange(``ffind'',3714,3714) | inrange(``ffind'',3716,3716) | inrange(``ffind'',3750,3751) | inrange(``ffind'',3792,3792) | inrange(``ffind'',3900,3939) | inrange(``ffind'',3990,3999) | inrange(``ffind'',5000,5999) | inrange(``ffind'',7200,7299) | inrange(``ffind'',7600,7699)\n qui replace `generate'=2 if inrange(``ffind'',2520,2589) | inrange(``ffind'',2600,2699) | inrange(``ffind'',2750,2769) | inrange(``ffind'',2800,2829) | inrange(``ffind'',2840,2899) | inrange(``ffind'',3000,3099) | inrange(``ffind'',3200,3569) | inrange(``ffind'',3580,3629) | inrange(``ffind'',3700,3709) | inrange(``ffind'',3712,3713) | inrange(``ffind'',3715,3715) | inrange(``ffind'',3717,3749) | inrange(``ffind'',3752,3791) | inrange(``ffind'',3793,3799) | inrange(``ffind'',3830,3839) | inrange(``ffind'',3860,3899) | inrange(``ffind'',1200,1399) | inrange(``ffind'',2900,2999) | inrange(``ffind'',4900,4949)\n qui replace `generate'=3 if inrange(``ffind'',3570,3579) | inrange(``ffind'',3622,3622) | inrange(``ffind'',3660,3692) | inrange(``ffind'',3694,3699) | inrange(``ffind'',3810,3839) | inrange(``ffind'',7370,7372) | inrange(``ffind'',7373,7373) | inrange(``ffind'',7374,7374) | inrange(``ffind'',7375,7375) | inrange(``ffind'',7376,7376) | inrange(``ffind'',7377,7377) | inrange(``ffind'',7378,7378) | inrange(``ffind'',7379,7379) | inrange(``ffind'',7391,7391) | inrange(``ffind'',8730,8734) | inrange(``ffind'',4800,4899)\n qui replace `generate'=4 if inrange(``ffind'',2830,2839) | inrange(``ffind'',3693,3693) | inrange(``ffind'',3840,3859) | inrange(``ffind'',8000,8099)\n qui replace `generate'=5 if missing(`generate') & ~missing(``ffind'')\n }\n else if ``ftyp''==10 {\n label define `generate' 1 \"Consumer NonDurables -- Food, Tobacco, Textiles, Apparel, Leather, Toys\" 2 \"Consumer Durables -- Cars, TV's, Furniture, Household Appliances\" 3 \"Manufacturing -- Machinery, Trucks, Planes, Chemicals, Off Furn, Paper, Com Printing\" 4 \"Oil, Gas, and Coal Extraction and Products\" 5 \"Business Equipment -- Computers, Software, and Electronic Equipment\" 6 \"Telephone and Television Transmission\" 7 \"Wholesale, Retail, and Some Services (Laundries, Repair Shops)\" 8 \"Healthcare, Medical Equipment, and Drugs\" 9 \"Utilities\" 10 \"Other -- Mines, Constr, BldMt, Trans, Hotels, Bus Serv, Entertainment, Finance\"\n label values `generate' `generate'\n qui replace `generate'=1 if inrange(``ffind'',100,999) | inrange(``ffind'',2000,2399) | inrange(``ffind'',2700,2749) | inrange(``ffind'',2770,2799) | inrange(``ffind'',3100,3199) | inrange(``ffind'',3940,3989)\n qui replace `generate'=2 if inrange(``ffind'',2500,2519) | inrange(``ffind'',2590,2599) | inrange(``ffind'',3630,3659) | inrange(``ffind'',3710,3711) | inrange(``ffind'',3714,3714) | inrange(``ffind'',3716,3716) | inrange(``ffind'',3750,3751) | inrange(``ffind'',3792,3792) | inrange(``ffind'',3900,3939) | inrange(``ffind'',3990,3999)\n qui replace `generate'=3 if inrange(``ffind'',2520,2589) | inrange(``ffind'',2600,2699) | inrange(``ffind'',2750,2769) | inrange(``ffind'',2800,2829) | inrange(``ffind'',2840,2899) | inrange(``ffind'',3000,3099) | inrange(``ffind'',3200,3569) | inrange(``ffind'',3580,3629) | inrange(``ffind'',3700,3709) | inrange(``ffind'',3712,3713) | inrange(``ffind'',3715,3715) | inrange(``ffind'',3717,3749) | inrange(``ffind'',3752,3791) | inrange(``ffind'',3793,3799) | inrange(``ffind'',3830,3839) | inrange(``ffind'',3860,3899)\n qui replace `generate'=4 if inrange(``ffind'',1200,1399) | inrange(``ffind'',2900,2999)\n qui replace `generate'=5 if inrange(``ffind'',3570,3579) | inrange(``ffind'',3622,3622) | inrange(``ffind'',3660,3692) | inrange(``ffind'',3694,3699) | inrange(``ffind'',3810,3839) | inrange(``ffind'',7370,7372) | inrange(``ffind'',7373,7373) | inrange(``ffind'',7374,7374) | inrange(``ffind'',7375,7375) | inrange(``ffind'',7376,7376) | inrange(``ffind'',7377,7377) | inrange(``ffind'',7378,7378) | inrange(``ffind'',7379,7379) | inrange(``ffind'',7391,7391) | inrange(``ffind'',8730,8734)\n qui replace `generate'=6 if inrange(``ffind'',4800,4899)\n qui replace `generate'=7 if inrange(``ffind'',5000,5999) | inrange(``ffind'',7200,7299) | inrange(``ffind'',7600,7699)\n qui replace `generate'=8 if inrange(``ffind'',2830,2839) | inrange(``ffind'',3693,3693) | inrange(``ffind'',3840,3859) | inrange(``ffind'',8000,8099)\n qui replace `generate'=9 if inrange(``ffind'',4900,4949)\n qui replace `generate'=10 if missing(`generate') & ~missing(``ffind'')\n }\n else if ``ftyp''==12 {\n label define `generate' 1 \"Consumer NonDurables -- Food, Tobacco, Textiles, Apparel, Leather, Toys\" 2 \"Consumer Durables -- Cars, TV's, Furniture, Household Appliances\" 3 \"Manufacturing -- Machinery, Trucks, Planes, Off Furn, Paper, Com Printing\" 4 \"Oil, Gas, and Coal Extraction and Products\" 5 \"Chemicals and Allied Products\" 6 \"Business Equipment -- Computers, Software, and Electronic Equipment\" 7 \"Telephone and Television Transmission\" 8 \"Utilities\" 9 \"Wholesale, Retail, and Some Services (Laundries, Repair Shops)\" 10 \"Healthcare, Medical Equipment, and Drugs\" 11 \"Finance\" 12 \"Other -- Mines, Constr, BldMt, Trans, Hotels, Bus Serv, Entertainment\"\n label values `generate' `generate'\n qui replace `generate'=1 if inrange(``ffind'',100,999) | inrange(``ffind'',2000,2399) | inrange(``ffind'',2700,2749) | inrange(``ffind'',2770,2799) | inrange(``ffind'',3100,3199) | inrange(``ffind'',3940,3989)\n qui replace `generate'=2 if inrange(``ffind'',2500,2519) | inrange(``ffind'',2590,2599) | inrange(``ffind'',3630,3659) | inrange(``ffind'',3710,3711) | inrange(``ffind'',3714,3714) | inrange(``ffind'',3716,3716) | inrange(``ffind'',3750,3751) | inrange(``ffind'',3792,3792) | inrange(``ffind'',3900,3939) | inrange(``ffind'',3990,3999)\n qui replace `generate'=3 if inrange(``ffind'',2520,2589) | inrange(``ffind'',2600,2699) | inrange(``ffind'',2750,2769) | inrange(``ffind'',3000,3099) | inrange(``ffind'',3200,3569) | inrange(``ffind'',3580,3629) | inrange(``ffind'',3700,3709) | inrange(``ffind'',3712,3713) | inrange(``ffind'',3715,3715) | inrange(``ffind'',3717,3749) | inrange(``ffind'',3752,3791) | inrange(``ffind'',3793,3799) | inrange(``ffind'',3830,3839) | inrange(``ffind'',3860,3899)\n qui replace `generate'=4 if inrange(``ffind'',1200,1399) | inrange(``ffind'',2900,2999)\n qui replace `generate'=5 if inrange(``ffind'',2800,2829) | inrange(``ffind'',2840,2899)\n qui replace `generate'=6 if inrange(``ffind'',3570,3579) | inrange(``ffind'',3660,3692) | inrange(``ffind'',3694,3699) | inrange(``ffind'',3810,3829) | inrange(``ffind'',7370,7379)\n qui replace `generate'=7 if inrange(``ffind'',4800,4899)\n qui replace `generate'=8 if inrange(``ffind'',4900,4949)\n qui replace `generate'=9 if inrange(``ffind'',5000,5999) | inrange(``ffind'',7200,7299) | inrange(``ffind'',7600,7699)\n qui replace `generate'=10 if inrange(``ffind'',2830,2839) | inrange(``ffind'',3693,3693) | inrange(``ffind'',3840,3859) | inrange(``ffind'',8000,8099)\n qui replace `generate'=11 if inrange(``ffind'',6000,6999)\n qui replace `generate'=12 if missing(`generate') & ~missing(``ffind'')\n }\n\n else if ``ftyp''==17 {\n label define `generate' 1 \"Food\" 2 \"Mining and Minerals\" 3 \"Oil and Petroleum Products\" 4 \"Textiles, Apparel & Footware\" 5 \"Consumer Durables\" 6 \"Chemicals\" 7 \"Drugs, Soap, Prfums, Tobacco\" 8 \"Construction and Construction Materials\" 9 \"Steel Works Etc\" 10 \"Fabricated Products\" 11 \"Machinery and Business Equipment\" 12 \"Automobiles\" 13 \"Transportation\" 14 \"Utilities\" 15 \"Retail Stores\" 16 \"Banks, Insurance Companies, and Other Financials\" 17 \"Other\"\n label values `generate' `generate'\n qui replace `generate'=1 if inrange(``ffind'',100,199) | inrange(``ffind'',200,299) | inrange(``ffind'',700,799) | inrange(``ffind'',900,999) | inrange(``ffind'',2000,2009) | inrange(``ffind'',2010,2019) | inrange(``ffind'',2020,2029) | inrange(``ffind'',2030,2039) | inrange(``ffind'',2040,2046) | inrange(``ffind'',2047,2047) | inrange(``ffind'',2048,2048) | inrange(``ffind'',2050,2059) | inrange(``ffind'',2060,2063) | inrange(``ffind'',2064,2068) | inrange(``ffind'',2070,2079) | inrange(``ffind'',2080,2080) | inrange(``ffind'',2082,2082) | inrange(``ffind'',2083,2083) | inrange(``ffind'',2084,2084) | inrange(``ffind'',2085,2085) | inrange(``ffind'',2086,2086) | inrange(``ffind'',2087,2087) | inrange(``ffind'',2090,2092) | inrange(``ffind'',2095,2095) | inrange(``ffind'',2096,2096) | inrange(``ffind'',2097,2097) | inrange(``ffind'',2098,2099) | inrange(``ffind'',5140,5149) | inrange(``ffind'',5150,5159) | inrange(``ffind'',5180,5182) | inrange(``ffind'',5191,5191)\n qui replace `generate'=2 if inrange(``ffind'',1000,1009) | inrange(``ffind'',1010,1019) | inrange(``ffind'',1020,1029) | inrange(``ffind'',1030,1039) | inrange(``ffind'',1040,1049) | inrange(``ffind'',1060,1069) | inrange(``ffind'',1080,1089) | inrange(``ffind'',1090,1099) | inrange(``ffind'',1200,1299) | inrange(``ffind'',1400,1499) | inrange(``ffind'',5050,5052)\n qui replace `generate'=3 if inrange(``ffind'',1300,1300) | inrange(``ffind'',1310,1319) | inrange(``ffind'',1320,1329) | inrange(``ffind'',1380,1380) | inrange(``ffind'',1381,1381) | inrange(``ffind'',1382,1382) | inrange(``ffind'',1389,1389) | inrange(``ffind'',2900,2912) | inrange(``ffind'',5170,5172)\n qui replace `generate'=4 if inrange(``ffind'',2200,2269) | inrange(``ffind'',2270,2279) | inrange(``ffind'',2280,2284) | inrange(``ffind'',2290,2295) | inrange(``ffind'',2296,2296) | inrange(``ffind'',2297,2297) | inrange(``ffind'',2298,2298) | inrange(``ffind'',2299,2299) | inrange(``ffind'',2300,2390) | inrange(``ffind'',2391,2392) | inrange(``ffind'',2393,2395) | inrange(``ffind'',2396,2396) | inrange(``ffind'',2397,2399) | inrange(``ffind'',3020,3021) | inrange(``ffind'',3100,3111) | inrange(``ffind'',3130,3131) | inrange(``ffind'',3140,3149) | inrange(``ffind'',3150,3151) | inrange(``ffind'',3963,3965) | inrange(``ffind'',5130,5139)\n qui replace `generate'=5 if inrange(``ffind'',2510,2519) | inrange(``ffind'',2590,2599) | inrange(``ffind'',3060,3069) | inrange(``ffind'',3070,3079) | inrange(``ffind'',3080,3089) | inrange(``ffind'',3090,3099) | inrange(``ffind'',3630,3639) | inrange(``ffind'',3650,3651) | inrange(``ffind'',3652,3652) | inrange(``ffind'',3860,3861) | inrange(``ffind'',3870,3873) | inrange(``ffind'',3910,3911) | inrange(``ffind'',3914,3914) | inrange(``ffind'',3915,3915) | inrange(``ffind'',3930,3931) | inrange(``ffind'',3940,3949) | inrange(``ffind'',3960,3962) | inrange(``ffind'',5020,5023) | inrange(``ffind'',5064,5064) | inrange(``ffind'',5094,5094) | inrange(``ffind'',5099,5099)\n qui replace `generate'=6 if inrange(``ffind'',2800,2809) | inrange(``ffind'',2810,2819) | inrange(``ffind'',2820,2829) | inrange(``ffind'',2860,2869) | inrange(``ffind'',2870,2879) | inrange(``ffind'',2890,2899) | inrange(``ffind'',5160,5169)\n qui replace `generate'=7 if inrange(``ffind'',2100,2199) | inrange(``ffind'',2830,2830) | inrange(``ffind'',2831,2831) | inrange(``ffind'',2833,2833) | inrange(``ffind'',2834,2834) | inrange(``ffind'',2840,2843) | inrange(``ffind'',2844,2844) | inrange(``ffind'',5120,5122) | inrange(``ffind'',5194,5194)\n qui replace `generate'=8 if inrange(``ffind'',800,899) | inrange(``ffind'',1500,1511) | inrange(``ffind'',1520,1529) | inrange(``ffind'',1530,1539) | inrange(``ffind'',1540,1549) | inrange(``ffind'',1600,1699) | inrange(``ffind'',1700,1799) | inrange(``ffind'',2400,2439) | inrange(``ffind'',2440,2449) | inrange(``ffind'',2450,2459) | inrange(``ffind'',2490,2499) | inrange(``ffind'',2850,2859) | inrange(``ffind'',2950,2952) | inrange(``ffind'',3200,3200) | inrange(``ffind'',3210,3211) | inrange(``ffind'',3240,3241) | inrange(``ffind'',3250,3259) | inrange(``ffind'',3261,3261) | inrange(``ffind'',3264,3264) | inrange(``ffind'',3270,3275) | inrange(``ffind'',3280,3281) | inrange(``ffind'',3290,3293) | inrange(``ffind'',3420,3429) | inrange(``ffind'',3430,3433) | inrange(``ffind'',3440,3441) | inrange(``ffind'',3442,3442) | inrange(``ffind'',3446,3446) | inrange(``ffind'',3448,3448) | inrange(``ffind'',3449,3449) | inrange(``ffind'',3450,3451) | inrange(``ffind'',3452,3452) | inrange(``ffind'',5030,5039) | inrange(``ffind'',5070,5078) | inrange(``ffind'',5198,5198) | inrange(``ffind'',5210,5211) | inrange(``ffind'',5230,5231) | inrange(``ffind'',5250,5251)\n qui replace `generate'=9 if inrange(``ffind'',3300,3300) | inrange(``ffind'',3310,3317) | inrange(``ffind'',3320,3325) | inrange(``ffind'',3330,3339) | inrange(``ffind'',3340,3341) | inrange(``ffind'',3350,3357) | inrange(``ffind'',3360,3369) | inrange(``ffind'',3390,3399)\n qui replace `generate'=10 if inrange(``ffind'',3410,3412) | inrange(``ffind'',3443,3443) | inrange(``ffind'',3444,3444) | inrange(``ffind'',3460,3469) | inrange(``ffind'',3470,3479) | inrange(``ffind'',3480,3489) | inrange(``ffind'',3490,3499)\n qui replace `generate'=11 if inrange(``ffind'',3510,3519) | inrange(``ffind'',3520,3529) | inrange(``ffind'',3530,3530) | inrange(``ffind'',3531,3531) | inrange(``ffind'',3532,3532) | inrange(``ffind'',3533,3533) | inrange(``ffind'',3534,3534) | inrange(``ffind'',3535,3535) | inrange(``ffind'',3536,3536) | inrange(``ffind'',3540,3549) | inrange(``ffind'',3550,3559) | inrange(``ffind'',3560,3569) | inrange(``ffind'',3570,3579) | inrange(``ffind'',3580,3580) | inrange(``ffind'',3581,3581) | inrange(``ffind'',3582,3582) | inrange(``ffind'',3585,3585) | inrange(``ffind'',3586,3586) | inrange(``ffind'',3589,3589) | inrange(``ffind'',3590,3599) | inrange(``ffind'',3600,3600) | inrange(``ffind'',3610,3613) | inrange(``ffind'',3620,3621) | inrange(``ffind'',3622,3622) | inrange(``ffind'',3623,3629) | inrange(``ffind'',3670,3679) | inrange(``ffind'',3680,3680) | inrange(``ffind'',3681,3681) | inrange(``ffind'',3682,3682) | inrange(``ffind'',3683,3683) | inrange(``ffind'',3684,3684) | inrange(``ffind'',3685,3685) | inrange(``ffind'',3686,3686) | inrange(``ffind'',3687,3687) | inrange(``ffind'',3688,3688) | inrange(``ffind'',3689,3689) | inrange(``ffind'',3690,3690) | inrange(``ffind'',3691,3692) | inrange(``ffind'',3693,3693) | inrange(``ffind'',3694,3694) | inrange(``ffind'',3695,3695) | inrange(``ffind'',3699,3699) | inrange(``ffind'',3810,3810) | inrange(``ffind'',3811,3811) | inrange(``ffind'',3812,3812) | inrange(``ffind'',3820,3820) | inrange(``ffind'',3821,3821) | inrange(``ffind'',3822,3822) | inrange(``ffind'',3823,3823) | inrange(``ffind'',3824,3824) | inrange(``ffind'',3825,3825) | inrange(``ffind'',3826,3826) | inrange(``ffind'',3827,3827) | inrange(``ffind'',3829,3829) | inrange(``ffind'',3830,3839) | inrange(``ffind'',3950,3955) | inrange(``ffind'',5060,5060) | inrange(``ffind'',5063,5063) | inrange(``ffind'',5065,5065) | inrange(``ffind'',5080,5080) | inrange(``ffind'',5081,5081)\n qui replace `generate'=12 if inrange(``ffind'',3710,3710) | inrange(``ffind'',3711,3711) | inrange(``ffind'',3714,3714) | inrange(``ffind'',3716,3716) | inrange(``ffind'',3750,3751) | inrange(``ffind'',3792,3792) | inrange(``ffind'',5010,5015) | inrange(``ffind'',5510,5521) | inrange(``ffind'',5530,5531) | inrange(``ffind'',5560,5561) | inrange(``ffind'',5570,5571) | inrange(``ffind'',5590,5599)\n qui replace `generate'=13 if inrange(``ffind'',3713,3713) | inrange(``ffind'',3715,3715) | inrange(``ffind'',3720,3720) | inrange(``ffind'',3721,3721) | inrange(``ffind'',3724,3724) | inrange(``ffind'',3725,3725) | inrange(``ffind'',3728,3728) | inrange(``ffind'',3730,3731) | inrange(``ffind'',3732,3732) | inrange(``ffind'',3740,3743) | inrange(``ffind'',3760,3769) | inrange(``ffind'',3790,3790) | inrange(``ffind'',3795,3795) | inrange(``ffind'',3799,3799) | inrange(``ffind'',4000,4013) | inrange(``ffind'',4100,4100) | inrange(``ffind'',4110,4119) | inrange(``ffind'',4120,4121) | inrange(``ffind'',4130,4131) | inrange(``ffind'',4140,4142) | inrange(``ffind'',4150,4151) | inrange(``ffind'',4170,4173) | inrange(``ffind'',4190,4199) | inrange(``ffind'',4200,4200) | inrange(``ffind'',4210,4219) | inrange(``ffind'',4220,4229) | inrange(``ffind'',4230,4231) | inrange(``ffind'',4400,4499) | inrange(``ffind'',4500,4599) | inrange(``ffind'',4600,4699) | inrange(``ffind'',4700,4700) | inrange(``ffind'',4710,4712) | inrange(``ffind'',4720,4729) | inrange(``ffind'',4730,4739) | inrange(``ffind'',4740,4742) | inrange(``ffind'',4780,4780) | inrange(``ffind'',4783,4783) | inrange(``ffind'',4785,4785) | inrange(``ffind'',4789,4789)\n qui replace `generate'=14 if inrange(``ffind'',4900,4900) | inrange(``ffind'',4910,4911) | inrange(``ffind'',4920,4922) | inrange(``ffind'',4923,4923) | inrange(``ffind'',4924,4925) | inrange(``ffind'',4930,4931) | inrange(``ffind'',4932,4932) | inrange(``ffind'',4939,4939) | inrange(``ffind'',4940,4942)\n qui replace `generate'=15 if inrange(``ffind'',5260,5261) | inrange(``ffind'',5270,5271) | inrange(``ffind'',5300,5300) | inrange(``ffind'',5310,5311) | inrange(``ffind'',5320,5320) | inrange(``ffind'',5330,5331) | inrange(``ffind'',5334,5334) | inrange(``ffind'',5390,5399) | inrange(``ffind'',5400,5400) | inrange(``ffind'',5410,5411) | inrange(``ffind'',5412,5412) | inrange(``ffind'',5420,5421) | inrange(``ffind'',5430,5431) | inrange(``ffind'',5440,5441) | inrange(``ffind'',5450,5451) | inrange(``ffind'',5460,5461) | inrange(``ffind'',5490,5499) | inrange(``ffind'',5540,5541) | inrange(``ffind'',5550,5551) | inrange(``ffind'',5600,5699) | inrange(``ffind'',5700,5700) | inrange(``ffind'',5710,5719) | inrange(``ffind'',5720,5722) | inrange(``ffind'',5730,5733) | inrange(``ffind'',5734,5734) | inrange(``ffind'',5735,5735) | inrange(``ffind'',5736,5736) | inrange(``ffind'',5750,5750) | inrange(``ffind'',5800,5813) | inrange(``ffind'',5890,5890) | inrange(``ffind'',5900,5900) | inrange(``ffind'',5910,5912) | inrange(``ffind'',5920,5921) | inrange(``ffind'',5930,5932) | inrange(``ffind'',5940,5940) | inrange(``ffind'',5941,5941) | inrange(``ffind'',5942,5942) | inrange(``ffind'',5943,5943) | inrange(``ffind'',5944,5944) | inrange(``ffind'',5945,5945) | inrange(``ffind'',5946,5946) | inrange(``ffind'',5947,5947) | inrange(``ffind'',5948,5948) | inrange(``ffind'',5949,5949) | inrange(``ffind'',5960,5963) | inrange(``ffind'',5980,5989) | inrange(``ffind'',5990,5990) | inrange(``ffind'',5992,5992) | inrange(``ffind'',5993,5993) | inrange(``ffind'',5994,5994) | inrange(``ffind'',5995,5995) | inrange(``ffind'',5999,5999)\n qui replace `generate'=16 if inrange(``ffind'',6010,6019) | inrange(``ffind'',6020,6020) | inrange(``ffind'',6021,6021) | inrange(``ffind'',6022,6022) | inrange(``ffind'',6023,6023) | inrange(``ffind'',6025,6025) | inrange(``ffind'',6026,6026) | inrange(``ffind'',6028,6029) | inrange(``ffind'',6030,6036) | inrange(``ffind'',6040,6049) | inrange(``ffind'',6050,6059) | inrange(``ffind'',6060,6062) | inrange(``ffind'',6080,6082) | inrange(``ffind'',6090,6099) | inrange(``ffind'',6100,6100) | inrange(``ffind'',6110,6111) | inrange(``ffind'',6112,6112) | inrange(``ffind'',6120,6129) | inrange(``ffind'',6140,6149) | inrange(``ffind'',6150,6159) | inrange(``ffind'',6160,6163) | inrange(``ffind'',6172,6172) | inrange(``ffind'',6199,6199) | inrange(``ffind'',6200,6299) | inrange(``ffind'',6300,6300) | inrange(``ffind'',6310,6312) | inrange(``ffind'',6320,6324) | inrange(``ffind'',6330,6331) | inrange(``ffind'',6350,6351) | inrange(``ffind'',6360,6361) | inrange(``ffind'',6370,6371) | inrange(``ffind'',6390,6399) | inrange(``ffind'',6400,6411) | inrange(``ffind'',6500,6500) | inrange(``ffind'',6510,6510) | inrange(``ffind'',6512,6512) | inrange(``ffind'',6513,6513) | inrange(``ffind'',6514,6514) | inrange(``ffind'',6515,6515) | inrange(``ffind'',6517,6519) | inrange(``ffind'',6530,6531) | inrange(``ffind'',6532,6532) | inrange(``ffind'',6540,6541) | inrange(``ffind'',6550,6553) | inrange(``ffind'',6611,6611) | inrange(``ffind'',6700,6700) | inrange(``ffind'',6710,6719) | inrange(``ffind'',6720,6722) | inrange(``ffind'',6723,6723) | inrange(``ffind'',6724,6724) | inrange(``ffind'',6725,6725) | inrange(``ffind'',6726,6726) | inrange(``ffind'',6730,6733) | inrange(``ffind'',6790,6790) | inrange(``ffind'',6792,6792) | inrange(``ffind'',6794,6794) | inrange(``ffind'',6795,6795) | inrange(``ffind'',6798,6798) | inrange(``ffind'',6799,6799)\n qui replace `generate'=17 if missing(`generate') & ~missing(``ffind'')\n\n }\n\n else if ``ftyp''==30 {\n label define `generate' 1 \"Food Products\" 2 \"Beer & Liquor\" 3 \"Tobacco Products\" 4 \"Recreation\" 5 \"Printing and Publishing\" 6 \"Consumer Goods\" 7 \"Apparel\" 8 \"Healthcare, Medical Equipment, Pharmaceutical Products\" 9 \"Chemicals\" 10 \"Textiles\" 11 \"Construction and Construction Materials\" 12 \"Steel Works Etc\" 13 \"Fabricated Products and Machinery\" 14 \"Electrical Equipment\" 15 \"Automobiles and Trucks\" 16 \"Aircraft, ships, and railroad equipment\" 17 \"Precious Metals, Non-Metallic, and Industrial Metal Mining\" 18 \"Coal\" 19 \"Petroleum and Natural Gas\" 20 \"Utilities\" 21 \"Communication\" 22 \"Personal and Business Services\" 23 \"Business Equipment\" 24 \"Business Supplies and Shipping Containers\" 25 \"Transportation\" 26 \"Wholesale\" 27 \"Retail\" 28 \"Restaraunts, Hotels, Motels\" 29 \"Banking, Insurance, Real Estate, Trading\" 30 \"Everything Else\"\n label values `generate' `generate'\n qui replace `generate'=1 if inrange(``ffind'',100,199) | inrange(``ffind'',200,299) | inrange(``ffind'',700,799) | inrange(``ffind'',910,919) | inrange(``ffind'',2000,2009) | inrange(``ffind'',2010,2019) | inrange(``ffind'',2020,2029) | inrange(``ffind'',2030,2039) | inrange(``ffind'',2040,2046) | inrange(``ffind'',2048,2048) | inrange(``ffind'',2050,2059) | inrange(``ffind'',2060,2063) | inrange(``ffind'',2064,2068) | inrange(``ffind'',2070,2079) | inrange(``ffind'',2086,2086) | inrange(``ffind'',2087,2087) | inrange(``ffind'',2090,2092) | inrange(``ffind'',2095,2095) | inrange(``ffind'',2096,2096) | inrange(``ffind'',2097,2097) | inrange(``ffind'',2098,2099)\n qui replace `generate'=2 if inrange(``ffind'',2080,2080) | inrange(``ffind'',2082,2082) | inrange(``ffind'',2083,2083) | inrange(``ffind'',2084,2084) | inrange(``ffind'',2085,2085)\n qui replace `generate'=3 if inrange(``ffind'',2100,2199)\n qui replace `generate'=4 if inrange(``ffind'',920,999) | inrange(``ffind'',3650,3651) | inrange(``ffind'',3652,3652) | inrange(``ffind'',3732,3732) | inrange(``ffind'',3930,3931) | inrange(``ffind'',3940,3949) | inrange(``ffind'',7800,7829) | inrange(``ffind'',7830,7833) | inrange(``ffind'',7840,7841) | inrange(``ffind'',7900,7900) | inrange(``ffind'',7910,7911) | inrange(``ffind'',7920,7929) | inrange(``ffind'',7930,7933) | inrange(``ffind'',7940,7949) | inrange(``ffind'',7980,7980) | inrange(``ffind'',7990,7999)\n qui replace `generate'=5 if inrange(``ffind'',2700,2709) | inrange(``ffind'',2710,2719) | inrange(``ffind'',2720,2729) | inrange(``ffind'',2730,2739) | inrange(``ffind'',2740,2749) | inrange(``ffind'',2750,2759) | inrange(``ffind'',2770,2771) | inrange(``ffind'',2780,2789) | inrange(``ffind'',2790,2799) | inrange(``ffind'',3993,3993)\n qui replace `generate'=6 if inrange(``ffind'',2047,2047) | inrange(``ffind'',2391,2392) | inrange(``ffind'',2510,2519) | inrange(``ffind'',2590,2599) | inrange(``ffind'',2840,2843) | inrange(``ffind'',2844,2844) | inrange(``ffind'',3160,3161) | inrange(``ffind'',3170,3171) | inrange(``ffind'',3172,3172) | inrange(``ffind'',3190,3199) | inrange(``ffind'',3229,3229) | inrange(``ffind'',3260,3260) | inrange(``ffind'',3262,3263) | inrange(``ffind'',3269,3269) | inrange(``ffind'',3230,3231) | inrange(``ffind'',3630,3639) | inrange(``ffind'',3750,3751) | inrange(``ffind'',3800,3800) | inrange(``ffind'',3860,3861) | inrange(``ffind'',3870,3873) | inrange(``ffind'',3910,3911) | inrange(``ffind'',3914,3914) | inrange(``ffind'',3915,3915) | inrange(``ffind'',3960,3962) | inrange(``ffind'',3991,3991) | inrange(``ffind'',3995,3995)\n qui replace `generate'=7 if inrange(``ffind'',2300,2390) | inrange(``ffind'',3020,3021) | inrange(``ffind'',3100,3111) | inrange(``ffind'',3130,3131) | inrange(``ffind'',3140,3149) | inrange(``ffind'',3150,3151) | inrange(``ffind'',3963,3965)\n qui replace `generate'=8 if inrange(``ffind'',2830,2830) | inrange(``ffind'',2831,2831) | inrange(``ffind'',2833,2833) | inrange(``ffind'',2834,2834) | inrange(``ffind'',2835,2835) | inrange(``ffind'',2836,2836) | inrange(``ffind'',3693,3693) | inrange(``ffind'',3840,3849) | inrange(``ffind'',3850,3851) | inrange(``ffind'',8000,8099)\n qui replace `generate'=9 if inrange(``ffind'',2800,2809) | inrange(``ffind'',2810,2819) | inrange(``ffind'',2820,2829) | inrange(``ffind'',2850,2859) | inrange(``ffind'',2860,2869) | inrange(``ffind'',2870,2879) | inrange(``ffind'',2890,2899)\n qui replace `generate'=10 if inrange(``ffind'',2200,2269) | inrange(``ffind'',2270,2279) | inrange(``ffind'',2280,2284) | inrange(``ffind'',2290,2295) | inrange(``ffind'',2297,2297) | inrange(``ffind'',2298,2298) | inrange(``ffind'',2299,2299) | inrange(``ffind'',2393,2395) | inrange(``ffind'',2397,2399)\n qui replace `generate'=11 if inrange(``ffind'',800,899) | inrange(``ffind'',1500,1511) | inrange(``ffind'',1520,1529) | inrange(``ffind'',1530,1539) | inrange(``ffind'',1540,1549) | inrange(``ffind'',1600,1699) | inrange(``ffind'',1700,1799) | inrange(``ffind'',2400,2439) | inrange(``ffind'',2450,2459) | inrange(``ffind'',2490,2499) | inrange(``ffind'',2660,2661) | inrange(``ffind'',2950,2952) | inrange(``ffind'',3200,3200) | inrange(``ffind'',3210,3211) | inrange(``ffind'',3240,3241) | inrange(``ffind'',3250,3259) | inrange(``ffind'',3261,3261) | inrange(``ffind'',3264,3264) | inrange(``ffind'',3270,3275) | inrange(``ffind'',3280,3281) | inrange(``ffind'',3290,3293) | inrange(``ffind'',3295,3299) | inrange(``ffind'',3420,3429) | inrange(``ffind'',3430,3433) | inrange(``ffind'',3440,3441) | inrange(``ffind'',3442,3442) | inrange(``ffind'',3446,3446) | inrange(``ffind'',3448,3448) | inrange(``ffind'',3449,3449) | inrange(``ffind'',3450,3451) | inrange(``ffind'',3452,3452) | inrange(``ffind'',3490,3499) | inrange(``ffind'',3996,3996)\n qui replace `generate'=12 if inrange(``ffind'',3300,3300) | inrange(``ffind'',3310,3317) | inrange(``ffind'',3320,3325) | inrange(``ffind'',3330,3339) | inrange(``ffind'',3340,3341) | inrange(``ffind'',3350,3357) | inrange(``ffind'',3360,3369) | inrange(``ffind'',3370,3379) | inrange(``ffind'',3390,3399)\n qui replace `generate'=13 if inrange(``ffind'',3400,3400) | inrange(``ffind'',3443,3443) | inrange(``ffind'',3444,3444) | inrange(``ffind'',3460,3469) | inrange(``ffind'',3470,3479) | inrange(``ffind'',3510,3519) | inrange(``ffind'',3520,3529) | inrange(``ffind'',3530,3530) | inrange(``ffind'',3531,3531) | inrange(``ffind'',3532,3532) | inrange(``ffind'',3533,3533) | inrange(``ffind'',3534,3534) | inrange(``ffind'',3535,3535) | inrange(``ffind'',3536,3536) | inrange(``ffind'',3538,3538) | inrange(``ffind'',3540,3549) | inrange(``ffind'',3550,3559) | inrange(``ffind'',3560,3569) | inrange(``ffind'',3580,3580) | inrange(``ffind'',3581,3581) | inrange(``ffind'',3582,3582) | inrange(``ffind'',3585,3585) | inrange(``ffind'',3586,3586) | inrange(``ffind'',3589,3589) | inrange(``ffind'',3590,3599)\n qui replace `generate'=14 if inrange(``ffind'',3600,3600) | inrange(``ffind'',3610,3613) | inrange(``ffind'',3620,3621) | inrange(``ffind'',3623,3629) | inrange(``ffind'',3640,3644) | inrange(``ffind'',3645,3645) | inrange(``ffind'',3646,3646) | inrange(``ffind'',3648,3649) | inrange(``ffind'',3660,3660) | inrange(``ffind'',3690,3690) | inrange(``ffind'',3691,3692) | inrange(``ffind'',3699,3699)\n qui replace `generate'=15 if inrange(``ffind'',2296,2296) | inrange(``ffind'',2396,2396) | inrange(``ffind'',3010,3011) | inrange(``ffind'',3537,3537) | inrange(``ffind'',3647,3647) | inrange(``ffind'',3694,3694) | inrange(``ffind'',3700,3700) | inrange(``ffind'',3710,3710) | inrange(``ffind'',3711,3711) | inrange(``ffind'',3713,3713) | inrange(``ffind'',3714,3714) | inrange(``ffind'',3715,3715) | inrange(``ffind'',3716,3716) | inrange(``ffind'',3792,3792) | inrange(``ffind'',3790,3791) | inrange(``ffind'',3799,3799)\n qui replace `generate'=16 if inrange(``ffind'',3720,3720) | inrange(``ffind'',3721,3721) | inrange(``ffind'',3723,3724) | inrange(``ffind'',3725,3725) | inrange(``ffind'',3728,3729) | inrange(``ffind'',3730,3731) | inrange(``ffind'',3740,3743)\n qui replace `generate'=17 if inrange(``ffind'',1000,1009) | inrange(``ffind'',1010,1019) | inrange(``ffind'',1020,1029) | inrange(``ffind'',1030,1039) | inrange(``ffind'',1040,1049) | inrange(``ffind'',1050,1059) | inrange(``ffind'',1060,1069) | inrange(``ffind'',1070,1079) | inrange(``ffind'',1080,1089) | inrange(``ffind'',1090,1099) | inrange(``ffind'',1100,1119) | inrange(``ffind'',1400,1499)\n qui replace `generate'=18 if inrange(``ffind'',1200,1299)\n qui replace `generate'=19 if inrange(``ffind'',1300,1300) | inrange(``ffind'',1310,1319) | inrange(``ffind'',1320,1329) | inrange(``ffind'',1330,1339) | inrange(``ffind'',1370,1379) | inrange(``ffind'',1380,1380) | inrange(``ffind'',1381,1381) | inrange(``ffind'',1382,1382) | inrange(``ffind'',1389,1389) | inrange(``ffind'',2900,2912) | inrange(``ffind'',2990,2999)\n qui replace `generate'=20 if inrange(``ffind'',4900,4900) | inrange(``ffind'',4910,4911) | inrange(``ffind'',4920,4922) | inrange(``ffind'',4923,4923) | inrange(``ffind'',4924,4925) | inrange(``ffind'',4930,4931) | inrange(``ffind'',4932,4932) | inrange(``ffind'',4939,4939) | inrange(``ffind'',4940,4942)\n qui replace `generate'=21 if inrange(``ffind'',4800,4800) | inrange(``ffind'',4810,4813) | inrange(``ffind'',4820,4822) | inrange(``ffind'',4830,4839) | inrange(``ffind'',4840,4841) | inrange(``ffind'',4880,4889) | inrange(``ffind'',4890,4890) | inrange(``ffind'',4891,4891) | inrange(``ffind'',4892,4892) | inrange(``ffind'',4899,4899)\n qui replace `generate'=22 if inrange(``ffind'',7020,7021) | inrange(``ffind'',7030,7033) | inrange(``ffind'',7200,7200) | inrange(``ffind'',7210,7212) | inrange(``ffind'',7214,7214) | inrange(``ffind'',7215,7216) | inrange(``ffind'',7217,7217) | inrange(``ffind'',7218,7218) | inrange(``ffind'',7219,7219) | inrange(``ffind'',7220,7221) | inrange(``ffind'',7230,7231) | inrange(``ffind'',7240,7241) | inrange(``ffind'',7250,7251) | inrange(``ffind'',7260,7269) | inrange(``ffind'',7270,7290) | inrange(``ffind'',7291,7291) | inrange(``ffind'',7292,7299) | inrange(``ffind'',7300,7300) | inrange(``ffind'',7310,7319) | inrange(``ffind'',7320,7329) | inrange(``ffind'',7330,7339) | inrange(``ffind'',7340,7342) | inrange(``ffind'',7349,7349) | inrange(``ffind'',7350,7351) | inrange(``ffind'',7352,7352) | inrange(``ffind'',7353,7353) | inrange(``ffind'',7359,7359) | inrange(``ffind'',7360,7369) | inrange(``ffind'',7370,7372) | inrange(``ffind'',7374,7374) | inrange(``ffind'',7375,7375) | inrange(``ffind'',7376,7376) | inrange(``ffind'',7377,7377) | inrange(``ffind'',7378,7378) | inrange(``ffind'',7379,7379) | inrange(``ffind'',7380,7380) | inrange(``ffind'',7381,7382) | inrange(``ffind'',7383,7383) | inrange(``ffind'',7384,7384) | inrange(``ffind'',7385,7385) | inrange(``ffind'',7389,7390) | inrange(``ffind'',7391,7391) | inrange(``ffind'',7392,7392) | inrange(``ffind'',7393,7393) | inrange(``ffind'',7394,7394) | inrange(``ffind'',7395,7395) | inrange(``ffind'',7396,7396) | inrange(``ffind'',7397,7397) | inrange(``ffind'',7399,7399) | inrange(``ffind'',7500,7500) | inrange(``ffind'',7510,7519) | inrange(``ffind'',7520,7529) | inrange(``ffind'',7530,7539) | inrange(``ffind'',7540,7549) | inrange(``ffind'',7600,7600) | inrange(``ffind'',7620,7620) | inrange(``ffind'',7622,7622) | inrange(``ffind'',7623,7623) | inrange(``ffind'',7629,7629) | inrange(``ffind'',7630,7631) | inrange(``ffind'',7640,7641) | inrange(``ffind'',7690,7699) | inrange(``ffind'',8100,8199) | inrange(``ffind'',8200,8299) | inrange(``ffind'',8300,8399) | inrange(``ffind'',8400,8499) | inrange(``ffind'',8600,8699) | inrange(``ffind'',8700,8700) | inrange(``ffind'',8710,8713) | inrange(``ffind'',8720,8721) | inrange(``ffind'',8730,8734) | inrange(``ffind'',8740,8748) | inrange(``ffind'',8800,8899) | inrange(``ffind'',8900,8910) | inrange(``ffind'',8911,8911) | inrange(``ffind'',8920,8999)\n qui replace `generate'=23 if inrange(``ffind'',3570,3579) | inrange(``ffind'',3622,3622) | inrange(``ffind'',3661,3661) | inrange(``ffind'',3662,3662) | inrange(``ffind'',3663,3663) | inrange(``ffind'',3664,3664) | inrange(``ffind'',3665,3665) | inrange(``ffind'',3666,3666) | inrange(``ffind'',3669,3669) | inrange(``ffind'',3670,3679) | inrange(``ffind'',3680,3680) | inrange(``ffind'',3681,3681) | inrange(``ffind'',3682,3682) | inrange(``ffind'',3683,3683) | inrange(``ffind'',3684,3684) | inrange(``ffind'',3685,3685) | inrange(``ffind'',3686,3686) | inrange(``ffind'',3687,3687) | inrange(``ffind'',3688,3688) | inrange(``ffind'',3689,3689) | inrange(``ffind'',3695,3695) | inrange(``ffind'',3810,3810) | inrange(``ffind'',3811,3811) | inrange(``ffind'',3812,3812) | inrange(``ffind'',3820,3820) | inrange(``ffind'',3821,3821) | inrange(``ffind'',3822,3822) | inrange(``ffind'',3823,3823) | inrange(``ffind'',3824,3824) | inrange(``ffind'',3825,3825) | inrange(``ffind'',3826,3826) | inrange(``ffind'',3827,3827) | inrange(``ffind'',3829,3829) | inrange(``ffind'',3830,3839) | inrange(``ffind'',7373,7373)\n qui replace `generate'=24 if inrange(``ffind'',2440,2449) | inrange(``ffind'',2520,2549) | inrange(``ffind'',2600,2639) | inrange(``ffind'',2640,2659) | inrange(``ffind'',2670,2699) | inrange(``ffind'',2760,2761) | inrange(``ffind'',3220,3221) | inrange(``ffind'',3410,3412) | inrange(``ffind'',3950,3955)\n qui replace `generate'=25 if inrange(``ffind'',4000,4013) | inrange(``ffind'',4040,4049) | inrange(``ffind'',4100,4100) | inrange(``ffind'',4110,4119) | inrange(``ffind'',4120,4121) | inrange(``ffind'',4130,4131) | inrange(``ffind'',4140,4142) | inrange(``ffind'',4150,4151) | inrange(``ffind'',4170,4173) | inrange(``ffind'',4190,4199) | inrange(``ffind'',4200,4200) | inrange(``ffind'',4210,4219) | inrange(``ffind'',4220,4229) | inrange(``ffind'',4230,4231) | inrange(``ffind'',4240,4249) | inrange(``ffind'',4400,4499) | inrange(``ffind'',4500,4599) | inrange(``ffind'',4600,4699) | inrange(``ffind'',4700,4700) | inrange(``ffind'',4710,4712) | inrange(``ffind'',4720,4729) | inrange(``ffind'',4730,4739) | inrange(``ffind'',4740,4749) | inrange(``ffind'',4780,4780) | inrange(``ffind'',4782,4782) | inrange(``ffind'',4783,4783) | inrange(``ffind'',4784,4784) | inrange(``ffind'',4785,4785) | inrange(``ffind'',4789,4789)\n qui replace `generate'=26 if inrange(``ffind'',5000,5000) | inrange(``ffind'',5010,5015) | inrange(``ffind'',5020,5023) | inrange(``ffind'',5030,5039) | inrange(``ffind'',5040,5042) | inrange(``ffind'',5043,5043) | inrange(``ffind'',5044,5044) | inrange(``ffind'',5045,5045) | inrange(``ffind'',5046,5046) | inrange(``ffind'',5047,5047) | inrange(``ffind'',5048,5048) | inrange(``ffind'',5049,5049) | inrange(``ffind'',5050,5059) | inrange(``ffind'',5060,5060) | inrange(``ffind'',5063,5063) | inrange(``ffind'',5064,5064) | inrange(``ffind'',5065,5065) | inrange(``ffind'',5070,5078) | inrange(``ffind'',5080,5080) | inrange(``ffind'',5081,5081) | inrange(``ffind'',5082,5082) | inrange(``ffind'',5083,5083) | inrange(``ffind'',5084,5084) | inrange(``ffind'',5085,5085) | inrange(``ffind'',5086,5087) | inrange(``ffind'',5088,5088) | inrange(``ffind'',5090,5090) | inrange(``ffind'',5091,5092) | inrange(``ffind'',5093,5093) | inrange(``ffind'',5094,5094) | inrange(``ffind'',5099,5099) | inrange(``ffind'',5100,5100) | inrange(``ffind'',5110,5113) | inrange(``ffind'',5120,5122) | inrange(``ffind'',5130,5139) | inrange(``ffind'',5140,5149) | inrange(``ffind'',5150,5159) | inrange(``ffind'',5160,5169) | inrange(``ffind'',5170,5172) | inrange(``ffind'',5180,5182) | inrange(``ffind'',5190,5199)\n qui replace `generate'=27 if inrange(``ffind'',5200,5200) | inrange(``ffind'',5210,5219) | inrange(``ffind'',5220,5229) | inrange(``ffind'',5230,5231) | inrange(``ffind'',5250,5251) | inrange(``ffind'',5260,5261) | inrange(``ffind'',5270,5271) | inrange(``ffind'',5300,5300) | inrange(``ffind'',5310,5311) | inrange(``ffind'',5320,5320) | inrange(``ffind'',5330,5331) | inrange(``ffind'',5334,5334) | inrange(``ffind'',5340,5349) | inrange(``ffind'',5390,5399) | inrange(``ffind'',5400,5400) | inrange(``ffind'',5410,5411) | inrange(``ffind'',5412,5412) | inrange(``ffind'',5420,5429) | inrange(``ffind'',5430,5439) | inrange(``ffind'',5440,5449) | inrange(``ffind'',5450,5459) | inrange(``ffind'',5460,5469) | inrange(``ffind'',5490,5499) | inrange(``ffind'',5500,5500) | inrange(``ffind'',5510,5529) | inrange(``ffind'',5530,5539) | inrange(``ffind'',5540,5549) | inrange(``ffind'',5550,5559) | inrange(``ffind'',5560,5569) | inrange(``ffind'',5570,5579) | inrange(``ffind'',5590,5599) | inrange(``ffind'',5600,5699) | inrange(``ffind'',5700,5700) | inrange(``ffind'',5710,5719) | inrange(``ffind'',5720,5722) | inrange(``ffind'',5730,5733) | inrange(``ffind'',5734,5734) | inrange(``ffind'',5735,5735) | inrange(``ffind'',5736,5736) | inrange(``ffind'',5750,5799) | inrange(``ffind'',5900,5900) | inrange(``ffind'',5910,5912) | inrange(``ffind'',5920,5929) | inrange(``ffind'',5930,5932) | inrange(``ffind'',5940,5940) | inrange(``ffind'',5941,5941) | inrange(``ffind'',5942,5942) | inrange(``ffind'',5943,5943) | inrange(``ffind'',5944,5944) | inrange(``ffind'',5945,5945) | inrange(``ffind'',5946,5946) | inrange(``ffind'',5947,5947) | inrange(``ffind'',5948,5948) | inrange(``ffind'',5949,5949) | inrange(``ffind'',5950,5959) | inrange(``ffind'',5960,5969) | inrange(``ffind'',5970,5979) | inrange(``ffind'',5980,5989) | inrange(``ffind'',5990,5990) | inrange(``ffind'',5992,5992) | inrange(``ffind'',5993,5993) | inrange(``ffind'',5994,5994) | inrange(``ffind'',5995,5995) | inrange(``ffind'',5999,5999)\n qui replace `generate'=28 if inrange(``ffind'',5800,5819) | inrange(``ffind'',5820,5829) | inrange(``ffind'',5890,5899) | inrange(``ffind'',7000,7000) | inrange(``ffind'',7010,7019) | inrange(``ffind'',7040,7049) | inrange(``ffind'',7213,7213)\n qui replace `generate'=29 if inrange(``ffind'',6000,6000) | inrange(``ffind'',6010,6019) | inrange(``ffind'',6020,6020) | inrange(``ffind'',6021,6021) | inrange(``ffind'',6022,6022) | inrange(``ffind'',6023,6024) | inrange(``ffind'',6025,6025) | inrange(``ffind'',6026,6026) | inrange(``ffind'',6027,6027) | inrange(``ffind'',6028,6029) | inrange(``ffind'',6030,6036) | inrange(``ffind'',6040,6059) | inrange(``ffind'',6060,6062) | inrange(``ffind'',6080,6082) | inrange(``ffind'',6090,6099) | inrange(``ffind'',6100,6100) | inrange(``ffind'',6110,6111) | inrange(``ffind'',6112,6113) | inrange(``ffind'',6120,6129) | inrange(``ffind'',6130,6139) | inrange(``ffind'',6140,6149) | inrange(``ffind'',6150,6159) | inrange(``ffind'',6160,6169) | inrange(``ffind'',6170,6179) | inrange(``ffind'',6190,6199) | inrange(``ffind'',6200,6299) | inrange(``ffind'',6300,6300) | inrange(``ffind'',6310,6319) | inrange(``ffind'',6320,6329) | inrange(``ffind'',6330,6331) | inrange(``ffind'',6350,6351) | inrange(``ffind'',6360,6361) | inrange(``ffind'',6370,6379) | inrange(``ffind'',6390,6399) | inrange(``ffind'',6400,6411) | inrange(``ffind'',6500,6500) | inrange(``ffind'',6510,6510) | inrange(``ffind'',6512,6512) | inrange(``ffind'',6513,6513) | inrange(``ffind'',6514,6514) | inrange(``ffind'',6515,6515) | inrange(``ffind'',6517,6519) | inrange(``ffind'',6520,6529) | inrange(``ffind'',6530,6531) | inrange(``ffind'',6532,6532) | inrange(``ffind'',6540,6541) | inrange(``ffind'',6550,6553) | inrange(``ffind'',6590,6599) | inrange(``ffind'',6610,6611) | inrange(``ffind'',6700,6700) | inrange(``ffind'',6710,6719) | inrange(``ffind'',6720,6722) | inrange(``ffind'',6723,6723) | inrange(``ffind'',6724,6724) | inrange(``ffind'',6725,6725) | inrange(``ffind'',6726,6726) | inrange(``ffind'',6730,6733) | inrange(``ffind'',6740,6779) | inrange(``ffind'',6790,6791) | inrange(``ffind'',6792,6792) | inrange(``ffind'',6793,6793) | inrange(``ffind'',6794,6794) | inrange(``ffind'',6795,6795) | inrange(``ffind'',6798,6798) | inrange(``ffind'',6799,6799)\n qui replace `generate'=30 if missing(`generate') & ~missing(``ffind'')\n }\n\n else if ``ftyp''==38 {\n label define `generate' 1 \"Agriculture, forestry, and fishing\" 2 \"Mining\" 3 \"Oil and Gas Extraction\" 4 \"Nonmetalic Minerals Except Fuels\" 5 \"Construction\" 6 \"Food and Kindred Products\" 7 \"Tobacco Products\" 8 \"Textile Mill Products\" 9 \"Apparel and other Textile Products\" 10 \"Lumber and Wood Products\" 11 \"Furniture and Fixtures\" 12 \"Paper and Allied Products\" 13 \"Printing and Publishing\" 14 \"Chemicals and Allied Products\" 15 \"Petroleum and Coal Products\" 16 \"Rubber and Miscellaneous Plastics Products\" 17 \"Leather and Leather Products\" 18 \"Stone, Clay and Glass Products\" 19 \"Primary Metal Industries\" 20 \"Fabricated Metal Products\" 21 \"Machinery, Except Electrical\" 22 \"Electrical and Electronic Equipment\" 23 \"Transportation Equipment\" 24 \"Instruments and Related Products\" 25 \"Miscellaneous Manufacturing Industries\" 26 \"Transportation\" 27 \"Telephone and Telegraph Communication\" 28 \"Radio and Television Broadcasting\" 29 \"Electric, Gas, and Water Supply\" 30 \"Sanitary Services\" 31 \"Steam Supply\" 32 \"Irrigation Systems\" 33 \"Wholesale\" 34 \"Retail Stores\" 35 \"Finance, Insurance, and Real Estate\" 36 \"Services\" 37 \"Public Administration\" 38 \"Almost Nothing\"\n label values `generate' `generate'\n qui replace `generate'=1 if inrange(``ffind'',100,999)\n qui replace `generate'=2 if inrange(``ffind'',1000,1299)\n qui replace `generate'=3 if inrange(``ffind'',1300,1399)\n qui replace `generate'=4 if inrange(``ffind'',1400,1499)\n qui replace `generate'=5 if inrange(``ffind'',1500,1799)\n qui replace `generate'=6 if inrange(``ffind'',2000,2099)\n qui replace `generate'=7 if inrange(``ffind'',2100,2199)\n qui replace `generate'=8 if inrange(``ffind'',2200,2299)\n qui replace `generate'=9 if inrange(``ffind'',2300,2399)\n qui replace `generate'=10 if inrange(``ffind'',2400,2499)\n qui replace `generate'=11 if inrange(``ffind'',2500,2599)\n qui replace `generate'=12 if inrange(``ffind'',2600,2661)\n qui replace `generate'=13 if inrange(``ffind'',2700,2799)\n qui replace `generate'=14 if inrange(``ffind'',2800,2899)\n qui replace `generate'=15 if inrange(``ffind'',2900,2999)\n qui replace `generate'=16 if inrange(``ffind'',3000,3099)\n qui replace `generate'=17 if inrange(``ffind'',3100,3199)\n qui replace `generate'=18 if inrange(``ffind'',3200,3299)\n qui replace `generate'=19 if inrange(``ffind'',3300,3399)\n qui replace `generate'=20 if inrange(``ffind'',3400,3499)\n qui replace `generate'=21 if inrange(``ffind'',3500,3599)\n qui replace `generate'=22 if inrange(``ffind'',3600,3699)\n qui replace `generate'=23 if inrange(``ffind'',3700,3799)\n qui replace `generate'=24 if inrange(``ffind'',3800,3879)\n qui replace `generate'=25 if inrange(``ffind'',3900,3999)\n qui replace `generate'=26 if inrange(``ffind'',4000,4799)\n qui replace `generate'=27 if inrange(``ffind'',4800,4829)\n qui replace `generate'=28 if inrange(``ffind'',4830,4899)\n qui replace `generate'=29 if inrange(``ffind'',4900,4949)\n qui replace `generate'=30 if inrange(``ffind'',4950,4959)\n qui replace `generate'=31 if inrange(``ffind'',4960,4969)\n qui replace `generate'=32 if inrange(``ffind'',4970,4979)\n qui replace `generate'=33 if inrange(``ffind'',5000,5199)\n qui replace `generate'=34 if inrange(``ffind'',5200,5999)\n qui replace `generate'=35 if inrange(``ffind'',6000,6999)\n qui replace `generate'=36 if inrange(``ffind'',7000,8999)\n qui replace `generate'=37 if inrange(``ffind'',9000,9999)\n qui replace `generate'=38 if missing(`generate') & ~missing(``ffind'')\n\n }\n else if ``ftyp''==48 {\n label define `generate' 1 \"Agriculture\" 2 \"Food Products\" 3 \"Candy & Soda\" 4 \"Beer & Liquor\" 5 \"Tobacco Products\" 6 \"Recreation\" 7 \"Entertainment\" 8 \"Printing and Publishing\" 9 \"Consumer Goods\" 10 \"Apparel\" 11 \"Healthcare\" 12 \"Medical Equipment\" 13 \"Pharmaceutical Products\" 14 \"Chemicals\" 15 \"Rubber and Plastic Products\" 16 \"Textiles\" 17 \"Construction Materials\" 18 \"Construction\" 19 \"Steel Works Etc\" 20 \"Fabricated Products\" 21 \"Machinery\" 22 \"Electrical Equipment\" 23 \"Automobiles and Trucks\" 24 \"Aircraft\" 25 \"Shipbuilding, Railroad Equipment\" 26 \"Defense\" 27 \"Precious Metals\" 28 \"Non-Metallic and Industrial Metal Mining\" 29 \"Coal\" 30 \"Petroleum and Natural Gas\" 31 \"Utilities\" 32 \"Communication\" 33 \"Personal Services\" 34 \"Business Services\" 35 \"Computers\" 36 \"Electronic Equipment\" 37 \"Measuring and Control Equipment\" 38 \"Business Supplies\" 39 \"Shipping Containers\" 40 \"Transportation\" 41 \"Wholesale\" 42 \"Retail\" 43 \"Restaraunts, Hotels, Motels\" 44 \"Banking\" 45 \"Insurance\" 46 \"Real Estate\" 47 \"Trading\" 48 \"Almost Nothing\"\n label values `generate' `generate'\n qui replace `generate'=1 if inrange(``ffind'',100,199) | inrange(``ffind'',200,299) | inrange(``ffind'',700,799) | inrange(``ffind'',910,919) | inrange(``ffind'',2048,2048)\n qui replace `generate'=2 if inrange(``ffind'',2000,2009) | inrange(``ffind'',2010,2019) | inrange(``ffind'',2020,2029) | inrange(``ffind'',2030,2039) | inrange(``ffind'',2040,2046) | inrange(``ffind'',2050,2059) | inrange(``ffind'',2060,2063) | inrange(``ffind'',2070,2079) | inrange(``ffind'',2090,2092) | inrange(``ffind'',2095,2095) | inrange(``ffind'',2098,2099)\n qui replace `generate'=3 if inrange(``ffind'',2064,2068) | inrange(``ffind'',2086,2086) | inrange(``ffind'',2087,2087) | inrange(``ffind'',2096,2096) | inrange(``ffind'',2097,2097)\n qui replace `generate'=4 if inrange(``ffind'',2080,2080) | inrange(``ffind'',2082,2082) | inrange(``ffind'',2083,2083) | inrange(``ffind'',2084,2084) | inrange(``ffind'',2085,2085)\n qui replace `generate'=5 if inrange(``ffind'',2100,2199)\n qui replace `generate'=6 if inrange(``ffind'',920,999) | inrange(``ffind'',3650,3651) | inrange(``ffind'',3652,3652) | inrange(``ffind'',3732,3732) | inrange(``ffind'',3930,3931) | inrange(``ffind'',3940,3949)\n qui replace `generate'=7 if inrange(``ffind'',7800,7829) | inrange(``ffind'',7830,7833) | inrange(``ffind'',7840,7841) | inrange(``ffind'',7900,7900) | inrange(``ffind'',7910,7911) | inrange(``ffind'',7920,7929) | inrange(``ffind'',7930,7933) | inrange(``ffind'',7940,7949) | inrange(``ffind'',7980,7980) | inrange(``ffind'',7990,7999)\n qui replace `generate'=8 if inrange(``ffind'',2700,2709) | inrange(``ffind'',2710,2719) | inrange(``ffind'',2720,2729) | inrange(``ffind'',2730,2739) | inrange(``ffind'',2740,2749) | inrange(``ffind'',2770,2771) | inrange(``ffind'',2780,2789) | inrange(``ffind'',2790,2799)\n qui replace `generate'=9 if inrange(``ffind'',2047,2047) | inrange(``ffind'',2391,2392) | inrange(``ffind'',2510,2519) | inrange(``ffind'',2590,2599) | inrange(``ffind'',2840,2843) | inrange(``ffind'',2844,2844) | inrange(``ffind'',3160,3161) | inrange(``ffind'',3170,3171) | inrange(``ffind'',3172,3172) | inrange(``ffind'',3190,3199) | inrange(``ffind'',3229,3229) | inrange(``ffind'',3260,3260) | inrange(``ffind'',3262,3263) | inrange(``ffind'',3269,3269) | inrange(``ffind'',3230,3231) | inrange(``ffind'',3630,3639) | inrange(``ffind'',3750,3751) | inrange(``ffind'',3800,3800) | inrange(``ffind'',3860,3861) | inrange(``ffind'',3870,3873) | inrange(``ffind'',3910,3911) | inrange(``ffind'',3914,3914) | inrange(``ffind'',3915,3915) | inrange(``ffind'',3960,3962) | inrange(``ffind'',3991,3991) | inrange(``ffind'',3995,3995)\n qui replace `generate'=10 if inrange(``ffind'',2300,2390) | inrange(``ffind'',3020,3021) | inrange(``ffind'',3100,3111) | inrange(``ffind'',3130,3131) | inrange(``ffind'',3140,3149) | inrange(``ffind'',3150,3151) | inrange(``ffind'',3963,3965)\n qui replace `generate'=11 if inrange(``ffind'',8000,8099)\n qui replace `generate'=12 if inrange(``ffind'',3693,3693) | inrange(``ffind'',3840,3849) | inrange(``ffind'',3850,3851)\n qui replace `generate'=13 if inrange(``ffind'',2830,2830) | inrange(``ffind'',2831,2831) | inrange(``ffind'',2833,2833) | inrange(``ffind'',2834,2834) | inrange(``ffind'',2835,2835) | inrange(``ffind'',2836,2836)\n qui replace `generate'=14 if inrange(``ffind'',2800,2809) | inrange(``ffind'',2810,2819) | inrange(``ffind'',2820,2829) | inrange(``ffind'',2850,2859) | inrange(``ffind'',2860,2869) | inrange(``ffind'',2870,2879) | inrange(``ffind'',2890,2899)\n qui replace `generate'=15 if inrange(``ffind'',3031,3031) | inrange(``ffind'',3041,3041) | inrange(``ffind'',3050,3053) | inrange(``ffind'',3060,3069) | inrange(``ffind'',3070,3079) | inrange(``ffind'',3080,3089) | inrange(``ffind'',3090,3099)\n qui replace `generate'=16 if inrange(``ffind'',2200,2269) | inrange(``ffind'',2270,2279) | inrange(``ffind'',2280,2284) | inrange(``ffind'',2290,2295) | inrange(``ffind'',2297,2297) | inrange(``ffind'',2298,2298) | inrange(``ffind'',2299,2299) | inrange(``ffind'',2393,2395) | inrange(``ffind'',2397,2399)\n qui replace `generate'=17 if inrange(``ffind'',800,899) | inrange(``ffind'',2400,2439) | inrange(``ffind'',2450,2459) | inrange(``ffind'',2490,2499) | inrange(``ffind'',2660,2661) | inrange(``ffind'',2950,2952) | inrange(``ffind'',3200,3200) | inrange(``ffind'',3210,3211) | inrange(``ffind'',3240,3241) | inrange(``ffind'',3250,3259) | inrange(``ffind'',3261,3261) | inrange(``ffind'',3264,3264) | inrange(``ffind'',3270,3275) | inrange(``ffind'',3280,3281) | inrange(``ffind'',3290,3293) | inrange(``ffind'',3295,3299) | inrange(``ffind'',3420,3429) | inrange(``ffind'',3430,3433) | inrange(``ffind'',3440,3441) | inrange(``ffind'',3442,3442) | inrange(``ffind'',3446,3446) | inrange(``ffind'',3448,3448) | inrange(``ffind'',3449,3449) | inrange(``ffind'',3450,3451) | inrange(``ffind'',3452,3452) | inrange(``ffind'',3490,3499) | inrange(``ffind'',3996,3996)\n qui replace `generate'=18 if inrange(``ffind'',1500,1511) | inrange(``ffind'',1520,1529) | inrange(``ffind'',1530,1539) | inrange(``ffind'',1540,1549) | inrange(``ffind'',1600,1699) | inrange(``ffind'',1700,1799)\n qui replace `generate'=19 if inrange(``ffind'',3300,3300) | inrange(``ffind'',3310,3317) | inrange(``ffind'',3320,3325) | inrange(``ffind'',3330,3339) | inrange(``ffind'',3340,3341) | inrange(``ffind'',3350,3357) | inrange(``ffind'',3360,3369) | inrange(``ffind'',3370,3379) | inrange(``ffind'',3390,3399)\n qui replace `generate'=20 if inrange(``ffind'',3400,3400) | inrange(``ffind'',3443,3443) | inrange(``ffind'',3444,3444) | inrange(``ffind'',3460,3469) | inrange(``ffind'',3470,3479)\n qui replace `generate'=21 if inrange(``ffind'',3510,3519) | inrange(``ffind'',3520,3529) | inrange(``ffind'',3530,3530) | inrange(``ffind'',3531,3531) | inrange(``ffind'',3532,3532) | inrange(``ffind'',3533,3533) | inrange(``ffind'',3534,3534) | inrange(``ffind'',3535,3535) | inrange(``ffind'',3536,3536) | inrange(``ffind'',3538,3538) | inrange(``ffind'',3540,3549) | inrange(``ffind'',3550,3559) | inrange(``ffind'',3560,3569) | inrange(``ffind'',3580,3580) | inrange(``ffind'',3581,3581) | inrange(``ffind'',3582,3582) | inrange(``ffind'',3585,3585) | inrange(``ffind'',3586,3586) | inrange(``ffind'',3589,3589) | inrange(``ffind'',3590,3599)\n qui replace `generate'=22 if inrange(``ffind'',3600,3600) | inrange(``ffind'',3610,3613) | inrange(``ffind'',3620,3621) | inrange(``ffind'',3623,3629) | inrange(``ffind'',3640,3644) | inrange(``ffind'',3645,3645) | inrange(``ffind'',3646,3646) | inrange(``ffind'',3648,3649) | inrange(``ffind'',3660,3660) | inrange(``ffind'',3690,3690) | inrange(``ffind'',3691,3692) | inrange(``ffind'',3699,3699)\n qui replace `generate'=23 if inrange(``ffind'',2296,2296) | inrange(``ffind'',2396,2396) | inrange(``ffind'',3010,3011) | inrange(``ffind'',3537,3537) | inrange(``ffind'',3647,3647) | inrange(``ffind'',3694,3694) | inrange(``ffind'',3700,3700) | inrange(``ffind'',3710,3710) | inrange(``ffind'',3711,3711) | inrange(``ffind'',3713,3713) | inrange(``ffind'',3714,3714) | inrange(``ffind'',3715,3715) | inrange(``ffind'',3716,3716) | inrange(``ffind'',3792,3792) | inrange(``ffind'',3790,3791) | inrange(``ffind'',3799,3799)\n qui replace `generate'=24 if inrange(``ffind'',3720,3720) | inrange(``ffind'',3721,3721) | inrange(``ffind'',3723,3724) | inrange(``ffind'',3725,3725) | inrange(``ffind'',3728,3729)\n qui replace `generate'=25 if inrange(``ffind'',3730,3731) | inrange(``ffind'',3740,3743)\n qui replace `generate'=26 if inrange(``ffind'',3760,3769) | inrange(``ffind'',3795,3795) | inrange(``ffind'',3480,3489)\n qui replace `generate'=27 if inrange(``ffind'',1040,1049)\n qui replace `generate'=28 if inrange(``ffind'',1000,1009) | inrange(``ffind'',1010,1019) | inrange(``ffind'',1020,1029) | inrange(``ffind'',1030,1039) | inrange(``ffind'',1050,1059) | inrange(``ffind'',1060,1069) | inrange(``ffind'',1070,1079) | inrange(``ffind'',1080,1089) | inrange(``ffind'',1090,1099) | inrange(``ffind'',1100,1119) | inrange(``ffind'',1400,1499)\n qui replace `generate'=29 if inrange(``ffind'',1200,1299)\n qui replace `generate'=30 if inrange(``ffind'',1300,1300) | inrange(``ffind'',1310,1319) | inrange(``ffind'',1320,1329) | inrange(``ffind'',1330,1339) | inrange(``ffind'',1370,1379) | inrange(``ffind'',1380,1380) | inrange(``ffind'',1381,1381) | inrange(``ffind'',1382,1382) | inrange(``ffind'',1389,1389) | inrange(``ffind'',2900,2912) | inrange(``ffind'',2990,2999)\n qui replace `generate'=31 if inrange(``ffind'',4900,4900) | inrange(``ffind'',4910,4911) | inrange(``ffind'',4920,4922) | inrange(``ffind'',4923,4923) | inrange(``ffind'',4924,4925) | inrange(``ffind'',4930,4931) | inrange(``ffind'',4932,4932) | inrange(``ffind'',4939,4939) | inrange(``ffind'',4940,4942)\n qui replace `generate'=32 if inrange(``ffind'',4800,4800) | inrange(``ffind'',4810,4813) | inrange(``ffind'',4820,4822) | inrange(``ffind'',4830,4839) | inrange(``ffind'',4840,4841) | inrange(``ffind'',4880,4889) | inrange(``ffind'',4890,4890) | inrange(``ffind'',4891,4891) | inrange(``ffind'',4892,4892) | inrange(``ffind'',4899,4899)\n qui replace `generate'=33 if inrange(``ffind'',7020,7021) | inrange(``ffind'',7030,7033) | inrange(``ffind'',7200,7200) | inrange(``ffind'',7210,7212) | inrange(``ffind'',7214,7214) | inrange(``ffind'',7215,7216) | inrange(``ffind'',7217,7217) | inrange(``ffind'',7219,7219) | inrange(``ffind'',7220,7221) | inrange(``ffind'',7230,7231) | inrange(``ffind'',7240,7241) | inrange(``ffind'',7250,7251) | inrange(``ffind'',7260,7269) | inrange(``ffind'',7270,7290) | inrange(``ffind'',7291,7291) | inrange(``ffind'',7292,7299) | inrange(``ffind'',7395,7395) | inrange(``ffind'',7500,7500) | inrange(``ffind'',7520,7529) | inrange(``ffind'',7530,7539) | inrange(``ffind'',7540,7549) | inrange(``ffind'',7600,7600) | inrange(``ffind'',7620,7620) | inrange(``ffind'',7622,7622) | inrange(``ffind'',7623,7623) | inrange(``ffind'',7629,7629) | inrange(``ffind'',7630,7631) | inrange(``ffind'',7640,7641) | inrange(``ffind'',7690,7699) | inrange(``ffind'',8100,8199) | inrange(``ffind'',8200,8299) | inrange(``ffind'',8300,8399) | inrange(``ffind'',8400,8499) | inrange(``ffind'',8600,8699) | inrange(``ffind'',8800,8899) | inrange(``ffind'',7510,7515)\n qui replace `generate'=34 if inrange(``ffind'',2750,2759) | inrange(``ffind'',3993,3993) | inrange(``ffind'',7218,7218) | inrange(``ffind'',7300,7300) | inrange(``ffind'',7310,7319) | inrange(``ffind'',7320,7329) | inrange(``ffind'',7330,7339) | inrange(``ffind'',7340,7342) | inrange(``ffind'',7349,7349) | inrange(``ffind'',7350,7351) | inrange(``ffind'',7352,7352) | inrange(``ffind'',7353,7353) | inrange(``ffind'',7359,7359) | inrange(``ffind'',7360,7369) | inrange(``ffind'',7370,7372) | inrange(``ffind'',7374,7374) | inrange(``ffind'',7375,7375) | inrange(``ffind'',7376,7376) | inrange(``ffind'',7377,7377) | inrange(``ffind'',7378,7378) | inrange(``ffind'',7379,7379) | inrange(``ffind'',7380,7380) | inrange(``ffind'',7381,7382) | inrange(``ffind'',7383,7383) | inrange(``ffind'',7384,7384) | inrange(``ffind'',7385,7385) | inrange(``ffind'',7389,7390) | inrange(``ffind'',7391,7391) | inrange(``ffind'',7392,7392) | inrange(``ffind'',7393,7393) | inrange(``ffind'',7394,7394) | inrange(``ffind'',7396,7396) | inrange(``ffind'',7397,7397) | inrange(``ffind'',7399,7399) | inrange(``ffind'',7519,7519) | inrange(``ffind'',8700,8700) | inrange(``ffind'',8710,8713) | inrange(``ffind'',8720,8721) | inrange(``ffind'',8730,8734) | inrange(``ffind'',8740,8748) | inrange(``ffind'',8900,8910) | inrange(``ffind'',8911,8911) | inrange(``ffind'',8920,8999) | inrange(``ffind'',4220,4229)\n qui replace `generate'=35 if inrange(``ffind'',3570,3579) | inrange(``ffind'',3680,3680) | inrange(``ffind'',3681,3681) | inrange(``ffind'',3682,3682) | inrange(``ffind'',3683,3683) | inrange(``ffind'',3684,3684) | inrange(``ffind'',3685,3685) | inrange(``ffind'',3686,3686) | inrange(``ffind'',3687,3687) | inrange(``ffind'',3688,3688) | inrange(``ffind'',3689,3689) | inrange(``ffind'',3695,3695) | inrange(``ffind'',7373,7373)\n qui replace `generate'=36 if inrange(``ffind'',3622,3622) | inrange(``ffind'',3661,3661) | inrange(``ffind'',3662,3662) | inrange(``ffind'',3663,3663) | inrange(``ffind'',3664,3664) | inrange(``ffind'',3665,3665) | inrange(``ffind'',3666,3666) | inrange(``ffind'',3669,3669) | inrange(``ffind'',3670,3679) | inrange(``ffind'',3810,3810) | inrange(``ffind'',3812,3812)\n qui replace `generate'=37 if inrange(``ffind'',3811,3811) | inrange(``ffind'',3820,3820) | inrange(``ffind'',3821,3821) | inrange(``ffind'',3822,3822) | inrange(``ffind'',3823,3823) | inrange(``ffind'',3824,3824) | inrange(``ffind'',3825,3825) | inrange(``ffind'',3826,3826) | inrange(``ffind'',3827,3827) | inrange(``ffind'',3829,3829) | inrange(``ffind'',3830,3839)\n qui replace `generate'=38 if inrange(``ffind'',2520,2549) | inrange(``ffind'',2600,2639) | inrange(``ffind'',2670,2699) | inrange(``ffind'',2760,2761) | inrange(``ffind'',3950,3955)\n qui replace `generate'=39 if inrange(``ffind'',2440,2449) | inrange(``ffind'',2640,2659) | inrange(``ffind'',3220,3221) | inrange(``ffind'',3410,3412)\n qui replace `generate'=40 if inrange(``ffind'',4000,4013) | inrange(``ffind'',4040,4049) | inrange(``ffind'',4100,4100) | inrange(``ffind'',4110,4119) | inrange(``ffind'',4120,4121) | inrange(``ffind'',4130,4131) | inrange(``ffind'',4140,4142) | inrange(``ffind'',4150,4151) | inrange(``ffind'',4170,4173) | inrange(``ffind'',4190,4199) | inrange(``ffind'',4200,4200) | inrange(``ffind'',4210,4219) | inrange(``ffind'',4230,4231) | inrange(``ffind'',4240,4249) | inrange(``ffind'',4400,4499) | inrange(``ffind'',4500,4599) | inrange(``ffind'',4600,4699) | inrange(``ffind'',4700,4700) | inrange(``ffind'',4710,4712) | inrange(``ffind'',4720,4729) | inrange(``ffind'',4730,4739) | inrange(``ffind'',4740,4749) | inrange(``ffind'',4780,4780) | inrange(``ffind'',4782,4782) | inrange(``ffind'',4783,4783) | inrange(``ffind'',4784,4784) | inrange(``ffind'',4785,4785) | inrange(``ffind'',4789,4789)\n qui replace `generate'=41 if inrange(``ffind'',5000,5000) | inrange(``ffind'',5010,5015) | inrange(``ffind'',5020,5023) | inrange(``ffind'',5030,5039) | inrange(``ffind'',5040,5042) | inrange(``ffind'',5043,5043) | inrange(``ffind'',5044,5044) | inrange(``ffind'',5045,5045) | inrange(``ffind'',5046,5046) | inrange(``ffind'',5047,5047) | inrange(``ffind'',5048,5048) | inrange(``ffind'',5049,5049) | inrange(``ffind'',5050,5059) | inrange(``ffind'',5060,5060) | inrange(``ffind'',5063,5063) | inrange(``ffind'',5064,5064) | inrange(``ffind'',5065,5065) | inrange(``ffind'',5070,5078) | inrange(``ffind'',5080,5080) | inrange(``ffind'',5081,5081) | inrange(``ffind'',5082,5082) | inrange(``ffind'',5083,5083) | inrange(``ffind'',5084,5084) | inrange(``ffind'',5085,5085) | inrange(``ffind'',5086,5087) | inrange(``ffind'',5088,5088) | inrange(``ffind'',5090,5090) | inrange(``ffind'',5091,5092) | inrange(``ffind'',5093,5093) | inrange(``ffind'',5094,5094) | inrange(``ffind'',5099,5099) | inrange(``ffind'',5100,5100) | inrange(``ffind'',5110,5113) | inrange(``ffind'',5120,5122) | inrange(``ffind'',5130,5139) | inrange(``ffind'',5140,5149) | inrange(``ffind'',5150,5159) | inrange(``ffind'',5160,5169) | inrange(``ffind'',5170,5172) | inrange(``ffind'',5180,5182) | inrange(``ffind'',5190,5199)\n qui replace `generate'=42 if inrange(``ffind'',5200,5200) | inrange(``ffind'',5210,5219) | inrange(``ffind'',5220,5229) | inrange(``ffind'',5230,5231) | inrange(``ffind'',5250,5251) | inrange(``ffind'',5260,5261) | inrange(``ffind'',5270,5271) | inrange(``ffind'',5300,5300) | inrange(``ffind'',5310,5311) | inrange(``ffind'',5320,5320) | inrange(``ffind'',5330,5331) | inrange(``ffind'',5334,5334) | inrange(``ffind'',5340,5349) | inrange(``ffind'',5390,5399) | inrange(``ffind'',5400,5400) | inrange(``ffind'',5410,5411) | inrange(``ffind'',5412,5412) | inrange(``ffind'',5420,5429) | inrange(``ffind'',5430,5439) | inrange(``ffind'',5440,5449) | inrange(``ffind'',5450,5459) | inrange(``ffind'',5460,5469) | inrange(``ffind'',5490,5499) | inrange(``ffind'',5500,5500) | inrange(``ffind'',5510,5529) | inrange(``ffind'',5530,5539) | inrange(``ffind'',5540,5549) | inrange(``ffind'',5550,5559) | inrange(``ffind'',5560,5569) | inrange(``ffind'',5570,5579) | inrange(``ffind'',5590,5599) | inrange(``ffind'',5600,5699) | inrange(``ffind'',5700,5700) | inrange(``ffind'',5710,5719) | inrange(``ffind'',5720,5722) | inrange(``ffind'',5730,5733) | inrange(``ffind'',5734,5734) | inrange(``ffind'',5735,5735) | inrange(``ffind'',5736,5736) | inrange(``ffind'',5750,5799) | inrange(``ffind'',5900,5900) | inrange(``ffind'',5910,5912) | inrange(``ffind'',5920,5929) | inrange(``ffind'',5930,5932) | inrange(``ffind'',5940,5940) | inrange(``ffind'',5941,5941) | inrange(``ffind'',5942,5942) | inrange(``ffind'',5943,5943) | inrange(``ffind'',5944,5944) | inrange(``ffind'',5945,5945) | inrange(``ffind'',5946,5946) | inrange(``ffind'',5947,5947) | inrange(``ffind'',5948,5948) | inrange(``ffind'',5949,5949) | inrange(``ffind'',5950,5959) | inrange(``ffind'',5960,5969) | inrange(``ffind'',5970,5979) | inrange(``ffind'',5980,5989) | inrange(``ffind'',5990,5990) | inrange(``ffind'',5992,5992) | inrange(``ffind'',5993,5993) | inrange(``ffind'',5994,5994) | inrange(``ffind'',5995,5995) | inrange(``ffind'',5999,5999)\n qui replace `generate'=43 if inrange(``ffind'',5800,5819) | inrange(``ffind'',5820,5829) | inrange(``ffind'',5890,5899) | inrange(``ffind'',7000,7000) | inrange(``ffind'',7010,7019) | inrange(``ffind'',7040,7049) | inrange(``ffind'',7213,7213)\n qui replace `generate'=44 if inrange(``ffind'',6000,6000) | inrange(``ffind'',6010,6019) | inrange(``ffind'',6020,6020) | inrange(``ffind'',6021,6021) | inrange(``ffind'',6022,6022) | inrange(``ffind'',6023,6024) | inrange(``ffind'',6025,6025) | inrange(``ffind'',6026,6026) | inrange(``ffind'',6027,6027) | inrange(``ffind'',6028,6029) | inrange(``ffind'',6030,6036) | inrange(``ffind'',6040,6059) | inrange(``ffind'',6060,6062) | inrange(``ffind'',6080,6082) | inrange(``ffind'',6090,6099) | inrange(``ffind'',6100,6100) | inrange(``ffind'',6110,6111) | inrange(``ffind'',6112,6113) | inrange(``ffind'',6120,6129) | inrange(``ffind'',6130,6139) | inrange(``ffind'',6140,6149) | inrange(``ffind'',6150,6159) | inrange(``ffind'',6160,6169) | inrange(``ffind'',6170,6179) | inrange(``ffind'',6190,6199)\n qui replace `generate'=45 if inrange(``ffind'',6300,6300) | inrange(``ffind'',6310,6319) | inrange(``ffind'',6320,6329) | inrange(``ffind'',6330,6331) | inrange(``ffind'',6350,6351) | inrange(``ffind'',6360,6361) | inrange(``ffind'',6370,6379) | inrange(``ffind'',6390,6399) | inrange(``ffind'',6400,6411)\n qui replace `generate'=46 if inrange(``ffind'',6500,6500) | inrange(``ffind'',6510,6510) | inrange(``ffind'',6512,6512) | inrange(``ffind'',6513,6513) | inrange(``ffind'',6514,6514) | inrange(``ffind'',6515,6515) | inrange(``ffind'',6517,6519) | inrange(``ffind'',6520,6529) | inrange(``ffind'',6530,6531) | inrange(``ffind'',6532,6532) | inrange(``ffind'',6540,6541) | inrange(``ffind'',6550,6553) | inrange(``ffind'',6590,6599) | inrange(``ffind'',6610,6611)\n qui replace `generate'=47 if inrange(``ffind'',6200,6299) | inrange(``ffind'',6700,6700) | inrange(``ffind'',6710,6719) | inrange(``ffind'',6720,6722) | inrange(``ffind'',6723,6723) | inrange(``ffind'',6724,6724) | inrange(``ffind'',6725,6725) | inrange(``ffind'',6726,6726) | inrange(``ffind'',6730,6733) | inrange(``ffind'',6740,6779) | inrange(``ffind'',6790,6791) | inrange(``ffind'',6792,6792) | inrange(``ffind'',6793,6793) | inrange(``ffind'',6794,6794) | inrange(``ffind'',6795,6795) | inrange(``ffind'',6798,6798) | inrange(``ffind'',6799,6799)\n qui replace `generate'=48 if missing(`generate') & ~missing(``ffind'')\n\n }\n else if ``ftyp''==49 {\n label define `generate' 1 \"Agriculture\" 2 \"Food Products\" 3 \"Candy & Soda\" 4 \"Beer & Liquor\" 5 \"Tobacco Products\" 6 \"Recreation\" 7 \"Entertainment\" 8 \"Printing and Publishing\" 9 \"Consumer Goods\" 10 \"Apparel\" 11 \"Healthcare\" 12 \"Medical Equipment\" 13 \"Pharmaceutical Products\" 14 \"Chemicals\" 15 \"Rubber and Plastic Products\" 16 \"Textiles\" 17 \"Construction Materials\" 18 \"Construction\" 19 \"Steel Works Etc\" 20 \"Fabricated Products\" 21 \"Machinery\" 22 \"Electrical Equipment\" 23 \"Automobiles and Trucks\" 24 \"Aircraft\" 25 \"Shipbuilding, Railroad Equipment\" 26 \"Defense\" 27 \"Precious Metals\" 28 \"Non-Metallic and Industrial Metal Mining\" 29 \"Coal\" 30 \"Petroleum and Natural Gas\" 31 \"Utilities\" 32 \"Communication\" 33 \"Personal Services\" 34 \"Business Services\" 35 \"Computer Hardware\" 36 \"Computer Software\" 37 \"Electronic Equipment\" 38 \"Measuring and Control Equipment\" 39 \"Business Supplies\" 40 \"Shipping Containers\" 41 \"Transportation\" 42 \"Wholesale\" 43 \"Retail\" 44 \"Restaraunts, Hotels, Motels\" 45 \"Banking\" 46 \"Insurance\" 47 \"Real Estate\" 48 \"Trading\" 49 \"Almost Nothing\"\n label values `generate' `generate'\n qui replace `generate'=1 if inrange(``ffind'',100,199) | inrange(``ffind'',200,299) | inrange(``ffind'',700,799) | inrange(``ffind'',910,919) | inrange(``ffind'',2048,2048)\n qui replace `generate'=2 if inrange(``ffind'',2000,2009) | inrange(``ffind'',2010,2019) | inrange(``ffind'',2020,2029) | inrange(``ffind'',2030,2039) | inrange(``ffind'',2040,2046) | inrange(``ffind'',2050,2059) | inrange(``ffind'',2060,2063) | inrange(``ffind'',2070,2079) | inrange(``ffind'',2090,2092) | inrange(``ffind'',2095,2095) | inrange(``ffind'',2098,2099)\n qui replace `generate'=3 if inrange(``ffind'',2064,2068) | inrange(``ffind'',2086,2086) | inrange(``ffind'',2087,2087) | inrange(``ffind'',2096,2096) | inrange(``ffind'',2097,2097)\n qui replace `generate'=4 if inrange(``ffind'',2080,2080) | inrange(``ffind'',2082,2082) | inrange(``ffind'',2083,2083) | inrange(``ffind'',2084,2084) | inrange(``ffind'',2085,2085)\n qui replace `generate'=5 if inrange(``ffind'',2100,2199)\n qui replace `generate'=6 if inrange(``ffind'',920,999) | inrange(``ffind'',3650,3651) | inrange(``ffind'',3652,3652) | inrange(``ffind'',3732,3732) | inrange(``ffind'',3930,3931) | inrange(``ffind'',3940,3949)\n qui replace `generate'=7 if inrange(``ffind'',7800,7829) | inrange(``ffind'',7830,7833) | inrange(``ffind'',7840,7841) | inrange(``ffind'',7900,7900) | inrange(``ffind'',7910,7911) | inrange(``ffind'',7920,7929) | inrange(``ffind'',7930,7933) | inrange(``ffind'',7940,7949) | inrange(``ffind'',7980,7980) | inrange(``ffind'',7990,7999)\n qui replace `generate'=8 if inrange(``ffind'',2700,2709) | inrange(``ffind'',2710,2719) | inrange(``ffind'',2720,2729) | inrange(``ffind'',2730,2739) | inrange(``ffind'',2740,2749) | inrange(``ffind'',2770,2771) | inrange(``ffind'',2780,2789) | inrange(``ffind'',2790,2799)\n qui replace `generate'=9 if inrange(``ffind'',2047,2047) | inrange(``ffind'',2391,2392) | inrange(``ffind'',2510,2519) | inrange(``ffind'',2590,2599) | inrange(``ffind'',2840,2843) | inrange(``ffind'',2844,2844) | inrange(``ffind'',3160,3161) | inrange(``ffind'',3170,3171) | inrange(``ffind'',3172,3172) | inrange(``ffind'',3190,3199) | inrange(``ffind'',3229,3229) | inrange(``ffind'',3260,3260) | inrange(``ffind'',3262,3263) | inrange(``ffind'',3269,3269) | inrange(``ffind'',3230,3231) | inrange(``ffind'',3630,3639) | inrange(``ffind'',3750,3751) | inrange(``ffind'',3800,3800) | inrange(``ffind'',3860,3861) | inrange(``ffind'',3870,3873) | inrange(``ffind'',3910,3911) | inrange(``ffind'',3914,3914) | inrange(``ffind'',3915,3915) | inrange(``ffind'',3960,3962) | inrange(``ffind'',3991,3991) | inrange(``ffind'',3995,3995)\n qui replace `generate'=10 if inrange(``ffind'',2300,2390) | inrange(``ffind'',3020,3021) | inrange(``ffind'',3100,3111) | inrange(``ffind'',3130,3131) | inrange(``ffind'',3140,3149) | inrange(``ffind'',3150,3151) | inrange(``ffind'',3963,3965)\n qui replace `generate'=11 if inrange(``ffind'',8000,8099)\n qui replace `generate'=12 if inrange(``ffind'',3693,3693) | inrange(``ffind'',3840,3849) | inrange(``ffind'',3850,3851)\n qui replace `generate'=13 if inrange(``ffind'',2830,2830) | inrange(``ffind'',2831,2831) | inrange(``ffind'',2833,2833) | inrange(``ffind'',2834,2834) | inrange(``ffind'',2835,2835) | inrange(``ffind'',2836,2836)\n qui replace `generate'=14 if inrange(``ffind'',2800,2809) | inrange(``ffind'',2810,2819) | inrange(``ffind'',2820,2829) | inrange(``ffind'',2850,2859) | inrange(``ffind'',2860,2869) | inrange(``ffind'',2870,2879) | inrange(``ffind'',2890,2899)\n qui replace `generate'=15 if inrange(``ffind'',3031,3031) | inrange(``ffind'',3041,3041) | inrange(``ffind'',3050,3053) | inrange(``ffind'',3060,3069) | inrange(``ffind'',3070,3079) | inrange(``ffind'',3080,3089) | inrange(``ffind'',3090,3099)\n qui replace `generate'=16 if inrange(``ffind'',2200,2269) | inrange(``ffind'',2270,2279) | inrange(``ffind'',2280,2284) | inrange(``ffind'',2290,2295) | inrange(``ffind'',2297,2297) | inrange(``ffind'',2298,2298) | inrange(``ffind'',2299,2299) | inrange(``ffind'',2393,2395) | inrange(``ffind'',2397,2399)\n qui replace `generate'=17 if inrange(``ffind'',800,899) | inrange(``ffind'',2400,2439) | inrange(``ffind'',2450,2459) | inrange(``ffind'',2490,2499) | inrange(``ffind'',2660,2661) | inrange(``ffind'',2950,2952) | inrange(``ffind'',3200,3200) | inrange(``ffind'',3210,3211) | inrange(``ffind'',3240,3241) | inrange(``ffind'',3250,3259) | inrange(``ffind'',3261,3261) | inrange(``ffind'',3264,3264) | inrange(``ffind'',3270,3275) | inrange(``ffind'',3280,3281) | inrange(``ffind'',3290,3293) | inrange(``ffind'',3295,3299) | inrange(``ffind'',3420,3429) | inrange(``ffind'',3430,3433) | inrange(``ffind'',3440,3441) | inrange(``ffind'',3442,3442) | inrange(``ffind'',3446,3446) | inrange(``ffind'',3448,3448) | inrange(``ffind'',3449,3449) | inrange(``ffind'',3450,3451) | inrange(``ffind'',3452,3452) | inrange(``ffind'',3490,3499) | inrange(``ffind'',3996,3996)\n qui replace `generate'=18 if inrange(``ffind'',1500,1511) | inrange(``ffind'',1520,1529) | inrange(``ffind'',1530,1539) | inrange(``ffind'',1540,1549) | inrange(``ffind'',1600,1699) | inrange(``ffind'',1700,1799)\n qui replace `generate'=19 if inrange(``ffind'',3300,3300) | inrange(``ffind'',3310,3317) | inrange(``ffind'',3320,3325) | inrange(``ffind'',3330,3339) | inrange(``ffind'',3340,3341) | inrange(``ffind'',3350,3357) | inrange(``ffind'',3360,3369) | inrange(``ffind'',3370,3379) | inrange(``ffind'',3390,3399)\n qui replace `generate'=20 if inrange(``ffind'',3400,3400) | inrange(``ffind'',3443,3443) | inrange(``ffind'',3444,3444) | inrange(``ffind'',3460,3469) | inrange(``ffind'',3470,3479)\n qui replace `generate'=21 if inrange(``ffind'',3510,3519) | inrange(``ffind'',3520,3529) | inrange(``ffind'',3530,3530) | inrange(``ffind'',3531,3531) | inrange(``ffind'',3532,3532) | inrange(``ffind'',3533,3533) | inrange(``ffind'',3534,3534) | inrange(``ffind'',3535,3535) | inrange(``ffind'',3536,3536) | inrange(``ffind'',3538,3538) | inrange(``ffind'',3540,3549) | inrange(``ffind'',3550,3559) | inrange(``ffind'',3560,3569) | inrange(``ffind'',3580,3580) | inrange(``ffind'',3581,3581) | inrange(``ffind'',3582,3582) | inrange(``ffind'',3585,3585) | inrange(``ffind'',3586,3586) | inrange(``ffind'',3589,3589) | inrange(``ffind'',3590,3599)\n qui replace `generate'=22 if inrange(``ffind'',3600,3600) | inrange(``ffind'',3610,3613) | inrange(``ffind'',3620,3621) | inrange(``ffind'',3623,3629) | inrange(``ffind'',3640,3644) | inrange(``ffind'',3645,3645) | inrange(``ffind'',3646,3646) | inrange(``ffind'',3648,3649) | inrange(``ffind'',3660,3660) | inrange(``ffind'',3690,3690) | inrange(``ffind'',3691,3692) | inrange(``ffind'',3699,3699)\n qui replace `generate'=23 if inrange(``ffind'',2296,2296) | inrange(``ffind'',2396,2396) | inrange(``ffind'',3010,3011) | inrange(``ffind'',3537,3537) | inrange(``ffind'',3647,3647) | inrange(``ffind'',3694,3694) | inrange(``ffind'',3700,3700) | inrange(``ffind'',3710,3710) | inrange(``ffind'',3711,3711) | inrange(``ffind'',3713,3713) | inrange(``ffind'',3714,3714) | inrange(``ffind'',3715,3715) | inrange(``ffind'',3716,3716) | inrange(``ffind'',3792,3792) | inrange(``ffind'',3790,3791) | inrange(``ffind'',3799,3799)\n qui replace `generate'=24 if inrange(``ffind'',3720,3720) | inrange(``ffind'',3721,3721) | inrange(``ffind'',3723,3724) | inrange(``ffind'',3725,3725) | inrange(``ffind'',3728,3729)\n qui replace `generate'=25 if inrange(``ffind'',3730,3731) | inrange(``ffind'',3740,3743)\n qui replace `generate'=26 if inrange(``ffind'',3760,3769) | inrange(``ffind'',3795,3795) | inrange(``ffind'',3480,3489)\n qui replace `generate'=27 if inrange(``ffind'',1040,1049)\n qui replace `generate'=28 if inrange(``ffind'',1000,1009) | inrange(``ffind'',1010,1019) | inrange(``ffind'',1020,1029) | inrange(``ffind'',1030,1039) | inrange(``ffind'',1050,1059) | inrange(``ffind'',1060,1069) | inrange(``ffind'',1070,1079) | inrange(``ffind'',1080,1089) | inrange(``ffind'',1090,1099) | inrange(``ffind'',1100,1119) | inrange(``ffind'',1400,1499)\n qui replace `generate'=29 if inrange(``ffind'',1200,1299)\n qui replace `generate'=30 if inrange(``ffind'',1300,1300) | inrange(``ffind'',1310,1319) | inrange(``ffind'',1320,1329) | inrange(``ffind'',1330,1339) | inrange(``ffind'',1370,1379) | inrange(``ffind'',1380,1380) | inrange(``ffind'',1381,1381) | inrange(``ffind'',1382,1382) | inrange(``ffind'',1389,1389) | inrange(``ffind'',2900,2912) | inrange(``ffind'',2990,2999)\n qui replace `generate'=31 if inrange(``ffind'',4900,4900) | inrange(``ffind'',4910,4911) | inrange(``ffind'',4920,4922) | inrange(``ffind'',4923,4923) | inrange(``ffind'',4924,4925) | inrange(``ffind'',4930,4931) | inrange(``ffind'',4932,4932) | inrange(``ffind'',4939,4939) | inrange(``ffind'',4940,4942)\n qui replace `generate'=32 if inrange(``ffind'',4800,4800) | inrange(``ffind'',4810,4813) | inrange(``ffind'',4820,4822) | inrange(``ffind'',4830,4839) | inrange(``ffind'',4840,4841) | inrange(``ffind'',4880,4889) | inrange(``ffind'',4890,4890) | inrange(``ffind'',4891,4891) | inrange(``ffind'',4892,4892) | inrange(``ffind'',4899,4899)\n qui replace `generate'=33 if inrange(``ffind'',7020,7021) | inrange(``ffind'',7030,7033) | inrange(``ffind'',7200,7200) | inrange(``ffind'',7210,7212) | inrange(``ffind'',7214,7214) | inrange(``ffind'',7215,7216) | inrange(``ffind'',7217,7217) | inrange(``ffind'',7219,7219) | inrange(``ffind'',7220,7221) | inrange(``ffind'',7230,7231) | inrange(``ffind'',7240,7241) | inrange(``ffind'',7250,7251) | inrange(``ffind'',7260,7269) | inrange(``ffind'',7270,7290) | inrange(``ffind'',7291,7291) | inrange(``ffind'',7292,7299) | inrange(``ffind'',7395,7395) | inrange(``ffind'',7500,7500) | inrange(``ffind'',7520,7529) | inrange(``ffind'',7530,7539) | inrange(``ffind'',7540,7549) | inrange(``ffind'',7600,7600) | inrange(``ffind'',7620,7620) | inrange(``ffind'',7622,7622) | inrange(``ffind'',7623,7623) | inrange(``ffind'',7629,7629) | inrange(``ffind'',7630,7631) | inrange(``ffind'',7640,7641) | inrange(``ffind'',7690,7699) | inrange(``ffind'',8100,8199) | inrange(``ffind'',8200,8299) | inrange(``ffind'',8300,8399) | inrange(``ffind'',8400,8499) | inrange(``ffind'',8600,8699) | inrange(``ffind'',8800,8899) | inrange(``ffind'',7510,7515)\n qui replace `generate'=34 if inrange(``ffind'',2750,2759) | inrange(``ffind'',3993,3993) | inrange(``ffind'',7218,7218) | inrange(``ffind'',7300,7300) | inrange(``ffind'',7310,7319) | inrange(``ffind'',7320,7329) | inrange(``ffind'',7330,7339) | inrange(``ffind'',7340,7342) | inrange(``ffind'',7349,7349) | inrange(``ffind'',7350,7351) | inrange(``ffind'',7352,7352) | inrange(``ffind'',7353,7353) | inrange(``ffind'',7359,7359) | inrange(``ffind'',7360,7369) | inrange(``ffind'',7374,7374) | inrange(``ffind'',7376,7376) | inrange(``ffind'',7377,7377) | inrange(``ffind'',7378,7378) | inrange(``ffind'',7379,7379) | inrange(``ffind'',7380,7380) | inrange(``ffind'',7381,7382) | inrange(``ffind'',7383,7383) | inrange(``ffind'',7384,7384) | inrange(``ffind'',7385,7385) | inrange(``ffind'',7389,7390) | inrange(``ffind'',7391,7391) | inrange(``ffind'',7392,7392) | inrange(``ffind'',7393,7393) | inrange(``ffind'',7394,7394) | inrange(``ffind'',7396,7396) | inrange(``ffind'',7397,7397) | inrange(``ffind'',7399,7399) | inrange(``ffind'',7519,7519) | inrange(``ffind'',8700,8700) | inrange(``ffind'',8710,8713) | inrange(``ffind'',8720,8721) | inrange(``ffind'',8730,8734) | inrange(``ffind'',8740,8748) | inrange(``ffind'',8900,8910) | inrange(``ffind'',8911,8911) | inrange(``ffind'',8920,8999) | inrange(``ffind'',4220,4229)\n qui replace `generate'=35 if inrange(``ffind'',3570,3579) | inrange(``ffind'',3680,3680) | inrange(``ffind'',3681,3681) | inrange(``ffind'',3682,3682) | inrange(``ffind'',3683,3683) | inrange(``ffind'',3684,3684) | inrange(``ffind'',3685,3685) | inrange(``ffind'',3686,3686) | inrange(``ffind'',3687,3687) | inrange(``ffind'',3688,3688) | inrange(``ffind'',3689,3689) | inrange(``ffind'',3695,3695)\n qui replace `generate'=36 if inrange(``ffind'',7370,7372) | inrange(``ffind'',7375,7375) | inrange(``ffind'',7373,7373)\n qui replace `generate'=37 if inrange(``ffind'',3622,3622) | inrange(``ffind'',3661,3661) | inrange(``ffind'',3662,3662) | inrange(``ffind'',3663,3663) | inrange(``ffind'',3664,3664) | inrange(``ffind'',3665,3665) | inrange(``ffind'',3666,3666) | inrange(``ffind'',3669,3669) | inrange(``ffind'',3670,3679) | inrange(``ffind'',3810,3810) | inrange(``ffind'',3812,3812)\n qui replace `generate'=38 if inrange(``ffind'',3811,3811) | inrange(``ffind'',3820,3820) | inrange(``ffind'',3821,3821) | inrange(``ffind'',3822,3822) | inrange(``ffind'',3823,3823) | inrange(``ffind'',3824,3824) | inrange(``ffind'',3825,3825) | inrange(``ffind'',3826,3826) | inrange(``ffind'',3827,3827) | inrange(``ffind'',3829,3829) | inrange(``ffind'',3830,3839)\n qui replace `generate'=39 if inrange(``ffind'',2520,2549) | inrange(``ffind'',2600,2639) | inrange(``ffind'',2670,2699) | inrange(``ffind'',2760,2761) | inrange(``ffind'',3950,3955)\n qui replace `generate'=40 if inrange(``ffind'',2440,2449) | inrange(``ffind'',2640,2659) | inrange(``ffind'',3220,3221) | inrange(``ffind'',3410,3412)\n qui replace `generate'=41 if inrange(``ffind'',4000,4013) | inrange(``ffind'',4040,4049) | inrange(``ffind'',4100,4100) | inrange(``ffind'',4110,4119) | inrange(``ffind'',4120,4121) | inrange(``ffind'',4130,4131) | inrange(``ffind'',4140,4142) | inrange(``ffind'',4150,4151) | inrange(``ffind'',4170,4173) | inrange(``ffind'',4190,4199) | inrange(``ffind'',4200,4200) | inrange(``ffind'',4210,4219) | inrange(``ffind'',4230,4231) | inrange(``ffind'',4240,4249) | inrange(``ffind'',4400,4499) | inrange(``ffind'',4500,4599) | inrange(``ffind'',4600,4699) | inrange(``ffind'',4700,4700) | inrange(``ffind'',4710,4712) | inrange(``ffind'',4720,4729) | inrange(``ffind'',4730,4739) | inrange(``ffind'',4740,4749) | inrange(``ffind'',4780,4780) | inrange(``ffind'',4782,4782) | inrange(``ffind'',4783,4783) | inrange(``ffind'',4784,4784) | inrange(``ffind'',4785,4785) | inrange(``ffind'',4789,4789)\n qui replace `generate'=42 if inrange(``ffind'',5000,5000) | inrange(``ffind'',5010,5015) | inrange(``ffind'',5020,5023) | inrange(``ffind'',5030,5039) | inrange(``ffind'',5040,5042) | inrange(``ffind'',5043,5043) | inrange(``ffind'',5044,5044) | inrange(``ffind'',5045,5045) | inrange(``ffind'',5046,5046) | inrange(``ffind'',5047,5047) | inrange(``ffind'',5048,5048) | inrange(``ffind'',5049,5049) | inrange(``ffind'',5050,5059) | inrange(``ffind'',5060,5060) | inrange(``ffind'',5063,5063) | inrange(``ffind'',5064,5064) | inrange(``ffind'',5065,5065) | inrange(``ffind'',5070,5078) | inrange(``ffind'',5080,5080) | inrange(``ffind'',5081,5081) | inrange(``ffind'',5082,5082) | inrange(``ffind'',5083,5083) | inrange(``ffind'',5084,5084) | inrange(``ffind'',5085,5085) | inrange(``ffind'',5086,5087) | inrange(``ffind'',5088,5088) | inrange(``ffind'',5090,5090) | inrange(``ffind'',5091,5092) | inrange(``ffind'',5093,5093) | inrange(``ffind'',5094,5094) | inrange(``ffind'',5099,5099) | inrange(``ffind'',5100,5100) | inrange(``ffind'',5110,5113) | inrange(``ffind'',5120,5122) | inrange(``ffind'',5130,5139) | inrange(``ffind'',5140,5149) | inrange(``ffind'',5150,5159) | inrange(``ffind'',5160,5169) | inrange(``ffind'',5170,5172) | inrange(``ffind'',5180,5182) | inrange(``ffind'',5190,5199)\n qui replace `generate'=43 if inrange(``ffind'',5200,5200) | inrange(``ffind'',5210,5219) | inrange(``ffind'',5220,5229) | inrange(``ffind'',5230,5231) | inrange(``ffind'',5250,5251) | inrange(``ffind'',5260,5261) | inrange(``ffind'',5270,5271) | inrange(``ffind'',5300,5300) | inrange(``ffind'',5310,5311) | inrange(``ffind'',5320,5320) | inrange(``ffind'',5330,5331) | inrange(``ffind'',5334,5334) | inrange(``ffind'',5340,5349) | inrange(``ffind'',5390,5399) | inrange(``ffind'',5400,5400) | inrange(``ffind'',5410,5411) | inrange(``ffind'',5412,5412) | inrange(``ffind'',5420,5429) | inrange(``ffind'',5430,5439) | inrange(``ffind'',5440,5449) | inrange(``ffind'',5450,5459) | inrange(``ffind'',5460,5469) | inrange(``ffind'',5490,5499) | inrange(``ffind'',5500,5500) | inrange(``ffind'',5510,5529) | inrange(``ffind'',5530,5539) | inrange(``ffind'',5540,5549) | inrange(``ffind'',5550,5559) | inrange(``ffind'',5560,5569) | inrange(``ffind'',5570,5579) | inrange(``ffind'',5590,5599) | inrange(``ffind'',5600,5699) | inrange(``ffind'',5700,5700) | inrange(``ffind'',5710,5719) | inrange(``ffind'',5720,5722) | inrange(``ffind'',5730,5733) | inrange(``ffind'',5734,5734) | inrange(``ffind'',5735,5735) | inrange(``ffind'',5736,5736) | inrange(``ffind'',5750,5799) | inrange(``ffind'',5900,5900) | inrange(``ffind'',5910,5912) | inrange(``ffind'',5920,5929) | inrange(``ffind'',5930,5932) | inrange(``ffind'',5940,5940) | inrange(``ffind'',5941,5941) | inrange(``ffind'',5942,5942) | inrange(``ffind'',5943,5943) | inrange(``ffind'',5944,5944) | inrange(``ffind'',5945,5945) | inrange(``ffind'',5946,5946) | inrange(``ffind'',5947,5947) | inrange(``ffind'',5948,5948) | inrange(``ffind'',5949,5949) | inrange(``ffind'',5950,5959) | inrange(``ffind'',5960,5969) | inrange(``ffind'',5970,5979) | inrange(``ffind'',5980,5989) | inrange(``ffind'',5990,5990) | inrange(``ffind'',5992,5992) | inrange(``ffind'',5993,5993) | inrange(``ffind'',5994,5994) | inrange(``ffind'',5995,5995) | inrange(``ffind'',5999,5999)\n qui replace `generate'=44 if inrange(``ffind'',5800,5819) | inrange(``ffind'',5820,5829) | inrange(``ffind'',5890,5899) | inrange(``ffind'',7000,7000) | inrange(``ffind'',7010,7019) | inrange(``ffind'',7040,7049) | inrange(``ffind'',7213,7213)\n qui replace `generate'=45 if inrange(``ffind'',6000,6000) | inrange(``ffind'',6010,6019) | inrange(``ffind'',6020,6020) | inrange(``ffind'',6021,6021) | inrange(``ffind'',6022,6022) | inrange(``ffind'',6023,6024) | inrange(``ffind'',6025,6025) | inrange(``ffind'',6026,6026) | inrange(``ffind'',6027,6027) | inrange(``ffind'',6028,6029) | inrange(``ffind'',6030,6036) | inrange(``ffind'',6040,6059) | inrange(``ffind'',6060,6062) | inrange(``ffind'',6080,6082) | inrange(``ffind'',6090,6099) | inrange(``ffind'',6100,6100) | inrange(``ffind'',6110,6111) | inrange(``ffind'',6112,6113) | inrange(``ffind'',6120,6129) | inrange(``ffind'',6130,6139) | inrange(``ffind'',6140,6149) | inrange(``ffind'',6150,6159) | inrange(``ffind'',6160,6169) | inrange(``ffind'',6170,6179) | inrange(``ffind'',6190,6199)\n qui replace `generate'=46 if inrange(``ffind'',6300,6300) | inrange(``ffind'',6310,6319) | inrange(``ffind'',6320,6329) | inrange(``ffind'',6330,6331) | inrange(``ffind'',6350,6351) | inrange(``ffind'',6360,6361) | inrange(``ffind'',6370,6379) | inrange(``ffind'',6390,6399) | inrange(``ffind'',6400,6411)\n qui replace `generate'=47 if inrange(``ffind'',6500,6500) | inrange(``ffind'',6510,6510) | inrange(``ffind'',6512,6512) | inrange(``ffind'',6513,6513) | inrange(``ffind'',6514,6514) | inrange(``ffind'',6515,6515) | inrange(``ffind'',6517,6519) | inrange(``ffind'',6520,6529) | inrange(``ffind'',6530,6531) | inrange(``ffind'',6532,6532) | inrange(``ffind'',6540,6541) | inrange(``ffind'',6550,6553) | inrange(``ffind'',6590,6599) | inrange(``ffind'',6610,6611)\n qui replace `generate'=48 if inrange(``ffind'',6200,6299) | inrange(``ffind'',6700,6700) | inrange(``ffind'',6710,6719) | inrange(``ffind'',6720,6722) | inrange(``ffind'',6723,6723) | inrange(``ffind'',6724,6724) | inrange(``ffind'',6725,6725) | inrange(``ffind'',6726,6726) | inrange(``ffind'',6730,6733) | inrange(``ffind'',6740,6779) | inrange(``ffind'',6790,6791) | inrange(``ffind'',6792,6792) | inrange(``ffind'',6793,6793) | inrange(``ffind'',6794,6794) | inrange(``ffind'',6795,6795) | inrange(``ffind'',6798,6798) | inrange(``ffind'',6799,6799)\n qui replace `generate'=49 if missing(`generate') & ~missing(``ffind'')\n\n }\n else {\n di as error \"Type must be 5, 10, 12, 17, 30, 38, 48 or 49\"\n exit 111\n }\n\nend\n
","tags":["Stata","Code"]},{"location":"posts/get-bank-holding-company-financials/","title":"Bank Holing Company Financials from FR Y-9C","text":"A SAS macro used to extract BHC data.
","tags":["SAS","Code"]},{"location":"posts/get-bank-holding-company-financials/#extract-bhc-balance-sheet-data","title":"Extract BHC balance sheet data","text":"This is the SAS macro I write to consolidate and extract BHC's balance sheet data from WRDS Bank Regulatory database. It creates a bhcf
dataset in the work directory.
%macro bhc_financials(loopdatestart,loopdateend);\n /* Specify the variables to extract */\n%let vars=rssd9999 rssd9001 rssd9007 rssd9008 bhck2170 bhck3210;\n %let loopdatestart=%sysfunc(inputn(&loopdatestart,anydtdte9.));\n %let loopdateend=%sysfunc(inputn(&loopdateend,anydtdte9.));\n %let dif=%sysfunc(intck(month,&loopdatestart,&loopdateend));\n %let dats=;\n %do i=0 %to &dif;\n %let date=%sysfunc(intnx(month,&loopdatestart,&i,e));\n %let month=%sysfunc(month(&date),z2.);\n %let year=%sysfunc(year(&date));\n %if &month=3 or &month=6 or &month=9 or &month=12 %then %do;\n %let dats=&dats bank.bhcf&year&month;\n %end;\n %end;\n %put &dats;\n data bhcf(keep=&vars); set &dats; \n rssd9999 = input(put(rssd9999, 8.), yymmdd10.);/* reporting date */\n rssd9007 = input(put(rssd9007, 8.), yymmdd10.);/* date start */\n rssd9008 = input(put(rssd9008, 8.), yymmdd10.);/* date end */\nformat rssd9999 date9.;\n format rssd9007 date9.;\n format rssd9008 date9.;\n where rssd9999 between rssd9007 and rssd9008;\n run;\n%mend bhc_financials;\n\n%bhc_financials(01jan1990,01dec2000);\n
Warning
RSSD dates are not always available, in which case lines 18-24 should be removed.
","tags":["SAS","Code"]},{"location":"posts/get-bank-holding-company-financials/#merge-with-compustatcrsp","title":"Merge with Compustat/CRSP","text":"The firm identifier in the Y-9C data is RSSD9001
. To merge the BHC's balance sheet data with Compustat/CRSP, I use the PERMCO-RSSD
link table by the Federal Reserve Bank of New York.1 I saved the most recent copy in my server, and formatted it so that it can used directly. It is available at https://mingze-gao.com/data/download/crsp_20181231.csv.
%let beg_yr = 1986;\n%let end_yr = 2018;\nproc sql;\ncreate table lnk as\nselect *\nfrom crsp.ccmxpf_lnkhist\nwhere\n linktype in (\"LU\", \"LC\") and\n (&end_yr+1 >= year(linkdt) or linkdt = .B) and \n (&beg_yr-1 <= year(linkenddt) or linkenddt = .E)\norder by \n gvkey, linkdt;\nquit;\n/* PERMCO-RSSD link table by New York FED */\nfilename csv url \"https://mingze-gao.com/data/download/crsp_20181231.csv\";\nproc import datafile=csv out=work.crsp_20181231 dbms=csv replace; run;\nproc sql;\ncreate table gvkey_permno_permco_rssd as \nselect *\nfrom lnk join crsp_20181231 as fed\non lnk.lpermco=fed.permco;\nquit;\n
Note
Please run these programs on the WRDS cloud. You'll need to modify them in order to run locally with SAS/Connect.
https://www.newyorkfed.org/research/banking_research/datasets.html\u00a0\u21a9
Many research papers on Chinese firms include a control variable that indicates if the firm is a state-owned enterprise (SOE). This is important as SOEs and non-SOEs differ in many aspects and may have structural differences. This post documents the way to construct this indicator variable from the CSMAR databases.
Specifically, we need the CSMAR China Listed Firms Shareholders - Controlling Shareholders dataset. On WRDS, this dataset is named hld_contrshr
, located at /wrds/csmar/sasdata/hld
.
Inside this dataset there're a few variables identifying the ultimate controlling shareholder.
s0701b
: ultimate controlling shareholder.s0702b
: nature of ultimate controlling shareholder.According to the CSMAR documentation, s0702b
can be one of the following. Apparently, s0702b=1100
means the firm is a SOE.
Princeton University Library has another guide on other ways to identify Chinese SOE.
","tags":["CSMAR"]},{"location":"posts/kyleslambda/","title":"Kyle's Lambda","text":"A measure of market impact cost from Kyle (1985), which can be interpreted as the cost of demanding a certain amount of liquidity over a given time period.
","tags":["Python","Code","Liquidity"]},{"location":"posts/kyleslambda/#definition","title":"Definition","text":"Following Hasbrouck (2009) and Goyenko, Holden, Trzcinka (2009), Kyle's Lambda for a given stock \\(i\\) and day \\(t\\), is calculated as the slope coefficient \\(\\lambda_{i,t}\\) in the regression:
\\[ ret_{i,t,n}= \\delta_{i,t} + \\lambda_{i,t} S_{i,t,n}+\\epsilon_{i,t,n} \\]where for the \\(n\\)th five-minute period on date \\(t\\) and stock \\(i\\), \\(ret_{i,t,n}\\) is the stock return and \\(S_{i,t,n}\\) is the sum of the signed square-root dollar volume, that is,
\\[ S_{i,t,n}=\\sum_k{\\text{sign}}(dvol_{i,t,n,k}) \\sqrt{dvol_{i,t,n,k}} \\]","tags":["Python","Code","Liquidity"]},{"location":"posts/kyleslambda/#source-code","title":"Source Code","text":"This example Python code is not optimized for speed and serves only demonstration purpose. It may contain errors.
It returns \\(\\lambda \\times 10^6\\)
# KylesLambda.py\nimport numpy as np\nname = 'KylesLambda'\ndescription = \"\"\"\nA measure of market impact cost from Kyle (1985), \nwhich can be interpreted as the cost of demanding a certain amount of liquidity over a given time period.\nResult is Lambda*1E6.\n\"\"\"\nvars_needed = ['Price', 'Volume', 'Direction']\ndef estimate(data):\nprice = data['Price'].to_numpy()\nvolume = data['Volume'].to_numpy()\ndirection = data['Direction'].to_numpy()\nsqrt_dollar_volume = np.sqrt(np.multiply(price, volume))\nsigned_sqrt_dollar_volume = np.abs(\nnp.multiply(direction, sqrt_dollar_volume))\n# Find the total signed sqrt dollar volume and return per 5 min.\ntimestamps = np.array(data.index, dtype='datetime64')\nlast_ts, last_price = timestamps[0], price[0]\nbracket_ssdv = 0\nbracket = last_ts + np.timedelta64(5, 'm')\nrets, ssdvs, = [], []\nfor idx, ts in enumerate(timestamps):\nif ts <= bracket:\nbracket_ssdv += signed_sqrt_dollar_volume[idx]\nelse:\nret = np.log(price[idx-1]/last_price)\nif not np.isnan(ret) and not np.isnan(bracket_ssdv):\nrets.append(ret)\nssdvs.append(bracket_ssdv)\n# Reset bracket\nbracket = ts + np.timedelta64(5, 'm')\nlast_price = price[idx]\nbracket_ssdv = signed_sqrt_dollar_volume[idx]\n# Perform regression.\nx = np.vstack([np.ones(len(ssdvs)), np.array(ssdvs)]).T\ntry:\ncoef, _, _, _ = np.linalg.lstsq(x, np.array(rets), rcond=None)\nexcept np.linalg.LinAlgError:\nreturn None\nelse:\nreturn None if np.isnan(coef[1]) else coef[1]*1E6\n
","tags":["Python","Code","Liquidity"]},{"location":"posts/legao-to-make-your-own-lego-mosaics/","title":"LeGao to Make Your Own LEGO Mosaics","text":"I made an online app that turns a picture into a LEGO mosaic: mingze-gao.com/legao.
A few weeks ago, I went to the new LEGO flagship store at Bondi with my fianc\u00e9e, Sherry, and we were impressed by the LEGO Mosaics -- Sydney Harbour Bridge and Opera House in sunset, designed by Ryan McNaught (photo credit: jaysbrickblog.com).
This mosaic is made of 62,300 bricks and took 282 hours to build. Every single pixel is a 1x1 LEGO brick! We love it so much so that I'm thinking of making one myself and use it to decorate a wall in our apartment in the future.
To begin this endeavour, I'll need a handy tool to convert pictures to LEGO mosaic so that I can have a preview and the data to assemble later. It turns out that there's already an open-source library named legofy for this job. So I borrowed it and wrote a small Flask app on my server to do the magic.
I wrote the frontend using React and Ant Design, and picked up the React Hooks along the way. It was great fun. I named it using a combination of LEGO and my surname Gao, so, LeGao.
Now, LeGao is served at mingze-gao.com/legao. A preview is as below:
Users can upload an image(<5MB) and decide on which palette to use and how many 1x1 bricks the output image should have for its longest axis. This is useful when we need to make a LEGO mosaic in real world, as a 1x1 brick's dimension is about 8mm x 8mm.
The output image can be downloaded, no problem. All images will be deleted from my server after 5 minutes since upload/creation for privacy concern and the fact that my server doesn't have a big storage.
LeGao also tells you about how many bricks you'll need to assemble the mosaic, if you really want to. Then you can easily order the bricks online or visit a store to purchase them all~
","tags":["Apps"]},{"location":"posts/lomackinlay1988/","title":"Variance Ratio Test - Lo and MacKinlay (1988)","text":"A simple test for the random walk hypothesis of prices and efficient market.
","tags":["Python","Code"]},{"location":"posts/lomackinlay1988/#definition","title":"Definition","text":"Let's assume:
The variance ratio of \\(k\\)-period return is defined as:
\\[ \\begin{equation} \\textit{V}(k)=\\frac{\\textit{Var}(x_t+x_{t-1}+...+x_{t-k+1})/k}{\\textit{Var}(x_t)} \\end{equation} \\]The estimator of \\(\\textit{V}(k)\\) proposed in Lo and MacKinlay (1988) is
\\[ \\begin{equation} \\textit{VR}(k)=\\frac{\\hat\\sigma^2(k)}{\\hat\\sigma^2(1)} \\end{equation} \\]where \\(\\hat\\sigma^2(1)\\) is the unbiased estimator of the one-period return variance, using the one-period returns \\(\\{x_t\\}\\), and is defined as
\\[ \\begin{equation} \\hat\\sigma^2(1)=\\frac{1}{T-1} \\sum_{t-1}^T (x_t - \\hat\\mu)^2 \\end{equation} \\]and \\(\\hat\\sigma^2(k)\\) is the estimator of \\(k\\)-period return variance using \\(k\\)-period returns. Lo and MacKinlay (1988) defined it, due to limited sample size and the desire to improve the power of the test, as
\\[ \\begin{equation} \\hat\\sigma^2(k)=\\frac{1}{m} \\sum_{t-1}^T \\left(\\ln\\frac{P_t}{P_{t-k}} - k\\hat\\mu \\right)^2 \\end{equation} \\]where \\(m=k(T-k+1)(1-k/T)\\) is chosen such that \\(\\hat\\sigma^2(k)\\) is an unbiased estimator of the \\(k\\)-period return variance when \\(\\sigma^2_t\\) is constant over time.
","tags":["Python","Code"]},{"location":"posts/lomackinlay1988/#variance-ratio-test-statistics","title":"Variance Ratio Test Statistics","text":"Lo and MacKinlay (1988) proposed that under the null hypothesis of \\(V(k)=1\\), the test statistic is given by
\\[ \\begin{equation} Z(k)=\\frac{\\textit{VR}(k)-1}{\\sqrt{\\phi(k)}} \\end{equation} \\]which follows the standard normal distribution asymptotically.
","tags":["Python","Code"]},{"location":"posts/lomackinlay1988/#homoscedasticity","title":"Homoscedasticity","text":"Under the assumption of homoscedasticity, the asymptotic variance \\(\\phi\\) is given by
\\[ \\begin{equation} \\phi(k)=\\frac{2(2k-1)(k-1)}{3kT} \\end{equation} \\]","tags":["Python","Code"]},{"location":"posts/lomackinlay1988/#heteroscedasticity","title":"Heteroscedasticity","text":"Under the assumption of heteroscedasticity, the asymptotic variance \\(\\phi\\) is given by
\\[ \\begin{equation} \\phi(k)=\\sum_{j=1}^{k-1} \\left[\\frac{2(k-j)}{k} \\right]^2\\delta(j) \\end{equation} \\] \\[ \\begin{equation} \\delta(j)=\\frac{\\sum_{t=j+1}^T (x_t - \\hat\\mu)^2(x_{t-j} - \\hat\\mu)^2}{\\left[\\sum_{t=1}^T (x_t - \\hat\\mu)^2\\right]^2} \\end{equation} \\]Erratum
Note that there's a missing \\(T\\) in the numerator of \\(\\delta(j)\\) of Equation (8). It is actually missing the 1988 RFS paper and the 1998 JE'mtric paper, but has been corrected in the 1990 RFS Issue 1: https://doi.org/10.1093/rfs/3.1.ii. The corrected version reads:
\\[ \\begin{equation} \\delta(j)=\\frac{T\\sum_{t=j+1}^T (x_t - \\hat\\mu)^2(x_{t-j} - \\hat\\mu)^2}{\\left[\\sum_{t=1}^T (x_t - \\hat\\mu)^2\\right]^2} \\end{equation} \\]To correct it in the example code below, change the line 51 below to:
delta_arr = T * b_arr / np.square(np.sum(sqr_demeaned_x))\n
I thank Simon Jurkatis for letting me know about the erratum.
","tags":["Python","Code"]},{"location":"posts/lomackinlay1988/#source-code","title":"Source Code","text":"This example Python code has been optimized for speed but serves only demonstration purpose. It may contain errors.
# LoMacKinlay.py\nimport numpy as np\nfrom numba import jit\nname = 'LoMacKinlay1988'\ndescription = 'Variance ratio and test statistics as in Lo and MacKinlay (1988)'\nvars_needed = ['Price']\n@jit(nopython=True, nogil=True, cache=True)\ndef _estimate(log_prices, k, const_arr):\n# Log returns = [x2, x3, x4, ..., xT], where x(i)=ln[p(i)/p(i-1)]\nrets = np.diff(log_prices)\n# T is the length of return series\nT = len(rets)\n# mu is the mean log return\nmu = np.mean(rets)\n# sqr_demeaned_x is the array of squared demeaned log returns\nsqr_demeaned_x = np.square(rets - mu)\n# Var(1)\n# Didn't use np.var(rets, ddof=1) because\n# sqr_demeaned_x is calculated already and will be used many times.\nvar_1 = np.sum(sqr_demeaned_x) / (T-1)\n# Var(k)\n# Variance of log returns where x(i) = ln[p(i)/p(i-k)]\n# Before np.roll() - array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])\n# After np.roll(,shift=2) - array([8, 9, 0, 1, 2, 3, 4, 5, 6, 7])\n# Discard the first k elements.\nrets_k = (log_prices - np.roll(log_prices, k))[k:]\nm = k * (T - k + 1) * (1 - k / T)\nvar_k = 1/m * np.sum(np.square(rets_k - k * mu))\n# Variance Ratio\nvr = var_k / var_1\n# a_arr is an array of { (2*(k-j)/k)^2 } for j=1,2,...,k-1, fixed for a given k:\n# When k=5, a_arr = array([2.56, 1.44, 0.64, 0.16]).\n# When k=8, a_arr = array([3.0625, 2.25, 1.5625, 1., 0.5625, 0.25, 0.0625])\n# Without JIT it's defined as:\n# a_arr = np.square(np.arange(k-1, 0, step=-1, dtype=np.int) * 2 / k)\n# But np.array creation is not allowed in nopython mode.\n# So const_arr=np.arange(k-1, 0, step=-1, dtype=np.int) is created outside.\na_arr = np.square(const_arr * 2 / k)\n# b_arr is part of the delta_arr.\nb_arr = np.empty(k-1, dtype=np.float64)\nfor j in range(1, k):\nb_arr[j-1] = np.sum((sqr_demeaned_x *\nnp.roll(sqr_demeaned_x, j))[j+1:])\ndelta_arr = b_arr / np.square(np.sum(sqr_demeaned_x))\n# Both arrarys are of length (k-1)\nassert len(delta_arr) == len(a_arr) == k-1\nphi1 = 2 * (2*k - 1) * (k-1) / (3*k*T)\nphi2 = np.sum(a_arr * delta_arr)\n# VR test statistics under two assumptions\nvr_stat_homoscedasticity = (vr - 1) / np.sqrt(phi1)\nvr_stat_heteroscedasticity = (vr - 1) / np.sqrt(phi2)\nreturn vr, vr_stat_homoscedasticity, vr_stat_heteroscedasticity\ndef estimate(data):\n\"A fast estimation of Variance Ratio test statistics as in Lo and MacKinlay (1988)\"\n# Prices array = [p1, p2, p3, p4, ..., pT]\nprices = data['Price'].to_numpy(dtype=np.float64)\nresult = []\n# Estimate many lags.\nfor k in [2, 4, 6, 8, 10, 15, 20, 30, 40, 50, 100, 200, 500, 1000]:\n# Compute a constant array as np.array creation is not allowed in nopython mode.\nconst_arr = np.arange(k-1, 0, step=-1, dtype=np.int)\nvr, stat1, stat2 = _estimate(np.log(prices), k, const_arr)\nresult.append({\nf'Variance Ratio (k={k})': vr,\nf'Variance Ratio Test Statistic (k={k}) Homoscedasticity Assumption': stat1,\nf'Variance Ratio Test Statistic (k={k}) Heteroscedasticity Assumption': stat2\n})\nreturn result\n
As an example, let's create 1 million prices from random walk and estimate the variance ratio and two test statistics at various lags.
if __name__ == \"__main__\":\nimport pandas as pd\nfrom pprint import pprint\nnp.random.seed(1)\n# Generate random steps with mean=0 and standard deviation=1\nsteps = np.random.normal(0, 1, size=1000000)\n# Set first element to 0 so that the first price will be the starting stock price\nsteps[0] = 0\n# Simulate stock prices, P with a large starting price\nP = 10000 + np.cumsum(steps)\n# Test\ndata = pd.DataFrame(P, columns=['Price'])\nresult = estimate(data)\npprint(result)\n
In just a few seconds, the output is:
[{'Variance Ratio (k=2)': 1.0003293867428105,\n'Variance Ratio Test Statistic (k=2) Heteroscedasticity Assumption': 0.3290463403922243,\n'Variance Ratio Test Statistic (k=2) Homoscedasticity Assumption': 0.32938657811705435},\n{'Variance Ratio (k=4)': 1.0007984480057006,\n'Variance Ratio Test Statistic (k=4) Heteroscedasticity Assumption': 0.4259533413884602,\n'Variance Ratio Test Statistic (k=4) Homoscedasticity Assumption': 0.4267881978178301},\n{'Variance Ratio (k=6)': 0.9999130202975425,\n'Variance Ratio Test Statistic (k=6) Heteroscedasticity Assumption': -0.035117568315004344,\n'Variance Ratio Test Statistic (k=6) Homoscedasticity Assumption': -0.03518500446785826},\n{'Variance Ratio (k=8)': 1.0001094011344318,\n'Variance Ratio Test Statistic (k=8) Heteroscedasticity Assumption': 0.036922688136577515,\n'Variance Ratio Test Statistic (k=8) Homoscedasticity Assumption': 0.03698431520269611},\n{'Variance Ratio (k=10)': 1.000702410129927,\n'Variance Ratio Test Statistic (k=10) Heteroscedasticity Assumption': 0.20772743120012313,\n'Variance Ratio Test Statistic (k=10) Homoscedasticity Assumption': 0.20803582207641647},\n{'Variance Ratio (k=15)': 1.0022173139633856,\n'Variance Ratio Test Statistic (k=15) Heteroscedasticity Assumption': 0.5213067838911684,\n'Variance Ratio Test Statistic (k=15) Homoscedasticity Assumption': 0.5219816274021579},\n{'Variance Ratio (k=20)': 1.0038048661705044,\n'Variance Ratio Test Statistic (k=20) Heteroscedasticity Assumption': 0.7646395131154204,\n'Variance Ratio Test Statistic (k=20) Homoscedasticity Assumption': 0.7655801985571125},\n{'Variance Ratio (k=30)': 1.0054447472916035,\n'Variance Ratio Test Statistic (k=30) Heteroscedasticity Assumption': 0.8819250061384853,\n'Variance Ratio Test Statistic (k=30) Homoscedasticity Assumption': 0.8829960534692654},\n{'Variance Ratio (k=40)': 1.0073830253022766,\n'Variance Ratio Test Statistic (k=40) Heteroscedasticity Assumption': 1.0290213306735625,\n'Variance Ratio Test Statistic (k=40) Homoscedasticity Assumption': 1.0303005120740392},\n{'Variance Ratio (k=50)': 1.0086502431826903,\n'Variance Ratio Test Statistic (k=50) Heteroscedasticity Assumption': 1.0741837462564026,\n'Variance Ratio Test Statistic (k=50) Homoscedasticity Assumption': 1.0755809312730416},\n{'Variance Ratio (k=100)': 1.0153961901671604,\n'Variance Ratio Test Statistic (k=100) Heteroscedasticity Assumption': 1.3415119471043384,\n'Variance Ratio Test Statistic (k=100) Homoscedasticity Assumption': 1.3434284573260773},\n{'Variance Ratio (k=200)': 1.0157046541161026,\n'Variance Ratio Test Statistic (k=200) Heteroscedasticity Assumption': 0.9639233626580027,\n'Variance Ratio Test Statistic (k=200) Homoscedasticity Assumption': 0.9653299929052963},\n{'Variance Ratio (k=500)': 1.0182166207668526,\n'Variance Ratio Test Statistic (k=500) Heteroscedasticity Assumption': 0.7055681216511915,\n'Variance Ratio Test Statistic (k=500) Homoscedasticity Assumption': 0.7065863036900429},\n{'Variance Ratio (k=1000)': 1.0187822241562863,\n'Variance Ratio Test Statistic (k=1000) Heteroscedasticity Assumption': 0.5140698821944161,\n'Variance Ratio Test Statistic (k=1000) Homoscedasticity Assumption': 0.5147582201029065}]\n
It's easy to see that at all lags tested, we cannot reject the null hypothesis that this price series follows a random walk.
For comparison purpose, below is an implementation in pure Python. It is more readable but is significantly slower.
def estimate_python(data, k=5):\n\"A slow pure python implementation\"\nprices = data['Price'].to_numpy(dtype=np.float64)\nlog_prices = np.log(prices)\nrets = np.diff(log_prices)\nT = len(rets)\nmu = np.mean(rets)\nvar_1 = np.var(rets, ddof=1, dtype=np.float64)\nrets_k = (log_prices - np.roll(log_prices, k))[k:]\nm = k * (T - k + 1) * (1 - k / T)\nvar_k = 1/m * np.sum(np.square(rets_k - k * mu))\n# Variance Ratio\nvr = var_k / var_1\n# Phi1\nphi1 = 2 * (2*k - 1) * (k-1) / (3*k*T)\n# Phi2\ndef delta(j):\nres = 0\nfor t in range(j+1, T+1):\nt -= 1 # array index is t-1 for t-th element\nres += np.square((rets[t]-mu)*(rets[t-j]-mu))\nreturn res / ((T-1) * var_1)**2\nphi2 = 0\nfor j in range(1, k):\nphi2 += (2*(k-j)/k)**2 * delta(j)\nreturn vr, (vr - 1) / np.sqrt(phi1), (vr - 1) / np.sqrt(phi2)\n
","tags":["Python","Code"]},{"location":"posts/merge-compustat-and-crsp/","title":"Merge Compustat and CRSP","text":"Using the CRSP/Compustat Merged Database (CCM) to extract data is one of the fundamental steps in most finance studies. Here I document several SAS programs for annual, quarterly and monthly data, inspired by and adapted from several examples from the WRDS.1
","tags":["CRSP","Compustat","Code","SAS","WRDS"]},{"location":"posts/merge-compustat-and-crsp/#gvkey-permno-link-table","title":"GVKEY-PERMNO
link table","text":"First, we need to create a GVKEY-PERMNO
link table.
%let beg_yr = 2000;\n%let end_yr = 2003;\nproc sql;\ncreate table lnk as\nselect *\nfrom crsp.ccmxpf_lnkhist\nwhere\n/* See below for a description of the link types */\n linktype in (\"LU\", \"LC\") and\n/* Extend the period to deal with fiscal year issues */\n/* Note that the \".B\" and \".E\" missing value codes represent the */\n/* earliest possible beginning date and latest possible end date */\n/* of the Link Date range, respectively. */\n (&end_yr+1 >= year(linkdt) or linkdt = .B) and \n (&beg_yr-1 <= year(linkenddt) or linkenddt = .E)\n /* primary link assigned by Compustat or CRSP */\nand linkprim in (\"P\", \"C\") \norder by \n gvkey, linkdt;\nquit;\n
Link Type Description LC Link research complete (after extensive research by CRSP). Standard connection between databases. LU Link is unresearched by CRSP. It is established by comparing the Compustat and historical CRSP CUSIPs. LU represents the most popular link type. LS Link valid for this security only.2 LX Link to a security that trades on foreign exchange not included in CRSP data. LD Duplicate link to a security. Two GVKEYs map to a single PERMNO
(PERMCO
) during the same period, and this link should not be used. Almost all of these cases happened before 1990. LN Primary link exists but Compustat does not have prices.3 NR No link available; confirmed by research. NU No link available; not yet confirmed. According to WRDS's support page:
LC
, LU
and LS
) account for 41% of the links in CCM.LX
, LD
and LN
) account for only 2%. NR
and NU
) account for the rest 57%, which is expected because of the different coverage of the two databases.Generally, using LC
and LU
should be sufficient.
Example ccmfunda.sas
.
proc sql;\ncreate table mydata as \nselect *\nfrom lnk, comp.funda (keep=gvkey fyear datadate indfmt datafmt popsrc consol sale) as cst\nwhere indfmt= 'INDL' \nand datafmt='STD' \nand popsrc='D' \nand consol='C' \nand lnk.gvkey = cst.gvkey\nand (&beg_yr <= fyear <= &end_yr) \nand (linkdt <= cst.datadate or linkdt = .B) \nand (cst.datadate <= linkenddt or linkenddt = .E);\nquit;\n
","tags":["CRSP","Compustat","Code","SAS","WRDS"]},{"location":"posts/merge-compustat-and-crsp/#compustat-quarterly-and-crsp","title":"Compustat Quarterly and CRSP","text":"Example ccmfundq.sas
.
proc sql;\ncreate table mydata as \nselect *\nfrom lnk, comp.fundq (keep=gvkey fyearq datadate indfmt datafmt popsrc consol saley saleq) as cst\nwhere indfmt= 'INDL' \nand datafmt='STD' \nand popsrc='D' \nand consol='C' \nand lnk.gvkey = cst.gvkey\nand (&beg_yr <= fyearq <= &end_yr) \nand (linkdt <= cst.datadate or linkdt = .B) \nand (cst.datadate <= linkenddt or linkenddt = .E);\nquit;\n
","tags":["CRSP","Compustat","Code","SAS","WRDS"]},{"location":"posts/merge-compustat-and-crsp/#compustat-monthly-and-crsp","title":"Compustat Monthly and CRSP","text":"To be done.
WRDS Overview of CRSP/COMPUSTAT Merged: https://wrds-www.wharton.upenn.edu/pages/support/manuals-and-overviews/crsp/crspcompustat-merged-ccm/wrds-overview-crspcompustat-merged-ccm/ Use CRSP-Compustat Merged Table to Add Permno to Compustat Data: https://wrds-www.wharton.upenn.edu/pages/support/research-wrds/macros/wrds-macro-ccm/ Merging CRSP and Compustat Data: https://wrds-www.wharton.upenn.edu/pages/support/applications/linking-databases/linking-crsp-and-compustat/\u00a0\u21a9
Other CRSP PERMNOs
with the same PERMCO
will link to other GVKEYs
. LS
links mainly relate to ETFs where a single CRSP PERMCO
links to multiple Compustat GVKEYs
. In Compustat, even though they may belong to the same investment company (e.g. ISHARES), ETFs are presented with different GVKEYs
and CRSP flags this situation.\u00a0\u21a9
Prices are used to check the accuracy of the link. For linktype LN there is no price information available even on a quarterly or annual basis. The user will have to decide whether or not to include these links.\u00a0\u21a9
Thomson One Banker SDC Platinum database provides comprehensive M&A transaction data from early 1980s, and is perhaps the most widely used M&A database in the world.
This post documents the steps of downloading M&A deals from the SDC Platinum database. Specifically, I show how to download the complete M&A data where:
The screenshot below is the interface we'll see on launch of SDC Platinum. Click on Login
and we'll be asked to enter our initials and a project name for billing purpose.
Click on Login
and you'll be asked to enter your initials and a project name for billing purpose.
Since we're interested in M&A deals, select the Mergers & Acquisitions
tab and check US Targets
, so that we'll be searching in the domestic mergers database.
Then select the sample period, e.g. for the entire 2020 calendar year.
","tags":["M&A","SDC"]},{"location":"posts/download-ma-deals-from-sdc-platinum/#apply-filters-on-ma-deals","title":"Apply Filters on M&A Deals","text":"Now we can apply various filters on the M&A deals we want to download.
We can quickly add some filters on the target's and acquiror's nation, and make sure we check the Action to be Select
not Exclude
. Under the Deal
tab, we set the deal value to be at least $1m.
In case we couldn't find the desired filtering variable, we can head to All Items
tab and search manually. We add restrictions on acquiror and target public status here.
Lastly, for the Form of the Deal, we restrict to A
Acquisition (Stock), M
Merger (Stock or Assets) and AM
Acquisition of Majority Interest (Stock). We do not want to include deals that are acquisition of partial interest, recapitalization or repurchases in this case.
Our search requests should now look like below. Strongly recommended saving this session for later reuse.
","tags":["M&A","SDC"]},{"location":"posts/download-ma-deals-from-sdc-platinum/#specify-deal-variables-to-download","title":"Specify Deal Variables to Download","text":"Our effort so far is only shortlisting the M&A deals that we're interested in. We now need to specify the relevant deal variables to download by creating a new custom report.
As before, we can check those variables in the Basics
tab or search under All Items
tab. Once done, we format the report like below by arranging the order of the variables. This order is preserved when exported to spreadsheet. One note here is that each page has a maximum width of 160, so we need to insert page at proper places. It does not affect the layout of output spreadsheet. It also recommended to save the custom report for later reuse.
Finally, it's time to execute the requests and download the M&A deal data.
","tags":["M&A","SDC"]},{"location":"posts/download-ma-deals-from-sdc-platinum/#final-note","title":"Final Note","text":"As a final remark, the downloaded spreadsheet can be imported into SAS and matched with CRSP/Compustat using CUSIP and Ticker (SDC doesn't have permno
or gvkey
). First, merge the SDC CUSIP with the first 6-digit CUSIP in CRSP or Compustat; if no match, then use SDC Primary Ticker Symbol to match with the ticker symbol in CRSP or Compustat.
Merton (1974) Distance to Default (DD) model is useful in forecasting defaults. This post documents a few ways to empirically estimate Merton DD (and default probability) as in Bharath and Shumway (2008 RFS).
","tags":["SAS","Code","Merton","Default Probability"]},{"location":"posts/merton-dd/#the-merton-model","title":"The Merton Model","text":"The total value of a firm follows geometric Brownian motion,
\\[ \\begin{equation} dV = \\mu Vdt+\\sigma_V VdW \\end{equation} \\]where,
Assuming the firm has one discount bond maturing in \\(T\\) periods, the equity of the firm can be viewed as a call option on the underlying value of the firm with a strike price equal to the face value of the firm's debt and a time-to-maturity of \\(T\\).
The equity value of the firm is hence a function of the firm's value (Black-Scholes-Merton model):
\\[ \\begin{equation} E=V\\mathcal{N}(d_1)-e^{-rT}F\\mathcal{N}(d_2) \\end{equation} \\]where,
and,
\\[ \\begin{equation} d_1 = \\frac{\\ln(V/F)+(r+0.5\\sigma_V^2)T}{\\sigma_V \\sqrt{T}} \\end{equation} \\]with \\(d_2 = d_1-\\sigma_V \\sqrt{T}\\).
Moreover, the volatility of the firm's equity is related to the volatility of the firm's value, which follows from Ito's lemma,
\\[ \\begin{equation} \\sigma_E = \\left(\\frac{V}{E}\\right)\\frac{\\partial E}{\\partial V}\\sigma_V \\end{equation} \\]In the Black-Scholes-Merton model, \\(\\frac{\\partial E}{\\partial V}=\\mathcal{N}(d_1)\\), so that
\\[ \\begin{equation} \\sigma_E = \\left(\\frac{V}{E}\\right)\\mathcal{N}(d_1)\\sigma_V \\end{equation} \\]We observe from the market:
We then infer and solve for:
Once we have \\(V\\) and \\(\\sigma_V\\), the distance to default (DD) can be calculated as
\\[ \\begin{equation} DD=\\frac{\\ln(V/F)+ (\\mu-0.5\\sigma_V^2)T}{\\sigma_V\\sqrt{T}} \\end{equation} \\]where,
The implied probability of default, or expected default frequency (EDF, registered trademark of Moody's KMV), is
\\[ \\begin{equation} \\pi_{Merton} = \\mathcal{N}\\left(-DD\\right) \\end{equation} \\]","tags":["SAS","Code","Merton","Default Probability"]},{"location":"posts/merton-dd/#estimation","title":"Estimation","text":"","tags":["SAS","Code","Merton","Default Probability"]},{"location":"posts/merton-dd/#an-iterative-approach","title":"An iterative approach","text":"To estimate \\(\\pi_{Merton}\\), an iterative procedure can be applied instead of solving equations (2) and (5) simultaneously (see Crosbie and Bohn (2003), Vassalou and Xing (2004), Bharath and Shumway (2008), etc.).
A na\u00efve approach by Bharath and Shumway (2008) that does not solve equations (2) and (5) is constructed as below.
The na\u00efve distance to default is then
\\[ \\begin{equation} \\text{na\u00efve } DD=\\frac{\\ln[(E+F)/F]+ (r_{it-1}-0.5\\sigma_V^2)T}{\\sigma_V\\sqrt{T}} \\end{equation} \\]and the na\u00efve default probability is
\\[ \\begin{equation} \\pi_{\\text{na\u00efve}} = \\mathcal{N}(-\\text{na\u00efve } DD) \\end{equation} \\]","tags":["SAS","Code","Merton","Default Probability"]},{"location":"posts/merton-dd/#code","title":"Code","text":"The na\u00efve method is too simple and skipped for now.
Here I discuss the iterative approach.
","tags":["SAS","Code","Merton","Default Probability"]},{"location":"posts/merton-dd/#original-sas-code-in-bharath-and-shumway-2008-rfs","title":"Original SAS code in Bharath and Shumway (2008 RFS)","text":"The original code is enclosed in the SSRN version of Bharath and Shumway (2008), and was available on Shumway's website.
However, there are two issues in this version of code:
cdt=100*year(date)+month(date)
accidentally restricts the \"past year\" daily stock returns to the \"past month\" later. Note that at line 42-43 it merges by permno
and cdt
, where cdt
refers to a certain year-month. We can pause the program after this data step to confirm that indeed there is only a month of data for each permno
.cdt=100*&yyy.+&mmm.;
.Other issues are minor and harmless.
A copy of this version can be found here on GitHub.
","tags":["SAS","Code","Merton","Default Probability"]},{"location":"posts/merton-dd/#my-code","title":"My code","text":"Based on the original SAS code in Bharath and Shumway (2008), I made some edits and below is a fully self-contained SAS code that executes smoothly. Note that I've corrected the above issues.
","tags":["SAS","Code","Merton","Default Probability"]},{"location":"posts/minimum-variance-hedge-ratio/","title":"Minimum Variance Hedge Ratio","text":"This note briefly explains what's the minimum variance hedge ratio and how to derive it in a cross hedge, where the asset to be hedged is not the same as underlying asset.
","tags":["Hedge"]},{"location":"posts/minimum-variance-hedge-ratio/#the-hedge-ratio-h","title":"The Hedge Ratio \\(h\\)","text":"The hedge ratio \\(h\\) is the ratio of the size of the hedging position to the exposure of the asset to be hedged:
Apparently, if we vary \\(h\\), the variance (risk) of the combined hedged position will also change.
","tags":["Hedge"]},{"location":"posts/minimum-variance-hedge-ratio/#the-optimal-minimum-variance-hedge-ratio-h","title":"The (Optimal) Minimum-Variance Hedge Ratio \\(h^*\\)","text":"Our objective in hedging is to manage the variance (risk) of our position, making it as low as possible by setting the hedge ratio \\(h\\) to be the optimal hedge ratio \\(h^*\\) that minimises the variance of the combined hedged position.
","tags":["Hedge"]},{"location":"posts/minimum-variance-hedge-ratio/#hedge-where-aa","title":"Hedge where \\(A'=A\\)","text":"It's relatively easy when the underlying asset of the futures (\\(A'\\)) is the same as the asset to be hedged (\\(A\\)), as they have a perfect correlation and the same variance. Thus, as long as the hedge ratio \\(h=1\\), where the size of hedging position equals the exposure of the asset held, the perfect correlation and same variance ensure the value changes in the hedging position offset the changes in the value of asset to be hedged, so that the variance of the hedged position is minimum at zero (ignoring other basis risks). This means, the optimal minimum-variance hedge ratio \\(h^*=1\\).
","tags":["Hedge"]},{"location":"posts/minimum-variance-hedge-ratio/#cross-hedge-where-a-neq-a","title":"Cross Hedge where \\(A' \\neq A\\)","text":"When the underlying asset of the futures (\\(A'\\)) differ from asset to be hedged (\\(A\\)), the optimal hedge ratio \\(h^*\\) that minimises the portfolio variance is not necessarily 1 anymore.
Let's now derive \\(h^*\\).
Let's consider a short hedge, where we long \\(S_t\\) and short \\(h\\times F_t\\), hence\uff1a
The optimal hedge ratio \\(h^*\\) is the hedge ratio that minimises the variance of \\(\\Delta C\\).
\\[ h^* =\\underset{h}{\\operatorname{argmin}} \\text{Var}(\\Delta C) =\\underset{h}{\\operatorname{argmin}} \\text{Var}(\\Delta S_t-h\\times \\Delta F_t) \\]We also know that
\\[ \\text{Var}(\\Delta S_t-h\\times \\Delta F_t) = \\sigma^2_S + h^2\\sigma^2_F - 2h(\\rho \\sigma_S \\sigma_F) \\]To minimise the variance, the first-order condition (FOC) is that
\\[ \\frac{\\partial \\text{Var}(\\Delta C)}{\\partial h}=2h\\sigma^2_F-2(\\rho \\sigma_S \\sigma_F)=0 \\]The optimal hedge ratio \\(h^*\\) is the \\(h\\) that solves the FOC above. Therefore,
\\[ h^* = \\rho \\frac{\\sigma_S}{\\sigma_F} \\]","tags":["Hedge"]},{"location":"posts/minimum-variance-hedge-ratio/#intuition","title":"Intuition","text":"The optimal hedge ratio \\(h^*\\) describes the optimal \\(N_F/N_A\\), so that the optimal size of the hedging position:
\\[ N_F^* = h^* \\times N_A \\]If \\(\\rho=1\\) and \\(\\sigma_F=\\sigma_S\\), then \\(h^*=1\\):
If \\(\\rho=1\\) and \\(\\sigma_F=2\\sigma_S\\), then \\(h^*=0.5\\):
If \\(\\rho<1\\), then \\(h^*\\) depends on \\(\\rho\\) and \\({\\sigma_S}/{\\sigma_F}\\):
Among many reasons why people find it hard to use cryptocurrency there's a simple one -- memorising the private key is too hard. So, people invented brain wallet, which turns a string of words into a private key and thus wallet.
It's genius in that now a user needs only to memorise whatever he or she used to create the wallet. You can turn your name, phone number, DoB, favourite quote, lover's home address, ..., literally anything into a cryptocurrency wallet. However, this also means that if someone else successfully guessed the passphrase you used, they can sweep all the coins you have!
","tags":["Bitcoin","Python","Code"]},{"location":"posts/never-use-a-brain-wallet/#python-brain-wallet-for-bitcoin","title":"Python brain wallet for Bitcoin","text":"After a little bit of research, I've put together a simple brain wallet Python script that turns any input string to a legal Bitcoin private key and its address.
import codecs\nimport hashlib\nimport ecdsa\nclass BrainWallet:\n@staticmethod\ndef generate_address_from_passphrase(passphrase):\nprivate_key = str(hashlib.sha256(\npassphrase.encode('utf-8')).hexdigest())\naddress = BrainWallet.generate_address_from_private_key(private_key)\nreturn private_key, address\n@staticmethod\ndef generate_address_from_private_key(private_key):\npublic_key = BrainWallet.__private_to_public(private_key)\naddress = BrainWallet.__public_to_address(public_key)\nreturn address\n@staticmethod\ndef __private_to_public(private_key):\nprivate_key_bytes = codecs.decode(private_key, 'hex')\n# Get ECDSA public key\nkey = ecdsa.SigningKey.from_string(\nprivate_key_bytes, curve=ecdsa.SECP256k1).verifying_key\nkey_bytes = key.to_string()\nkey_hex = codecs.encode(key_bytes, 'hex')\n# Add bitcoin byte\nbitcoin_byte = b'04'\npublic_key = bitcoin_byte + key_hex\nreturn public_key\n@staticmethod\ndef __public_to_address(public_key):\npublic_key_bytes = codecs.decode(public_key, 'hex')\n# Run SHA256 for the public key\nsha256_bpk = hashlib.sha256(public_key_bytes)\nsha256_bpk_digest = sha256_bpk.digest()\n# Run ripemd160 for the SHA256\nripemd160_bpk = hashlib.new('ripemd160')\nripemd160_bpk.update(sha256_bpk_digest)\nripemd160_bpk_digest = ripemd160_bpk.digest()\nripemd160_bpk_hex = codecs.encode(ripemd160_bpk_digest, 'hex')\n# Add network byte\nnetwork_byte = b'00'\nnetwork_bitcoin_public_key = network_byte + ripemd160_bpk_hex\nnetwork_bitcoin_public_key_bytes = codecs.decode(\nnetwork_bitcoin_public_key, 'hex')\n# Double SHA256 to get checksum\nsha256_nbpk = hashlib.sha256(network_bitcoin_public_key_bytes)\nsha256_nbpk_digest = sha256_nbpk.digest()\nsha256_2_nbpk = hashlib.sha256(sha256_nbpk_digest)\nsha256_2_nbpk_digest = sha256_2_nbpk.digest()\nsha256_2_hex = codecs.encode(sha256_2_nbpk_digest, 'hex')\nchecksum = sha256_2_hex[:8]\n# Concatenate public key and checksum to get the address\naddress_hex = (network_bitcoin_public_key + checksum).decode('utf-8')\nwallet = BrainWallet.base58(address_hex)\nreturn wallet\n@staticmethod\ndef base58(address_hex):\nalphabet = '123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz'\nb58_string = ''\n# Get the number of leading zeros and convert hex to decimal\nleading_zeros = len(address_hex) - len(address_hex.lstrip('0'))\n# Convert hex to decimal\naddress_int = int(address_hex, 16)\n# Append digits to the start of string\nwhile address_int > 0:\ndigit = address_int % 58\ndigit_char = alphabet[digit]\nb58_string = digit_char + b58_string\naddress_int //= 58\n# Add '1' for each 2 leading zeros\nones = leading_zeros // 2\nfor one in range(ones):\nb58_string = '1' + b58_string\nreturn b58_string\n
","tags":["Bitcoin","Python","Code"]},{"location":"posts/never-use-a-brain-wallet/#easily-cracking-a-wallet","title":"Easily \"cracking\" a wallet","text":"Let me show you some really easy-to-guess passphrases and their associated private keys and addresses. As an example, the code below uses \"password\" as the input passphrase and derives the private key and address from it.
passphrase = 'password'\nwallet = BrainWallet()\nprivate_key, address = wallet.generate_address_from_passphrase(passphrase)\nprint(f'passphrase: {passphrase}')\nprint(f'private key: {private_key}')\nprint(f'address: {address}')\n
The output is:
passphrase: password\nprivate key: 5e884898da28047151d0e56f8dc6292773603d0d6aabbdd62a11ef721d1542d8\naddress: 16ga2uqnF1NqpAuQeeg7sTCAdtDUwDyJav\n
As at May 22, 2019, this address has 45,014 transactions with a total of 0.3563 BTC (of course the balance is zero)! You can check its current balance at blockchain.com. Also, congratulations, you are now one of the many owners of this address/wallet. So next time you observe some coins transfered to it, you'll be able to use it as well (though I don't suggest you to do so)!
","tags":["Bitcoin","Python","Code"]},{"location":"posts/never-use-a-brain-wallet/#some-other-cracked-wallets","title":"Some other \"cracked\" wallets","text":"I explored a little bit more and it's surprising to find out how easy it is to crack a wallet this way. Below is a table of some passphrases and their associated keys and addresses.
Passphrase Private Key Address Used satoshi da2876b3eb31edb4436fa4650673fc6f01f90de2f1793c4ec332b2387b09726f 1ADJqstUMBB5zFquWg19UqZ7Zc6ePCpzLE True bitcoin 6b88c087247aa2f07ee1c5956b8e1a9f4c7f892a70e324f1bb3d161e05ca107b 1E984zyYbNmeuumzEdqT8VSL8QGJi3byAD True hello world b94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9 1CS8g7nwaxPPprb4vqcTVdLCuCRirsbsMb True testing cf80cd8aed482d5d1527d7dc72fceff84e6326592848447d2dc0b0e87dfc9a90 1JdDsbYYRSpsTnBVgenruULVeUjt5z6WnR True god 5723360ef11043a879520412e9ad897e0ebcb99cc820ec363bfecc9d751a1a99 1KxmSmcMTmPvU1qSLYpJLrqnSzBoQ53NXN True terminator aa802f654e3ae7aaa1b73f8724056a05e2691accea8dd90057916080f84d7e93 18kvt3D6K1CG8MxGP6ke7q6vLU5NGpLZdR True abc ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad 1NEwmNSC7w9nZeASngHCd43Bc5eC2FmXpn TrueAnd a lot of swear words are used as well, but I'm just going to skip them.
Apart from the single world and short phrases, some people do use famous quotes. As an example, see this one from A Tale of Two Cities:
it was the best of times it was the worst of times
Its private key is af8da705bfd95621983e5cf4232ac1ca0c79b47122e3defd8a98fa9a4387d985
and its address is 17WenQJaYvqCNumebQU54TsixWtQ1GQ4ND. It has received 1 BTC in total but again the balance is zero, lol.
Never use a brain wallet. Because if you can think of it, someone else might also be able to come up with same passphrase. But, if you are comfortable or absolutely sure that your passphrase is secret, feel free to use the script above and make yourself a wallet. \ud83d\ude0f
","tags":["Bitcoin","Python","Code"]},{"location":"posts/probability-of-informed-trading/","title":"Probability of Informed Trading (PIN)","text":"Note
This post was originally published on my old blog in March 2018 in Chinese. Translation is provided by ChatGPT-4.
The above figure is based on data I collected in 2019 of trades on Binance.
In the market microstructure literature, Easley et. al. (1996) proposed a trading model that can decompose the bid-ask spread. The most commendable aspect of this model is the introduction of the \"Probability of Informed Trading,\" or PIN, which serves as a means of measuring the informational component in the spread. As the name suggests, under ideal conditions, PIN can reflect the probability of informed trading in a market with market maker. In this article, I attempt to comb through the modeling process in the Easley et. al. (1996) paper and discuss how to handle the objective function in maximum likelihood estimation to avoid overflow errors during computation.
","tags":["PIN","Informed Trading"]},{"location":"posts/probability-of-informed-trading/#model","title":"Model","text":"Assume that the buy and sell orders of informed and uninformed traders follow independent Poisson processes, and the following tree diagram describes the entire trading process:
Next, assume that the market maker is a Bayesian, that is, he will update his understanding of the overall market status, especially whether there is new information on that day, by observing trades and trading rates. Suppose each trading day is independent, \\(P(t)=(P_n(t), P_b(t), P_g(t))\\) is the market maker's prior probability perception, where \\(n\\) represents no new information, \\(b\\) represents bearish bad news, and \\(g\\) represents bullish good news, so \\(P(t)=(1-\\alpha, \\alpha\\delta, \\alpha(1-\\delta))\\).
Let \\(S_t\\) be the event of a sell order arriving at time \\(t\\), and \\(B_t\\) be the event of a buy order arriving at time \\(t\\). Also, let \\(P(t|S_t)\\) be the updated probability perception of the market maker after observing a sell order arriving at time \\(t\\) based on the existing information. Then, according to Bayes' theorem, if there is no new information at time \\(t\\) and the market maker observes a sell order, the posterior probability \\(P_n(t|S_t)\\) should be:
\\[ \\begin{equation} P_n(t|S_t)=\\frac{P_n(t)\\varepsilon}{\\varepsilon+P_b(t)\\mu}\\end{equation} \\]Similarly, if there is bearish information and the market maker observes a sell order at time \\(t\\), the posterior probability \\(P_b(t|S_t)\\) should be:
\\[ \\begin{equation} P_b(t|S_t)=\\frac{P_b(t)(\\varepsilon+\\mu)}{\\varepsilon+P_b(t)\\mu}\\end{equation} \\]If there is bullish information and the market maker observes a sell order at time \\(t\\), the posterior probability \\(P_g(t|S_t)\\) should be:
\\[ \\begin{equation} P_g(t|S_t)=\\frac{P_g(t)\\varepsilon}{\\varepsilon+P_b(t)\\mu} \\end{equation} \\]Thus, the expected zero-profit bid price at time \\(t\\) on day \\(i\\) should be the conditional expectation of the asset value based on historical information and observing sell order at this time, that is,
\\[ \\begin{equation} b(t)=\\frac{P_n(t)\\varepsilon V^*_i+P_b(t)(\\varepsilon+\\mu)\\underline{V}_i+P_g(t)\\varepsilon\\overline{V}_i}{\\varepsilon+P_b(t)\\mu} \\end{equation} \\]Here, \\(V_i\\) is the value of the asset at the end of day \\(i\\), and let the asset value be \\(\\overline{V}_i\\) when there is positive news, \\(\\underline{V}_i\\) when there is negative news, and \\(V^*_i\\) when there is no news, with \\(\\underline{V}_i < V^*_i < \\overline{V}_i\\).
At this point, the ask price should be:
\\[ \\begin{equation} a(t)=\\frac{P_n(t)\\varepsilon V^*_i+P_b(t)\\varepsilon\\underline{V}_i+P_g(t)(\\varepsilon+\\mu)\\overline{V}_i}{\\varepsilon+P_g(t)\\mu}\\end{equation} \\]Let's associate these bid and ask prices with the expected asset value at time \\(t\\). Considering that the conditional expectation of the asset value at this time is:
\\[ \\begin{equation} E[V_i|t]=P_n(t)V^*_i+P_b(t)\\underline{V}_i+P_g(t)\\overline{V}_i\\end{equation} \\]we can write the above \\(b(t)\\) and \\(a(t)\\) as:
\\[ \\begin{equation} b(t)=E[V_i|t]-\\frac{\\mu P_b(t)}{\\varepsilon+\\mu P_b(t)}(E[V_i|t]-\\underline{V}_i)\\end{equation} \\] \\[ \\begin{equation} a(t)=E[V_i|t]+\\frac{\\mu P_g(t)}{\\varepsilon+\\mu P_g(t)}(\\overline{V}_i-E[V_i|t])\\end{equation} \\]Thus, the bid-ask spread is \\(a(t)-b(t)\\), which is:
\\[ \\begin{equation} a(t)-b(t)=\\frac{\\mu P_g(t)}{\\varepsilon+\\mu P_g(t)}(\\overline{V}_i-E[V_i|t])+\\frac{\\mu P_b(t)}{\\varepsilon+\\mu P_b(t)}(E[V_i|t]-\\underline{V}_i)\\end{equation} \\]This indicates that the bid-ask spread at time \\(t\\) is actually:
The probability of a buy order being an informed trade \\(\\times\\) the expected loss due to the informed buyer + the probability of a sell order being an informed trade \\(\\times\\) the expected loss due to the informed seller
Therefore, the probability that any trade at time \\(t\\) is based on asymmetric information from informed traders is the sum of these two probabilities:
\\[ \\begin{equation} PIN(t)=\\frac{\\mu P_g(t)}{\\varepsilon+\\mu P_g(t)}+\\frac{\\mu P_b(t)}{\\varepsilon+\\mu P_b(t)}=\\frac{\\mu(1-P_n(t))}{\\mu(1-P_n(t))+2\\varepsilon}\\end{equation} \\]If no information event occurs (\\(P_n(t)=1\\)) or there are no informed trades (\\(\\mu=0\\)), both \\(PIN\\) and the bid-ask spread should be zero. If the probabilities of positive and negative news are equal, i.e., \\(\\delta=1-\\delta\\), the bid-ask spread can be simplified to:
\\[ \\begin{equation} a(t)-b(t)=\\frac{\\alpha\\mu}{\\alpha\\mu+2\\varepsilon}[\\overline{V}_i-\\underline{V}_i]\\end{equation} \\]And our \\(PIN\\) measure is simplified to:
\\[ \\begin{equation} PIN(t)=\\frac{\\alpha\\mu}{\\alpha\\mu+2\\varepsilon}\\end{equation} \\]","tags":["PIN","Informed Trading"]},{"location":"posts/probability-of-informed-trading/#model-estimation","title":"Model Estimation","text":"After the model is established, let's talk about the parameter estimation of this model. The parameters we need to estimate, \\(\\theta=(\\alpha, \\delta, \\varepsilon, \\mu)\\), are actually very difficult to estimate. This is because we cannot directly observe them, and can only observe the arrival of buy and sell orders. In this model, the daily buy and sell orders are assumed to follow one of the three Poisson processes. Although we don't know which process it is specifically, the overall idea is: more buy orders imply potential good news, more sell orders imply potential bad news, and overall buying and selling will decrease when there is no new information. With this idea in mind, we can try to estimate \\(\\theta\\) using the maximum likelihood estimation method.
First, according to the trading model shown in the diagram, assume that there is bad news on a certain day, then the arrival rate of sell orders is \\((\\mu+\\varepsilon)\\), which means both informed and uninformed traders participate in selling. The arrival rate of buy orders is \\(\\varepsilon\\), that is, only uninformed traders will continue to buy. Therefore, the probability of observing a sequence of trades with \\(B\\) buy orders and \\(S\\) sell orders in a period of time is:
\\[ \\begin{equation} e^{-\\varepsilon} \\frac{\\varepsilon^B}{B!} e^{-(\\mu+\\varepsilon)} \\frac{(\\mu+\\varepsilon)^S}{S!} \\end{equation} \\]If there is good news on a certain day, the probability of observing a sequence of trades with \\(B\\) buy orders and \\(S\\) sell orders in a period of time is:
\\[ \\begin{equation} e^{-\\varepsilon} \\frac{\\varepsilon^B}{B!} e^{-\\varepsilon} \\frac{\\varepsilon^S}{S!} \\end{equation} \\]If there is no new information on a certain day, the probability of observing a sequence of trades with \\(B\\) buy orders and \\(S\\) sell orders in a period of time is:
\\[ \\begin{equation} e^{-(\\mu+\\varepsilon)} \\frac{(\\mu+\\varepsilon)^B}{B!} e^{-\\varepsilon} \\frac{\\varepsilon^S}{S!} \\end{equation} \\]So, the probability of observing a total of \\(B\\) buy orders and \\(S\\) sell orders on a trading day should be the weighted average of the above three possibilities, and the weights here are the probabilities of each possibility. Therefore, we can write out the likelihood function:
\\[ \\begin{align} L((B, S)| \\theta)=\u00a0 &(1-\\alpha)e^{-\\varepsilon} \\frac{\\varepsilon^B}{B!} e^{-\\varepsilon} \\frac{\\varepsilon^S}{S!} \\\\ &+ \\alpha\\delta\u00a0 e^{-\\varepsilon} \\frac{\\varepsilon^B}{B!} e^{-(\\mu+\\varepsilon)} \\frac{(\\mu+\\varepsilon)^S}{S!} \\\\ &+ \\alpha(1-\\delta) e^{-(\\mu+\\varepsilon)} \\frac{(\\mu+\\varepsilon)^B}{B!} e^{-\\varepsilon} \\frac{\\varepsilon^S}{S!} \\end{align} \\]Hence, the objective function of the maximum likelihood function is:
\\[ \\begin{equation} L(D|\\theta)=\\prod_{i=1}^{N}L(\\theta|(B_i, S_i)) \\end{equation} \\]","tags":["PIN","Informed Trading"]},{"location":"posts/probability-of-informed-trading/#bottomline","title":"Bottomline","text":"The problem seems to end here. With the objective function, it seems to be all set as long as you program it and pay attention to the parameter boundaries. However, the real challenge comes next, because if you really write the objective function like this and run it, you will inevitably encounter an overflow error. After all, this function is filled with powers and factorials. Even if the time element is chosen very small, some highly liquid assets will still have hundreds of transactions within a few seconds. Therefore, both \\(B!\\), \\(S!\\), and \\((\\mu+\\varepsilon)^B\\) can beautifully crash your program. So, further processing of the objective function here is extremely important.
By observing equation (16), the three terms in the likelihood function can actually extract a common factor \\(e^{-2\\varepsilon}(\\mu+\\varepsilon)^{B+S}/(B!S!)\\)! After extracting this common factor, you can also substitute \\(x\\equiv \\frac{\\varepsilon}{\\mu+\\varepsilon}\\in [0, 1]\\) into it. The transformed likelihood function, after taking the logarithm, will be in the form:
\\[ \\begin{align} l((B, S)| \\theta)=&\\ln(L((B, S)| \\theta))\\\\ =&-2\\varepsilon+(B+S)\\ln(\\mu+\\varepsilon) \\\\ &+\\ln((1-\\alpha)x^{B+S}+\\alpha\\delta e^{-\\mu}x^B + \\alpha(1-\\delta)e^{-\\mu}x^S) \\\\ &-\\ln(B!S!) \\end{align} \\]Now, since the last term \\(\\ln(B!S!)\\) does not affect the parameter estimation at all, it can be safely excluded. The remaining part can perfectly avoid overflow. Personally, I think the brilliant move here is the introduction of \\(x\\equiv \\frac{\\varepsilon}{\\mu+\\varepsilon}\\in [0, 1]\\), which prevents the overflow error caused by \\((\\mu+\\varepsilon)>1\\).
","tags":["PIN","Informed Trading"]},{"location":"posts/python-shared-memory-in-multiprocessing/","title":"Python Shared Memory in Multiprocessing","text":"Python 3.8 introduced a new module multiprocessing.shared_memory
that provides shared memory for direct access across processes. My test shows that it significantly reduces the memory usage, which also speeds up the program by reducing the costs of copying and moving things around.1
In this test, I generated a 240MB numpy.recarray
from a pandas.DataFrame
with datetime
, int
and str
typed columns. I used numpy.recarray
because it can preserve the dtype
of each column, so that later I can reconstruct the same array from the buffer of shared memory.
I performed a simple numpy.nansum
on the numeric column of the data using two methods. The first method uses multiprocessing.shared_memory
where the 4 spawned processes directly access the data in the shared memory. The second method passes the data to the spawned processes, which effectively means each process will have a separate copy of the data.
A quick run of the test code below shows that the first method based on shared_memory
uses minimal memory (peak usage is 0.33MB) and is much faster (2.09s) than the second one where the entire data is copied and passed into each process (peak memory usage of 1.8G and takes 216s). More importantly, the memory usage under the second method is consistently high.
from multiprocessing.shared_memory import SharedMemory\nfrom multiprocessing.managers import SharedMemoryManager\nfrom concurrent.futures import ProcessPoolExecutor, as_completed\nfrom multiprocessing import current_process, cpu_count, Process\nfrom datetime import datetime\nimport numpy as np\nimport pandas as pd\nimport tracemalloc\nimport time\ndef work_with_shared_memory(shm_name, shape, dtype):\nprint(f'With SharedMemory: {current_process()=}')\n# Locate the shared memory by its name\nshm = SharedMemory(shm_name)\n# Create the np.recarray from the buffer of the shared memory\nnp_array = np.recarray(shape=shape, dtype=dtype, buf=shm.buf)\nreturn np.nansum(np_array.val)\ndef work_no_shared_memory(np_array: np.recarray):\nprint(f'No SharedMemory: {current_process()=}')\n# Without shared memory, the np_array is copied into the child process\nreturn np.nansum(np_array.val)\nif __name__ == \"__main__\":\n# Make a large data frame with date, float and character columns\na = [\n(datetime.today(), 1, 'string'),\n(datetime.today(), np.nan, 'abc'),\n] * 5000000\ndf = pd.DataFrame(a, columns=['date', 'val', 'character_col'])\n# Convert into numpy recarray to preserve the dtypes (1)\nnp_array = df.to_records(index=False, column_dtypes={'character_col': 'S6'})\ndel df\nshape, dtype = np_array.shape, np_array.dtype\nprint(f\"np_array's size={np_array.nbytes/1e6}MB\")\n# With shared memory\n# Start tracking memory usage\ntracemalloc.start()\nstart_time = time.time()\nwith SharedMemoryManager() as smm:\n# Create a shared memory of size np_arry.nbytes\nshm = smm.SharedMemory(np_array.nbytes)\n# Create a np.recarray using the buffer of shm\nshm_np_array = np.recarray(shape=shape, dtype=dtype, buf=shm.buf)\n# Copy the data into the shared memory\nnp.copyto(shm_np_array, np_array)\n# Spawn some processes to do some work\nwith ProcessPoolExecutor(cpu_count()) as exe:\nfs = [exe.submit(work_with_shared_memory, shm.name, shape, dtype)\nfor _ in range(cpu_count())]\nfor _ in as_completed(fs):\npass\n# Check memory usage\ncurrent, peak = tracemalloc.get_traced_memory()\nprint(f\"Current memory usage {current/1e6}MB; Peak: {peak/1e6}MB\")\nprint(f'Time elapsed: {time.time()-start_time:.2f}s')\ntracemalloc.stop()\n# Without shared memory\ntracemalloc.start()\nstart_time = time.time()\nwith ProcessPoolExecutor(cpu_count()) as exe:\nfs = [exe.submit(work_no_shared_memory, np_array)\nfor _ in range(cpu_count())]\nfor _ in as_completed(fs):\npass\n# Check memory usage\ncurrent, peak = tracemalloc.get_traced_memory()\nprint(f\"Current memory usage {current/1e6}MB; Peak: {peak/1e6}MB\")\nprint(f'Time elapsed: {time.time()-start_time:.2f}s')\ntracemalloc.stop()\n
Warning
A very important note about using multiprocessing.shared_memory
, as at June 2020, is that the numpy.ndarray
cannot have a dtype=dtype('O')
. That is, the dtype
cannot be dtype(object)
. If it is, there will be a segmentation fault when child processes try to access the shared memory and dereference it. It happens when the column contains strings.
To solve this problem, you need to specify the dtype
in df.to_records()
. For example:
np_array = df.to_records(index=False\uff0ccolumn_dtypes={'character_col': 'S6'})\n
Here, we specify that character_col
contains strings of length 6. If it contains Unicode, we can use 'U6'
instead. Longer strings will then be truncated at the specified length. As such, there won't be anymore segfault.
This test is performed on a 2017 12-inch MacBook with 1.3 GHz Dual-Core Intel Core i5 and 8 GB 1867 MHz LPDDR3 RAM.\u00a0\u21a9
This note is just to show that the different variants of Black-Scholes formula in textbook and tutorial solutions are in fact the same.
This is the one shown in our formula sheet, and is also the traditional presentation of Black-Scholes model.
\\[ \\begin{equation} C=SN(d_1)-N(d_2)Ke^{-r_f t} \\end{equation} \\] \\[ \\begin{equation} d_1=\\frac{ln(\\frac{S}{K})+(r_f+\\frac{\\sigma^2}{2})t}{\\sigma \\sqrt{t}} \\end{equation} \\] \\[ \\begin{equation} d_2=d_1 - \\sigma \\sqrt{t} \\end{equation} \\]","tags":["Option"]},{"location":"posts/reconciliation-of-black-scholes-variants/#variant-2","title":"Variant 2","text":"This one comes from textbook, and looks slightly different in that \\(PV(K)\\) replaces \\(K\\) in the natural logarithm.
\\[ \\begin{equation} C=SN(d_1)-N(d_2)PV(K) \\end{equation}\\] \\[ \\begin{equation} d_1=\\frac{ln(\\frac{S}{PV(K)})}{\\sigma \\sqrt{t}}+\\frac{\\sigma \\sqrt{t}}{2} \\end{equation} \\] \\[ \\begin{equation} d_2=d_1 - \\sigma \\sqrt{t} \\end{equation} \\]However, it's in fact easy to show that \\(d_1\\) in eq. (5) is the same as in eq. (2): Under continuous compounding, \\(PV(K)=Ke^{-r_f t}\\):
\\[ \\begin{align} d_1 &=\\frac{ln(\\frac{S}{PV(K)})}{\\sigma \\sqrt{t}}+\\frac{\\sigma \\sqrt{t}}{2}\\newline &=\\frac{ln(\\frac{S}{Ke^{-r_f t}})}{\\sigma \\sqrt{t}} +\\frac{\\frac{\\sigma^2}{2}t}{\\sigma \\sqrt{t}}\\newline &=\\frac{ln(\\frac{S}{Ke^{-r_f t}})+\\frac{\\sigma^2}{2}t}{\\sigma \\sqrt{t}}\\newline &=\\frac{ln(\\frac{S}{K})+r_f t+\\frac{\\sigma^2}{2}t}{\\sigma \\sqrt{t}}\\newline &=\\frac{ln(\\frac{S}{K})+(r_f+\\frac{\\sigma^2}{2})t}{\\sigma \\sqrt{t}}=eq. (2) \\end{align} \\]Therefore, the two variants are effectively the same under continuous compounding. \u00a0
","tags":["Option"]},{"location":"posts/specification-curve-analysis/","title":"Specification Curve Analysis","text":"","tags":["Stata","Code","Specification Curve","Apps"]},{"location":"posts/specification-curve-analysis/#motivation","title":"Motivation","text":"More often than not, empirical researchers need to argue that their chosen model specification reigns. If not, they need to run a battery of tests on alternative specifications and report them. The problem is, researchers can fit a few tables each with a few models in the paper at best, and it's extremely hard for readers to know whether the reported results are being cherry-picked.
So, why not run all possible model specifications and find a concise way to report them all?
","tags":["Stata","Code","Specification Curve","Apps"]},{"location":"posts/specification-curve-analysis/#the-specification-curve","title":"The Specification Curve","text":"The idea of specification curve is a direct answer to the question provided by Simonsohn, Simmons and Nelson (2020).1 2
To intuitively explain this concept, below is the Figure 2 from my paper Organization Capital and Executive Performance Incentives on the Journal of Banking & Finance,3 which is used to show the robustness of an substitution effect of organization capital on executive pay-for-performance sensitivity. Therefore, the estimated coefficients for the variable of interest OC are expected to be negative across different model specifications.
The plot is made up of two parts. The upper panel plots the coefficient estimates of OC in various model specifications, in descending order, and the associated 95% confidence intervals. Sample sizes of each model are plotted as bars at the bottom of the upper panel. For simplicity, we annotate only the maximum and minimum coefficient estimates, as well as the threshold of zero. The lower panel reports the exact specification for each model, where colored dots indicate the choices from various specification alternatives. Both panels share the same x-axis of model number.
To interpret this specification curve, for example, OC has an estimated coefficient of \u22120.11 in the first model, which uses the natural logarithm of DELTA_MGMT (a measure of executive pay-for-performance sensitivity) as the dependent variable, and control variables as in the baseline model, including industry fixed effects and year fixed effects, clustering standard errors at the firm level, and is estimated on the full sample.
Further, the ordered nature of the curve implies that this is the minimum estimated impact of OC on ln(DELTA_MGMT), whereas the maximum estimated coefficient is doubled at \u22120.22 when the industry fixed effects are replaced with the more conservative firm fixed effects and estimated on the sample excluding global financial crisis period. More importantly, in all specifications, we find the coefficient estimates of OC to be statistically significant. Using an alternative measure of executive pay-for-performance sensitivity as the dependent variable, again, has minimal impact on the documented substitution effect of OC.
This specification curve reports a total of 2*2*4*1*2=32 specifications:
Beyond reporting all estimates from hundreds and thousands of models, the more appealing point of specification curve is that we can identify the most impactful factors in specifying the model. As the models are sorted by the coefficient estimates, the distribution of dots in the lower panel can reveal whether certain specification choices drive the results.
Of course, even 32 models cannot exhaust all possible specifications. Nevertheless, by addressing the most critical ones, we are able to use one specification curve plot to convince readers that our findings are robust.
","tags":["Stata","Code","Specification Curve","Apps"]},{"location":"posts/specification-curve-analysis/#specurve-stata-command-for-specification-curve-analysis","title":"specurve
- Stata command for specification curve analysis","text":"I developed a Stata command specurve
for specification curve analysis. It is written in Stata Mata and has no external dependencies.4 The source code is available at GitHub.
Run the following command in Stata:
net install specurve, from(\"https://raw.githubusercontent.com/mgao6767/specurve/master\") replace\n
","tags":["Stata","Code","Specification Curve","Apps"]},{"location":"posts/specification-curve-analysis/#example-usage-output","title":"Example usage & output","text":"","tags":["Stata","Code","Specification Curve","Apps"]},{"location":"posts/specification-curve-analysis/#regressions-with-reghdfe","title":"Regressions with reghdfe
","text":". use \"http://www.stata-press.com/data/r13/nlswork.dta\", clear\n(National Longitudinal Survey. Young Women 14-26 years of age in 1968)\n\n. copy \"https://mingze-gao.com/specurve/example_config_nlswork_reghdfe.yml\" ., replace\n. specurve using example_config_nlswork_reghdfe.yml, saving(specurve_demo)\n
","tags":["Stata","Code","Specification Curve","Apps"]},{"location":"posts/specification-curve-analysis/#iv-regressions-with-ivreghdfe","title":"IV regressions with ivreghdfe
","text":". copy \"https://mingze-gao.com/specurve/example_config_nlswork_ivreghdfe.yml\" ., replace\n. specurve using example_config_nlswork_ivreghdfe.yml, cmd(ivreghdfe) rounding(0.01) title(\"IV regression with ivreghdfe\")\n
Check help specurve
in Stata for a step-by-step guide.
Estimation results are saved in the frame named \"specurve\".
Use frame change specurve
to check the results.
Use frame change default
to switch back to the original dataset.
Simonsohn, Uri and Simmons, Joseph P. and Nelson, Leif D., 2020, Specification Curve Analysis, Nature Human Behaviour.\u00a0\u21a9
Special thanks to Rawley Heimer from Boston College who visited our discipline in 2019 and introduced the Specification Curve Analysis to us in the seminar on research methods.\u00a0\u21a9
This plot was made using a previous version of specurve
.\u00a0\u21a9
Previous versions depend on Stata 16's Python integration.\u00a0\u21a9
Nowadays top journals favour more granular studies. Sometimes it's useful to dig into the raw SEC filings and perform textual analysis. This note documents how I download all historical SEC filings via EDGAR and conduct some textual analyses.
Tip
If you don't require a very customized textual analysis, you should try for example SeekEdgar.com.
","tags":["Textual Analysis","SEC","Python","Code"]},{"location":"posts/textual-analysis-on-sec-filings/#1-build-a-master-index-of-sec-filings","title":"1. Build a master index of SEC filings","text":"I use the python-edgar
to download quarterly zipped index files to ./edgar-idx
.
$ mkdir ~/edgar && cd ~/edgar\n$ git clone https://github.com/edouardswiac/python-edgar.git\n$ python ./python-edgar/run.py -d ./edgar-idx\n
Then merge the downloaded tsv files into a master file using cat
.
$ cat ./edgar-idx/*.tsv > ./edgar-idx/master.tsv\n$ du -h ./edgar-idx/master.tsv\n
The resulting master.tsv
is about 2.6G as at Feb 2020. I then use the following python script to build a SQLite database for more efficient query.
# Load index files in `edgar-idx` to a sqlite database.\nimport sqlite3\nEDGAR_BASE = \"https://www.sec.gov/Archives/\"\ndef parse(line):\n# each line: \"cik|firm_name|file_type|date|url_txt|url_html\"\n# an example:\n# \"99780|TRINITY INDUSTRIES INC|8-K|2020-01-15|edgar/data/99780/0000099780-\\\n# 20-000008.txt|edgar/data/99780/0000099780-20-000008-index.html\"\nline = tuple(line.split('|')[:5])\nl = list(line)\nl[-1] = EDGAR_BASE + l[-1]\nreturn tuple(l)\nif __name__ == '__main__':\nconn = sqlite3.connect(r\"edgar-idx.sqlite3\")\nc = conn.cursor()\nc.execute('''CREATE TABLE IF NOT EXISTS edgar_idx \n (cik TEXT, firm_name TEXT, file_type TEXT, date DATE, url TEXT,\n PRIMARY KEY(cik, file_type, date));''')\nfilename = './edgar-idx/master.tsv'\nwith open(filename, 'r') as f:\nlines = f.readlines()\ndata = [parse(line) for line in lines]\nc.executemany('INSERT OR IGNORE INTO edgar_idx \\\n (cik, firm_name, file_type, date, url) VALUES (?,?,?,?,?)', data)\nconn.commit()\nconn.close()\n
","tags":["Textual Analysis","SEC","Python","Code"]},{"location":"posts/textual-analysis-on-sec-filings/#2-download-filings-from-edgar","title":"2. Download filings from EDGAR","text":"I write the following script to download filings from EDGAR. Note that this script is only a skeleton. The full implementation has proper logging, speed control and detailed error handling. For example, you'll need to keep track of failures and re-download them later.
Warning
As per SEC's policy, you should limit concurrent requests to below 10 per second. Hence, there is no need to use a proxy pool, such as Scylla
.
This example script download all 8-K files to ./data/{cik}/{file_type}/{date}.txt.gz
.
Compression is highly recommended unless you've TBs of free disk space!
# Download all 8-K filings.\nimport os\nimport sqlite3\nimport requests\nimport concurrent.futures\nimport gzip\nimport tqdm\ndef download(job):\ncik, _, file_type, date, url = job\ntry:\nres = requests.get(url)\nfilename = f'./data/{cik}/{file_type}/{date}.txt.gz'\nif res.status_code == 200:\nwith gzip.open(filename, 'wb') as f:\nf.write(res.content)\nexcept Exception:\npass\nif __name__ == \"__main__\":\n# select what to download\nconn = sqlite3.connect(r\"edgar-idx.sqlite3\")\nc = conn.cursor()\nc.execute('SELECT * FROM edgar_idx WHERE file_type=\"8-K\";')\njobs = c.fetchall()\nconn.close()\n# start downloading\nprogress = tqdm.tqdm(total=len(jobs))\nfutures = []\nwith concurrent.futures.ThreadPoolExecutor(max_workers=16) as exe:\nfor job in jobs:\ncik, _, file_type, date, url = job\nfilename = f'./data/{cik}/{file_type}/{date}.txt.gz'\nos.makedirs(os.path.dirname(filename), exist_ok=True)\nif os.path.exists(filename):\nprogress.update()\nelse:\nf = exe.submit(download, job)\nf.add_done_callback(progress.update)\nfutures.append(f)\nfor f in concurrent.futures.as_completed(futures):\npass\n
","tags":["Textual Analysis","SEC","Python","Code"]},{"location":"posts/textual-analysis-on-sec-filings/#3-example-textual-analyses","title":"3. Example textual analyses","text":"The downloaded txt files are the text version of filings htmls, which generally are well structured. Specifically, each filing is structured as:
<SEC-DOCUMENT>\n<SEC-HEADER></SEC-HEADER>\n<DOCUMENT>\n<TYPE>\n<SEQUENCE>\n<FILENAME>\n<DESCRIPTION>\n<TEXT>\n</TEXT>\n</DESCRIPTION>\n</FILENAME>\n</SEQUENCE>\n</TYPE>\n</DOCUMENT>\n<DOCUMENT></DOCUMENT>\n<DOCUMENT></DOCUMENT>\n ...\n</SEC-DOCUMENT>\n
Example <SEC-DOCUMENT>\n<SEC-HEADER></SEC-HEADER>\n<DOCUMENT>\n<TYPE>8-K\n <SEQUENCE>1\n <FILENAME>f13478e8vk.htm\n <DESCRIPTION>FORM 8-K\n <TEXT>\n ...\n </TEXT>\n</DESCRIPTION>\n</FILENAME>\n</SEQUENCE>\n</TYPE>\n</DOCUMENT>\n<DOCUMENT>\n<TYPE>EX-99.1\n <SEQUENCE>2\n <FILENAME>f13478exv99w1.htm\n <DESCRIPTION>EXHIBIT 99.1\n <TEXT>\n ...\n </TEXT>\n</DESCRIPTION>\n</FILENAME>\n</SEQUENCE>\n</TYPE>\n</DOCUMENT>\n<DOCUMENT></DOCUMENT>\n ...\n</SEC-DOCUMENT>\n
","tags":["Textual Analysis","SEC","Python","Code"]},{"location":"posts/textual-analysis-on-sec-filings/#31-extract-all-items-reported-in-8-k-filings-since-2004","title":"3.1 Extract all items reported in 8-K filings since 2004","text":"Since 2004, SEC requires companies to file 8-K within 4 business days of many types of events. For a short description, see SEC's fast answer to Form 8-K. The detailed instruction (PDF) is available at here.
To extract all items reported in each filing since 2004, there are several ways. First, I can use a regular expression to extract all \"Item X.XX\"
from the 8-K <DOCUMENT>
. Or, I can take advantage of the information in <SEC-HEADER>
. Below is an example <SEC-HEADER>
1, of which the lines of ITEM INFORMATION
actually describe the items reported in the filing.
<SEC-HEADER>0000079732-02-000036.hdr.sgml : 20020802\n<ACCEPTANCE-DATETIME>20020802082752\nACCESSION NUMBER: 0000079732-02-000036\nCONFORMED SUBMISSION TYPE: 8-K\nPUBLIC DOCUMENT COUNT: 4\nCONFORMED PERIOD OF REPORT: 20020801\nITEM INFORMATION: Changes in control of registrant\nITEM INFORMATION: Financial statements and exhibits\nFILED AS OF DATE: 20020802\n\nFILER:\n\n COMPANY DATA: \n COMPANY CONFORMED NAME: ATLANTIC CITY ELECTRIC CO\n CENTRAL INDEX KEY: 0000008192\n STANDARD INDUSTRIAL CLASSIFICATION: ELECTRIC SERVICES [4911]\n IRS NUMBER: 210398280\n STATE OF INCORPORATION: NJ\n FISCAL YEAR END: 1231\n\n FILING VALUES:\n FORM TYPE: 8-K\n SEC ACT: 1934 Act\n SEC FILE NUMBER: 001-03559\n FILM NUMBER: 02717802\n\n BUSINESS ADDRESS: \n STREET 1: 800 KING STREET\n STREET 2: PO BOX 231\n CITY: WILMINGTON\n STATE: DE\n ZIP: 19899\n BUSINESS PHONE: 6096454100\n\n MAIL ADDRESS: \n STREET 1: 800 KING STREET\n STREET 2: PO BOX 231\n CITY: WILMINGTON\n STATE: DE\n ZIP: 19899\n</SEC-HEADER>\n
Following this strategy, I write the code below to extract all items reported in 8-K filings since 2004. I didn't use regex for this task because the text portion of the filing is actually dirty. For instance, you'll need to remove all html tags, and be careful about the \"non-breaking space\",
, etc. My experience is that using <SEC-HEADER>
for this task is the best.
# Extract all items reported in 8-K filings since 2004.\nimport os\nimport gzip\nimport tqdm\nimport sqlite3\nimport concurrent.futures\nBASE_DIR = './data'\nFILE_TYPE = '8-K'\nDB = \"result.sqlite3\"\ndef walk_dirpath(cik, file_type):\n\"\"\" Yield paths of all files for a given cik and file type \"\"\"\nfor root, _, files in os.walk(os.path.join(BASE_DIR, cik, file_type)):\nfor filename in files:\nyield os.path.join(root, filename)\ndef regsearch(cik):\nmatches = []\nfor filepath in walk_dirpath(cik, FILE_TYPE):\ndate = os.path.split(filepath)[1].strip('.txt.gz')\nif int(date.split('-')[0]) < 2004:\ncontinue\nwith gzip.open(filepath, 'rb') as f:\ndata = f.readlines()\nls = [l for l in data if l.startswith(b'ITEM INFORMATION')]\nfor l in ls:\nitem = l.decode().replace('\\t','').replace('ITEM INFORMATION:', '')\nif len(item.strip()):\nmatches.append((cik, FILE_TYPE, date, item.strip()))\nreturn matches\nif __name__ == \"__main__\":\nconn = sqlite3.connect(DB)\nc = conn.cursor()\nc.execute('''CREATE TABLE IF NOT EXISTS files_all_items\n (cik TEXT, file_type TEXT, date DATE, item TEXT,\n PRIMARY KEY(cik, file_type, date, item));''')\nconn.commit()\n_, ciks, _ = next(os.walk(BASE_DIR))\nprogress = tqdm.tqdm(total=len(ciks))\nwith concurrent.futures.ProcessPoolExecutor(max_workers=16) as exe:\nfutures = [exe.submit(regsearch, cik) for cik in ciks]\nfor f in concurrent.futures.as_completed(futures):\nres = f.result()\nc.executemany(\n\"INSERT OR IGNORE INTO files_all_items \\\n (cik, file_type, date, item) VALUES (?,?,?,?)\", res)\nconn.commit()\nprogress.update()\nconn.close()\n
","tags":["Textual Analysis","SEC","Python","Code"]},{"location":"posts/textual-analysis-on-sec-filings/#32-find-all-8-k-filings-with-item-101-andor-item-203","title":"3.2 Find all 8-K filings with Item 1.01 and/or Item 2.03","text":"To get those filings that have either:
I run the following SQL query:
-- SQLite\nCREATE TABLE `files_with_items_101_or_203` AS\nSELECT DISTINCT cik, file_type, date\nFROM `files_all_items`\nWHERE\ninstr(lower(item), \"creation of a direct financial obligation\") > 0 OR\ninstr(lower(item), \"entry into a material definitive agreement\") > 0\nORDER BY cik, file_type, date;\n
To get those with both items, use the following query:
-- SQLite\nCREATE TABLE `files_with_items_101_and_203` AS\nSELECT cik, file_type, date\nFROM `files_all_items`\nWHERE\ninstr(lower(item), \"creation of a direct financial obligation\") > 0 OR\ninstr(lower(item), \"entry into a material definitive agreement\") > 0\nGROUP BY cik, file_type, date\nHAVING count(*) > 1\nORDER BY cik, file_type, date;\n
","tags":["Textual Analysis","SEC","Python","Code"]},{"location":"posts/textual-analysis-on-sec-filings/#33-nini-smith-and-sufi-2009","title":"3.3 Nini, Smith and Sufi (2009)","text":"This example code finds the appearance of any of the 10 search words used in \"Creditor control rights and firm investment policy\" by Nini, Smith and Sufi (JFE 2009), which is used to identify the loan contracts as attached in the SEC filing.
import re\nimport os\nimport sys\nimport gzip\nimport tqdm\nimport sqlite3\nimport logging\nimport concurrent.futures\nlogging.basicConfig(stream=sys.stdout, level=logging.WARN)\nBASE_DIR = './data'\nFILE_TYPE = '10-Q'\nDB = \"result.sqlite3\"\n# Regex pattern used to remove html tags\ncleanr = re.compile(b'<.*?>|&([a-z0-9]+|#[0-9]{1,6}|#x[0-9a-f]{1,6});')\n# Regex pattern used to find the appearance of any of the 10 search words used\n# in \"Creditor control rights and firm investment policy\"\n# by Nini, Smith and Sufi (JFE 2009)\n# pat_10_words = r\"CREDIT FACILITY|REVOLVING CREDIT|(CREDIT|LOAN|(LOAN (AND|&) \\\n# SECURITY)|(FINANCING (AND|&) SECURITY)|CREDIT (AND|&) GUARANTEE) AGREEMENT\"\nNSS_10_words = ['credit facility',\n'revolving credit',\n'credit agreement',\n'loan agreement',\n'loan and security agreement',\n'loan & security agreement',\n'credit and guarantee agreement',\n'credit & guarantee agreement',\n'financing and security agreement',\n'financing & security agreement']\nNSS_10_words_str = '|'.join([word.upper() for word in NSS_10_words])\npat_10_words = re.compile(NSS_10_words_str.encode())\n# Regex pattern used in this search\npattern = pat_10_words\ndef walk_dirpath(cik, file_type):\n\"\"\" Yield paths of all files for a given cik and file type \"\"\"\nfor root, _, files in os.walk(os.path.join(BASE_DIR, cik, file_type)):\nfor filename in files:\nyield os.path.join(root, filename)\ndef regsearch(cik):\nmatches = []\nfor filepath in walk_dirpath(cik, FILE_TYPE):\ndate = os.path.split(filepath)[1].strip('.txt.gz')\ntry:\nwith gzip.open(filepath, 'rb') as f:\ndata = b' '.join(f.read().splitlines())\ndata = re.sub(cleanr, b'', data)\nmatch = pattern.search(data)\nif match:\nmatches.append((cik, FILE_TYPE, date))\nlogging.info(f'{filepath}, {match.group()}')\nexcept Exception as e:\nlogging.error(f'failed at {filepath}, {e}')\nreturn matches\nif __name__ == \"__main__\":\nconn = sqlite3.connect(DB)\nc = conn.cursor()\n# create a table to store the indices\nc.execute('''CREATE TABLE IF NOT EXISTS files_with_10_words\n (cik TEXT, file_type TEXT, date DATE,\n PRIMARY KEY(cik, file_type, date));''')\nconn.commit()\n_, ciks, _ = next(os.walk(BASE_DIR))\nprogress = tqdm.tqdm(total=len(ciks))\nwith concurrent.futures.ProcessPoolExecutor(max_workers=16) as exe:\nfutures = [exe.submit(regsearch, cik) for cik in ciks]\nfor f in concurrent.futures.as_completed(futures):\nmatches = f.result()\nc.executemany(\n\"INSERT OR IGNORE INTO files_with_10_words \\\n (cik, file_type, date) VALUES (?,?,?)\", matches)\nconn.commit()\nprogress.update()\nconn.close()\nlogging.info('complete')\n
The original file is at https://www.sec.gov/Archives/edgar/data/0000008192/0000079732-02-000036.txt\u00a0\u21a9
Uninitialized variable in C can be anything (most of the time). I find, in some cases, we can know the value of an uninitialized variable and thus maybe exploit it.
The example code below compiled with gcc
, without optimization, exits successfully. Very interesting!
#include <assert.h>\n#include <limits.h>\n#include <stdlib.h>\nvoid f(int n) {\n// Declare and init `a` with the value of n\n// This would push n on the stack memory\nint a = n;\nreturn;\n// f() returns,\n// but `a` leaves on the stack a garbage value n\n}\nvoid g(int n) {\n// Declare but do no initialize `x` so\n// `x` may be of anything...?\nint x;\nassert(x == n); // Should fail here if `x` is not n\n}\nint main() {\nfor (int i = INT_MIN; i < INT_MAX; i++) {\nf(i); // However, if we call f() and g() sequentially...\ng(i); // The local variable `x` in g() will always be i,\n// which is the garbage value left by f() on the stack.\n}\n// This program will end peacefully\nreturn 0;\n}\n
We can also try to \"contaminate\" the stack by filling it with a value, e.g.,
#include <assert.h>\n#include <memory.h>\n#include <stdint.h>\n#include <stdio.h>\nvoid f(uint8_t n) {\n// Try to \"contaminate\" the stack with value n\nuint8_t arr[BUFSIZ];\nmemset(arr, n, BUFSIZ * sizeof(uint8_t));\n}\nvoid g(uint8_t n) {\nuint8_t x;\nassert(x == n);\nprintf(\"uninitialized x is %d\\n\", x);\nuint8_t y;\nassert(y == n); // uninitialized y is also n\n}\nint main() {\nfor (uint8_t i = 0; i < UINT8_MAX; i++) {\nf(i);\ng(i);\n}\n// This program will end peacefully!\nreturn 0;\n}\n
As a result, the uninitialized local variables x
and y
both have the same value of n
because f(n)
writes many n
on the stack.
Studying C is real fun!
","tags":["C","Code"]},{"location":"posts/use-sas-macros-on-wrds/","title":"Use SAS Macros on WRDS","text":"The Wharton Research Data Services (WRDS) provides quite a handful of SAS macros that can be used directly. This article explains how to use those handy macros on WRDS when you use remote submission to run your code on the WRDS cloud. Lastly, it explains how to load and use third-party SAS macros from a URL.
","tags":["SAS","Code","WRDS"]},{"location":"posts/use-sas-macros-on-wrds/#prerequisite","title":"Prerequisite","text":"Before everything, just make sure that this autoexec.sas
is located in the home folder on your WRDS cloud.
* The library name definitions below are used by SAS;\n* Assign default libref for WRDS (Wharton Research Data Services);\n%include '/wrds/lib/utility/wrdslib.sas';\noptions sasautos=('/wrds/wrdsmacros/', SASAUTOS) MAUTOSOURCE;\n
This code runs automatically when you've connected to the WRDS cloud. The first line assigns the default library references for you to use, e.g. comp
for Compustat. The second line makes available the macros. A list of these handy macros is available at the WRDS documentation.
If you don't have this SAS code in the home folder, simply create one there or you can choose to include these two lines of code in your remotely submitted code.
","tags":["SAS","Code","WRDS"]},{"location":"posts/use-sas-macros-on-wrds/#simple-usage","title":"Simple usage","text":"Let's say we want to winsorize a dataset by using the macro provided by WRDS (full code). Below is an example of winsorizing Total Assets AT
of Compustat sample by fiscal year from 1980 to 2018.
%let wrds=wrds-cloud.wharton.upenn.edu 4016;\noptions comamid=TCP remote=WRDS;\nsignon username=_prompt_;\n\nrsubmit;\n\n/* Create a dataset in the work directory */\ndata work.funda(keep=gvkey fyear at);\n set comp.funda;\n if 1980 <= fyear <= 2018;\n /* Generic filter */\nif indfmt='INDL' and datafmt='STD' and popsrc='D' and consol='C';\nrun;\n/* Invoke the macro */\n/* The documentation is available at:\n https://wrds-www.wharton.upenn.edu/pages/support/research-wrds/macros/wrds-macros-winsorize/ */\n%WINSORIZE(INSET=funda,OUTSET=funda_w,SORTVAR=fyear,VARS=at,PERC1=1,TRIM=0);\n\n/* Before the winsorization */\nproc means data=work.funda; by fyear; var at; \noutput out=funda_before_win min= mean= max= / autoname; run;\n/* After the winsorization */\nproc means data=work.funda_w; by fyear; var at;\noutput out=funda_after_win min= mean= max= / autoname; run;\nproc print data=funda_before_win;\nproc print data=funda_after_win; run;\nendrsubmit;\nsignoff;\n
Invoking the macro is as simple as a single line (line 18 above):
%WINSORIZE(INSET=funda,OUTSET=funda_w,SORTVAR=fyear,VARS=at,PERC1=1,TRIM=0);\n
However, one thing to note about this particular winsorization macro by WRDS is that a variable named a
is used in line 57 and 59. So if the INSET
has a variable named a
as well, there\u2019ll be possible data integrity issue. Hence, I prefer to use another version described in my other post Winsorization in SAS.
I tend to collect and store all useful macros on my personal server, hence I don't need to worry about a loss of or changes to the macros. To use these macros, simply include them before invoking.
filename winsor url \"https://mingze-gao.com/utils/winsor.sas\";\n%include winsor;\n
Then, I can simply call winsor
as below.
%let winsVars = tac inv_at_l drev drevadj ppe roa;\n%winsor(dsetin=work.funda, dsetout=work.funda_wins, byvar=fyear, vars=&winsVars, type=winsor, pctl=1 99);\n
","tags":["SAS","Code","WRDS"]},{"location":"posts/what-it-takes-to-be-a-ceo-a-fun-survey-of-literature/","title":"What it takes to be a CEO? A fun survey of literature","text":"Taking up the position of CEO means more than pressure from the board and investors. You\u2019ll also face heavy scrutiny from academia. Whether or not a firm\u2019s hiring and compensation committees use them as a reference, here are some of the findings that you may want to be aware of.
","tags":["Literature","CEO"]},{"location":"posts/what-it-takes-to-be-a-ceo-a-fun-survey-of-literature/#upon-birth","title":"Upon birth","text":"There are many things determined when you\u2019re born. It\u2019ll be naive to think that they matter less than anything else. A starter example is the Journal of Financial Economics paper \"Are CEOs born leaders? Lessons from traits of a million individuals\" by Adams, Keloharju and Knupfer (2018).
","tags":["Literature","CEO"]},{"location":"posts/what-it-takes-to-be-a-ceo-a-fun-survey-of-literature/#1-birthday-month","title":"1. Birthday (month)","text":"Birth month affects school entry, which affects whether you are relatively older in the class. If you are born after the cutoff month, you'll have to wait for another year for entry. But this extra year buys you some more time to develop, which makes you more confident than the younger peers. This increased confidence is linked to adult labor market outcomes. Bai, Ma, Mullally and Solomon (2019 JFE) find that in mutual fund industry, it's associated with better stock selection and fund performance. These relatively older fund managers also appear more confident in photographs and display more confident behaviours: making larger bets, window dressing their holdings less, and so on.
","tags":["Literature","CEO"]},{"location":"posts/what-it-takes-to-be-a-ceo-a-fun-survey-of-literature/#2-birth-order","title":"2. Birth order","text":"The birth order also matters: negative associations between birth order and intelligence level have been found in numerous studies. More frankly, first born kids tend to have higher IQ scores. Kristensen and Bjerkedal (2007 Science) show that this is more dependent on social rank in the family where they receive more-favorable family interaction and stimulation. However, birth order is still the most prominent observable factor. Custodio and Siegel (2018) published a working paper where they find CEOs are more likely to be the first-born child of their family, and the results hold for both family and non-family firms, though thankfully the advantage of being first-born seems to decay over time.
","tags":["Literature","CEO"]},{"location":"posts/what-it-takes-to-be-a-ceo-a-fun-survey-of-literature/#3-gender","title":"3. Gender","text":"Studies on CEO gender difference and its relation with firm risk-taking, capital allocation, accounting conservatism, corporate social responsibility, and so on are plenty. Genearlly it is shown that male executives are overconfident relative to female executives (Huang and Kisgen, 2013 JFE), and we know that overconfidence is not necessarily a good thing. Firms run by female CEOs use lower leverage and have less volative earnings, (Faccio, Marchica and Mura 2016 JCF), and there are a lot more differences in terms of firm operational, financial, and M&A performances. Tate and Yang (2015 JFE) show that female leaders cultivate more female-friendly cultures inside their firms. Moreover, (having) a female director may bring a firm more access to external finance. Goldman Sachs announced on 23 January 2020 that they won't take companies public anymore unless they have at least one \"diverse\" board member.
","tags":["Literature","CEO"]},{"location":"posts/what-it-takes-to-be-a-ceo-a-fun-survey-of-literature/#4-hometown","title":"4. Hometown","text":"Everyone has some sort of hometown biases as well as hometown advantages. For example, Jiang, Qian and Yonker (2019 JFQA) find that CEOs are over twice as likely to acquire targets located in the states of their childhood homes than similar targets elsewhere. Smaller such deals are on average destorying shareholder value but bigger ones tend to be value enhancing. They conclude that CEOs may seek private benefits when acquiring small targets in their hometown but can also avoid poor deals due to hometown advantages. In a Chinese study, Kong, Pan, Tian and Zhang (2020 JCF) show that CEO's hometown connections increase access to trade credit and such effect is more pronounced for non-SOEs and firms in poor regions. In another Chinese study on commercial banks, Bian, Ji and Zhang (2019 JBF) find that a higher degree of dialect similarity between chairman and the CEO is associated with a higher ROA, ROE and a lower cost-to-income ratio, but is not with bank risks, CEO pay or lower pay-performance sensitivity. They conclude that speaking a similar dialect with the chairman doesn't undermine monitoring and reduces agency costs.
","tags":["Literature","CEO"]},{"location":"posts/what-it-takes-to-be-a-ceo-a-fun-survey-of-literature/#5-cultural-heritage","title":"5. Cultural heritage","text":"The place where you're born has even more profound implications through cultural heritage. Nguyen, Hagendorff and Eshraghi (2018 RFS) show that following shocks to industry competition, firms led by CEOs who are second- or third-generation immigrants are associated with a 6.2% higher profitability compared with the average firm. Their analysis attributes this effect to various cultural values that prevail in a CEO\u2019s ancestral country of origin. Through an epidemiological approach, Liu (2016 JFE) show that a corruption culture of corporate insiders' country of ancestry is associated with higher likelihood of earnings management, accounting fraud, option backdating and opportunistic insider trading.
","tags":["Literature","CEO"]},{"location":"posts/what-it-takes-to-be-a-ceo-a-fun-survey-of-literature/#early-in-life","title":"Early in life","text":"Many early life experiences are closely linked to natural and family endowment, yet others may be random and exogenous. Either way, early life experience is something that will have an impact on CEO behaviours later on.
","tags":["Literature","CEO"]},{"location":"posts/what-it-takes-to-be-a-ceo-a-fun-survey-of-literature/#1-education","title":"1. Education","text":"No doubt education matters for everyone including CEO. Custodio and Metzger (2014 JFE) find that financial expert CEOs tend to be hired by more mature firms. Firms with financial expert CEOs hold less cash, more debt and engage in more share repurchases. They are able to raise external funds even when credit conditions are tight and their investments are less sensitive to cash flows. On the other hand, CEOs with an engineering (or scientific) education display higher investment-cash flow sensitivity (Malmendier and Tate (2005 JFE)). Similar findings appear in banking sector as shown by King, Srivastav and Williams (2016 JCF) focusing on CEO's MBA quality. Moreover, education offers more than just knowledge and skills. Although Khanna, Kim and Lu (2015 JF) do not find evidence that connections and network ties developed during education are associated with corporate fraud, such CEO connectedness certainly affect information sharing, investments and so on. Wang and Yin (2018 JCF) find that CEOs tend to initiate more, larger and better M&A deals where target firms are headquarted in those states where they received their undergraduate and graduate degrees.
","tags":["Literature","CEO"]},{"location":"posts/what-it-takes-to-be-a-ceo-a-fun-survey-of-literature/#2-disaster-experience","title":"2. Disaster experience","text":"People are shaped by their experiences and disasters are a major one. Several Chinese studies have shown that CEOs who have experienced famine are more risk-averse and hold more cash. They conduct less takeovers, but the M&A deals tend to perform better when they do according to Zhang (2017 PBFJ). Such risk aversion can sometimes be good as Hu, Li and Luo (2019 PBFJ) find that firms governed by CEOs experienced great famine have higher market value during crisis. But generally speaking this effect is mitigated by higher education background and is weaker in SOEs, as well as for CEOs who also experienced economic reform, which is shown to increase CEO's risk tolerance by Hao, Wang, Chou and Ko (2018 IRF). American CEOs, for sure, are no exception. A famous Journal of Finance paper \"what doesn't kill you will only make you more risk-loving\" by Bernile, Bhagwat and Rau (2016) concludes like its title. But more importantly, CEOs who experienced fatal disasters without extremely negative consequences lead firms more aggressively, whereas CEOs who witness the extreme downside of disasters behave more conservatively.
","tags":["Literature","CEO"]},{"location":"posts/what-it-takes-to-be-a-ceo-a-fun-survey-of-literature/#3-academic-military-and-other-experiences","title":"3. Academic, military and other experiences","text":"Apart from previous industry experience, researchers also examined the role of many other executive experiences. Shen, Lan, Xiong, Lv and Jian (2019 Economic Modelling) find that top management team's academic experience promotes corporate innovations and attribute the effect to improved internal control level and reduced information asymmetry. Benmelech and Frydman (2015 JFE) find that military service could make CEOs pursue lower coporate investment, and military CEOs are less likely to be involved in corporate fraudulent activity, performing better during industry downturns.
","tags":["Literature","CEO"]},{"location":"posts/what-it-takes-to-be-a-ceo-a-fun-survey-of-literature/#personality-traits","title":"Personality traits","text":"","tags":["Literature","CEO"]},{"location":"posts/what-it-takes-to-be-a-ceo-a-fun-survey-of-literature/#1-masculinity","title":"1. Masculinity","text":"Masculinity is a long-studied factor in many fields of research and there're also many interesting papers specifically on male CEOs. Since CEO testosterone levels cannot be tested directly, a common proxy in the literature is the facial width-to-height ratio (fWHR). Jia, Van Lent and Zeng (2014 JAR) find that a higher fWHR of a male CEO, representing more masculine faces, is associated with more misreporting, predicts his firm's likelihood of being subject to SEC enforcement action and incidence of insider trading and option backdating. They also find that executive's facial masculinity is associated the likelihood of being named as a perpetrator by SEC. In a forthcoming European Financial Management paper by Kamiya, Kim and Park (2018), male CEOs' facial masculinity is found to be related to higher stock return volatility, higher financial leverage and more M&A activities. A paper at the 2018 Academy of Managment Annual Meeting by Joshi, Misangyi, Rizzi and Neely (2018), however, find that masculinity does not have a direct effect on the firm's operational performance. The researchers also find that masculinity worked to the detriment of CEOs in female-dominated industries; less masculine CEOs also performed poorly in highly male-dominated environments.
","tags":["Literature","CEO"]},{"location":"posts/what-it-takes-to-be-a-ceo-a-fun-survey-of-literature/#2-sensation-seeking-corruption-and-frugality","title":"2. Sensation-seeking, corruption and frugality","text":"In \"desperate\" search for proxies and signals of CEO/manager quality and traits, studies have turned to some really interesting areas such as the cars they drive and whether they can fly airplanes. Brown, Lu, Ray and Teo (2018 JF) show that sensation-seeking hedge fund managers who own powerful sports cars take on more investment risks but do not deliver higher returns. \"Red Ferrari syndrome\", as described by Business Insider, February 2016. Unfortunately, some investors themselves are susceptible to sensation seeking and hence fuel the demand for such managers. Mironov (2015 JFE) has an interesting study and finds that if you can get away from a traffic violation through bribe, as a manager, you may deliver some outperformance through, for instance, tax evasion, because corruption sometimes promotes efficiency.
Sunder, Sunder and Zhang (2017 JFE) look at pilot CEOs who fly airplanes as a hobby and find that they are significantly associated with better corporate innovation outcomes. They conclude that sensation seeking combines risk taking with a desire to pursue novel experiencecs and has been associated with creativity. Davidson, Dey and Smith (2015 JFE) even hired private investigators to collect data on executives' legal infractions and ownership of real estate, boats, luxury vehicles and motocycles. They find no direct evidence of a relation between executives' frugality and the propensity to perpetrate fraud. But there will be a relatively loose control environment characterized by relatively high and increasing probabilities of other insiders perpetrating fraud and unintentional material reporting errors during unfrugal CEOs' reigns.
","tags":["Literature","CEO"]},{"location":"posts/what-it-takes-to-be-a-ceo-a-fun-survey-of-literature/#3-creativity-and-innovation","title":"3. Creativity and innovation","text":"One in five U.S. high-technology firms are led by CEOs with hands-on innovation experience as inventors. Islam and Zein (2020 JFE) show that firms led by \u201cInventor CEOs\u201d are associated with higher quality innovation, especially when the CEO is a high-impact inventor. During an inventor CEO's tenure, firms file a greater number of patents and more valuable patents in technology classes where the CEO's hands-on experience lies. It is possible that such inventor CEOs are more capable of evaluating, selecting and executing innovative investment projects.
","tags":["Literature","CEO"]},{"location":"posts/what-it-takes-to-be-a-ceo-a-fun-survey-of-literature/#family-marriage-and-fidelity","title":"Family, marriage and fidelity","text":"","tags":["Literature","CEO"]},{"location":"posts/what-it-takes-to-be-a-ceo-a-fun-survey-of-literature/#1-newborns-and-loss-of-family-members","title":"1. Newborns and loss of family members","text":"\"Corporate executives managing some of the largest public companies in the U.S. are shaped by their daughters\". Cronqvist and Yu (2017 JFE) find that when a firm\u2019s CEO has a daughter, the corporate social responsibility rating (CSR) is about 9.1% higher, compared to a median firm. This finding perhaps reveals another plausibly exogenous determinant of CEO's styles. On the other hand, a loss of important family member poses a significant negative shock. In the 2020 AFA Annual Meeting, I encountered a paper by Liu, Shu, Sulaeman and Yeung (2019) who find that after deaths in the family, bereaved managers take significantly less risk. Firms managed by bereaved CEOs exhibit lower capital expenditures, fewer acquisitions, lower debt issuance and lower CEO ownership after the bereavement events.
","tags":["Literature","CEO"]},{"location":"posts/what-it-takes-to-be-a-ceo-a-fun-survey-of-literature/#2-marriage-divorce-and-infidelity","title":"2. Marriage, divorce and (in)fidelity","text":"While previous studies focused on the cultural background of the CEOs themselves, another paper I encountered in AFA Annual Meeting by Antoniou, Cuculiza, Kumar and Yang (2019) incorporates CEO spouses into the research. They show that the high uncertainty avoidance of CEO spouses will influence CEOs\u2019 personal uncertainty avoidance, and then lead to less corporate risk-taking. Larcker, McCall and Tayan (2013) show that CEO's divorce is impactful because it causes loss of control due to sale of stocks for divorce settlement, affects productivity, and attitude towards defaults.
One final interesting paper I want to mention to conclude this post is \"the geography of financial misconduct\" by Parsons, Sulaeman and Titman (2018 JF). In 2015, the website Ashley Madison, whose target clients are married people seeking an extramarital affair, was hacked and there was a leak of 40 million user account data of name, address and billing information. The researchers use the data to measure the intensity of spousal infidelity of a local area and find that financial misconducts are strongly related to unfaithfulness in the city.
","tags":["Literature","CEO"]},{"location":"posts/what-it-takes-to-be-a-ceo-a-fun-survey-of-literature/#final-note","title":"Final note","text":"This short survey of CEO literature is not meant to be comprehensive, but to list a few very interesting papers that I find fun to read. I guess the message is that being a CEO means a lot more than managing the firm and stakeholders, and shareholders also need to open their minds and eyes.
A funny example. Next time hiring a CEO, other things equal, maybe you'll want a female immigrant who is the first-born kid and born in August, attended certain schools in certain areas, experienced natural disasters, served in military before, has a daughter and no sports cars, knows how to fly airplanes, loyal to her spouse from certain countries, and whose all family members are live and well...
","tags":["Literature","CEO"]},{"location":"posts/winsorization-in-sas/","title":"Winsorization in SAS","text":"These are two versions of winsorization in SAS, of which I recommend the first one.
","tags":["SAS","Code","WRDS"]},{"location":"posts/winsorization-in-sas/#version-1-unknown-author","title":"Version 1 (Unknown Author)","text":"/*****************************************\nAuthor unknown - that is a pity because this macro is the best since sliced bread! \nTrim or winsorize macro\n* byvar = none for no byvar;\n* type = delete/winsor (delete will trim, winsor will winsorize;\n*dsetin = dataset to winsorize/trim;\n*dsetout = dataset to output with winsorized/trimmed values;\n*byvar = subsetting variables to winsorize/trim on;\nSample usage:\n%winsor(dsetin=work.myDsetIn, byvar=fyear, \n dsetout=work.myDsOut, vars=btm roa roe, type=winsor, pctl=1 99);\n****************************************/\n%macro winsor(dsetin=, dsetout=, byvar=none, vars=, type=winsor, pctl=1 99);\n\n%if &dsetout = %then %let dsetout = &dsetin;\n\n%let varL=;\n%let varH=;\n%let xn=1;\n\n%do %until ( %scan(&vars,&xn)= );\n %let token = %scan(&vars,&xn);\n %let varL = &varL &token.L;\n %let varH = &varH &token.H;\n %let xn=%EVAL(&xn + 1);\n%end;\n\n%let xn=%eval(&xn-1);\ndata xtemp;\n set &dsetin;\n run;\n%if &byvar = none %then %do;\n data xtemp;\n set xtemp;\n xbyvar = 1;\n run;\n%let byvar = xbyvar;\n\n%end;\nproc sort data = xtemp;\n by &byvar;\n run;\nproc univariate data = xtemp noprint;\n by &byvar;\n var &vars;\n output out = xtemp_pctl PCTLPTS = &pctl PCTLPRE = &vars PCTLNAME = L H;\n run;\ndata &dsetout;\n merge xtemp xtemp_pctl;\n by &byvar;\n array trimvars{&xn} &vars;\n array trimvarl{&xn} &varL;\n array trimvarh{&xn} &varH;\n\ndo xi = 1 to dim(trimvars);\n\n%if &type = winsor %then %do;\n if not missing(trimvars{xi}) then do;\n if (trimvars{xi} < trimvarl{xi}) then trimvars{xi} = trimvarl{xi};\n if (trimvars{xi} > trimvarh{xi}) then trimvars{xi} = trimvarh{xi};\n end;\n %end;\n\n%else %do;\n if not missing(trimvars{xi}) then do;\n if (trimvars{xi} < trimvarl{xi}) then delete;\n if (trimvars{xi} > trimvarh{xi}) then delete;\n end;\n %end;\n\nend;\n drop &varL &varH xbyvar xi;\n run;\n%mend winsor;\n
","tags":["SAS","Code","WRDS"]},{"location":"posts/winsorization-in-sas/#version-2-wrds","title":"Version 2 (WRDS)","text":"A potential problem with this WRDS macro is that a variable named a
is used in line 57 and 59 (highlighted below). So if the INSET
has a variable named a
as well, there\u2019ll be possible data integrity issue.
WINSORIZE
macro /* ********************************************************************************* */\n/* ******************** W R D S R E S E A R C H M A C R O S ******************** */\n/* ********************************************************************************* */\n/* WRDS Macro: WINSORIZE */\n/* Summary : Winsorizes or Trims Outliers */\n/* Date : April 14, 2009 */\n/* Author : Rabih Moussawi, WRDS */\n/* Variables : - INSET and OUTSET are input and output datasets */\n/* - SORTVAR: sort variable used in ranking */\n/* - VARS: variables to trim and winsorize */\n/* - PERC1: trimming and winsorization percent, each tail (default=1%) */\n/* - TRIM: trimming=1/winsorization=0, default=0 */\n/* ********************************************************************************* */\n%MACRO WINSORIZE (INSET=,OUTSET=,SORTVAR=,VARS=,PERC1=1,TRIM=0);\n\n/* List of all variables */\n%let vars = %sysfunc(compbl(&vars));\n%let nvars = %nwords(&vars);\n\n/* Display Output */\n%put ### START.;\n/* Trimming / Winsorization Options */\n%if &trim=0 %then %put ### Winsorization; %else %put ### Trimming;\n%put ### Number of Variables: &nvars;\n%put ### List of Variables: &vars;\noptions nonotes;\n\n/* Ranking within &sortvar levels */\n%put ### Sorting... ;\nproc sort data=&inset; by &sortvar; run;\n/* 2-tail winsorization/trimming */\n%let perc2 = %eval(100-&perc1);\n\n%let var2 = %sysfunc(tranwrd(&vars,%str( ),%str(__ )))__;\n%let var_p1 = %sysfunc(tranwrd(&vars,%str( ),%str(__&perc1 )))__&perc1 ;\n%let var_p2 = %sysfunc(tranwrd(&vars,%str( ),%str(__&perc2 )))__&perc2 ;\n\n/* Calculate upper and lower percentiles */\nproc univariate data=&inset noprint;\nby &sortvar;\nvar &vars;\noutput out=_perc pctlpts=&perc1 &perc2 pctlpre=&var2;\nrun;\n%if &trim=1 %then\n%let condition = %str(if myvars(i)>=perct2(i) or myvars(i)<=perct1(i) then myvars(i)=. );\n%else %let condition = %str(myvars(i)=min(perct2(i),max(perct1(i),myvars(i))) );\n\n%if &trim=0 %then %put ### Winsorizing at &perc1.%... ;\n%else %put ### Trimming at &perc1.%... ;\n\n/* Save output with trimmed/winsorized variables */\ndata &outset;\nmerge &inset (in=a) _perc;\nby &sortvar;\nif a;\narray myvars {&nvars} &vars;\narray perct1 {&nvars} &var_p1;\narray perct2 {&nvars} &var_p2;\ndo i = 1 to &nvars;\n if not missing(myvars(i)) then\ndo;\n &condition;\n end;\nend;\ndrop i &var_p1 &var_p2;\nrun;\n/* House Cleaning */\nproc sql; drop table _perc; quit;\noptions notes;\n\n%put ### DONE . ; %put ;\n%MEND WINSORIZE;\n\n/* ********************************************************************************* */\n/* ************* Material Copyright Wharton Research Data Services *************** */\n/* ****************************** All Rights Reserved ****************************** */\n/* ********************************************************************************* */\n
","tags":["SAS","Code","WRDS"]},{"location":"posts/working-remotely-on-a-windows-machine-wsl-from-vscode-on-a-mac/","title":"Working Remotely on a Windows Machine from VSCode on a Mac","text":"Now I only need a MacBook (1.3 GHz dual-core i5) to do all my work anywhere, thanks to a powerful workstation provided by the university. Yet the workstation is based on Windows 10 and sitting behind the university VPN. I don't want to use Remote Desktop every time I need to do some coding, so I decided to make it so I can code remotely on the workstation but from the lovely VSCode on my little MacBook.
"},{"location":"posts/working-remotely-on-a-windows-machine-wsl-from-vscode-on-a-mac/#1-set-up-the-windows-10-host-machine","title":"1. Set up the Windows 10 host machine","text":"The first step is to enable remote SSH login on the Windows machine. It is now super easy to do with the Windows Subsystem for Linux (WSL). I use the Ubuntu 18.04 LTS distro but other Linux distros should work just fine. This will be the remote environment that I work in. Then I follow the instruction in SSH on Windows Subsystem for Linux (WSL). The post is in great detail with step-by-step guidance. So I won't repeat it again.
"},{"location":"posts/working-remotely-on-a-windows-machine-wsl-from-vscode-on-a-mac/#2-set-up-the-vscode-on-mac","title":"2. Set up the VSCode on Mac","text":"The second step is to install the Remote-SSH extension on VSCode. Then simply ssh into the Ubuntu environment on Windows 10 host machine using the username and password created for the Ubuntu distro. In my case is ssh
myusername
@asgard.econ.usyd.edu.au
. A password prompt will of course kindly show up.
The annoying thing is that each time the window reloads and when I start VSCode, I need to manually type in my lengthy password. The better way must be using a SSH key instead.
To do so, open up the Terminal on the Mac and run:
ssh-keygen\n
A public-private key pair will be generated as ~/.ssh/id_rsa.pub
and ~/.ssh/id_rsa
. Then we need to tell the host machine that this key can be used to identify myself so i can skip entering password next time:
ssh-copy-id myusername@asgard.econ.usyd.edu.au\n
It will ask for the password on the host machine to confirm I am who I am. But after this, starting VSCode will never ask my password again. What a relief!
"},{"location":"posts/working-remotely-on-a-windows-machine-wsl-from-vscode-on-a-mac/#lastly","title":"Lastly...","text":"Because the host machine is inside the university network, I need to first connect to the university VPN, otherwise the host address asgard.econ.usyd.edu.au
will not resolve. Still, it's really great that I can code and run my programs remotely on the powerful 8-core 16-thread machine without feeling the hotness and noise, which turns out to be really important in the summer of Australia......
Question
Given a centrifuge with \\(n\\) holes, can we balance it with \\(k\\) (\\(1\\le k \\le n\\)) identical test tubes?
This is a simple yet interesting problem, very well illustrated by Numberphile and discussed by Matt Baker's blog.
The now proved solution is that:
Note
You can balance \\(k\\) identical test tubes, \\(1\\le k\\le n\\), in an \\(n\\)-hole centrifuge if and only if both \\(k\\) and \\(n-k\\) can be expressed as a sum of prime divisors of \\(n\\).
Example: 18-hole centrifugeExample: 20-hole centrifugeBelow is my attempt to programmatically answer the centrifuge problem.
"},{"location":"posts/centrifuge-problem/#method-1-naive-dfs","title":"Method 1: Na\u00efve DFS","text":"The very first method literally follows the solution. For a given \\((n,k)\\) pair, check if \\(k\\) and \\(n-k\\) can be written as a linear combination of the prime divisors of \\(n\\) (with non-negative coefficients).
def is_linear_combination(x: int, prime_numbers: list) -> bool:\n\"\"\"Check if `x` can be written as a linear combination of prime numbers, i.e.,\n x = b1*p1 + b2*p2 + b3*p3 + ... + bn*pn\n where pi represents a prime number in `prime_numbers`, bi is a non-negative integer.\n \"\"\"\n# very naive and not optimized\nfor n in prime_numbers:\n# n divides x \nif x % n:\nreturn True\n# n does not divides x, check if the difference between x and multiples of n can be\n# a linear combination of other remaining prime numbers\nfor i in range(x//n):\nif is_linear_combination(x - i*n, [p for p in prime_numbers if p!=n]):\nreturn True\nreturn False \ndef centrifuge_naive(n: int, k: int) -> bool:\n\"\"\"Check if a `n`-hole centrifuge can be balanced with `k` identical test tubes.\n True if both `k` and `n-k` can be written as a linear combination of the prime divisors of `n`.\n \"\"\"\nprime_divisors = get_prime_divisors(n) # simple cached function, skipped\nreturn is_linear_combination(k, prime_divisors) and is_linear_combination(n-k, prime_divisors)\n
"},{"location":"posts/centrifuge-problem/#some-optimizations","title":"Some Optimizations","text":"The above method works just fine, but very slow if we want to compute the total number of solutions, instead of just checking whether a particular \\(k\\) works.
There can be a few optimizations, for example, we can compute only the lower half of \\(k\\)s:
from functools import lru_cache\n@lru_cache(maxsize=None)\ndef centrifuge_naive(n: int, k: int) -> bool:\nprime_divisors = get_prime_divisors(n) # cached\nif k > n//2:\nreturn centrifuge(n, n-k)\nreturn is_linear_combination(k, prime_divisors) and is_linear_combination(n-k, prime_divisors)\n
Further, if \\(n\\) is a (large) prime number itself, we understand that all \\(1\\le k\\lt n\\) will not work. Similarly, if \\(n\\) is a power of prime number, we can bypass many values of \\(k\\) too.
@lru_cache(maxsize=None)\ndef centrifuge_naive(n, k):\nprime_divisors = get_prime_divisors(n)\n# ...\n# special case when n is power of prime\nif len(prime_divisors) == 1:\np = prime_divisors[0]\nreturn (k % p == 0) and ((n - k) % p == 0)\n# ...\n
At certain point, we will realize that it would be faster to simply compute all possible \\(k\\)s instead of checking one by one whether a certain \\(k\\) can balance the centrifuge. This leads us to the second approach, which I call \"bootstrap\".
"},{"location":"posts/centrifuge-problem/#method-2-bootstrap","title":"Method 2: Bootstrap","text":"The bootstrap method is a variant of DFS, which essentially generates all possible \\(k\\) for a given \\(n\\) by exhausting the values from linear combinations of \\(n\\)'s prime divisors. The generated values should be between 2 and \\(n\\). Then we can tell if \\(k'\\) can balance the \\(n\\)-hole centrifuge by checking whether \\(k'\\) and \\(n-k'\\) are in the generated values.
def bootstrap(x, n, numbers, result):\n\"\"\"Compute all linear combinations of the given numbers smaller than n\"\"\"\nfor p in numbers:\nif p+x > n:\nbreak\nfor i in range((n-x) // p):\np_ = x + p * i # p_ <= n\nif not result[p_]:\n# x + p*i has not been tested, and is a linear combination of given numbers \nresult[p_] = True\n# check whether we can add multiples of remaining numbers\nbootstrap(p_, n, [n2 for n2 in numbers if n2 != p], result)\ndef centrifuge_bootstrap(n: int, k: int) -> bool:\nprime_divisors = get_prime_divisors(n) # cached, `prime_divisors` is sorted\n# result[k] represents whether k is valid, k=0...n\nresult = [True] + [False] * (n-1) + [True]\nbootstrap(0, n, prime_divisors, result) # TODO: bootstrap only once for a given `n`\nreturn result[k] and result[n-k]\n
This method invests some time in pre-computing all possible linear combinations of the prime divisors of \\(n\\). If we are only interested to see a particular \\((n,k)\\) pair, we can break out when we have done result[k]
and result[n-k]
in bootstrap()
.
The last method uses dynamic programming. We can use \\(f[k]\\)=True
to represent that \\(k\\) is a linear combination of \\(n\\)'s prime divisors. A value \\(i\\) is either itself a prime divisor of \\(n\\) (and thus a linear combination of the prime divisors), or the sum of a \\(n\\)'s prime divisor \\(p\\) and \\((i-p)\\). In the latter case, if \\((i-p)\\) is a linear combination of \\(n\\)'s prime divisors, so is \\(p+(i-p)=i\\).
Hint
If \\((i-p)\\) is a linear combination of \\(n\\)'s prime divisors, i.e., \\(i-p=a_1p_1+a_2p_2+...+a_np_n\\), where \\(\\{p_i\\}\\) are the prime divisors of \\(n\\) and \\(\\{a_i\\}\\) are non-negative integers, then \\(i-p+p\\) is definitely a linear combination too: \\(p\\)'s coefficient becomes \\(a+1\\ge0\\).
Hence,
The boundary condition is \\(f[0]\\) = True
, i.e., an empty centrifuge is balanced.
The whole function is extremely short:
def centrifuge_dp(n: int, k: int) -> bool:\nprime_divisors = get_prime_divisors(n) # cached, `prime_divisors` is sorted\nf = [True] + [False] * n\nfor p in prime_divisors: # TODO: DP only once for a given `n`\nfor i in range(p, n+1):\nf[i] = f[i] or f[i-p]\nreturn f[k] and f[n-k]\n
"},{"location":"posts/centrifuge-problem/#performance-comparison","title":"Performance Comparison","text":"Obviously, the Method 2 and 3 are much faster than the na\u00efve Method 1. Method 3 does not even use recursion and is the fastest.
Note
A note there is that if we are to check all \\(1\\le k\\le n\\), e.g., [i for i in range(1, n+1) if centrifuge(n,i)]
, we need to make some adjustment to the functions above so as to bootstrap or perform DP only once for each \\(n\\). This is trivial.
Below are some plots of balanced centrifuges. Note that for a particular value of \\(k\\), there can be more than one way to balance the centrifuge. Here, I illustrate only one.
6-hole10-hole12-hole12-hole20-hole24-hole33-holeplot_centrifuge(6, \"6-hole-centrifuge.svg\")\n
plot_centrifuge(10, \"10-hole-centrifuge.svg\")\n
plot_centrifuge(12, \"12-hole-centrifuge.svg\")\n
plot_centrifuge(18, \"18-hole-centrifuge.svg\")\n
plot_centrifuge(20, \"20-hole-centrifuge.svg\")\n
plot_centrifuge(24, \"24-hole-centrifuge.svg\")\n
plot_centrifuge(33, \"33-hole-centrifuge.svg\")\n
"},{"location":"posts/centrifuge-problem/#python-code","title":"Python code","text":"The code to generate the plots above:
from functools import lru_cache\nimport numpy as np\nimport matplotlib.pyplot as plt\n@lru_cache(maxsize=None)\ndef prime_divisors(n):\n\"\"\"Return list of n's prime divisors\"\"\"\nprimes = []\np = 2\nwhile p**2 <= n:\nif n % p == 0:\nprimes.append(p)\nn //= p\nelse:\np += 1 if p % 2 == 0 else 2\nif n > 1:\nprimes.append(n)\nreturn primes\ndef centrifuge(n):\n\"\"\"Return a list of which the k-th element represents if k tubes can balance the n-hole centrifuge\"\"\"\nF = [True] + [False] * n\nfor p in prime_divisors(n):\nfor i in range(p, n + 1):\nF[i] = F[i] or F[i - p]\nreturn [F[k] and F[n - k] for k in range(n + 1)]\ndef factorize(k: int, nums: list) -> list:\n\"\"\"Given k, return the list of numbers from the given numbers which add up to k.\n The given numbers are guaranteed to be able to generate k via a linear combination.\n Examples:\n >>> factorize(5, [2, 3])\n [2, 3]\n >>> factorize(6, [2, 3])\n [2, 2, 2]\n >>> factorize(7, [2, 3])\n [2, 2, 3]\n \"\"\"\ndef _factorize(k, nums, res: list):\nfor p in nums:\nif k % p == 0:\nres.extend([p] * (k // p))\nreturn True\nelse:\nfor i in range(1, k // p):\nif _factorize(k - p * i, [n for n in nums if n != p], res):\nres.extend([p] * i)\nreturn True\nreturn False\nres = []\n_factorize(k, nums, res)\nreturn res\n@lru_cache(maxsize=None)\ndef centrifuge_k(n, k):\n\"\"\"Given (n, k) and that k balances a n-hole centrifuge, find the positions of k tubes\"\"\"\nif n == k:\nreturn [True] * n\nfactors = factorize(k, prime_divisors(n))\npos = [False] * n\ndef c(factors: list, pos: list) -> bool:\nif sum(pos) == k:\nreturn True\nif not factors:\nreturn False\np = factors.pop(0)\npos_wanted = [n // p * i for i in range(p)]\nfor offset in range(n):\npos_rotated = [(i + offset) % n for i in pos_wanted]\n# the intended positions of the p tubes are all available\nif not any(pos[i] for i in pos_rotated):\n# claim the positions\nfor i in pos_rotated:\npos[i] = True\nif not c(factors, pos):\n# unclaim the positions\nfor i in pos_rotated:\npos[i] = False\nelse:\nreturn True\n# all rotated positions failed, add p back to factors to place later\nfactors.append(p)\nc(factors, pos)\nreturn pos\ndef plot_centrifuge(n, figname=\"centrifuge.svg\"):\nncols = max(int(n**0.5), 1) # minimum 1 column\nnrows = n // ncols if n % ncols == 0 else n // ncols + 1\nheight = 3 if nrows == ncols else 2\nwidth = 2\nfig, axes = plt.subplots(nrows, ncols, figsize=(height * nrows, width * ncols))\nz = np.exp(2 * np.pi * 1j / n)\ntheta = np.linspace(0, 2 * np.pi, 20)\nradius = 1 / (ncols + nrows)\na = radius * np.cos(theta)\nb = radius * np.sin(theta)\ncent = centrifuge(n)\nfor nr in range(nrows):\nfor nc in range(ncols):\nk = nr * ncols + nc + 1\naxis = axes[nr, nc] if ncols > 1 else axes[nr]\nif k > n:\naxis.axis(\"off\")\ncontinue\n# draw the n-holes\nfor i in [z**i for i in range(n)]:\naxis.plot(a + i.real, b + i.imag, color=\"b\" if cent[k] else \"gray\")\n# draw the k tubes\nif cent[k]:\nif k > n // 2:\npos = [not b for b in centrifuge_k(n, n - k)]\nelse:\npos = centrifuge_k(n, k)\nfor i, ok in enumerate(pos):\ni = z**i\nif ok:\naxis.fill(a + i.real, b + i.imag, color=\"r\")\naxis.set_aspect(1)\naxis.set(xticklabels=[], yticklabels=[])\naxis.set(xlabel=None)\naxis.set_ylabel(f\"k={k}\", rotation=0, labelpad=10)\naxis.tick_params(bottom=False, left=False)\nfig.suptitle(f\"$k$ Test Tubes to Balance a {n}-Hole Centrifuge\")\nfig.text(0.1, 0.05, \"Red dot represents the position of test tubes.\")\nplt.savefig(figname)\nplt.close(fig)\nif __name__ == \"__main__\":\nfor n in range(6, 51):\nprint(f\"Balancing {n}-hole centrifuge...\")\nplot_centrifuge(n, f\"{n}-hole-centrifuge.png\")\n
"},{"location":"posts/centrifuge-problem/#download-plots-of-balanced-centrifuges","title":"Download plots of balanced centrifuges","text":"Success
You can download the Python code and all plots of balanced \\(n\\)-hole centrifuge, \\(6\\le n\\le50\\), which I calculated using the code above.
"},{"location":"tags/","title":"Tags","text":""},{"location":"tags/#8-k","title":"8-K","text":"Mingze Gao, aka Adrian, is a Postdoctoral Research Fellow at the University of Sydney Business School. With a focus on banking and corporate finance, his work has been published at journals including Journal of Banking & Finance, Finance Research Letters, and/or presented at conferences such as WFA, EFA (scheduled), FMA, FIRN, AFBC, etc.
Mingze has a strong background in programming and received First Prize in the 2010 National Olympiad in Informatics in Provinces (NOIP). His PhD thesis involves large-scale textual analysis and novel machine learning application, leading to a $500,000 grant from the Australian Research Council (ARC) Discovery Project financing his postdoctoral fellowship. He also has a Grad.Cert. in computing from UNSW with High Distinction, on, e.g., database, crypto and distributed ledger technology. Some of his open-source works include, for example, frds, specurve, mktstructure and edgar-analyzer.
CV, Google Scholar, Faculty Profile and SSRN Profile.
"},{"location":"#education","title":"Education","text":"Made in 1994. Married in 2021. Husband to Sherry. Father of four cats.
I came to Sydney in 2013 and have since been with the University of Sydney. Started with a commerce degree in econometrics and finance majors, I enjoy very much the study and life here and successfully completed my research degrees in finance afterwards. A fan of computer science too, I've completed a degree in computing at the University of New South Wales during my postdoctoral research fellowship.
My PhD work is summarised by three papers. The first theoretically extends the principal-agent model and empirically shows that a firm's accumulated knowledge substitutes for costly executive performance incentives. The second involves textual analysis on millions of firm 8K filings and documents a positive effect of corporate real estate holdings on M&A performance. The third applies machine learning on high-dimensional bank loan data and proposes an effective early-warning predictor for bank risks, which forms the backbone of a successful $500,000 grant from the Australian Research Council (ARC) Discovery Project financing my postdoctoral fellowship.
I'm blessed to have my wonderful supervisors over the past many years, Henry Leung, Buhui Qiu, Joakim Westerholm and Eliza Wu,1 and the invaluable mentoring from Iftekhar Hasan. I strive to produce more high-quality research outputs myself and together with my awesome coauthors. To the best I can, I also like to provide as much as possible to all researchers so that we can thrive together, which motivates me writing the research notes, apps and more. My favourite quote:
Work until you no longer have to introduce yourself.
Apart from work, I workout regularly. I cycle to/from work and train at home. I also do running training and aim to complete a marathon one day soon.
Alphabetically ordered by last name.\u00a0\u21a9
\ud83d\udc68\u200d\ud83d\udcbb The apps, programs and other tools I developed.
"},{"location":"apps/#research","title":"Research","text":"frds
- a Python framework to compute a collection of academic measures used in the finance literature.specurve
- a Stata command used to perform Specification Curve Analysis and generate the Specification Curve plot - listed in Harvard Business School Research Computing Services blog.edgar-analyzer
- a Python command-line tool to download SEC filings and perform textual analyses.mktstructure
- a Python command-line tool to download Refinitiv Tick History data and compute some market microstructure measures.phd.io
- a website for PhDs to build their personal web pages.PaperManager
- a simple tag-based paper manager with a fast PDF viewer in pure Python.LeGao
- a web application used to make LEGO mosaics.specurve
.edgar-analyzer
.Organization Capital and Executive Performance Incentives, with Henry Leung and Buhui Qiu, Journal of Banking & Finance, 2021.
A firm's organization capital has a significant substitution effect on its executive pay-for-performance sensitivity. SSRN
Consumer Behaviour and Credit Supply: Evidence from an Australian FinTech Lender, with Henry Leung, Linhui Liu and Buhui Qiu, Finance Research Letters, 2023.
Consumer behaviour affects FinTech lending decisions. SSRN
OtherCloser than ever: Growing business-level connections between Australia and Europe, with Boris Choy, Teresa Davis, Hanyun Ding, Massimo Garbuio, Catherine Hardy, Henry Leung, Thanh Son Luong, Greg Patmore, Sandra Peter, Buhui Qiu, Kai Riemer, John Shields, Catherine Sutton-Brady, Carlos Vazquez-Hernandez, and Eliza Wu, European Management Journal, 2023.
"},{"location":"research/#working-papers","title":"\ud83d\udcdd Working Papers","text":""},{"location":"research/#in-circulation","title":"In circulation","text":"\"Lone (Loan) Wolf Pack Risk\", with Iftekhar Hasan, Buhui Qiu and Eliza Wu.
\"Anomalous Lending and Bank Risk\", with Iftekhar Hasan, Buhui Qiu, Eliza Wu and Yan Yu.
\"Borrower Technology Similarity and Bank Loan Contracting\", with Yunying Huang, Steven Ongena and Eliza Wu.
\"Corporate Real Estate Holdings and M&As\", with Thanh Son Luong and Buhui Qiu.
\"Catering to Environmental Premium in Green Venture Financing: Evidence from a Bert-Based Deep Learning Approach\", with Henry Leung, Tse-Chun Lin and Tracy Thi Vu.
"},{"location":"research/#in-progress","title":"In progress","text":"FINC50 is one half of your Finance 101. (1)
The objectives are
Note
This is a proof-of-concept and always a work-in-progress.
It could take a relatively long time for me to \"complete\".
"},{"location":"finc50/#course-notes","title":"Course notes","text":"An interactive chart and calculator of bond cashflows, present values and prices.
{ \"$schema\": \"https://vega.github.io/schema/vega/v5.json\", \"description\": \"A chart of bond's cashflows, present value and price, made by Mingze Gao\", \"width\": 700, \"height\": 300, \"title\": { \"text\": \"Cashflows, PV and Price of a $10,000 Bond\", \"fontSize\": 18, \"anchor\": \"middle\" }, \"data\": [ { \"name\": \"table\", \"transform\": [ { \"type\": \"sequence\", \"as\": \"year\", \"start\": 0, \"step\": 0.5, \"stop\": 31 }, { \"type\": \"formula\", \"as\": \"i\", \"expr\": \"(0.5*(couponFrequency=='semiannual')+(couponFrequency=='annual'))\" }, { \"type\": \"formula\", \"as\": \"i2\", \"expr\": \"(2*(couponFrequency=='semiannual')+(couponFrequency=='annual'))\" }, { \"type\": \"formula\", \"as\": \"cashflow\", \"expr\": \"10000*couponRate*(datum.i)*(datum.year>0)+10000*(datum.year==maturityInYears)\" }, { \"type\": \"formula\", \"as\": \"r\", \"expr\": \"couponFrequency=='annual'? discountRate : pow(1+discountRate,0.5)-1\" }, { \"type\": \"formula\", \"as\": \"pv\", \"expr\": \"datum.cashflow / pow(1+datum.r, datum.year)\" }, { \"type\": \"formula\", \"as\": \"price\", \"expr\": \"datum.r>0 ? (10000*couponRate*(datum.i)*(1-pow(1+datum.r,-maturityInYears*datum.i2))/datum.r+10000*pow(1+datum.r,-maturityInYears*datum.i2)) : 10000*(1+couponRate*(datum.i)*maturityInYears*datum.i2)\" }, { \"type\": \"filter\", \"expr\": \"datum.year<=maturityInYears\" }, { \"type\": \"filter\", \"expr\": \"couponFrequency=='annual'? (datum.year==round(datum.year)) : 1 \" } ] }, { \"name\": \"scaledata\", \"source\": \"table\", \"transform\": [ { \"type\": \"aggregate\", \"fields\": [\"cashflow\", \"price\"], \"ops\": [\"max\", \"max\"], \"as\": [\"maxCashflow\", \"mP\"] }, { \"type\": \"formula\", \"as\": \"maxV\", \"expr\": \"max(datum.maxCashflow, datum.mP*1.1)\" } ] } ], \"signals\": [ { \"name\": \"maturityInYears\", \"value\": 10, \"bind\": { \"input\": \"range\", \"min\": 1, \"max\": 30, \"step\": 1 } }, { \"name\": \"discountRate\", \"value\": 0.08, \"bind\": { \"input\": \"range\", \"min\": 0, \"max\": 0.2, \"step\": 0.0001 } }, { \"name\": \"couponRate\", \"value\": 0.05, \"bind\": { \"input\": \"range\", \"min\": 0, \"max\": 0.2, \"step\": 0.0001 } }, { \"name\": \"couponFrequency\", \"value\": \"annual\", \"bind\": { \"input\": \"radio\", \"options\": [\"annual\", \"semiannual\"] } } ], \"scales\": [ { \"name\": \"x\", \"type\": \"band\", \"domain\": { \"data\": \"table\", \"field\": \"year\", \"sort\": true }, \"range\": \"width\", \"padding\": 0.7 }, { \"name\": \"y\", \"type\": \"linear\", \"domain\": { \"data\": \"scaledata\", \"field\": \"maxV\" }, \"range\": \"height\" } ], \"axes\": [ { \"orient\": \"bottom\", \"scale\": \"x\", \"title\": \"Year\" }, { \"orient\": \"left\", \"scale\": \"y\", \"title\": \"Cash Flows, PV and Bond Price\" } ], \"marks\": [ { \"type\": \"rect\", \"from\": { \"data\": \"table\" }, \"encode\": { \"update\": { \"fill\": { \"value\": \"steelblue\" }, \"x\": { \"scale\": \"x\", \"field\": \"year\" }, \"width\": { \"scale\": \"x\", \"band\": 1 }, \"y\": { \"scale\": \"y\", \"field\": \"cashflow\" }, \"y2\": { \"scale\": \"y\", \"value\": 0 }, \"tooltip\": { \"signal\": \"{ 'Cashflow': format(datum.cashflow, '$,.2f') }\" } } } }, { \"type\": \"rect\", \"from\": { \"data\": \"table\" }, \"encode\": { \"update\": { \"fill\": { \"value\": \"#d6001c\" }, \"x\": { \"scale\": \"x\", \"field\": \"year\" }, \"width\": { \"scale\": \"x\", \"band\": 1 }, \"y\": { \"scale\": \"y\", \"field\": \"pv\" }, \"y2\": { \"scale\": \"y\", \"value\": 0 }, \"tooltip\": { \"signal\": \"{ 'PV': format(datum.pv, '$,.2f') }\" } } } }, { \"type\": \"rect\", \"from\": { \"data\": \"table\" }, \"encode\": { \"update\": { \"fill\": { \"value\": \"darkgray\" }, \"x\": { \"scale\": \"x\", \"value\": 0 }, \"width\": { \"scale\": \"x\", \"band\": 1 }, \"y\": { \"scale\": \"y\", \"field\": \"price\" }, \"y2\": { \"scale\": \"y\", \"value\": 0 }, \"tooltip\": { \"signal\": \"{ 'Bond Price': format(datum.price, '$,.2f') }\" } } } }, { \"type\": \"text\", \"from\": { \"data\": \"table\" }, \"encode\": { \"update\": { \"x\": { \"scale\": \"x\", \"value\": 0 }, \"y\": { \"scale\": \"y\", \"field\": \"price\", \"offset\": -5 }, \"text\": { \"signal\": \"format(datum.price, '$,.2f')\" }, \"fontSize\": { \"value\": 12 }, \"align\": { \"value\": \"left\" }, \"baseline\": { \"value\": \"bottom\" }, \"fill\": { \"value\": \"black\" } } } }, { \"type\": \"text\", \"encode\": { \"enter\": { \"align\": { \"value\": \"right\" }, \"baseline\": { \"value\": \"bottom\" }, \"fill\": { \"value\": \"rgba(0, 0, 0, 0.2)\" }, \"fontSize\": { \"value\": 14 }, \"x\": { \"value\": 0, \"offset\": \"width*0.85\" }, \"y\": { \"value\": 0, \"offset\": \"height*1.2\" }, \"text\": { \"value\": \"Assume coupons paid in arrears and effective annual discount rate (conversion based on coupon frequency).\" } } } } ] }
"},{"location":"finc50/#bond-price-and-yield","title":"Bond price and yield","text":"An interactive chart of bond price and yield.
{ \"$schema\": \"https://vega.github.io/schema/vega/v5.json\", \"description\": \"A chart of bond's price and yield, made by Mingze Gao\", \"width\": 700, \"height\": 300, \"title\": { \"text\": \"Bond Price and Yield\", \"fontSize\": 18, \"anchor\": \"middle\" }, \"data\": [ { \"name\": \"table\", \"transform\": [ { \"type\": \"sequence\", \"as\": \"yield\", \"start\": 0.0, \"step\": 0.5, \"stop\": 20.5 }, { \"type\": \"formula\", \"as\": \"price\", \"expr\": \"datum.yield>0 ? (10000*couponRate*(1-pow(1+datum.yield/100,-maturityInYears))/(datum.yield/100)+10000*pow(1+datum.yield/100,-maturityInYears)) : 10000*(1+couponRate*maturityInYears)\" }, { \"type\": \"formula\", \"as\": \"price5\", \"expr\": \"datum.yield>0 ? (10000*0.05*(1-pow(1+datum.yield/100,-maturityInYears))/(datum.yield/100)+10000*pow(1+datum.yield/100,-maturityInYears)) : 10000*(1+0.05*maturityInYears)\" } ] }, { \"name\": \"scaledata\", \"source\": \"table\", \"transform\": [ { \"type\": \"formula\", \"as\": \"maxV\", \"expr\": \"max(datum.price, datum.price5*1.2)\" } ] } ], \"signals\": [ { \"name\": \"maturityInYears\", \"value\": 10, \"bind\": { \"input\": \"range\", \"min\": 1, \"max\": 30, \"step\": 1 } }, { \"name\": \"couponRate\", \"value\": 0.05, \"bind\": { \"input\": \"range\", \"min\": 0, \"max\": 0.1, \"step\": 0.0001 } } ], \"scales\": [ { \"name\": \"x\", \"type\": \"linear\", \"domain\": { \"data\": \"table\", \"field\": \"yield\", \"sort\": true }, \"range\": \"width\" }, { \"name\": \"y\", \"type\": \"linear\", \"domain\": { \"data\": \"scaledata\", \"field\": \"maxV\" }, \"range\": \"height\" } ], \"axes\": [ { \"orient\": \"bottom\", \"scale\": \"x\", \"title\": \"Yield (%)\", \"ticks\": false }, { \"orient\": \"left\", \"scale\": \"y\", \"title\": \"Bond Price\" } ], \"marks\": [ { \"type\": \"rule\", \"encode\": { \"update\": { \"x\": { \"scale\": \"x\", \"value\": 0 }, \"y\": { \"scale\": \"y\", \"value\": 10000 }, \"x2\": { \"scale\": \"x\", \"value\": 5 }, \"y2\": { \"scale\": \"y\", \"value\": 10000 }, \"strokeWidth\": { \"value\": 1 }, \"strokeDash\": { \"value\": [8, 3] }, \"strokeCap\": { \"value\": \"round\" }, \"opacity\": { \"value\": 1 } } } }, { \"type\": \"rule\", \"encode\": { \"update\": { \"x\": { \"scale\": \"x\", \"value\": 5 }, \"y\": { \"scale\": \"y\", \"value\": 0 }, \"x2\": { \"scale\": \"x\", \"value\": 5 }, \"y2\": { \"scale\": \"y\", \"value\": 10000 }, \"strokeWidth\": { \"value\": 1 }, \"strokeDash\": { \"value\": [8, 3] }, \"strokeCap\": { \"value\": \"round\" }, \"opacity\": { \"value\": 1 } } } }, { \"type\": \"line\", \"from\": { \"data\": \"table\" }, \"encode\": { \"update\": { \"x\": { \"scale\": \"x\", \"field\": \"yield\" }, \"width\": { \"scale\": \"x\", \"band\": 1 }, \"y\": { \"scale\": \"y\", \"field\": \"price\" }, \"tooltip\": { \"signal\": \"{ 'Bond Price': format(datum.price, '$,.2f') }\" } } } }, { \"type\": \"line\", \"from\": { \"data\": \"table\" }, \"encode\": { \"update\": { \"x\": { \"scale\": \"x\", \"field\": \"yield\" }, \"y\": { \"scale\": \"y\", \"field\": \"price5\" }, \"stroke\": { \"value\": \"#d6001c\" }, \"tooltip\": { \"signal\": \"{ 'Bond Price': format(datum.price5, '$,.2f') }\" } } } }, { \"type\": \"text\", \"from\": { \"data\": \"table\" }, \"encode\": { \"update\": { \"x\": { \"scale\": \"x\", \"value\": 20 }, \"y\": { \"scale\": \"y\", \"field\": \"price\", \"offset\": -5 }, \"text\": { \"signal\": \"format(datum.price, '$,.0f')+'@'+format(datum.yield,'.1f')+'%'\" }, \"fontSize\": { \"value\": 12 }, \"align\": { \"value\": \"left\" }, \"baseline\": { \"value\": \"bottom\" }, \"fill\": { \"value\": \"black\" } } } }, { \"type\": \"text\", \"encode\": { \"enter\": { \"align\": { \"value\": \"right\" }, \"baseline\": { \"value\": \"bottom\" }, \"fill\": { \"value\": \"rgba(0, 0, 0, 0.2)\" }, \"fontSize\": { \"value\": 14 }, \"x\": { \"value\": 0, \"offset\": \"width\" }, \"y\": { \"value\": 0, \"offset\": \"height*1.2\" }, \"text\": { \"value\": \"Assume $10,000 bond, annual coupons paid in arrears and effective annual discount rate.\" } } } } ] }
"},{"location":"finc50/#risk-and-return","title":"Risk and return","text":"A graph showing volatility and return of S&P500 constituents in 2022.(1) Try to pan, zoom, select and click.
import yfinance as yf\nimport pandas as pd\nimport numpy as np\nlink = \"https://en.wikipedia.org/wiki/List_of_S%26P_500_companies#S&P_500_component_stocks\"\ndf = pd.read_html(link, header=0)[0]\ndf = yf.download(tickers=df['Symbol'].tolist(), start=\"2022-01-01\", end=\"2022-12-31\", progress=False, rounding=True)\ndf = df[['Adj Close']]\ndf.columns = df.columns.droplevel(0)\nret = ((df.pct_change()+1).cumprod()-1).iloc[-1]\nstd = df.pct_change().std() * np.sqrt(252)\ndf = pd.DataFrame({'return': ret.values, \"std\": std.values, \"ticker\": ret.index}).round(3).dropna()\ndf.to_json(\"./spy_risk_return.json\", orient=\"records\") \n
{ \"$schema\": \"https://vega.github.io/schema/vega/v5.json\", \"title\": { \"text\": \"Return and Volatility of S&P500 Stocks in 2022\", \"fontSize\": 18, \"anchor\": \"middle\" }, \"description\": \"An interactive scatter plot example supporting pan and zoom.\", \"width\": 700, \"height\": 300, \"padding\": { \"top\": 30, \"left\": 40, \"bottom\": 20, \"right\": 10 }, \"autosize\": \"none\", \"config\": { \"axis\": { \"domain\": false, \"tickSize\": 1, \"tickColor\": \"#888\", \"labelFont\": \"Monaco, Courier New\" } }, \"signals\": [ { \"name\": \"margin\", \"value\": 20 }, { \"name\": \"hover\", \"on\": [ { \"events\": \"*:mouseover\", \"encode\": \"hover\" }, { \"events\": \"*:mouseout\", \"encode\": \"leave\" }, { \"events\": \"*:mousedown\", \"encode\": \"select\" }, { \"events\": \"*:mouseup\", \"encode\": \"release\" } ] }, { \"name\": \"xoffset\", \"update\": \"-(height + padding.bottom)\" }, { \"name\": \"yoffset\", \"update\": \"-(width + padding.left)\" }, { \"name\": \"xrange\", \"update\": \"[0, width]\" }, { \"name\": \"yrange\", \"update\": \"[height, 0]\" }, { \"name\": \"down\", \"value\": null, \"on\": [ { \"events\": \"touchend\", \"update\": \"null\" }, { \"events\": \"mousedown, touchstart\", \"update\": \"xy()\" } ] }, { \"name\": \"xcur\", \"value\": null, \"on\": [ { \"events\": \"mousedown, touchstart, touchend\", \"update\": \"slice(xdom)\" } ] }, { \"name\": \"ycur\", \"value\": null, \"on\": [ { \"events\": \"mousedown, touchstart, touchend\", \"update\": \"slice(ydom)\" } ] }, { \"name\": \"delta\", \"value\": [0, 0], \"on\": [ { \"events\": [ { \"source\": \"window\", \"type\": \"mousemove\", \"consume\": true, \"between\": [ { \"type\": \"mousedown\" }, { \"source\": \"window\", \"type\": \"mouseup\" } ] }, { \"type\": \"touchmove\", \"consume\": true, \"filter\": \"event.touches.length === 1\" } ], \"update\": \"down ? [down[0]-x(), y()-down[1]] : [0,0]\" } ] }, { \"name\": \"anchor\", \"value\": [0, 0], \"on\": [ { \"events\": \"wheel\", \"update\": \"[invert('xscale', x()), invert('yscale', y())]\" }, { \"events\": { \"type\": \"touchstart\", \"filter\": \"event.touches.length===2\" }, \"update\": \"[(xdom[0] + xdom[1]) / 2, (ydom[0] + ydom[1]) / 2]\" } ] }, { \"name\": \"zoom\", \"value\": 1, \"on\": [ { \"events\": \"wheel!\", \"force\": true, \"update\": \"pow(1.001, event.deltaY * pow(16, event.deltaMode))\" }, { \"events\": { \"signal\": \"dist2\" }, \"force\": true, \"update\": \"dist1 / dist2\" } ] }, { \"name\": \"dist1\", \"value\": 0, \"on\": [ { \"events\": { \"type\": \"touchstart\", \"filter\": \"event.touches.length===2\" }, \"update\": \"pinchDistance(event)\" }, { \"events\": { \"signal\": \"dist2\" }, \"update\": \"dist2\" } ] }, { \"name\": \"dist2\", \"value\": 0, \"on\": [ { \"events\": { \"type\": \"touchmove\", \"consume\": true, \"filter\": \"event.touches.length===2\" }, \"update\": \"pinchDistance(event)\" } ] }, { \"name\": \"xdom\", \"update\": \"slice(xext)\", \"on\": [ { \"events\": { \"signal\": \"delta\" }, \"update\": \"[xcur[0] + span(xcur) * delta[0] / width, xcur[1] + span(xcur) * delta[0] / width]\" }, { \"events\": { \"signal\": \"zoom\" }, \"update\": \"[anchor[0] + (xdom[0] - anchor[0]) * zoom, anchor[0] + (xdom[1] - anchor[0]) * zoom]\" } ] }, { \"name\": \"ydom\", \"update\": \"slice(yext)\", \"on\": [ { \"events\": { \"signal\": \"delta\" }, \"update\": \"[ycur[0] + span(ycur) * delta[1] / height, ycur[1] + span(ycur) * delta[1] / height]\" }, { \"events\": { \"signal\": \"zoom\" }, \"update\": \"[anchor[1] + (ydom[0] - anchor[1]) * zoom, anchor[1] + (ydom[1] - anchor[1]) * zoom]\" } ] }, { \"name\": \"size\", \"update\": \"clamp(20 / span(xdom), 1, 1000)\" } ], \"data\": [ { \"name\": \"points\", \"url\": \"./demo/spy_risk_return.json\", \"transform\": [ { \"type\": \"extent\", \"field\": \"std\", \"signal\": \"xext\" }, { \"type\": \"extent\", \"field\": \"return\", \"signal\": \"yext\" }, { \"type\": \"formula\", \"as\": \"url\", \"expr\": \"'https://www.google.com/search?q=ticker:'+datum.ticker\", \"initonly\": true }, { \"type\": \"formula\", \"as\": \"tip\", \"expr\": \"'Ticker:'+datum.ticker\", \"initonly\": true } ] } ], \"scales\": [ { \"name\": \"xscale\", \"zero\": false, \"domain\": { \"signal\": \"xdom\" }, \"range\": { \"signal\": \"xrange\" } }, { \"name\": \"yscale\", \"zero\": false, \"domain\": { \"signal\": \"ydom\" }, \"range\": { \"signal\": \"yrange\" } } ], \"axes\": [ { \"scale\": \"xscale\", \"orient\": \"top\", \"offset\": { \"signal\": \"xoffset\" }, \"title\": \"Volatility\", \"titlePadding\": 15 }, { \"scale\": \"yscale\", \"orient\": \"right\", \"offset\": { \"signal\": \"yoffset\" }, \"title\": \"Return\", \"titleAngle\": -90, \"titlePadding\": 20 } ], \"marks\": [ { \"type\": \"symbol\", \"from\": { \"data\": \"points\" }, \"clip\": true, \"encode\": { \"enter\": { \"fillOpacity\": { \"value\": 0.6 }, \"fill\": { \"value\": \"#a6192e\" } }, \"update\": { \"x\": { \"scale\": \"xscale\", \"field\": \"std\" }, \"y\": { \"scale\": \"yscale\", \"field\": \"return\" }, \"size\": { \"signal\": \"size\" } }, \"hover\": { \"fill\": { \"value\": \"firebrick\" }, \"tooltip\": { \"field\": \"tip\", \"type\": \"nominal\" }, \"size\": { \"signal\": \"size\", \"mult\": 5 } }, \"leave\": { \"fill\": { \"value\": \"#a6192e\" } }, \"select\": { \"size\": { \"signal\": \"size\", \"mult\": 5 }, \"href\": { \"field\": \"url\", \"type\": \"nominal\" } }, \"release\": { \"size\": { \"signal\": \"size\" } } } } ] }
"},{"location":"finc50/fixed-income/","title":"Fixed Income Securities","text":"In the last post we introduce features of a bond. Now, let's look at how to price a plain vanilla bond and examine the relation between bond price and yield.
"},{"location":"finc50/fixed-income/bond-prices-and-yields/#price-of-a-bond","title":"Price of a bond","text":"First, what's the fair price of a bond?
For the investor, a bond represents a series of cashflows to receive in the future. So its price is naturally the total present value of all cashflows from the bond, including coupon payments (if any) and the repayment of principal (bond face value).
Therefore, the following equation holds true universally at all time \\(t\\):
\\[ \\text{Bond Price}_{t} = \\text{PV}_t(\\text{future coupons}) + \\text{PV}_t(\\text{face value}) \\]A bond's price at time \\(t\\) is the present value as at time \\(t\\) of all coupons to receive in the future, plus the present value as at time \\(t\\) of the bond face value. Personally, I'd call this fundamental.
Before we move on
Remember that all the complications we will see later are just results of uncertainties about the PVs, which utilmately are determined by cashflows and discount rates. Whenever you're lost, pause and think about how they are affected, and then reasone about how bond price may be affected.
I like examples. Suppose we are to issue a 10-year bond with a $10,000 face value that pays 5% annual coupon in arrears(1), with 8% discount rate(2), what would be the price today \\(t=0\\) at which we can sell the bond to investors?
The chart below shows the result. Specifically, the blue bars indicate the bond's cashflows, the overlaying red bars indicate their present values as at time \\(t=0\\). The gray bar at \\(t=0\\) is the sum of all red bars, i.e., present values, and the price of the bond today.
{ \"$schema\": \"https://vega.github.io/schema/vega/v5.json\", \"description\": \"A chart of bond's cashflows, present value and price, made by Mingze Gao\", \"width\": 700, \"height\": 300, \"title\": { \"text\": \"Cashflows, PV and Price of a $10,000 Bond\", \"fontSize\": 18, \"anchor\": \"middle\" }, \"data\": [ { \"name\": \"table\", \"transform\": [ { \"type\": \"sequence\", \"as\": \"year\", \"start\": 0, \"step\": 0.5, \"stop\": 31 }, { \"type\": \"formula\", \"as\": \"i\", \"expr\": \"(0.5*(couponFrequency=='semiannual')+(couponFrequency=='annual'))\" }, { \"type\": \"formula\", \"as\": \"i2\", \"expr\": \"(2*(couponFrequency=='semiannual')+(couponFrequency=='annual'))\" }, { \"type\": \"formula\", \"as\": \"cashflow\", \"expr\": \"10000*couponRate*(datum.i)*(datum.year>0)+10000*(datum.year==maturityInYears)\" }, { \"type\": \"formula\", \"as\": \"r\", \"expr\": \"couponFrequency=='annual'? discountRate : pow(1+discountRate,0.5)-1\" }, { \"type\": \"formula\", \"as\": \"pv\", \"expr\": \"datum.cashflow / pow(1+datum.r, datum.year)\" }, { \"type\": \"formula\", \"as\": \"price\", \"expr\": \"datum.r>0 ? (10000*couponRate*(datum.i)*(1-pow(1+datum.r,-maturityInYears*datum.i2))/datum.r+10000*pow(1+datum.r,-maturityInYears*datum.i2)) : 10000*(1+couponRate*(datum.i)*maturityInYears*datum.i2)\" }, { \"type\": \"filter\", \"expr\": \"datum.year<=maturityInYears\" }, { \"type\": \"filter\", \"expr\": \"couponFrequency=='annual'? (datum.year==round(datum.year)) : 1 \" } ] }, { \"name\": \"scaledata\", \"source\": \"table\", \"transform\": [ { \"type\": \"aggregate\", \"fields\": [\"cashflow\", \"price\"], \"ops\": [\"max\", \"max\"], \"as\": [\"maxCashflow\", \"mP\"] }, { \"type\": \"formula\", \"as\": \"maxV\", \"expr\": \"max(datum.maxCashflow, datum.mP*1.1)\" } ] } ], \"signals\": [ { \"name\": \"maturityInYears\", \"value\": 10, \"bind\": { \"input\": \"range\", \"min\": 1, \"max\": 30, \"step\": 1 } }, { \"name\": \"discountRate\", \"value\": 0.08, \"bind\": { \"input\": \"range\", \"min\": 0, \"max\": 0.2, \"step\": 0.0001 } }, { \"name\": \"couponRate\", \"value\": 0.05, \"bind\": { \"input\": \"range\", \"min\": 0, \"max\": 0.2, \"step\": 0.0001 } }, { \"name\": \"couponFrequency\", \"value\": \"annual\", \"bind\": { \"input\": \"radio\", \"options\": [\"annual\", \"semiannual\"] } } ], \"scales\": [ { \"name\": \"x\", \"type\": \"band\", \"domain\": { \"data\": \"table\", \"field\": \"year\", \"sort\": true }, \"range\": \"width\", \"padding\": 0.7 }, { \"name\": \"y\", \"type\": \"linear\", \"domain\": { \"data\": \"scaledata\", \"field\": \"maxV\" }, \"range\": \"height\" } ], \"axes\": [ { \"orient\": \"bottom\", \"scale\": \"x\", \"title\": \"Year\" }, { \"orient\": \"left\", \"scale\": \"y\", \"title\": \"Cash Flows, PV and Bond Price\" } ], \"marks\": [ { \"type\": \"rect\", \"from\": { \"data\": \"table\" }, \"encode\": { \"update\": { \"fill\": { \"value\": \"steelblue\" }, \"x\": { \"scale\": \"x\", \"field\": \"year\" }, \"width\": { \"scale\": \"x\", \"band\": 1 }, \"y\": { \"scale\": \"y\", \"field\": \"cashflow\" }, \"y2\": { \"scale\": \"y\", \"value\": 0 }, \"tooltip\": { \"signal\": \"{ 'Cashflow': format(datum.cashflow, '$,.2f') }\" } } } }, { \"type\": \"rect\", \"from\": { \"data\": \"table\" }, \"encode\": { \"update\": { \"fill\": { \"value\": \"#d6001c\" }, \"x\": { \"scale\": \"x\", \"field\": \"year\" }, \"width\": { \"scale\": \"x\", \"band\": 1 }, \"y\": { \"scale\": \"y\", \"field\": \"pv\" }, \"y2\": { \"scale\": \"y\", \"value\": 0 }, \"tooltip\": { \"signal\": \"{ 'PV': format(datum.pv, '$,.2f') }\" } } } }, { \"type\": \"rect\", \"from\": { \"data\": \"table\" }, \"encode\": { \"update\": { \"fill\": { \"value\": \"darkgray\" }, \"x\": { \"scale\": \"x\", \"value\": 0 }, \"width\": { \"scale\": \"x\", \"band\": 1 }, \"y\": { \"scale\": \"y\", \"field\": \"price\" }, \"y2\": { \"scale\": \"y\", \"value\": 0 }, \"tooltip\": { \"signal\": \"{ 'Bond Price': format(datum.price, '$,.2f') }\" } } } }, { \"type\": \"text\", \"from\": { \"data\": \"table\" }, \"encode\": { \"update\": { \"x\": { \"scale\": \"x\", \"value\": 0 }, \"y\": { \"scale\": \"y\", \"field\": \"price\", \"offset\": -5 }, \"text\": { \"signal\": \"format(datum.price, '$,.2f')\" }, \"fontSize\": { \"value\": 12 }, \"align\": { \"value\": \"left\" }, \"baseline\": { \"value\": \"bottom\" }, \"fill\": { \"value\": \"black\" } } } }, { \"type\": \"text\", \"encode\": { \"enter\": { \"align\": { \"value\": \"right\" }, \"baseline\": { \"value\": \"bottom\" }, \"fill\": { \"value\": \"rgba(0, 0, 0, 0.2)\" }, \"fontSize\": { \"value\": 14 }, \"x\": { \"value\": 0, \"offset\": \"width*0.85\" }, \"y\": { \"value\": 0, \"offset\": \"height*1.2\" }, \"text\": { \"value\": \"Assume coupons paid in arrears and effective annual discount rate (conversion based on coupon frequency).\" } } } } ] }
Try it out
Change the parameters, see what happens to the bond price!
Now, let's have fun with the interactive chart above.
The initial price \\(P_{t=0}\\) of a plain vanilla \\(N\\)-year bond with face value \\(F\\), annual coupon \\(C\\), at a constant discount rate \\(r\\)(1), is given by
Note that this is only the initial price when the bond is issued at time \\(t=0\\).
"},{"location":"finc50/fixed-income/bond-prices-and-yields/#price-over-time","title":"Price over time","text":"Question
So, other things equal, how does bond price changes over time as we approaches the maturity date?
We need a better formula that can let \\(t\\) take values other than 0. Recall the rationale that the price is nothing but sum of all PVs of future payments.
"},{"location":"finc50/fixed-income/bond-prices-and-yields/#a-slightly-improved-formula","title":"A slightly improved formula","text":"At time \\(t\\), which is exactly \\(n\\) years till maturity, the price, \\(P_{t}\\), of a plain vanilla bond with face value \\(F\\), annual coupon \\(C\\), at a constant discount rate \\(r\\), is given by
\\[ P_{t} = \\underbrace{\\sum_{\\tau=1}^{n} \\frac{C}{(1+r)^{\\tau}}}_{\\text{sum of coupons' PVs}} + \\underbrace{\\frac{F}{(1+r)^n}}_{\\text{face value's PV}} \\]From only \\(P_{t=0}\\) to \\(\\{P_{t}\\}\\) is a major improvement!(1)
Let me show you another graph. Note that in this graph, each bar represents the bond price as at a point in time.(1)
{ \"$schema\": \"https://vega.github.io/schema/vega/v5.json\", \"description\": \"A chart of bond's price over time till maturity, made by Mingze Gao\", \"width\": 700, \"height\": 300, \"title\": { \"text\": \"Bond Price Over Time From Issue To Maturity\", \"fontSize\": 18, \"anchor\": \"middle\" }, \"data\": [ { \"name\": \"table\", \"transform\": [ { \"type\": \"sequence\", \"as\": \"year\", \"start\": 0, \"step\": 0.5, \"stop\": 31 }, { \"type\": \"formula\", \"as\": \"i\", \"expr\": \"(0.5*(couponFrequency=='semiannual')+(couponFrequency=='annual'))\" }, { \"type\": \"formula\", \"as\": \"i2\", \"expr\": \"(2*(couponFrequency=='semiannual')+(couponFrequency=='annual'))\" }, { \"type\": \"formula\", \"as\": \"cashflow\", \"expr\": \"10000*couponRate*(datum.i)*(datum.year>0)+10000*(datum.year==maturityInYears)\" }, { \"type\": \"formula\", \"as\": \"r\", \"expr\": \"couponFrequency=='annual'? discountRate : pow(1+discountRate,0.5)-1\" }, { \"type\": \"formula\", \"as\": \"pv\", \"expr\": \"datum.cashflow / pow(1+datum.r, datum.year)\" }, { \"type\": \"formula\", \"as\": \"price\", \"expr\": \"datum.r>0 ? (10000*couponRate*(datum.i)*(1-pow(1+datum.r,-(maturityInYears-datum.year)*datum.i2))/datum.r+10000*pow(1+datum.r,-(maturityInYears-datum.year)*datum.i2)) : 10000*(1+couponRate*(datum.i)*(maturityInYears-datum.year)*datum.i2)\" }, { \"type\": \"filter\", \"expr\": \"datum.year<=maturityInYears\" }, { \"type\": \"filter\", \"expr\": \"couponFrequency=='annual'? (datum.year==round(datum.year)) : 1 \" } ] } ], \"signals\": [ { \"name\": \"maturityInYears\", \"value\": 30, \"bind\": { \"input\": \"range\", \"min\": 1, \"max\": 30, \"step\": 1 } }, { \"name\": \"discountRate\", \"value\": 0.08, \"bind\": { \"input\": \"range\", \"min\": 0, \"max\": 0.2, \"step\": 0.0001 } }, { \"name\": \"couponRate\", \"value\": 0.05, \"bind\": { \"input\": \"range\", \"min\": 0, \"max\": 0.2, \"step\": 0.0001 } }, { \"name\": \"couponFrequency\", \"value\": \"annual\", \"bind\": { \"input\": \"radio\", \"options\": [\"annual\", \"semiannual\"] } } ], \"scales\": [ { \"name\": \"x\", \"type\": \"band\", \"domain\": { \"data\": \"table\", \"field\": \"year\", \"sort\": true }, \"range\": \"width\", \"padding\": 0.7 }, { \"name\": \"y\", \"type\": \"linear\", \"domain\": { \"data\": \"table\", \"field\": \"price\" }, \"range\": \"height\" } ], \"axes\": [ { \"orient\": \"bottom\", \"scale\": \"x\", \"title\": \"Year\" }, { \"orient\": \"left\", \"scale\": \"y\", \"title\": \"Bond Price\" } ], \"marks\": [ { \"type\": \"rule\", \"encode\": { \"update\": { \"x\": { \"scale\": \"x\", \"value\": 0 }, \"y\": { \"scale\": \"y\", \"value\": 10000 }, \"x2\": { \"scale\": \"x\", \"signal\": \"maturityInYears\" }, \"y2\": { \"scale\": \"y\", \"value\": 10000 }, \"strokeWidth\": { \"value\": 2 }, \"strokeDash\": { \"value\": [8, 3] }, \"strokeCap\": { \"value\": \"round\" }, \"opacity\": { \"value\": 1 } } } }, { \"type\": \"rect\", \"from\": { \"data\": \"table\" }, \"encode\": { \"update\": { \"fill\": { \"value\": \"steelblue\" }, \"x\": { \"scale\": \"x\", \"field\": \"year\" }, \"width\": { \"scale\": \"x\", \"band\": 1 }, \"y\": { \"scale\": \"y\", \"field\": \"price\" }, \"y2\": { \"scale\": \"y\", \"value\": 0 }, \"tooltip\": { \"signal\": \"{ 'Bond Price': format(datum.price, '$,.2f') }\" } } } }, { \"type\": \"text\", \"from\": { \"data\": \"table\" }, \"encode\": { \"update\": { \"x\": { \"scale\": \"x\", \"signal\": \"maturityInYears\" }, \"y\": { \"scale\": \"y\", \"value\": 10000, \"offset\": -5 }, \"text\": { \"value\": \"Bond Face Value\" }, \"fontSize\": { \"value\": 12 }, \"align\": { \"value\": \"left\" }, \"baseline\": { \"value\": \"bottom\" }, \"fill\": { \"value\": \"black\" } } } }, { \"type\": \"text\", \"encode\": { \"enter\": { \"align\": { \"value\": \"right\" }, \"baseline\": { \"value\": \"bottom\" }, \"fill\": { \"value\": \"rgba(0, 0, 0, 0.2)\" }, \"fontSize\": { \"value\": 14 }, \"x\": { \"value\": 0, \"offset\": \"width*0.85\" }, \"y\": { \"value\": 0, \"offset\": \"height*1.2\" }, \"text\": { \"value\": \"Assume coupons paid in arrears and effective annual discount rate (conversion based on coupon frequency).\" } } } } ] }
Try it out
Change the discount rate to be lower than coupon rate, what do you find?
It's not difficult to find that, as it approaches the maturity, the bond price approaches the face value, regardless of whether the bond is traded at a premium or discount.(1)
Recall earlier we said that the longer the maturity, the lower the bond price. This is true because we are talking about the initial price at issue. For example, other things equal, the price of a 30-year bond is lower than the price of a 10-year bond.
Here, time is changing. There is a bond of a given maturity (e.g., 30 years), and we study how its price changes over time as we get close to the 30-year mark.
"},{"location":"finc50/fixed-income/bond-prices-and-yields/#a-more-improved-formula","title":"A more improved formula","text":"But we can still do better!
Question
So far we have been assuming the next coupon is exactly one period (e.g., year) in the future, or in other words, the last coupon has just been paid. What if this is not the case? What if the next annual coupon is in 2 months, not in 12 months, from now?
When the coupon payment date does not align with the time at which we compute the bond price, only a simple adjustment is required.
The basic idea is, since next coupon and all future payments are closer than what's assumed in computation, we have over-discounted the bond value. So, we can correct it by \"growing\" the undervalued price for the time since last coupon payment.
\\[ P_{t} = \\underbrace{\\left[\\sum_{\\tau=1}^{n} \\frac{C}{(1+r)^{\\tau}} + \\frac{F}{(1+r)^n}\\right]}_{\\text{bond price right after last coupon}} \\times (1+r)^{\\frac{\\text{days since last coupon}}{\\text{days between coupons}}} \\]As such, we can now derive a continuous path for the bond price since issue to maturity, assuming other things equal. This is shown in the next chart as a blue line.(1)
Attention!
Now imagine you are to buy a bond immediately before it matures. What would be the price according to the formula and the chart above?
No matter how much coupon the bond pays, the price (indicated by the the last bar) is the face value of the bond $10,000. After the purchase, however, you will immediately receive a total payment of bond face value and the last coupon payment, which surely is greater than $10,000.
Apparently, you need to pay more than the price described by the formula to the seller.
In fact, a bondholder starts to accumulate accrued interest the moment their own the bond. Even though they may sell the bond right before a coupon payment, but given that they have been holding the bond for almost entire the time until selling just before the next coupon payment, they should be given compensation for not receiving the next coupon, which will be paid to the buyer.
Further, we generalize this idea to bond transactions any time between coupon payments -- the buyer should compensate the seller additionally a coupon payment proportional to the time that the seller has been holding since last coupon payment relative to the time between two coupon payments.
{ \"$schema\": \"https://vega.github.io/schema/vega/v5.json\", \"description\": \"A chart of bond's dirty and clean prices over time till maturity, made by Mingze Gao\", \"width\": 700, \"height\": 300, \"title\": { \"text\": \"Bond Price, Dirty & Clean, Over Time From Issue To Maturity\", \"fontSize\": 18, \"anchor\": \"middle\" }, \"data\": [ { \"name\": \"table\", \"transform\": [ { \"type\": \"sequence\", \"as\": \"year\", \"start\": 0, \"step\": 0.01, \"stop\": 31 }, { \"type\": \"formula\", \"as\": \"price\", \"expr\": \"discountRate>0 ? (10000*couponRate*(1-pow(1+discountRate,-(maturityInYears-datum.year)))/discountRate+10000*pow(1+discountRate,-(maturityInYears-datum.year))) : 10000*(1+couponRate*(maturityInYears-datum.year))\" }, { \"type\": \"formula\", \"as\": \"dirtyprice\", \"expr\": \"datum.price+10000*couponRate*(datum.year-floor(datum.year))\" }, { \"type\": \"filter\", \"expr\": \"datum.year<=maturityInYears\" } ] } ], \"signals\": [ { \"name\": \"maturityInYears\", \"value\": 30, \"bind\": { \"input\": \"range\", \"min\": 1, \"max\": 30, \"step\": 1 } }, { \"name\": \"discountRate\", \"value\": 0.08, \"bind\": { \"input\": \"range\", \"min\": 0, \"max\": 0.2, \"step\": 0.0001 } }, { \"name\": \"couponRate\", \"value\": 0.05, \"bind\": { \"input\": \"range\", \"min\": 0, \"max\": 0.2, \"step\": 0.0001 } }, { \"name\": \"showDirtyPrice\", \"value\": \"true\", \"bind\": { \"input\": \"radio\", \"options\": [\"true\", \"false\"] } } ], \"scales\": [ { \"name\": \"x\", \"type\": \"linear\", \"domain\": { \"data\": \"table\", \"field\": \"year\", \"sort\": true }, \"range\": \"width\" }, { \"name\": \"y\", \"type\": \"linear\", \"domain\": { \"data\": \"table\", \"field\": \"price\" }, \"range\": \"height\" } ], \"axes\": [ { \"orient\": \"bottom\", \"scale\": \"x\", \"title\": \"Year\" }, { \"orient\": \"left\", \"scale\": \"y\", \"title\": \"Bond Price\" } ], \"marks\": [ { \"type\": \"rule\", \"encode\": { \"update\": { \"x\": { \"scale\": \"x\", \"value\": 0 }, \"y\": { \"scale\": \"y\", \"value\": 10000 }, \"x2\": { \"scale\": \"x\", \"signal\": \"maturityInYears\" }, \"y2\": { \"scale\": \"y\", \"value\": 10000 }, \"strokeWidth\": { \"value\": 2 }, \"strokeDash\": { \"value\": [8, 3] }, \"strokeCap\": { \"value\": \"round\" }, \"opacity\": { \"value\": 1 } } } }, { \"type\": \"line\", \"from\": { \"data\": \"table\" }, \"encode\": { \"update\": { \"x\": { \"scale\": \"x\", \"field\": \"year\" }, \"y\": { \"scale\": \"y\", \"field\": \"price\" } } } }, { \"type\": \"line\", \"from\": { \"data\": \"table\" }, \"encode\": { \"update\": { \"x\": { \"scale\": \"x\", \"field\": \"year\" }, \"y\": { \"scale\": \"y\", \"field\": \"dirtyprice\" }, \"strokeWidth\": { \"signal\": \"showDirtyPrice=='true'? 1: 0\" }, \"strokeDash\": { \"value\": [2, 2] }, \"strokeCap\": { \"value\": \"round\" }, \"opacity\": { \"value\": 1 }, \"stroke\": { \"value\": \"#d6001c\" } } } }, { \"type\": \"text\", \"from\": { \"data\": \"table\" }, \"encode\": { \"update\": { \"x\": { \"scale\": \"x\", \"signal\": \"maturityInYears\" }, \"y\": { \"scale\": \"y\", \"value\": 10000, \"offset\": -5 }, \"text\": { \"value\": \"Bond Face Value\" }, \"fontSize\": { \"value\": 12 }, \"align\": { \"value\": \"left\" }, \"baseline\": { \"value\": \"bottom\" }, \"fill\": { \"value\": \"black\" } } } }, { \"type\": \"text\", \"encode\": { \"enter\": { \"align\": { \"value\": \"right\" }, \"baseline\": { \"value\": \"bottom\" }, \"fill\": { \"value\": \"rgba(0, 0, 0, 0.2)\" }, \"fontSize\": { \"value\": 14 }, \"x\": { \"value\": 0, \"offset\": \"width*0.6\" }, \"y\": { \"value\": 0, \"offset\": \"height*1.2\" }, \"text\": { \"value\": \"Assume annual coupons paid in arrears and effective annual discount rate.\" } } } } ] }
The chart above shows the continuous path of bond price described by the formula in blue and the price including the additional compensation, i.e., the accrued interest. We names these two prices \"clean price\" and \"dirty price\", respectively.
And the accrued interest is given by
\\[ \\text{coupon} \\times \\frac{\\text{time since last coupon}}{\\text{time between coupons}} \\]which periodically increases and resets.
Day count convention
So far, we have not yet spent a single word on a bond's \"yield\", but instead been using \"discount rate\". What's the difference?
The answer is straightforward. A bond consists of a series of cashflows in the future, each of which may be discounted at different rates. For example, the coupon in 1 year may be discounted at a 5% discount rate, the coupon in 2 years may be discounted at 6%, and so on.
It turns out that, at any time \\(t\\), while a bond can have multiple future payments each discounted at different rates \\(\\{r_{\\tau}\\}\\), we can always find a single discount rate \\(\\color{red}y\\) which, when applied to all future payments, leads to the same bond price at the time:
\\[ P_{t} = \\underbrace{\\sum_{\\tau=1}^{n} \\frac{C}{(1+r_\\tau)^{\\tau}} + \\frac{F}{(1+r_n)^n}}_{\\text{each discounted at varying rates}} = \\underbrace{\\sum_{\\tau=1}^{n} \\frac{C}{(1+{\\color{red}y})^{\\tau}} + \\frac{F}{(1+{\\color{red}y})^n}}_{\\text{all discounted at the same rate}} \\]This single discount rate \\(\\color{red}y\\) is called the yield to maturity, or yield, of the bond at time \\(t\\).
If we plot bond prices against yields, at a given time, it's easy to see that they have a one-to-one mapping and a inverse non-linear relationship.
{ \"$schema\": \"https://vega.github.io/schema/vega/v5.json\", \"description\": \"A chart of bond's price and yield, made by Mingze Gao\", \"width\": 700, \"height\": 300, \"title\": { \"text\": \"Bond Price and Yield\", \"fontSize\": 18, \"anchor\": \"middle\" }, \"data\": [ { \"name\": \"table\", \"transform\": [ { \"type\": \"sequence\", \"as\": \"yield\", \"start\": 0.0, \"step\": 0.5, \"stop\": 20.5 }, { \"type\": \"formula\", \"as\": \"price\", \"expr\": \"datum.yield>0 ? (10000*couponRate*(1-pow(1+datum.yield/100,-maturityInYears))/(datum.yield/100)+10000*pow(1+datum.yield/100,-maturityInYears)) : 10000*(1+couponRate*maturityInYears)\" }, { \"type\": \"formula\", \"as\": \"price5\", \"expr\": \"datum.yield>0 ? (10000*0.05*(1-pow(1+datum.yield/100,-maturityInYears))/(datum.yield/100)+10000*pow(1+datum.yield/100,-maturityInYears)) : 10000*(1+0.05*maturityInYears)\" } ] }, { \"name\": \"scaledata\", \"source\": \"table\", \"transform\": [ { \"type\": \"formula\", \"as\": \"maxV\", \"expr\": \"max(datum.price, datum.price5*1.2)\" } ] } ], \"signals\": [ { \"name\": \"maturityInYears\", \"value\": 10, \"bind\": { \"input\": \"range\", \"min\": 1, \"max\": 30, \"step\": 1 } }, { \"name\": \"couponRate\", \"value\": 0.05, \"bind\": { \"input\": \"range\", \"min\": 0, \"max\": 0.1, \"step\": 0.0001 } } ], \"scales\": [ { \"name\": \"x\", \"type\": \"linear\", \"domain\": { \"data\": \"table\", \"field\": \"yield\", \"sort\": true }, \"range\": \"width\" }, { \"name\": \"y\", \"type\": \"linear\", \"domain\": { \"data\": \"scaledata\", \"field\": \"maxV\" }, \"range\": \"height\" } ], \"axes\": [ { \"orient\": \"bottom\", \"scale\": \"x\", \"title\": \"Yield (%)\", \"ticks\": false }, { \"orient\": \"left\", \"scale\": \"y\", \"title\": \"Bond Price\" } ], \"marks\": [ { \"type\": \"rule\", \"encode\": { \"update\": { \"x\": { \"scale\": \"x\", \"value\": 0 }, \"y\": { \"scale\": \"y\", \"value\": 10000 }, \"x2\": { \"scale\": \"x\", \"value\": 5 }, \"y2\": { \"scale\": \"y\", \"value\": 10000 }, \"strokeWidth\": { \"value\": 1 }, \"strokeDash\": { \"value\": [8, 3] }, \"strokeCap\": { \"value\": \"round\" }, \"opacity\": { \"value\": 1 } } } }, { \"type\": \"rule\", \"encode\": { \"update\": { \"x\": { \"scale\": \"x\", \"value\": 5 }, \"y\": { \"scale\": \"y\", \"value\": 0 }, \"x2\": { \"scale\": \"x\", \"value\": 5 }, \"y2\": { \"scale\": \"y\", \"value\": 10000 }, \"strokeWidth\": { \"value\": 1 }, \"strokeDash\": { \"value\": [8, 3] }, \"strokeCap\": { \"value\": \"round\" }, \"opacity\": { \"value\": 1 } } } }, { \"type\": \"line\", \"from\": { \"data\": \"table\" }, \"encode\": { \"update\": { \"x\": { \"scale\": \"x\", \"field\": \"yield\" }, \"width\": { \"scale\": \"x\", \"band\": 1 }, \"y\": { \"scale\": \"y\", \"field\": \"price\" }, \"tooltip\": { \"signal\": \"{ 'Bond Price': format(datum.price, '$,.2f') }\" } } } }, { \"type\": \"line\", \"from\": { \"data\": \"table\" }, \"encode\": { \"update\": { \"x\": { \"scale\": \"x\", \"field\": \"yield\" }, \"y\": { \"scale\": \"y\", \"field\": \"price5\" }, \"stroke\": { \"value\": \"#d6001c\" }, \"tooltip\": { \"signal\": \"{ 'Bond Price': format(datum.price5, '$,.2f') }\" } } } }, { \"type\": \"text\", \"from\": { \"data\": \"table\" }, \"encode\": { \"update\": { \"x\": { \"scale\": \"x\", \"value\": 20 }, \"y\": { \"scale\": \"y\", \"field\": \"price\", \"offset\": -5 }, \"text\": { \"signal\": \"format(datum.price, '$,.0f')+'@'+format(datum.yield,'.1f')+'%'\" }, \"fontSize\": { \"value\": 12 }, \"align\": { \"value\": \"left\" }, \"baseline\": { \"value\": \"bottom\" }, \"fill\": { \"value\": \"black\" } } } }, { \"type\": \"text\", \"encode\": { \"enter\": { \"align\": { \"value\": \"right\" }, \"baseline\": { \"value\": \"bottom\" }, \"fill\": { \"value\": \"rgba(0, 0, 0, 0.2)\" }, \"fontSize\": { \"value\": 14 }, \"x\": { \"value\": 0, \"offset\": \"width\" }, \"y\": { \"value\": 0, \"offset\": \"height*1.2\" }, \"text\": { \"value\": \"Assume $10,000 bond, annual coupons paid in arrears and effective annual discount rate.\" } } } } ] }
There are many interesting features about the bond price-yield relationship.
The one-to-one mapping between bond price and yield implies that we can always compute the other when given either of them. So, knowing either price or yield is sufficient when dealing with bonds.(1)
The inverse relationship suggests that the higher the yield, the lower the bond price.
The non-linearity suggests that the sensitivity of bond price to yield is not static. In fact, we can tell from the graph that the curve is convex. This convexity will be of great importance in later studies of bond risk.
"},{"location":"finc50/fixed-income/bond-prices-and-yields/#other-yields","title":"Other yields","text":"When we talk about a bond's yield, we usually refer to the yield to maturity. But there can be some other yield measures:
Portfolio of bonds
The yield of a portfolio of bonds is NOT a weighted average of individual bonds' yields because the bonds are not homogeneous.
"},{"location":"finc50/fixed-income/bond-prices-and-yields/#trouble-maker-floating-rate-bonds","title":"Trouble maker: floating-rate bonds","text":"We cannot easily compute the yield of a floater, simply because the values of reference rate in the future are unknown. Instead, we can use some spread measures that describe the yield in excess of the reference rate. The most popular measure of yield spread for a floating-rate bond is discount margin.
As its name suggests, discount margin basically captures the \"discount rate in excess of reference rate\".
Suppose the bond market price is \\(P_t\\), reference rate is assumed constant at \\(R\\), the discount margin \\(DM\\) is the one that solves the equation below:
\\[ P_{t} = \\sum_{\\tau=1}^{n} \\frac{C}{(1+R+{\\color{red}DM})^{\\tau}} + \\frac{F}{(1+R+{\\color{red}DM})^n} \\]Note that here the coupon payment \\(C\\) is determined by the reference rate \\(R\\) and the quoted margin.
"},{"location":"finc50/fixed-income/bond-yields-and-returns/","title":"Bond Yields and Returns","text":"Info
This post is still under construction.
"},{"location":"finc50/fixed-income/introduction/","title":"Introduction to Fixed Income Securities","text":""},{"location":"finc50/fixed-income/introduction/#what-are-fixed-income-securities","title":"What are fixed income securities?","text":"Fixed income securities are financial instruments that provide a fixed, or predictable, stream of income to investors. These securities typically take the form of bonds, but also include other investment types like certificates of deposit and preferred shares.
Fixed income securities are essentially loans made by an investor to an issuer. In exchange for the loan, the issuer agrees to pay the investor a specified rate of interest during the life of the bond and to repay the principal when it \"matures,\" or comes due.
Types of fixed income securities
By the type of issuer, there are
Government bondsCorporate bondsMunicipal bondsRisk-free rate
When we talk about \"risk-free rate\", we largely refer to the yield of such government bonds.
Governments issue bonds to borrow money. These bonds are considered among the safest investments, as they are backed by the taxing power of the government.
Companies also issue bonds to finance their operations or projects. Corporate bonds are considered higher risk than government bonds, but they also typically pay a higher rate of interest.
Municipal bonds are issued by cities, states, or other local entities for various public purposes. These bonds often have tax advantages, making them attractive to certain investors. (1)
In terms of underlying asset, there are
Asset-backed securities (ABS)Mortgage-backed securities (MBS)These are bonds backed by loan receivables other than real estate, such as credit card debt, auto loans, student loans, or even royalties from music. In the case of ABS, a pool of these non-mortgage assets is packaged and sold to investors as securities. The principal and interest payments made by the borrowers on these underlying loans are then passed through to the investors.
These are investment products backed by home and commercial mortgage loans. These loans are packaged into securities and sold to investors. Similar to ABS, the principal and interest payments made by the borrowers are passed through to the investors. However, MBS are directly tied to the mortgage industry and are susceptible to the performance of the housing market.
"},{"location":"finc50/fixed-income/introduction/#why-invest-in-fixed-income-securities","title":"Why invest in fixed income securities?","text":"Investors choose fixed income securities for several reasons:
Income: Fixed income securities provide regular interest payments, which can be an attractive source of income.
Preservation of capital: When the bond matures, the full principal amount is returned to the investor. This makes bonds appealing for those looking to preserve their capital.
Diversification: Including fixed income securities in a portfolio can help diversify investments and reduce risk.
Investors in fixed income securities come from a broad spectrum and include both individuals and institutions.
IndividualsInstitutionsIndividual investors, particularly those in or nearing retirement, often invest in fixed-income securities as a way to preserve capital and generate a steady stream of income.
Pension Funds: Pension funds invest heavily in fixed-income securities as they provide predictable returns which can be matched against their future payout obligations.
Insurance Companies: Like pension funds, insurance companies have long-term, predictable liabilities and thus invest significantly in fixed-income securities to match these liabilities.
Mutual Funds: There are many mutual funds, known as bond funds, that specialize in investing in fixed-income securities.
Banks and Financial Institutions: Banks and other financial institutions often invest in fixed-income securities as a way to generate a return on their excess capital and to help manage their interest rate risk.
Endowments and Foundations: These entities often include fixed-income securities in their portfolios for diversification and income generation.
Central Banks: Central banks often hold domestic and foreign fixed-income securities as a part of their reserves and as a tool for implementing monetary policy.
Each type of investor may have different investment objectives and constraints, and therefore might focus on different types of fixed-income securities (e.g., government bonds, corporate bonds, municipal bonds, etc.) based on their risk tolerance, income requirements, tax situation, and other factors.
Market size of fixed income securitiesSource of figure: SIFMA
\"Although they usually attract less attention than equity markets, fixed-income markets are more than three times the size of global equity markets\", CFA Institute.
"},{"location":"finc50/fixed-income/introduction/#features-of-a-bond","title":"Features of a bond","text":""},{"location":"finc50/fixed-income/introduction/#the-basics","title":"The basics","text":"In its simplest form, a bond may be specified by the following characteristics:
Example
On March 17, 2021, Microsoft (1) \"issued a $6,250,000,000 (2) aggregate principal amount of its 2.921% (3) Notes (5) due 2052 (4) (the \u201c2052 Notes\u201d)\".
Source: the firm's SEC filing.
The bond's indenture (1) also specifies that
The 2052 Notes will bear interest (computed on the basis of a 360-day year consisting of twelve 30-day months) from March 17, 2021 at the rate of 2.921% per annum, payable semi-annually in arrears.
... (1)
A sequence of cash flows between the bond issuer and investor is described below
sequenceDiagram\n autonumber\n Issuer (bond seller)-->>Investor (bond buyer): Bond\n Investor (bond buyer)->>Issuer (bond seller): Price of bond\n note right of Investor (bond buyer): We will figure out the price later\n\n loop every 6 months until 2052\n Issuer (bond seller)->>Investor (bond buyer): Coupon (1/2 of 2.921% of face value)\n end\n\n Issuer (bond seller)->>Investor (bond buyer): Return principal ($6,250,000,000)\n Investor (bond buyer)-->>Issuer (bond seller): Redeem bond
"},{"location":"finc50/fixed-income/introduction/#additional-things","title":"Additional things","text":"A bond can be secured in that the issuer can pledge certain assets (1) to \"secure\" the payments to investors. In case of defaults \ud83d\ude14, bondholders have a direct claim on the pledged assets.
An unsecured bond (or \"debenture\"), on the other hand, relies solely on the issuer's creditworthiness and ability to generate cash flow to repay bondholders. In case of defaults \ud83d\ude14, bondholders of unsecured bonds only have a claim on the issuer's general assets.
A bond has also a seniority. In case of defaults \ud83d\ude14, investors of more senior bonds can claim before investors of less senior bonds. (1)
A bond needs to specify the currency as well, along with many other things...
"},{"location":"finc50/fixed-income/introduction/#embedded-options","title":"Embedded options","text":"A bond is plain vanilla when it has no embedded options, which can add a lot \"flavour\".
Note
Embedded options in bonds refer to features or provisions that give either the bond issuer or the bondholder the right to take certain actions under specific circumstances.
Embedded options provide added flexibility to the bond's terms and can impact the bond's cash flows and overall value. The three main types of embedded options found in bonds are:
Call Option (Callable Bonds)Put Option (Puttable Bonds)Conversion Option (Convertible Bonds)sequenceDiagram\nautonumber\n participant Issuer\n participant Underwriter\n participant Investors\n\n rect rgba(0, 0, 255, .1)\n note left of Issuer: Primary Market\n Issuer->>Underwriter: Request Bond Underwriting\n Underwriter->>Issuer: Analyze Issuer's Creditworthiness\n Underwriter->>Investors: Offer Bonds for Sale\n Investors->>Underwriter: Express Interest in Buying Bonds\n Underwriter->>Issuer: Finalize Bond Terms and Pricing\n Investors->>Underwriter: Place Orders for Bonds\n Underwriter->>Investors: Issue Bonds to Investors\n end\n rect rgba(0, 0, 255, .1)\n note right of Investors: Secondary Market\n Investors->>Investors: Buy and Sell Bonds\n end
"},{"location":"finc50/stata/","title":"Stata Workshop","text":"Welcome
This series of introductory notes is prepared for the BUSS7902 Quantitative Business Research Methods at the University of Sydney Business School.
First, let's get familiar with Stata and see what we can do with it.
Take a side trip to see some amazing features of Stata.
Note
This is just to showcase one of the many amazing features of Stata.
Since Stata 15, we can search, browse and import almost a million U.S. and international economic and financial time series made available by the St. Louis Federal Reserve's Federal Research Economic Data. This post briefly explains this great feature.
","tags":["Stata"]},{"location":"finc50/stata/fred/#prerequisite","title":"Prerequisite","text":"Before you start, you will need an API Key from FRED. Register one here
Then in Stata, you can store this key permanently so you don't need to provide again.(1)
_key_
with your actual API Key obtained.set fredkey _key_, permanently\n
","tags":["Stata"]},{"location":"finc50/stata/fred/#gui-is-always-a-good-start","title":"GUI is always a good start","text":"Alternatively, click on menu File>Import>Fedearl Reserve Economic Data (FRED)
will bring up the dialog as shown below.
Enter API Key and you'll be free to explore all the data series available on FRED.
For example, let's see the CPI of Australia...
Describe the data series, we can find many useful meta info.
Vintage
Note that \"vintage\" section lists a number of dates, with each vintage referring a particular version of the data series at that point of time.
It may sound strange but an economic data series may be revised multiple times after it has been published. Potential reasons may be that later people collect more accurate information, or that there is a change of estimation method, etc.
For example, the CPI from 2005 to 2010 collected by a research as at 2011 may be different from the one collected as at 2023. Without specifying the data vintage, replicating a prior work can be hard.
Another tricky part is that ignoring vintages introduces look-ahead bias in analysis.
For example, a trading strategy using the revised GDP accessed today, instead of the vintage GDP, implicitly uses hindsight as the GDP series may have been revised to accomodate more accurate data obtained after release.
Let's close the description, double click on the series and click on import. Another dialog will be shown to confirm some final details.
The outputs will be like the following:
. import fred AUSCPIALLQINMEI, daterange(2010-01-01 2023-08-08) aggregate(quarterly,avg)\n\nSummary\n----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------\nSeries ID Nobs Date range Frequency\n----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------\nAUSCPIALLQINMEI 53 2010-01-01 to 2023-01-01 Quarterly\n----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------\n# of series imported: 1\n highest frequency: Quarterly\n lowest frequency: Quarterly\n
","tags":["Stata"]},{"location":"finc50/stata/fred/#programmatical-is-recipe-to-reproducibility","title":"Programmatical is recipe to reproducibility","text":"We don't need to go through the GUI process every time. In fact, Stata already told us what the corresponding command is:
import fred AUSCPIALLQINMEI, daterange(2010-01-01 2023-08-08) aggregate(quarterly,avg)\n
We can simply put this line of code into our program.
For example, the code below generates a time-series chart for Australia's CPI.
// Import\nimport fred AUSCPIALLQINMEI, daterange(2010-01-01 2023-03-31) vintage(2023-05-10) aggregate(quarterly,avg) clear\nrename AUSCPIALLQINMEI_20230510 cpi_australia\n// Time format\ngen yrqtr = yq(year(daten),quarter(daten))\nformat yrqtr %tq\ntsset yrqtr\n// Set start of the period to 100\ngen cpi_ret = cpi_australia/L.cpi_australia - 1\nreplace cpi_australia = 100 if _n==1\nreplace cpi_australia = L.cpi_australia * (1+cpi_ret) if _n>1\n// Plotting\ntwoway (tsline cpi_australia), title(\"Quarterly CPI of Australia 2010Q1-2023Q1\") ytitle(\"\") ttitle(\"\") note(\"Index 2010Q1=100. Source: FRED, 2023-05-10 vintage.\")\n
Note
The code snippet above specifies the data vintage. Therefore, even if someone runs it 30 years from now, they will still get exactly the same data and plot as I do in 2023.
","tags":["Stata"]},{"location":"finc50/stata/introduction/","title":"Stata - Introduction","text":"Stata is a powerful statistical analysis software that we often use in empirical researches. This series of posts aims to provide some basic knowledge for junior researchers to get started with Stata, as well as some personal tips on more efficiently using Stata in research projects.1
","tags":["Stata"]},{"location":"finc50/stata/introduction/#stata-gui","title":"Stata GUI","text":"The Graphical User Interface (GUI) of Stata looks like this:
This is the default layout of Stata 16. Some preset preferences can be found via menu option Edit>Preferences>Load preference set
. You can also save your personal preferences (color theme, layout, etc.) in the Edit>Preference
menu.
Of the many windows of the GUI, we are mostly interested in the Results window (Ctrl+2
) where outputs are displayed. While all other windows can be hidden or closed, the Results window always remains center of the GUI.
The Command window (Ctrl+1
) is where we mostly interact with Stata by entering Stata commands. Since usually we need a group of commands to complete a task, it is a good idea to place them together in a do file (with the extension .do
), which is the file type native to Stata, just like .py
file to Python. Later, we will introduce the Stata's Do Editor (Ctrl+9
) as a nice editor for .do
files.
Tip
As a researcher, keeping good record of the programs and codes used is a merit. For oneself, it boosts productivity as more codes are accumulated. Beyond that, it ensures all results can be replicated even years later. Nowadays, more and more top journals also require submission of the codes used in the paper.
","tags":["Stata"]},{"location":"finc50/stata/introduction/#basic-demonstration","title":"Basic demonstration","text":"Now let's start tying our first Stata command in the Command window.
","tags":["Stata"]},{"location":"finc50/stata/introduction/#working-directory","title":"Working directory","text":"First, let's type pwd
(and hit Eneter
), which is a command to display the current working directory. Knowing the current working directory, for example, allows you to use relative path correctly.
Throughout this series, I'll follow the tradition and prefix all Stata commands with .
, hence . pwd
meaning \"enter the pwd
command in the Command window\":
. pwd\n
From the Results window, we can see a line of text like \"C:\\Users\\mgao\\Documents\", which is the output of executing the pwd
command, i.e., the current working directory of Stata on my PC.
We can change the current working directory to another directory on the computer via the command cd
, for example:
. cd \"C:\\Users\\mgao\\Dropbox (Sydney Uni)\\BUSS7902 Stata\"\n
Then we can verify that it's indeed changed by pwd
again.
The above two examples (pwd
and cd
) already showcase the basic syntax structure of Stata commands. With a few exceptions, a Stata command is like:
. _commmand_ _parameter1_ _parameter2_ ... , _options_\n
or technically,
cmd [varlist | namelist | anything] [if] [in] [using filename] [= exp] [weight] [, options]
where cmd
is the name of a command and everything in [ ]
is optional.
display
prints a message to the Results window:
. display \"hello world!\"\n
cls
clears the Results window:
. cls\n
clear
clears memory, removing the dataset loaded, if any:
. clear\n
log
echos a copy of the session to file:
Create a log file name stata101.log
in the current working directory. The replace
option asks Stata to replace the log file if it already exists. Untill . log close
, everything displayed in Results will be saved in the log file.
. log using \"stata101.log\", replace\n. display \"hello from Mingze\"\n. log close\n
cmdlog
is similar to log
but records only the commands but no results.
Of course, Stata is famous for its superior statistical analysis. Let's see how regressions can be easily done in Stata.
","tags":["Stata"]},{"location":"finc50/stata/introduction/#load-dataset","title":"Load dataset","text":"We start by loading an example dataset that comes with Stata installation. This can be done via sysuse
command. We use the dataset named \"auto\":
. sysuse auto\n
Tip
Stata comes with several builtin datasets. Use . sysuse dir
to have a look.
More generally, we can use
our own datasets. We'll see more on this later.
We can now ask Stata to describe
the meta information of the dataset and summarize
the variables in terms of number of observations, mean, standard deviation, etc.
The screenshot below shows the output:
Enlarge the output of describe
:
. describe\n\nContains data from C:\\Program Files\\Stata16\\ado\\base/a/auto.dta\n obs: 74 1978 Automobile Data\n vars: 12 13 Apr 2018 17:45\n (_dta has notes)\n--------------------------------------------------------------------------------\n storage display value\nvariable name type format label variable label\n--------------------------------------------------------------------------------\nmake str18 %-18s Make and Model\nprice int %8.0gc Price\nmpg int %8.0g Mileage (mpg)\nrep78 int %8.0g Repair Record 1978\nheadroom float %6.1f Headroom (in.)\ntrunk int %8.0g Trunk space (cu. ft.)\nweight int %8.0gc Weight (lbs.)\nlength int %8.0g Length (in.)\nturn int %8.0g Turn Circle (ft.)\ndisplacement int %8.0g Displacement (cu. in.)\ngear_ratio float %6.2f Gear Ratio\nforeign byte %8.0g origin Car type\n--------------------------------------------------------------------------------\nSorted by: foreign\n
","tags":["Stata"]},{"location":"finc50/stata/introduction/#run-regression","title":"Run regression","text":"Suppose we'd like to estimate a simple liner regression to study the relation between car price and mileage, headroom and weight:
\\[ price = \\alpha + \\beta_1 mpg + \\beta_2 headroom + \\beta_3 weight + \\varepsilon \\]All we need to do is a simple line of code,
. regress price mpg headroom weight\n
which would generate the following estimation results:
","tags":["Stata"]},{"location":"finc50/stata/introduction/#save-results","title":"Save results","text":"One of the coolest things Stata can do is to export the tabulated regression results to Microsoft Word, PDF, LaTeX and more.
","tags":["Stata"]},{"location":"finc50/stata/introduction/#save-to-word","title":"Save to Word","text":"For example, we can save the previous results (as shown in the screenshot above) to a Word document named \"table1\" (table1.docx
) easily with the following three lines of codes.
. putdocx begin\n. putdocx table mytable = etable\n. putdocx save table1.docx, replace\n
Behind the scene, Stata creates a .docx
to work with. putdocx table
command creates a new table (mytable) in the .docx
file containing estimation results (etable
tells it to tabulates the coefficients from previous estimation). Lastly, Stata saves the .docx
file as \"table1.docx\" in the current working directory.
If working with LaTeX, you can export results as TeX files conveniently too.
For this purpose, though, an additional Stata package estout
is required. Personally I'd say this is gold. You can install estout
package via a single command in Stata:
. ssc install estout, replace\n
Now, you can use the following two lines of codes:
. eststo: regress price mpg headroom weight\n. esttab using \"table1.tex\", tex replace label star(* 0.10 ** 0.05 *** 0.01) nogaps compress\n
to produce a TeX file (\"table1.tex\") with the following content:
{\n\\def\\sym#1{\\ifmmode^{#1}\\else\\(^{#1}\\)\\fi}\n\\begin{tabular}{l*{1}{c}}\n\\hline\\hline\n&\\multicolumn{1}{c}{(1)}\\\\\n&\\multicolumn{1}{c}{Price}\\\\\n\\hline\nMileage (mpg) & -56.19 \\\\\n& (-0.66) \\\\\nHeadroom (in.) & -675.6\\sym{*} \\\\\n& (-1.72) \\\\\nWeight (lbs.) & 2.062\\sym{***}\\\\\n& (3.13) \\\\\nConstant & 3158.3 \\\\\n& (0.87) \\\\\n\\hline\nObservations & 74 \\\\\n\\hline\\hline\n\\multicolumn{2}{l}{\\footnotesize \\textit{t} statistics in parentheses}\\\\\n\\multicolumn{2}{l}{\\footnotesize \\sym{*} \\(p<0.10\\), \\sym{**} \\(p<0.05\\), \\sym{***} \\(p<0.01\\)}\\\\\n\\end{tabular}\n}\n
You can check the PDF compiled from the above TeX code at this Overleaf link.
Note
We will revisit and elaborate on these topics later. I deliberately make them oversimplified only to show you what Stata can do in making our lives much easier.
I prepare this series of introductory course notes for the BUSS7902 Quantitative Business Research Methods for PhD students at the University of Sydney Business School in Semester 1, 2023.\u00a0\u21a9
If you've ever used Python, you may know that it's famous for its simplicity and the many packages available for use. Good news is, Stata is no different. We can install a wide range of Stata packages easily and then use them to achieve a ton of things.
This post briefly explains where to find, how to install and update Stata packages.
","tags":["Stata"]},{"location":"finc50/stata/packages/#stata-packages_1","title":"Stata packages","text":"In a nutshell, installing and using Stata packages are as simple as the following two lines of code (and a line of output):
. ssc install nicewords\n. nicewords\nAbsolutely excellent!\n
Specifically, we use the builtin command ssc
to install a package named nicewords
in the first line, and then execute the command nicewords
in the second line, which randomly prints some nice words.
Generally, a package can provide one or more Stata commands to use, depending on the complexity of the task it solves.
","tags":["Stata"]},{"location":"finc50/stata/packages/#where-to-find-stata-packages","title":"Where to find Stata packages","text":"","tags":["Stata"]},{"location":"finc50/stata/packages/#ssc-statistical-software-components-archive","title":"ssc
- Statistical Software Components Archive","text":"Stata packages are hosted at the Statistical Software Components (SSC) Archive, which is often called the Boston College Archive and provided by http://repec.org. This explains the example above where we used the command ssc
to manage (install) packages.
We can find recently added packages with . ssc new
, and the top 10 most popular packages on SSC with . ssc hot
. In fact, the top 10 for December 2022 are:
net
- e.g., GitHub","text":"Apart from SSC, some packages are available on other websites like GitHub. A growing trend is that package authors publish their code repositories on GitHub, which contain the devlopment version of the packages.
","tags":["Stata"]},{"location":"finc50/stata/packages/#how-to-install-and-update-packages","title":"How to install and update packages","text":"ssc install
is pretty much all we need. For example, to install the package reghdfe
:
. ssc install reghdfe\n
For packages outside SSC, we can install them using net
. As an example, I have a package specurve
on GitHub, which can be installed by:
. net install specurve, from(\"https://raw.githubusercontent.com/mgao6767/specurve/master\")\n
To update an existing package, we can add the option replace
to the above command:
. ssc install reghdfe, replace\n. net install specurve, replace from(\"https://raw.githubusercontent.com/mgao6767/specurve/master\")\n
Alternatively, we can use ado update
:
. ado update, update // for community-contributed packages\n. ado update, update ssconly // for SSC only\n
","tags":["Stata"]},{"location":"finc50/stata/packages/#some-packages-of-my-choice","title":"Some packages of my choice","text":"","tags":["Stata"]},{"location":"finc50/stata/packages/#reghdfe-and-ivreghdfe","title":"reghdfe
and ivreghdfe
","text":"reghdfe
is among the top 10 Stata packages as we've seen above. It allows for multiple fixed effects in linear regressions, while the builtin xtreg
allows only one fixed effect. It's gold!
ivreghdfe
is essentially reghdfe
plus ivreg2
, which allows us to include multiple fixed effects in instrumental variable regressions.
estout
and outreg2
","text":"estout
is also a top 10 Stata package that provides tools to make regression tables. We've seen an example from the previous post. I highly recommend, too!
outreg2
does a similar job in a simpler way. Yet if we want finer controls estout
is perhaps better, in my humble opinion.
winsor
and winsor2
","text":"Data is often noisy with extreme values or impossible values recorded by mistake. In some fields of research, we try to mitigate such concern by winsorization. Note that they may yield different results due to their different approaches in determining percentile values.
","tags":["Stata"]},{"location":"finc50/stata/working-with-datasets/","title":"Stata - Working with datasets","text":"Recap
We can use the command sysuse
to use builtin datasets, and use
to load other external datasets.
In the introduction, we briefly mentioned how to load Stata datasets to use. Now, let's take a more in-depth look at how we work with datasets in Stata.
","tags":["Stata"]},{"location":"finc50/stata/working-with-datasets/#datasets-here-and-there","title":"Datasets, here and there","text":"Datasets are stored at different places, locally on our computer's hard disk or remotely on a server. For Stata to use them, we need to load them into Stata, or putting them into memory.
Because Stata commands (e.g., summarize
, describe
, count
, etc.) operate on the current dataset in memory, working simultaneously on multiple datasets was painful -- one needs to save current, load the other dataset, perform tasks and save/load again. But since Stata 16, a feature called frame
is introduced, where different datasets can be loaded into memory at the same time, but in different \"frames\". The chart below gives a simple illustration.
flowchart LR\n subgraph Stata\n direction TB\n Engine\n Engine -->|frame change| default & frame2 & frame3 & ...\n subgraph default\n end\n subgraph frame2\n end\n subgraph ...\n end\n subgraph frame3\n end\n end\n\n\n\n default -->|sysuse| auto.dta\n frame2 -->|use| /User/mgao/Desktop/anotherDataset.dta\n frame3 -->|use| http://www.stata-press.com/data/r13/nlswork.dta\n\n subgraph network\n http://www.stata-press.com/data/r13/nlswork.dta\n end\n subgraph local\n /User/mgao/Desktop/anotherDataset.dta\n end\n subgraph builtin\n auto.dta\n end
Although we still can only operate Stata commands on a single frame/dataset at a given time, we no longer need to save/load datasets as they all reside in memory frames.
Tip
We as beginners can be agnostic about frame
, especially when dealing with only one dataset throughout. Technically, we are loading data into the default
frame (1), and work in the default frame.
default
frame is just a frame named \"default\".dta
","text":"The dta
is Stata's proprietary binary data file format, and is the default file format used by Stata.
What I like very much about the dta
data format include:
dta
files can store different types of variables, including numeric variables (e.g., integers, floats) and string variables (text). It can represent missing values too.dta
files can store metadata, such as variable labels (descriptive names for variables), value labels (labels for specific variable values), and variable formats (e.g., date formats).dta
files created in one version of Stata can generally be read by other versions of Stata, ensuring cross-platform compatibility.Also, dta
files are typically compressed to reduce file size and optimize storage.
More importantly, Stata provides commands (use
, save
, etc.) to read data from dta
files into memory and save data from memory to dta
files. This makes it extremely easy to work with.
For example, you can easily load a Stata dataset online to your Stata via use
:
use \"http://www.stata-press.com/data/r13/nlswork.dta\", clear\n
Note that , clear
option tells Stata to clear the memory (1) in case there is already some other dataset in it.
Alternatively, you can download the nlswork.dta
dataset to your computer, and load it from your local computer:
use \"/Users/mgao/Downloads/nlswork.dta\", clear\n
use \"C:\\Users\\mgao\\Downloads\\nlswork.dta\", clear\n
After some work on the dataset, say, keeping only observations where year
is 88,
keep if year==88\n
we can save the modified dataset either to its original place, overwriting the original dataset, or to a different place, creating a new dataset:
Mac / Linux Windowssave \"/Users/mgao/Downloads/nlswork.dta\", replace\n
save \"C:\\Users\\mgao\\Downloads\\nlswork.dta\", replace\n
","tags":["Stata"]},{"location":"finc50/stata/working-with-datasets/#mighty-csv","title":"Mighty csv
","text":"We all love csv
or \"comma-separated-values\" files. They are simple and readable without requiring any special software.(1) Many datasets are also published online in csv
format.
csv
files, in case you didn't know...What if we want to save a csv
version of the dataset? Easy, we use export
command:
export delimited using \"/Users/mgao/Downloads/nlswork.csv\", replace\n
export delimited using \"C:\\Users\\mgao\\Downloads\\nlswork.csv\", replace\n
Of course, we can import the csv
file back to Stata using import
command:
import delimited using \"/Users/mgao/Downloads/nlswork.csv\", clear \n
import delimited using \"C:\\Users\\mgao\\Downloads\\nlswork.csv\", clear\n
In some rare cases where the text file is not delimited/separated by comma, we can manually specify the delimiter. For example, some datasets use \"tab-separated-values\" or tsv
format:
import delimited \"path/to/datafile.tsv\", delimiter(tab)\n
","tags":["Stata"]},{"location":"finc50/stata/working-with-datasets/#did-someone-say-excel","title":"Did someone say \"Excel\"?","text":"Stata gets you covered. import excel
is all you need.
For example, we next are to use an Excel spreadsheet named \"BUSS7902 Chapter 4A Lecture (Data).xlsx\".
Before everything, we can ask Stata to describe the file:
. import excel \"~/Downloads/BUSS7902 Chapter 4A Lecture (Data).xlsx\", describe\n Sheet | Range\n -----------------+-----------------\n Magic Box | A1:C101\n Assembly | A1:H76\n Distance | A1:L42\n Insurance+Survey | A1:H1501\n
As shown, the spreadsheet contains four sheets of different names, \"Magic Box\" and so on. Let's say we are interested in the data in the \"Magic Box\" sheet,(1) we can instruct Stata to load data from the sheet and optionally specify the data range in the sheet.(2)
firstrow
option to tell Stata to treat the first row as variable name, not value of an observation.. import excel \"~/Downloads/BUSS7902 Chapter 4A Lecture (Data).xlsx\", firstrow sheet(\"Magic Box\") cellrange(A1:A101) clear\n(1 var, 100 obs)\n
And that's it! Stata will take care of the variable type and etc., and is pretty good at it most of the time.
","tags":["Stata"]},{"location":"finc50/stata/working-with-datasets/#so-you-want-more-frames","title":"So you want moreframe
s?","text":"Tip
This is for the tech-savvy. You don't need frame
almost surely.
Okay. So you noticed that every time we import/use a new dataset, we set the clear
option to clear the memory, discarding whatever dataset we currently work on to make room for new new dataset. This is troublesome. What if we don't want to give up the intermediate results while taking a peak at the different dataset?
We make a new frame
and load the new dataset into the new frame.(1)
Let's have a look first at what frames are currently there:
. frame list\n* default 100 x 1\nNote: Frames marked with * contain unsaved data.\n
","tags":["Stata"]},{"location":"finc50/stata/working-with-datasets/#create-a-new-frame","title":"Create a new frame","text":"We create a new frame
to which a new dataset can be loaded without clearing the existing dataset in the default frame. We can name it whichever we like, say, \"assembly\":
frame create \"assembly\"\n
Now check again the frames, we can see it is indeed there.
. frame list\n assembly 0 x 0\n* default 100 x 1\nNote: Frames marked with * contain unsaved data.\n
","tags":["Stata"]},{"location":"finc50/stata/working-with-datasets/#change-to-the-new-frame","title":"Change to the new frame","text":"Let's now switch to the newly created \"assembly\" frame, leaving the \"Magic Box\" data untouched in the default frame.
frame change assembly\n
Tip
Forgot which frame you are in?
. frame\n(current frame is assembly)\n
You may notice that the Variables window is now blank, showing that there is no variable in this frame. Rest assured that the x
variable we imported earlier from \"Magic Box\" sheet stays in memory and in the default frame.
We can now load data into this empty frame using the same methods as discussed above.
For example, I can now load the data in the \"Assembly\" sheet into the frame.
. import excel \"~/Downloads/BUSS7902 Chapter 4A Lecture (Data).xlsx\", firstrow sheet(\"Assembly\") cellrange(A1:A76) clear case(lower)\n(1 var, 75 obs)\n
If we check the frames, we can see that now both datasets exist in memory, albeit in two frames.
. frame list\n* assembly 75 x 1\n* default 100 x 1\nNote: Frames marked with * contain unsaved data.\n
To go back to the original frame (named \"default\"), use frame change default
.
This post is just another piece of my serious nonsense. All of a sudden, I wanted to know how many Bitcoins I could have mined since 2012? This is because I\u2019ve known Bitcoin since its existence in 2009, but have never really put any effort in mining. Instead, I was fascinated by the idea of using distributed (volunteer) computing to solve scientific problems. For example, BOINC and related projects like World Community Grid are using the computing power donated from around the world to find effective treatments for cancer and HIV/AIDS, low-cost water filtration systems and new materials for capturing solar energy efficiently, etc. I was one of the many volunteers for a long time, even before the genesis block of Bitcoin.
An interesting question is, what if I didn\u2019t donate my computers to volunteer computing, but used them in Bitcoin instead? How many Bitcoins I could have mined? To solve this question, I started from looking at my contribution history of the World Community Grid (it\u2019s awesome that the full history is available).
According to WCG\u2019s website, 7 WCG Point are equal to 1 BOINC credit, which represents 1/100 day of CPU time on a reference computer that does 1,000 MFLOPS based on the Whetstone benchmark.
However, the definition of BOINC credit has been changed to 1/200 day of CPU time since 2010, though on WCG\u2019s website it still says the total WCG Points divided by 700 gives the number of GigaFLOPs. I\u2019m going to stick to the WCG\u2019s website for now.
Suppose I\u2019ve got one WCG Point today, then it means my computer has spent 1/700 day of CPU time, i.e. 123 seconds, at a computing rate of 1 GigaFLOP/second. So, if I can convert GigaFLOPs to Bitcoin hashrate, the problem will be quite easy.
However, FLOPs cannot be converted to hashrate in a simple manner as Bitcoin hashes are about integer math, totally different from floating point operations. I\u2019m just going to use a very rough estimate that 1 hash results 12.7k FLOPs (source: BitcoinTalk thread, CoinDesk), so that
1 WCG Point implies mining at a speed of 78.7kH/s for 123 seconds. -- a very rough estimate
Then, if I received 1k Points a day, it might be safe to say I\u2019ve been mining for about 123k seconds at a speed of 78.7kH/s, which translates to an average daily hashrate of 112kH/s or 0.112MH/s.
I did some math and found that it seems in June 2012 my hashrate was as high as 0.006% of the whole network, though one year later it\u2019s effectively 0%. lol.
Next step will be calculating how many Bitcoins I could have mined based on the hashrate history.
Taking into account the average block time and the controlled supply of Bitcoin (table below), I plot the daily average number of blocks and Bitcoins generated in this period.
Date reached Block Reward Era BTC/block End BTC % of Limit 2009-01-03 0 1 50.00 12.500% 2010-04-22 52500 1 50.00 25.000% 2011-01-28 105000 1 50.00 37.500% 2011-12-14 157500 1 50.00 50.000% 2012-11-28 210000 2 25.00 56.250% 2013-10-09 262500 2 25.00 62.500% 2014-08-11 315000 2 25.00 68.750% 2015-07-29 367500 2 25.00 75.000% 2016-07-09 420000 3 12.50 78.125% 2017-06-23 472500 3 12.50 81.250% 2018-05-29 525000 3 12.50 84.375%Based my average hashrate and the historical network hashrate, the plot below shows how many Bitcoins I could have mined if I didn\u2019t denote my computers\u2019 computing power to the World Community Grid but to Bitcoin mining \u2013 14.8 Bitcoins!
Okay, problem solved.
If I\u2019ve really mined these 14.8 Bitcoins, then I\u2019ll probably have a shot at becoming a millionaire, if again I could hold them and time the market perfectly. At Bitcoin\u2019s highest historical price in Australian dollar, 14.8 Bitcoins are roughly 380,505 dollars. Even if I follow the redefined BOINC credit, I still could have mined half of the 14.8 Bitcoins and potentially pocketed 190k dollars.
I\u2019ve also participated in more than just World Community Grid, including some famous ones like SETI@Home and Einstein@Home. Below are two certificates of contributed computing power.
So together I\u2019ve put in about 2.28 quintillion, or 2.28E18, FLOPs into these two projects.
The funny thing is that I\u2019ve put in only 348 PetaFLOPs into World Community Grid during this entire period, or 0.348 quintillion FLOPs in total.
Hence, if my donation of computing power to SETI@Home and Einstein@Home happened at similar time as to WCG, then potentially I could have mined at least 6 times more Bitcoins. Well, I couldn\u2019t imagine what my life would be if I\u2019ve mined 100 Bitcoins, which might be $2.5 million.
","tags":["Bitcoin"]},{"location":"posts/accumulator-option-pricing/","title":"Accumulator Option Pricing","text":"An accumulator is a financial derivative that is sometimes known as \"I kill you later\". This post attempts to explain how it is structured and price it via Monte Carlo simulations in Python.
","tags":["Option","Simulation","Python"]},{"location":"posts/accumulator-option-pricing/#1-overview-of-accumulator","title":"1. Overview of Accumulator","text":"Like all derivatives, there are two parties involved in an accumulator, the buyer and the seller, both agree on a strike price that is usually at a discount to the prevailing market price of the underlying security at the time of contract origination.
The accumulator is settled periodically throughout its term. At each settlement:
Let's make up an example so as to illustrate how it works.
","tags":["Option","Simulation","Python"]},{"location":"posts/accumulator-option-pricing/#21-month-0","title":"2.1. Month 0","text":"Suppose that I bought an accumulator from Sherry the seller today, where the underlying security is TSC (hypothetical ticker), currently trading at $100. The strike price is $90 and the knock-out price is $105. The amount of stocks that I can buy is 1,000 in each settlement. The accumulator lasts for 6 months and settles monthly.
","tags":["Option","Simulation","Python"]},{"location":"posts/accumulator-option-pricing/#22-month-1","title":"2.2. Month 1","text":"At the end of month 1, the market price of TSC is $102, which is between the strike price ($90) and the knock-out price ($105). I can buy 1,000 shares from Sherry at the strike price of $90 each and make a profit of \\((\\$102-\\$90)\\times 1000=\\$12,000\\).
","tags":["Option","Simulation","Python"]},{"location":"posts/accumulator-option-pricing/#23-month-2","title":"2.3. Month 2","text":"At the end of month 2, the market price of TSC is $95, which is between the strike price ($90) and the knock-out price ($105). I can buy 1,000 shares from Sherry at the strike price of $90 each and make a profit of \\((\\$95-\\$90)\\times1000=\\$5,000\\).
","tags":["Option","Simulation","Python"]},{"location":"posts/accumulator-option-pricing/#24-month-3","title":"2.4. Month 3","text":"At the end of month 3, the market price of TSC is $85, which is below the strike price ($90). I have to buy 2,000 shares from Sherry at the strike price of $90, making a loss of \\((\\$90-\\$85)\\times2000=\\$10,000\\).
","tags":["Option","Simulation","Python"]},{"location":"posts/accumulator-option-pricing/#25-month-4","title":"2.5. Month 4","text":"At the end of month 4, the market price of TSC is $88, which is below the strike price ($90). I have to buy 2,000 shares from Sherry at the strike price of $90, making a loss of \\((\\$90-\\$88)\\times2000=\\$4,000\\).
","tags":["Option","Simulation","Python"]},{"location":"posts/accumulator-option-pricing/#26-month-5","title":"2.6. Month 5","text":"At the end of month 5, the market price of TSC is $106, which is above the knock-out price, so the contract is terminated immediately. I cannot make any profit from Sherry any longer.
","tags":["Option","Simulation","Python"]},{"location":"posts/accumulator-option-pricing/#3-some-observations","title":"3. Some Observations","text":"In the example above:
All these taken together, we can find that the buyer has:
But this is not the full story. Another hidden feature is that while the accumulator is terminated when the share price is above the knock-out price, the contract does not terminate when the buyer is at a loss until it matures. So, even though the maximum losses of both the buyer and the seller are fixed, but they differ significantly and disproportionately.
If so, why would anyone become interested in buying the contract? Potentially it's because the strike is set to be lower at market price, therefore at the beginning the buyers always feel like they are taking advantages. They may also think that once the price increases to above the knock-out level, which might be set to slightly higher than market price, the contract is terminated so they are free of any loss.
However, the buyers often underestimate the probability of a price decline and how big the impact it will have on accumulator buyers. The \"I kill you later\" earns its name for a reason.
","tags":["Option","Simulation","Python"]},{"location":"posts/accumulator-option-pricing/#4-some-math","title":"4. Some Math ...","text":"Let's make some notations.
So at each settlement, the payoff matrix conditional on the contract not terminated in the previous settlement is:
Share Price Buyer's Payoff Seller's Payoff \\(S_t>K^+\\) 0 0 \\(K\\le S_t\\le K^+\\) \\(A(S_t-K)\\ge0\\) \\(-A(S_t-K)\\le0\\) \\(S_t<K\\) \\(c A(S_t-K)<0\\) \\(-cA(S_t-K)>0\\)However, deriving a closed-end analytical solution is not easy since there are many settlements in the contract and the total payoff is path-dependent (the knock out). There is a conference paper in 2009 discussing the issue and the PDF version is available here.
","tags":["Option","Simulation","Python"]},{"location":"posts/accumulator-option-pricing/#5-a-simulation-approach","title":"5. ... A Simulation Approach","text":"I am to going to use Monte Carlo simulations to find out the distribution of buyer's payoff.
","tags":["Option","Simulation","Python"]},{"location":"posts/accumulator-option-pricing/#51-assumptions","title":"5.1. Assumptions","text":"For simplicity, I'm going to make the following assumptions:
Then, there are only two variables: \\(k\\) and \\(\\sigma\\) that I will need to vary!
","tags":["Option","Simulation","Python"]},{"location":"posts/accumulator-option-pricing/#52-core-code","title":"5.2. Core Code","text":"The simulation code I write below leverages Numba to speed up the calculation.
For 1 million simulations per pair of \\((k, \\sigma)\\), it takes about 2 seconds on my laptop with JIT and almost 1 minute without it.
import numpy as np\nfrom collections import OrderedDict\nfrom numba import int32, float32\nfrom numba.experimental import jitclass\n@jitclass(OrderedDict({\n'times': int32,\n'strike_price': float32,\n'knock_out_price': float32,\n'volatility': float32\n}))\nclass FastSimulation:\ndef __init__(self, times, strike_price, knock_out_price, volatility):\nself.times = times\nself.strike_price = strike_price\nself.knock_out_price = knock_out_price\nself.volatility = volatility\ndef run(self):\nnp.random.seed(1)\nbuyer_payoffs = []\nfor i in range(self.times):\n# generate 12 monthly returns from a normal distribution\n# written this way as size parameter is not supported by numba\nreturns = [np.random.normal(\nloc=0, scale=self.volatility)/100 + 1 for _ in range(12)]\n# convert returns to a price array\nprices = np.asarray(returns).cumprod() * 100\npayoff = 0\nfor price in prices:\n# the accumulator is terminated immediately\nif price > self.knock_out_price:\nbreak\npayoff += self.buyer_payoff(price)\nbuyer_payoffs.append(payoff)\nreturn buyer_payoffs\ndef buyer_payoff(self, share_price):\n\"Buyer payoff conditional on the accumulator not terminated\"\nif share_price > self.knock_out_price:\nreturn 0\npayoff = 1000 * (share_price - self.strike_price)\nif self.strike_price <= share_price <= self.knock_out_price:\nreturn payoff\nelse:\nreturn payoff * 2\n
","tags":["Option","Simulation","Python"]},{"location":"posts/accumulator-option-pricing/#53-results","title":"5.3. Results","text":"Numbers are boring. So here I put two plots showing the distribution of the buyer's payoffs. The Python code to generate the plots is as below (1000 simulations).
import plotly.figure_factory as ff\nimport plotly.graph_objs as go\nfrom plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot\ninit_notebook_mode(connected=True)\nhist_data, group_labels = [], []\nfor v in range(1, 6):\nhist_data.append(FastSimulation(times=1_000, strike_price=95, knock_out_price=105, volatility=v).run())\ngroup_labels.append(f'Volatility = {v}%')\ncolors = ['#75b0ec', '#338be3', '#34669c', '#344054', '#161c25']\n# Create distplot with curve_type set to 'normal'\nfig = ff.create_distplot(hist_data, group_labels, show_hist=False, colors=colors, curve_type=\"kde\")\n# Add title\nfig['layout'].update(title='Accumulator With Strike Price of 95 and Knock-Out Price of 105 | MingzeGao', \nxaxis=dict(title=\"Buyer's Payoff\", range=[-700e3, 300e3]), yaxis=dict(title='Probability'))\n# Plot!\niplot(fig)\n
","tags":["Option","Simulation","Python"]},{"location":"posts/accumulator-option-pricing/#531-k5-and-vin-15","title":"5.3.1. \\(k=5\\) and \\(v\\in [1..5]\\)","text":"","tags":["Option","Simulation","Python"]},{"location":"posts/accumulator-option-pricing/#532-k10-and-vin-15","title":"5.3.2. \\(k=10\\) and \\(v\\in [1..5]\\)","text":"","tags":["Option","Simulation","Python"]},{"location":"posts/accumulator-option-pricing/#6-discussion","title":"6. Discussion","text":"Apparently, the accumulator is a very interesting and sometimes evil derivative. From the plots above we can notice several things:
Hence, as a buyer of an accumulator, you win small with low volatility but lose big and huge with high volatility. I don't think any rational investor would like to take the long position. However, we do find exceptions, like CITIC Limited lost HK$15 billion in accumulators back in 2008.
","tags":["Option","Simulation","Python"]},{"location":"posts/adding-another-factor-to-principal-agent-model/","title":"Adding Another Factor to Principal-Agent Model","text":"In a traditional principal-agent model, firm output is a function of the agent's effort and the principal observes only the output not agent's effort. The principal carefully designs the agent's compensation package, especially the sensitivity of the agent's pay to firm output, to maximize the firm value. Now, what if we add another factor to the relationship between firm output and agent's effort? How would the optimal pay sensitivity change?
My earlier paper studied this issue by assuming such a factor, organization capital, that substitutes agent's effort in improving firm output. I find that if firm output is a function of two substituting factors (of which one is agent's effort), the optimal sensitivity of agent's pay to firm output can be both higher or lower, depending on the principal's choice.
To yield this two-way prediction, let's see a simple extension to the standard principal-agent model following Holmstrom and Milgrom (1987), where the principal hires an agent (CEO) to run the firm. We added organization capital (OC) as an additional determinant of firm outcomes, but in fact we can assume any factor, e.g., intellectual property, IT infrastructure, etc., that either strengthens or weakens the relation between firm output and executive effort.
The production function is given by \\(V(a,o)=f(a,o)+\\varepsilon\\), where \\(a\\) is the effort by the agent, \\(o\\) is the firm\u2019s organization capital, and \\(\\varepsilon\\) is random noise.
The agent is paid a wage \\(c(V)\\) and has reservation utility of \\(\\underline U\\). His objective function is given by \\(E\\left[U\\right]=E\\left[u\\left(v\\left(c\\right)-g\\left(a\\right)\\right)\\right]\\).
The function \\(u\\) represents his utility function and \\(v\\) represents his felicity function (i.e., his utility from cash), both increasing and weakly concave.
The functions \\(g\\), \\(u\\) and \\(v\\) are all twice continuously differentiable.
The risk-neutral principal chooses the effort level \\(a\\) and contract \\(c\\) to maximize the expected firm value minus the wage paid to the agent,
\\[ \\max_{c(\\cdot),a} E\\left[V\\left(a,o\\right)-c\\left(V\\left(a,o\\right)\\right)\\right] \\]subject to the individual rationality or participation constraint (IR) and incentive compatibility constraint (IC) as follows:
\\[ E\\left[u\\left(v\\left(c\\left(V\\left(a,o\\right)\\right)\\right)-g\\left(a\\right)\\right)\\right] \\ge \\underline{U} \\\\ a \\in \\arg\\max_{\\hat{a}}E\\left[u\\left(v\\left(c\\left(V\\left(\\hat{a},o\\right)\\right)\\right)-g\\left(\\hat{a}\\right)\\right)\\right] \\]We first consider the case where the optimal effort is determined endogenously. Under the Holmstrom and Milgrom (1987) framework, the following assumptions are made:
Further, Holmstrom and Milgrom (1987) show that the problem is equivalent to a single-period static problem under these assumptions. For simplicity, we also assume a quadratic cost of effort, \\(g(a)=\\frac{1}{2}ga^2\\), so that the principal\u2019s optimization problem becomes:
\\[ \\max_{\\phi,\\theta,a^*} E\\left[V-c\\right] \\]subject to
\\[ E\\left[-e^{-\\eta\\left(c-\\frac{1}{2}ga^{*2}\\right)}\\right] \\ge \\underline{U} \\\\ a^* \\in \\arg\\max_{\\hat{a}} E\\left[-e^{-\\eta\\left(c-\\frac{1}{2}g\\hat{a}^2\\right)}\\right] \\]Substituting in \\(c=\\phi+\\theta V\\) and \\(V(a,o)=f(a,o)+\\varepsilon\\), maximizing the agent\u2019s (negative exponential) utility function is equivalent to maximizing \\(\\phi+\\theta f(a,o)-\\frac{1}{2}ga^2-\\frac{1}{2}\\eta \\theta^2 \\sigma^2\\).
Since \\(f(a,o)=a+(1-a)o\\), the first-order condition of the agent\u2019s objective function with respect to a is given by \\(a^*=\\theta(1-o)\u2044g\\), which implies his effort choice is decreasing in the cost of effort \\(g\\), decreasing in the firm\u2019s organization capital \\(o\\), and increasing in the pay-for-performance sensitivity \\(\\theta\\).
Moreover, his chosen effort is independent of the fixed wage \\(\\phi\\), so that the principal can adjust the fixed pay to satisfy his participation constraint without affecting the incentives. Substituting \\(a^*=\\theta(1-o)\u2044g\\) into the principal\u2019s objective function and setting the participation constraint to bind, the optimal level of pay-for-performance sensitivity is given by:
\\[ \\theta = \\frac{1}{1+\\eta g \\frac{\\sigma^2}{(1-o)^2}} \\]Note
This optimal level of pay-for-performance sensitivity is derived as follows. Substituting \\(c=\\phi + \\theta V\\) and \\(V(a,o)=f(a,o)+\\varepsilon\\) into the agent's objective function of \\(E\\left[U\\right]=E\\left[u\\left(v(c)-g(a)\\right)\\right]\\), where \\(u(x)=-e^{-\\eta x}\\), \\(v(c)=c\\), and \\(\\varepsilon \\sim N(0,\\sigma^2)\\), we obtain:
\\[ E\\left[U\\right] = E\\left[e^{-\\eta \\left(\\phi+\\theta f(a,o)+\\theta\\varepsilon-\\frac{1}{2}ga^2\\right)}\\right] \\\\ = -E\\left[e^{-\\eta \\left(\\phi+\\theta f(a,o)-\\frac{1}{2}ga^2\\right)}\\right] \\times E\\left[e^{-\\eta \\theta \\varepsilon}\\right]\\\\ = -e^{-\\eta \\left(\\phi+\\theta f(a,o)-\\frac{1}{2}ga^2\\right)} \\times e^{\\frac{\\eta^2 \\theta^2 \\sigma^2}{2}} \\\\ = -e^{-\\eta \\left(\\phi+\\theta f(a,o)-\\frac{1}{2}ga^2-\\frac{1}{2}\\eta \\theta^2 \\sigma^2\\right)} \\]The first-order condition (FOC) of the agent with respect to \\(a\\) is given by
\\[ \\frac{\\partial}{\\partial a}\\left(\\phi+\\theta f(a^*,o)-\\frac{1}{2}ga^{*2}-\\frac{1}{2}\\eta\\theta^2\\sigma^2\\right)=0 \\]Since we assume \\(f(a,o)=a+(1-a)o\\), this yields the agent's FOC:
\\[ a^*=\\theta(1-o)/g \\]Setting the participation constraint to bind, we have
\\[ E\\left[U\\right] = -e^{-\\eta\\left(\\phi+\\theta\\left(a^*+(1-a^*)o\\right)-\\frac{1}{2}ga^{*2}-\\frac{1}{2}\\eta\\theta^2\\sigma^2\\right)} = \\underline{U} \\]The above equation implies:
\\[ \\phi + \\theta\\left(a^*+\\left(1-a^*\\right)o\\right)-\\frac{1}{2}ga^{*2}-\\frac{1}{2}\\eta\\theta^2\\sigma^2=w \\]where \\(w\\equiv -\\ln(-\\underline{U})/\\eta\\) is a constant determined by the agent's reservation utility and his coefficient of constant absolute risk aversion. Substituting in \\(a^*=\\theta(1-o)/g\\), we yield
\\[ E\\left[c\\right] = \\phi + \\theta E\\left[V\\right] \\\\ = w+\\frac{\\theta^2(1-o)^2}{2g} +\\frac{1}{2}\\eta\\theta^2\\sigma^2 \\]Thus, by substituting \\(a^*=\\theta(1-o)/g\\) into the principal\u2019s objective function \\(E\\left[V-c\\right]\\), we yield
\\[ a^*+(1-a^*)o-\\left[w+\\frac{\\theta^2(1-o)^2}{2g}+\\frac{1}{2}\\eta\\theta^2\\sigma^2\\right] \\\\ =\\frac{\\theta}{g}(1-o)^2+o-\\left[w+\\frac{\\theta^2(1-o)^2}{2g}+\\frac{1}{2}\\eta\\theta^2\\sigma^2\\right] \\]The principal's FOC with respect to \\(\\theta\\) yields:
\\[ \\theta = \\frac{1}{1+\\eta g \\frac{\\sigma^2}{(1-o)^2}} \\]Other things equal, we can see that the optimal pay-for-performance sensitivity \\(\\theta\\) is decreasing in the firm\u2019s organization capital \\(o\\). Specifically, this substitution effect is from the fact that OC reduces the marginal effect of executive effort on firm outcomes and thus reduces the optimal effort level endogenously.
On the other hand, fixing \\(a^*\\), the required pay-for-performance sensitivity is \\(\\theta=(a^* g)/(1-o)\\), which is increasing in organization capital \\(o\\). Thus, to elicit any given level of effort, the incentive compensation must be more high-powered (\"fixed target action\" as in Edmans and Gabaix (2016).
The relation between OC and executive pay-for-performance sensitivity depends critically on the optimal level of effort the principal wants to implement:
Therefore, the model offers two empirical predictions. On the one hand, high OC firms may offer higher pay-for-performance sensitivity to induce executive effort. On the other hand, pay-for-performance sensitivity may be reduced in high OC firms as a result of efficiency gains from the substitution of OC for executive effort.
Now, coming back at the question at the beginning, adding another factor to the principal-agent model may cause the optimal pay structure to change in either direction, even if such factor has a directional impact on the relation between firm output and agent's effort. In our case, such factor reduces the marginal effect of agent's effort on firm output. But one can easily find many other factors that may increase the marginal effect of agent's effort and yield similar predictions.
Perhaps, what's also interesting is that, if we know the directional effect of a factor while observing both pay-for-performance sensitivity and the level of such factor, we may be able to infer whether the principal elicits full executive effort at all costs. Paired with firm performance, could this be some indicators of governance or board ability? Seems like some future research questions.
This post is adapted from the online appendix of my JBF paper \"organization capital and executive performance incentives\".
","tags":["Organization Capital","Principal-Agent model"]},{"location":"posts/beta-unlevered-and-levered/","title":"Beta - Unlevered and Levered","text":"Beta is a measure of market risk. This post tries to explain the unlevered and levered betas.
","tags":["Beta"]},{"location":"posts/beta-unlevered-and-levered/#unlevered-firm-u","title":"Unlevered Firm u","text":"If a firm has no debt, it's all equity-financed and thus its equity's beta \\(\\beta_{E}\\) equals its asset's beta \\(\\beta_{A}\\). This beta is also the unlevered beta, \\(\\beta_{\\text{unlevered}}\\), since it's unaffected by leverage. The unlevered beta measures the market risk exposure of the firm's shareholders. Let's call this firm \\(u\\), Hence, we have:
\\[\\begin{equation} \\beta_{\\text{unlevered}}=\\beta_E^u=\\beta_A^u \\end{equation}\\]This equality says that in an unlevered firm, the unlevered beta equals its equity beta and its asset beta.
","tags":["Beta"]},{"location":"posts/beta-unlevered-and-levered/#levered-firm-l","title":"Levered Firm l","text":"If the same firm is partly financed by debt, let's call it firm \\(l\\). The asset of the levered firm \\(l\\) is financed by both equity and debt, and hence the asset's market risk is from both equity and debt. The asset's beta is a weighted average of its equity beta and debt beta.
\\[\\begin{equation} \\beta_A^l = \\frac{E}{E+D(1-t)} \\beta_E^l + \\frac{D(1-t)}{E+D(1-t)} \\beta_D^l \\end{equation}\\]\\(\\beta_A^l\\) measures the change in the return on a portfolio of all firm \\(l\\)'s securities (debt and equity) for each additional one percent change in the market return.
This part is not very hard to understand. The beta of a portfolio is the weighted average beta of its constituents. If you believe that debt beta is zero since the value of debt may not be affected by the equity market, then \\(\\beta_D^l=0\\) and the equation (2) can be simplified to:
\\[ \\begin{align} \\beta_A^l &= \\frac{E}{E+D(1-t)} \\beta_E^l \\newline &= \\frac{1}{1+\\frac{D}{E}(1-t)} \\beta_E^l \\end{align} \\]However, this firm's shareholders are now more exposed to the market risk than before, because leverage increases the variation in the payoff to shareholders. This means the equity's beta of this levered firm is higher than the equity's beta of the unlevered firm, i.e. \\(\\beta_E^l>\\beta_E^u\\).
Note that, the levered beta \\(\\beta_{\\text{levered}}\\) that we talk about refers to \\(\\beta_E^l\\), which is the equity beta of the levered firm \\(l\\).
","tags":["Beta"]},{"location":"posts/beta-unlevered-and-levered/#unlevered-vs-levered","title":"Unlevered vs Levered","text":"On the other hand, firm \\(u\\) and firm \\(l\\) differ only in capital structure whilst both have the same asset. Let's say we have a portfolio of firm \\(u\\)'s asset and the other portfolio of firm \\(l\\)'s asset, then these two portfolios should have the same expected return and market risk exposure.2 This means the two portfolios have the same beta, implying:
\\[\\begin{equation}\\beta_A^u = \\beta_A^l \\end{equation}\\]If we substitue in the definition of unlevered and levered beta (equation (1) and (4)):
\\[ \\begin{equation} \\beta_{\\text{unlevered}} = \\frac{1}{1+\\frac{D}{E}(1-t)} \\beta_{\\text{levered}} \\end{equation} \\]or
\\[ \\begin{equation} \\beta_{\\text{levered}} = \\left( 1+\\frac{D}{E}(1-t) \\right) \\beta_{\\text{unlevered}} \\end{equation} \\]This is the formula that we use to lever and unlever beta.1
","tags":["Beta"]},{"location":"posts/beta-unlevered-and-levered/#further-clarification","title":"Further Clarification","text":"The equity beta of a firm with debts is levered. To remove the impact of leverage on shareholders' market risk exposure, we need to unlever this beta in order to get the unlevered beta. This unlevered beta is also called the asset beta.
Note that the asset beta is a syncronym for unlevered beta. It is not, however, the asset's beta \\(\\beta_A^l\\) when the firm is leveraged as in equation (2) to (4). This convention is confusing indeed, so throughout this post, I'm using asset's beta to refer to the beta of a portfolio of all securities (debt and equity) of the levered firm.
","tags":["Beta"]},{"location":"posts/beta-unlevered-and-levered/#notations","title":"Notations","text":"This eq.(7) is also named Hamada Equation, where we assumed a zero debt beta. It draws on the Modigliani-Miller theorem on capital structure, and appeared in Prof. Robert Hamada's paper \"The Effect of the Firm's Capital Structure on the Systematic Risk of Common Stocks\" in the Journal of Finance in 1972.\u00a0\u21a9
Modigliani-Miller theorem states that the capital structure should not affect a firm's value.\u00a0\u21a9
Never underestimate what programmers can do.
The code below shows a fully-functioning Bitcoin address generator in obfuscated Python (2.5-2.7), which I saw in an interesting article posted in 2013.
_ =r\"\"\"A(W/2,*M(3*G\n *G*V(2*J%P),G,J,G)+((M((J-T\n )*V((G-S)%P),S,T,G)if(S@(G,J))if(\n W%2@(S,T)))if(W@(S,T);H=2**256;import&h\n ashlib&as&h,os,re,bi nascii&as&k;J$:int(\n k.b2a_hex(W),16);C$:C (W/ 58)+[W%58]if(W@\n [];X=h.new(\"rip em d160\");Y$:h.sha25\n 6(W).digest();I$ d=32:I(W/256,d-1)+\n chr(W%256)if(d>0@\"\"; U$:J(k.a2b_base\n 64(W));f=J(os.urando m(64)) %(H-U(\"AUVRIxl\nQt1/EQC2hcy/JvsA=\"))+ 1;M$Q,R,G :((W*W-Q-G)%P,\n(W*(G+2*Q-W*W)-R)%P) ;P=H-2** 32-977;V$Q=P,L=\n1,O=0:V(Q%W,W,O-Q/W* L,L)if(W@O%P;S,\nT=A(f,U(\"eb5mfvncu6 xVoGKVzocLBwKb/Nst\nzijZWfKBWxb4F5g=\"), U(\"SDra dyajxGVdpPv8DhEI\nqP0XtEimhVQZnEfQj/ sQ1Lg=\"), 0,0);F$:\"1\"+F(W\n [1:])if(W[:1 ]==\"\\0\"@\"\" .join(map(B,C(\n J(W))));K$: F(W +Y(Y(W))[:4]);\n X.update(Y(\"\\4\"+ I(S)+I(T)));B$\n :re.sub(\"[0OIl _]| [^\\\\w]\",\"\",\"\".jo\n in(map(chr,ra nge (123))))[W];print\"Addre\n ss:\",K(\"\\0\"+X.dig est())+\"\\nPrivkey:\",K(\n \"\\x80\"+I(f))\"\"\";exec(reduce(lambda W,X:\nW.replace(*X),zip(\" \\n&$@\",[\"\",\"\",\n\" \",\"=lambda W,\",\")else \"])\n,\"A$G,J,S,T:\"+_))\n
I\u2019ve tested it on Python 2.7 on Ubuntu. Working like a charm.
Warning
Don't use this address! The private key is not private!
","tags":["Bitcoin","Python"]},{"location":"posts/bloomberg-bquant/","title":"Bloomberg BQuant (BQNT)","text":"Bloomberg is developing a new function in the Terminal, called BQuant, BQNT, under the Bloomberg Anywhere license. I happen to be able to test it thanks to a fund manager and find it could be a future way of using Bloomberg Terminal.","tags":["Bloomberg","BQNT","Python","Quant"]},{"location":"posts/bloomberg-bquant/#background","title":"Background","text":"
Bloomberg recently made JupyterLab available inside the Terminal and invited partners to test it out. This function is named BQuant, or BQNT<GO>, which is still under heavy development, but the idea is just great. Jupyter notebooks inside Bloomberg Terminal! Just before this news, I was helping a fund manager in writing some alert programs that do some analysis on equity market and then send email notifications, which didn\u2019t go well because first it is very easy to breach the data limit using Bloomberg API (blpapi) and second I wasn\u2019t very comfortable about the presentation of analysis results. I was using poor HTML code in emails and didn\u2019t find a convenient way to insert plots and figures. Besides, I was also writing some back testing code to evaluate potential trading strategies. But still there\u2019s a concern as I won\u2019t be working there full time and they probably won\u2019t have a permanent programmer, so if they want to alter parameters a little bit it\u2019ll be a problem.
But things happen, with BQNT or more specifically the Jupyter notebook, I can make an interactive UI-based application without worrying about the data limit issue, as they also provide a new data retrieval interface, BQL, Bloomberg Query Language. In the past, pulling data through blpapi is basically retrieving data from the Terminal. But BQL, something like SQL, is to submit the query request to Bloomberg\u2019s server and get the data directly from server, which also enables basic calculations so as to further reduce the size of data being pulled out. Then, BQNT comes with pre-installed bqplot and some wrappers of libraries like ipwidgets, which makes visualization much easier and interactive. As BQNT is a customized JupyterLab, output cells can be maximized and code hided. The result is just like a single-page application.
The tearsheet above shows some basic features of BQNT, and of course there are more. There\u2019s a gallery in the Terminal with several demos showing what BQNT can make, including portfolio performance report, security filtering, trading strategy back test, etc., quite inspiring.
With a quick play, I was able to write a multi-security back test of William %R based strategy with trailing stop. All input parameters can be varied using sliders, dropdowns, calendars and etc. There is also an autocomplete security selection widget to assist you in defining the universe. Plots and tables can be aligned nicely using HBox and VBox\u2026 So, I\u2019m impressed, really.
I can foresee that in the future, users of Bloomberg Terminal can have BQNT powered applications tailored to their needs. For example, I want to know the stock volatility and price plot together with some commodity futures orderbook info. BQNT may give you the app. But of course, I\u2019ve only a rough guess and there could be many possibles and impossibles ahead of BQNT. I\u2019m a big fan, though.
","tags":["Bloomberg","BQNT","Python","Quant"]},{"location":"posts/bloomberg-bquant/#my-work","title":"My Work","text":"","tags":["Bloomberg","BQNT","Python","Quant"]},{"location":"posts/bloomberg-bquant/#bql-for-data-retrieval","title":"BQL for Data Retrieval","text":"We know there\u2019s a blpapi available already. Using this API one can pull data from a Terminal to Excel, Python, etc. But there is a limit on the frequency or total queries allowed in a certain period, which however isn\u2019t clear. As Bloomberg doesn\u2019t allow local storage of its data, if we need to retrieve a sizeable data too many times, there will be an issue.
The good thing about BQNT is that it comes with a new query system \u2013 so called BQL. It allows simple calculations done on the server side so as to reduce the size of data transferred. And, people in Bloomberg said, by using BQL we are not very likely to face any data limit issue again. I haven\u2019t done much stress tests so I can\u2019t tell whether there is still a limit or not.
","tags":["Bloomberg","BQNT","Python","Quant"]},{"location":"posts/bloomberg-bquant/#some-quick-examples","title":"Some Quick Examples","text":"Get all component stocks of an index:
import bql\nbq = bql.Service()\nsecurities = bq.univ.members('AS31 Index')\n
Get OHLC data of all component stocks:
from bql.util import get_time_series\nstart_date = '2017-01-01'\nend_date = '2018-01-01'\ndata = get_time_series(securities, ['PX_LAST', 'PX_OPEN', 'PX_HIGH', 'PX_LOW'], start_date, end_date)\n
If I want to know the industry sector of these stock, all I need is:
req = bql.Request(securities, bq.data.industry_sector())\ndata_industry = bql.combined_df(bq.execute(req))\n
The returned data is a pandas.DataFrame
, which is just awesome!
Jupyter Notebook has always been a favourite environment in data science. No need to say much. A JupyterLab inside Bloomberg Terminal together with BQL, basically the core idea of BQNT, is no doubt fantastic. For quants who need to do a lot of testings on trading ideas, filtering of securities, etc., this integrated environment is absolutely a good place to sort everything out. Moreover, files in BQNT are synced under a BBA license, you can easily pick up your work from any Terminal. In our meeting today, the size of this free cloud storage is said to be about 250MB but may be upgraded.
For fund managers or traders who want only a ready-to-use application, they can have some programmers to make one for them. The BQNT team kindly demonstrated a beta feature, where a \u2018consumer view\u2019 can be shared to others, which hides all Jupyter Notebook related parts and is really the final output alone \u2014 just like the Calculator on Windows.
","tags":["Bloomberg","BQNT","Python","Quant"]},{"location":"posts/bloomberg-bquant/#the-r-backtesting-app","title":"The %R Backtesting App","text":"This App I wrote replicates BT<GO> in its back testing outputs, but comes with more flexibility such as trailing stop loss, which isn\u2019t available in BT<GO>. It serves as a demo of BQNT powered application, validating current beta.
The objectives of the app are:
The main UI provides a short description of the trading strategy under back test, followed by a control panel where we can specify benchmark, underlying, time range, % parameters as well as trailing stop loss percentage. I also put a progress bar and status bar below for more immediate feedback.
","tags":["Bloomberg","BQNT","Python","Quant"]},{"location":"posts/bloomberg-bquant/#outputs","title":"Outputs","text":"If the underlying selected is a single security, e.g. CBA AU Equity, the simple back test output is something like below. An InteractiveLinePlot
linked with a subplot to show equity evolution in selection; a LinePlot
for the price series of the security with markers for enters and exits; and a LinePlot
for the %R indicator.
If the underlying selected is an index, e.g. AS31 Index, the back test is performed on each individual component of the index and results are presented below. A KDEPlot
shows the distribution of total return, max return and min return, followed by a ToggleButtons
to show All, Positive only and Negative only. Equity Return by industry sector and the benchmark return are sorted and plotted below.
Then there is the detailed DataGrid
for all calculated metrics of all securities and of each industry sectors, just like the output in BT<GO>. Results can be exported to a spreadsheet which will be conveniently stored in the BQNT platform, or the \u2018cloud\u2019 of size 250MB in total. A qualitative summary of this particular back test is provided in the end.
This App is by no means a finished work. I basically tried to mix in as many different things as possible. The end product should be one such that provides a condensed and conclusive opinion after each run, considering that its users may be those fund managers who do not want to get their hands dirty.
","tags":["Bloomberg","BQNT","Python","Quant"]},{"location":"posts/bloomberg-bquant/#other-thoughts","title":"Other Thoughts","text":"In my chat with Bloomberg BQNT team, I visioned BQNT powered apps may be the future way of using Bloomberg. For one, with more internal integration worked out, like the current one with PORT<GO>, surely users can use these UI-based apps to get jobs done. The good thing is that it can put everything you need together in one place, and only those you need. Once consumer view is rolled out, this will be more evident. They also are developing a scheduling module which will run Notebooks automatically, although at an additional cost.
Another thing I suggested is a marketplace for those BQNT powered apps. Say, I\u2019ve developed a market analysis application on BQNT, maybe I can put it for sale on the marketplace so someone else won\u2019t need to reinvent the wheel. It can also foster a community around BQNT, if any. The only downside is that BQNT is accessible only under BBA licence, which isn\u2019t cheap. Individual programmers / quants may not be able to afford it, and those in big institutions may not have the time and right to build and sell apps on it. This kinda sucks.
I can see the huge potential of BQNT, which if operates well can be the new way of using Bloomberg Terminal \u2014 the learning curve of Terminal is really too steep for many current and potential users, and they don\u2019t get very much out of it. But, if there are many ready-to-use UI-based applications for their customised needs, things definitely will be better. Unfortunately, BQNT is not open-source, and the access to it is very limited (BBA licence), I don\u2019t believe there will be an active community and hence a marketplace of a variety of apps.
","tags":["Bloomberg","BQNT","Python","Quant"]},{"location":"posts/call-option-value-from-two-approaches/","title":"Call Option Value from Two Approaches","text":"Suppose today the stock price is \\(S\\) and in one year time, the stock price could be either \\(S_1\\) or \\(S_2\\). You hold an European call option on this stock with an exercise price of \\(X=S\\), where \\(S_1<X<S_2\\) for simplicity. So you'll exercise the call when the stock price turns out to be \\(S_2\\) and leave it unexercised if \\(S_1\\).
","tags":["Option"]},{"location":"posts/call-option-value-from-two-approaches/#1-replicating-portfolio-approach","title":"1. Replicating Portfolio Approach","text":"Case 1 Case 2 Stock Price \\(S_1\\) \\(S_2\\) Option: 1 Call of cost \\(c\\) Exercise? No Yes Payoff (to replicate) 0 \\(S_2-X\\) Stock: \\(\\delta\\) shares of cost \\(\\delta S\\) Payoff \\(\\delta S_1\\) \\(\\delta S_2\\) Borrowing PV(K) Repay K KSo we have:
\\[ \\begin{equation} \\delta S_1-K=0 \\end{equation} \\] \\[ \\begin{equation} \\delta S_2 -K = S_2-X \\end{equation} \\]Therefore, the call option value is given by the difference between the cost of \\(\\delta\\) units of shares and the amount of borrowing:
\\[ \\begin{align} c_{REP} &= \\delta S - PV(K) \\newline &= \\delta S - Ke^{-r_f} \\newline &= \\delta S - \\delta S_1e^{-r_f} \\end{align} \\]When \\(\\delta\\) is defined as \\(\\frac{(S_2-X)-0}{S_2-S_1}\\) as in the textbook (at introductory level),
\\[ \\begin{equation} c_{REP}= \\frac{S_2-X}{S_2-S_1}(S - S_1e^{-r_f}) \\end{equation} \\]","tags":["Option"]},{"location":"posts/call-option-value-from-two-approaches/#2-risk-neutral-approach","title":"2. Risk Neutral Approach","text":"Without too much trouble, we can derive the call value using risk neutral approach as
\\[ \\begin{align} c_{RN} &= \\frac{p(S_2-X)+(1-p)\\times0}{e^{r_f}}\\newline &= \\frac{p(S_2-X)+0}{e^{r_f}}\\newline &= p(S_2-X) e^{-r_f} \\end{align} \\]We know that
\\[ \\begin{equation} p\\times \\frac{S_2}{S} + (1-p)\\frac{S_1}{S} = e^{r_f} \\end{equation} \\]so
\\[ \\begin{align} p &= \\frac{e^{r_f}-\\frac{S_1}{S}}{\\frac{S_2}{S}-\\frac{S_1}{S}}\\newline &=\\frac{Se^{r_f}-S_1}{S_2-S_1} \\end{align} \\]Therefore,
\\[ \\begin{align} c_{RN} &= p(S_2-X) e^{r_f}\\newline &=\\frac{Se^{r_f}-S_1}{S_2-S_1}(S_2-X) e^{-r_f}\\newline &=\\frac{S-S_1e^{-r_f}}{S_2-S_1}(S_2-X) \\end{align} \\]","tags":["Option"]},{"location":"posts/call-option-value-from-two-approaches/#identical-result-from-the-two-methods","title":"Identical Result from the Two Methods","text":"It's easy to find that
\\[ c_{RN} = c_{REP} \\]Hence, the call option value from replicating portfolio is the same as from risk neutral approach.
","tags":["Option"]},{"location":"posts/compute-jackknife-coefficient-estimates-in-sas/","title":"Compute Jackknife Coefficient Estimates in SAS","text":"In certain scenarios, we want to estimate a model's parameters on the sample for each observation with itself excluded. This can be achieved by estimating the model repeatedly on the leave-one-out samples but is very inefficient. If we estimate the model on the full sample, however, the coefficient estimates will certainly be biased. Thankfully, we have the Jackknife method to correct for the bias, which produces the Jackknifed coefficient estimates for each observation.
","tags":["SAS","Code","Discretionary Accruals","Jackknife"]},{"location":"posts/compute-jackknife-coefficient-estimates-in-sas/#variable-definition","title":"Variable Definition","text":"Let's start with some variable definitions to help with the explanation.
Variable Definition \\(b(i)\\) the parameter estimates after deleting the \\(i\\)th observation \\(s^2(i)\\) the variance estimate after deleting the \\(i\\)th observation \\(X(i)\\) the \\(X\\) matrix without the \\(i\\)th observation \\(\\hat{y}(i)\\) the \\(i\\)th value predicted without using the \\(i\\)th observation \\(r_i = y_i - \\hat{y}_i\\) the \\(i\\)th residual \\(h_i = x_i(X'X)^{-1}x_i'\\) the \\(i\\)th diagonal of the projection matrix for the predictor space, also called the hat matrix \\(RStudent =\\frac{r_i}{s(i) \\sqrt{1-h_i}}\\) studentized residual \\((X'X)_{jj}\\) the \\((j,j)\\)th element of \\((X'X)^{-1}\\) \\(DFBeta_j = \\frac{b_{j} - b_{(i)j}}{s(i)\\sqrt{(X'X)_{jj}}}\\) the scaled measures of the change in the \\(j\\)th parameter estimate calculated by deleting the \\(i\\)th observation","tags":["SAS","Code","Discretionary Accruals","Jackknife"]},{"location":"posts/compute-jackknife-coefficient-estimates-in-sas/#objective","title":"Objective","text":"Compute the coefficient estiamtes with the \\(i\\)th observation excluded from the sample, i.e. \\(b(i)\\), or the Jackknifed coefficient estimate.
","tags":["SAS","Code","Discretionary Accruals","Jackknife"]},{"location":"posts/compute-jackknife-coefficient-estimates-in-sas/#formula","title":"Formula","text":"From the table above, we can get that the \\(j\\)th Jackknifed coefficient estimate \\(b_{(i)j}\\) without using the \\(i\\)th observation is:
\\[b_{(i)j} = b_j - DFBeta_j \\times s(i) \\sqrt{(X'X)_{jj}} \\]Hence,
\\[b_{(i)j} = b_j - DFBeta_j \\times \\frac{r_i}{RStudent\\times \\sqrt{1-h_i}} \\sqrt{(X'X)_{jj}}\\]The good thing is that PROC REG
produces the coefficient estimate \\(b_j\\) for \\(j=1,2,...K\\), where \\(K\\) is the number of coefficients, and the INFLUENCE
and I
options produce the remaining statistics just enough to compute \\(b(i)\\):
PROC REG
or MODEL
statement Name in the output dataset \\(b_j\\) Outest=
option in PROC REG
<jthVariable>
\\(r_i\\) OutputStatistics=
from INFLUENCE
option in MODEL
statement Residual
\\(RStudent\\) OutputStatistics=
from INFLUENCE
option in MODEL
statement RStudent
\\(h_i\\) OutputStatistics=
from INFLUENCE
option in MODEL
statement HatDiagnol
\\(DFBeta_j\\) OutputStatistics=
from INFLUENCE
option in MODEL
statement DFB_<jthVariable>
\\((X'X)_{jj}\\) InvXPX=
from I
option in MODEL
statement <jthVariable>
","tags":["SAS","Code","Discretionary Accruals","Jackknife"]},{"location":"posts/compute-jackknife-coefficient-estimates-in-sas/#example","title":"Example","text":"","tags":["SAS","Code","Discretionary Accruals","Jackknife"]},{"location":"posts/compute-jackknife-coefficient-estimates-in-sas/#discretionary-accruals","title":"Discretionary accruals","text":"Suppose we want to calculate the firm-level discretionary accruals for each year using the Jones (1991) model and Kothari et al (2005) model. For a firm \\(i\\), we need to first estimate the model for the industry-year excluding firm \\(i\\), then use the coefficient estimates to generate predicted accruals for firm \\(i\\). The firm's discretionary accruals is the actual accruals minus the predicted accruals.
Below is an example PROC REG
that produces three datasets named work.params
, work.outstats
and work.xpxinv
, which contain sufficient statistics to compute the Jackknifed estimates and thus the predicted accruals.
ods listing close; \nproc reg data=work.funda edf outest=work.params;\n /* industry-year regression */\nby fyear sic2;\n /* id is necessary for later matching Jackknifed coefficients to firm-year */\n id key;\n /* Jones Model */\n Jones: model tac = inv_at_l drev ppe / noint influence i;\n /* Kothari Model with ROA */\n Kothari: model tac = inv_at_l drevadj ppe roa / noint influence i;\n ods output OutputStatistics=work.outstats InvXPX=work.xpxinv;\nrun;\nods listing;\n
Full SAS program for estimating 5 different measures of discretionary accruals:
","tags":["SAS","Code","Discretionary Accruals","Jackknife"]},{"location":"posts/compute-weekly-return-from-daily-crsp-data/","title":"Compute Weekly Return from Daily CRSP Data","text":"Computing the weekly returns from the CRSP daily stock data is a common task but may be tricky sometimes. Let's discuss a few different ways to get it done incorrectly and correctly.
TL;DR Take me to the final solution!
Surely -> The solution
","tags":["CRSP","SAS","Code"]},{"location":"posts/compute-weekly-return-from-daily-crsp-data/#incorrect-ways","title":"INCORRECT ways","text":"Let me start with a few incorrect ways, which may seem perfectly okay at first glance. This part is important because it shows you how a small mistake can lead to hard-to-discover bugs.
","tags":["CRSP","SAS","Code"]},{"location":"posts/compute-weekly-return-from-daily-crsp-data/#weekly-index-return-from-daily-data","title":"Weekly index return from daily data","text":"Date as the Friday of the weekDate as the last trading day of the weekUsing intnx()
, we can derive the Friday of the week given a date, as shown below.
proc sql;\n/* Compute weekly marekt return from daily data */\ncreate table mktret_weekly as \nselect distinct date, \n year(date) as Year,\n week(date) as Week,\n case when weekday(date)=6 then date\n else intnx(\"week.6\",date,1) end as FridayOfWeek format=date9.,\n (exp(sum(log(1+sprtrn)))-1)*100 as mktret label=\"Weekly SP500 Index Return (%)\"\nfrom crsp.dsi \nwhere \nyear(date) between &startyear. and &endyear.\ngroup by year(date), week(date) order by date;\nquit;\n
Note that intnx(\"weekday.6\", date, 0)
will give the last Friday, which is not what we want. We want the next Friday of the week for a given date, so we use intnx(\"weekday.6\", date, 1)
. The case...when...
statement ensures that if the given date is already a Friday, we don't go for the next one. Below is a sample output of the mktret_weekly
table generated.
mktret_weekly
Obs Date Year Week FridayOfWeek mktret 1 19860102 1986 0 03JAN1986 -0.1893222 2 19860103 1986 0 03JAN1986 -0.1893222 3 19860106 1986 1 10JAN1986 -2.333080418 4 19860107 1986 1 10JAN1986 -2.333080418 5 19860108 1986 1 10JAN1986 -2.333080418 6 19860109 1986 1 10JAN1986 -2.333080418 7 19860110 1986 1 10JAN1986 -2.333080418 8 19860113 1986 2 17JAN1986 1.1992620931 9 19860114 1986 2 17JAN1986 1.1992620931 We can verify that the FridayOfWeek
indeed gives the Friday of the week. Therefore, the final weekly dataset using Friday as the date identifier just need to keep FridayOfWeek
and mktret
.
proc sql;\n/* Compute weekly marekt return from daily data */\ncreate table mktret_weekly as \nselect distinct\n case when weekday(date)=6 then date else intnx(\"week.6\",date,1) end \nas date format=date9. label=\"Friday of the Week\",\n (exp(sum(log(1+sprtrn)))-1)*100 \nas mktret label=\"Weekly SP500 Index Return (%)\"\nfrom crsp.dsi \nwhere \nyear(date) between &startyear. and &endyear.\ngroup by year(date), week(date) order by date;\nquit;\n
Example output of mktret_weekly
Obs date mktret 1 03JAN1986 -0.1893222 2 10JAN1986 -2.333080418 3 17JAN1986 1.1992620931 4 24JAN1986 -0.959555101 5 31JAN1986 2.5916781551 6 07FEB1986 1.3126828796 %let startyear=1986;\n%let endyear=2019;\nproc sql;\n/* Compute weekly marekt return from daily data */\ncreate table mktret_weekly as \nselect distinct date, \n (exp(sum(log(1+sprtrn)))-1)*100 as mktret label=\"Weekly SP500 Index Return (%)\"\nfrom crsp.dsi where year(date) between &startyear. and &endyear. \ngroup by year(date), week(date) \nhaving date=max(date) \norder by date;\nquit;\n
Note here that it's tempting to use having weekday(date)=6
to make sure the dates are all Friday. However, if Friday in a week is not the last trading day, then the weekly return will be missing. This is why here I use date=max(date)
to ensure non-missing weekly returns. The date is the last trading day in any given week, consistent with the CRSP's daily stock file.
The caveat here is that since the dates are the weekly last trading days, when merged with other weekly datasets, you should be very careful about whether the other dataset is using Friday or the last trading day per week as its date variable.
","tags":["CRSP","SAS","Code"]},{"location":"posts/compute-weekly-return-from-daily-crsp-data/#weekly-stock-return-from-daily-data","title":"Weekly stock return from daily data","text":"Following the same logic, we can calculate the weekly stock returns from daily CRSP data, where dates are aligned to the Friday of the week.
proc sql;\n/* Stocks (ordinary shares only) in the financial sector */\ncreate table stocks as select distinct permno from crsp.stocknames\nwhere shrcd in (10, 11) and floor(siccd/100) between 60 and 67;\n\ncreate table stockrets_weekly as \nselect distinct permno,\n case when weekday(date)=6 then date else intnx(\"week.6\",date,1) end \nas date format=date9. label=\"Friday of the Week\",\n (exp(sum(log(1+ret)))-1)*100 as ret label=\"Weekly Return (%)\"\nfrom crsp.dsf \nwhere \nyear(date) between &startyear. and &endyear.\nand permno in (select * from stocks) \n and prc>0 and not missing(ret)\ngroup by year(date), week(date), permno order by permno, date;\nquit;\n
","tags":["CRSP","SAS","Code"]},{"location":"posts/compute-weekly-return-from-daily-crsp-data/#whats-wrong","title":"What's wrong?","text":"The code above seems okay. We know that CRSP daily stock file contains many observations where the daily trading volume is 0, in which case the price is recorded as the negative bid-ask midpoint. Therefore, we restrict to only those with positive stock prices. So what's the problem?
The problem is that a week can span two calendar years.
For example, check out the last week of 2019:
Mon Tue Wed Thu Fri Sat Sun 30 31 1 2 3 4 5Now we have a mistake. A single week is broken into two because of the use of week()
function in SAS. Another consequence is that when there're many years of data, there will be a lot of duplicates.
Now let's explore two ways that avoid this mistake. Although both generate the same result (there can be a few differences, see the caveat), the second one is much faster.
","tags":["CRSP","SAS","Code"]},{"location":"posts/compute-weekly-return-from-daily-crsp-data/#1-start-with-a-list-of-dates-slow-version","title":"1. Start with a list of dates (slow version)","text":"Now we can write some correct code to compute the weekly returns. We'll generate a series of Fridays first, then we merge based on the past 5 calendar days. This will ensure all trading days with non-missing data will be included in the weekly return calculation, and correct the mistake mentioned above.
%let start_date = 01Jan1986;\n%let end_date = 31Dec2019;\n\n/* Generate a series of Fridays */\ndata fridays;\ndate=\"&start_date\"d;\ndo while (date<=\"&end_date\"d);\n if weekday(date)=6 then output;\n date=intnx('day', date, 1, 's');\nend;\nformat date date9.;\nrun;\n
Weekly index return from daily data (as at Friday)Weekly stock return from daily data (as at Friday) proc sql;\n/* Compute weekly index return from daily data */\ncreate table mktret_weekly as \nselect distinct a.date,\n (exp(sum(log(1+sprtrn)))-1)*100 \nas mktret label=\"Weekly SP500 Index Return (%)\"\nfrom fridays as a left join crsp.dsi as dsi\non dsi.date between intnx('day', a.date, -4) and a.date\ngroup by a.date\norder by a.date;\nquit;\n
Note that this version is inefficient and takes a long time to run.
proc sql;\n/* Stocks (ordinary shares) in the financial sector (2-digit SIC=60-67) */\ncreate table stocks as select distinct permno from crsp.stocknames\nwhere shrcd in (10, 11) and floor(siccd/100) between 60 and 67;\n\n/* Compute weekly stock return from daily data */\ncreate table stockrets_weekly as \nselect distinct a.date, dsf.permno, dsf.hsiccd,\n (exp(sum(log(1+ret)))-1)*100 as ret label=\"Weekly Return (%)\"\nfrom fridays as a left join crsp.dsf as dsf\non dsf.date between intnx('day', a.date, -4) and a.date\n and dsf.permno in (select * from stocks) \n and dsf.prc>0 and not missing(dsf.ret)\ngroup by dsf.permno, a.date\norder by dsf.permno, a.date;\nquit;\n
","tags":["CRSP","SAS","Code"]},{"location":"posts/compute-weekly-return-from-daily-crsp-data/#2-group-using-aligned-dates-fast-version-with-caveat","title":"2. Group using aligned dates (fast version with caveat)","text":"This version uses a similar logic from the previous incorrect one, but it groups based on the aligned dates instead of year(date)
and week(date)
.
proc sql;\n/* Compute weekly stock return from daily data */\ncreate table stockrets_weekly2 as \nselect distinct permno, hsiccd,\n case when weekday(date)=6 then date else intnx(\"week.6\",date,1) end \nas date format=date9. label=\"Friday of the Week\",\n (exp(sum(log(1+ret)))-1)*100 as ret label=\"Weekly Return (%)\"\nfrom crsp.dsf (keep=permno date ret prc shrout hsiccd)\nwhere \n date between \"01Jan1986\"d and \"31Dec2019\"d\n and permno in (select * from stocks) \n and prc>0 and not missing(ret)\ngroup by permno, calculated date order by permno, date;\nquit;\n
Caveat
If the beginning and ending dates, \"01Jan1986\"d and \"31Dec2019\"d
in the example, are not Fridays, then the first and last weekly returns for all stocks will be incorrect, because they are not using all the daily data in those weeks.
To fix this minor issue, simply extend the beginning and ending dates beyond your sample period by a few weeks.
","tags":["CRSP","SAS","Code"]},{"location":"posts/convert-between-numeric-and-character-variables/","title":"Convert Between Numeric and Character Variables","text":"Converting between numeric and character variables is one of the most frequently encountered issues when processing datasets. This article explains how to do this conversion correctly and efficiently.
","tags":["SAS","Stata","Code"]},{"location":"posts/convert-between-numeric-and-character-variables/#numeric-to-character","title":"Numeric to Character","text":"Assume there's an imported dataset named filings
, where cik
is stored as a numeric variable as shown below:
Because cik
is of different digits, to convert the numeric cik
into a character variable, the natural procedure is to pad it with leading zeros. For example, cik
(Central Index Key) itself is a 10-digit number used by SEC.
In SAS, convert numeric variable to string with leading zeros (assuming 10-digit fixed length) is done via PUT()
function:
data filings(drop=cik); set filings;\n cik_char = put(cik, z10.); \nrun;\n
Tip
PUT()
function also works in PROC SQL
.
The generated cik_char
variable is of format and informat $10.
, and the dataset becomes:
In STATA, convert numeric variable to string with leading zeros (assuming 6-digit fixed length) can be achieved via the string()
function.
gen char_var = string(num_var,\"%06.0f\")\n
","tags":["SAS","Stata","Code"]},{"location":"posts/convert-between-numeric-and-character-variables/#character-to-numeric","title":"Character to Numeric","text":"In SAS, converting a character variable to a numeric one uses the INPUT()
function:
var_numeric = input(var_char, best12.);\n
In STATA, this conversion be can be done via either real()
function or destring
command.
gen num_var = real(char_var);\n
The real()
function works on a single variable. destring
command can convert all character variables into numeric in one go.
destring, repalce\n
Warning
If a character variable has non-numeric characters in it, then it will not be converted. In such a case, you may choose to use the encode
command, although it in fact is generating categories.
A more detailed explanation with examples is available at stats.idre.ucla.edu
","tags":["SAS","Stata","Code"]},{"location":"posts/correlated-random-effects/","title":"Correlated Random Effects","text":"Can we estimate the coefficient of gender while controlling for individual fixed effects? This sounds impossible as an individual's gender typically does not vary and hence would be absorbed by individual fixed effects. However, Correlated Random Effects (CRE) may actually help.
At last year's FMA Annual Meeting, I learned this CRE estimation technique when discussing a paper titled \"Gender Gap in Returns to Publications\" by Piotr Spiewanowski, Ivan Stetsyuk and Oleksandr Talavera. Let me recollect my memory and summarize the technique in this post.
","tags":["Econometrics"]},{"location":"posts/correlated-random-effects/#random-intercept-effect-model","title":"Random Intercept (Effect) Model","text":"Consider a random intercept model for a firm-year regression, e.g., to examine the relationship between firm performance, R&D expense, and whether the firm is VC-backed,
\\[ \\begin{equation} y_{it} = \\beta_0 + \\beta_1 x_{it} + \\beta_2 c_i + \\mu_i + \\varepsilon_{it} \\end{equation} \\]where,
We can estimate \\(\\beta_0\\), \\(\\beta_1\\), \\(\\beta_2\\) and \\(\\mu_i\\). Assuming that we've properly controlled for observable firm characteristics, \\(\\beta_1\\) tells the relationship between R&D expenditure and firm performance. \\(\\beta_2\\) tells the difference in firm performance between VC-backed and non-VC-backed firms.
However, we cannot rely on \\(\\beta_2\\) to assert whether VC-back firms have better or worse performance. The drawback here is that we are unable to exhaustively control for all other time-invariant firm attributes that correlate with both firm performance and VC investment, thereby leading to biased \\(\\beta_2\\) estimate due to omitted variables.
Similarly, our estimate of \\(\\beta_1\\) may also be biased if some omitted firm-specific and time-invariant attributes correlate with both R&D expenditure and firm performance.
","tags":["Econometrics"]},{"location":"posts/correlated-random-effects/#fixed-effect-model","title":"Fixed Effect Model","text":"If we subtract the \"between\" model
\\[ \\begin{equation} \\bar{y}_{i} = \\beta_0 + \\beta_1 \\bar{x}_{it} + \\beta_2 c_i + \\mu_i + \\bar{\\varepsilon}_{i} \\end{equation} \\]from Equation (1), we have the fixed effect model in the demeaned form:
\\[ \\begin{equation} (y_{it} - \\bar{y}_i) = \\beta_1 (x_{it}-\\bar{x}_i) + (\\varepsilon_{it} - \\bar{\\varepsilon}_{i}) \\end{equation} \\]The fixed effect model above removes the firm-level error \\(\\mu_i\\) so that the within effect (or fixed effect) estimate of \\(\\beta_1\\) is unbiased even if \\(E(\\mu_i|x_{it}) \\ne 0\\). This helps a lot and is why most of the time we control for firm fixed effects when estimating firm-year regressions.
However, the firm-level variable \\(c_i\\) is also removed. It is now impossible to estimate \\(\\beta_2\\) as in the random intercept model. In fact, we can no longer estimate the effect of any firm-level time-invariant attributes after controlling for firm fixed effects.
","tags":["Econometrics"]},{"location":"posts/correlated-random-effects/#hybrid-model","title":"Hybrid Model","text":"So, how to estimate both \\(\\beta_2\\) when firm fixed effects are controlled for?
The same question, if paraphrased differently, is how to estimate the within effect in a random intercept model.
Interestingly, we can decompose the firm-year level variable \\(x_{it}\\) into two components, a between component \\(\\bar{x}_i\\) and a cluster component \\((x_{it}-\\bar{x}_i)\\), so that
\\[ \\begin{equation} y_{it} = \\beta_0 + \\beta_1 (x_{it}-\\bar{x}_i) + \\beta_2 c_i + \\beta_3 \\bar{x}_i + \\mu_i + \\varepsilon_{it} \\end{equation} \\]It is apparent that the \\(\\beta_1\\) estimate gives the within effect as in the fixed effect model, identical to \\(\\beta_1\\) in Equation (3).
Moreover, the firm-level variable \\(c_i\\) is kept in the model and we can estimate \\(\\beta_2\\). The inclusion of cluster mean \\(\\bar{x}_i\\) corrects the estimate of \\(\\beta_2\\) for between-cluster differences in \\(x_{it}\\). Note that, however, for \\(\\beta_2\\) estimate to be unbiased, we still require \\(E(\\mu_i|x_{it},c_i)=0\\) and \\(\\mu_i|x_{it},c_i \\sim N(0,\\sigma^2_\\mu)\\).
","tags":["Econometrics"]},{"location":"posts/correlated-random-effects/#correlated-random-effect-model","title":"Correlated Random Effect Model","text":"A related model is correlated random effect model (Wooldridge 2010) that relaxes the assumption of zero correlation between the firm-level error \\(\\mu_i\\) and firm-year variable \\(x_{it}\\). Specifically, it assumes that \\(\\mu_i=\\pi\\bar{x}_i + v_i\\), so Equation (1) becomes
\\[ \\begin{align} y_{it} &= \\beta_0 + \\beta_1 x_{it} + \\beta_2 c_i + \\mu_i + \\varepsilon_{it} \\\\ &= \\beta_0 + \\beta_1 x_{it} + \\beta_2 c_i + \\pi\\bar{x}_i + v_i + \\varepsilon_{it} \\end{align} \\]By including the cluster mean \\(\\bar{x}_i\\), we can account for the correlation between the random effects \\(\\mu_i\\) and the independent variable \\(x_{it}\\) and obtain consistent estimates of the coefficients. The inclusion of \\(\\bar{x}_i\\) in the random intercept (effect) model makes the estimate for \\(\\beta_1\\) the same within effect (fixed-effect) estimate as in Equation (4).
Of course, as the time-invariant firm-specific attribute \\(c_i\\) remains in the model, we can estimate \\(\\beta_2\\) as in the hybrid model.
","tags":["Econometrics"]},{"location":"posts/correlated-random-effects/#estimation","title":"Estimation","text":"Note that there are many caveats for estimating CRE.
To be discussed.
","tags":["Econometrics"]},{"location":"posts/correlated-random-effects/#further-readings","title":"Further Readings","text":"This post is based on Within and between Estimates in Random-Effects Models: Advantages and Drawbacks of Correlated Random Effects and Hybrid Models.
Some other suggested readings include:
A note on the missing codes in CRSP.
","tags":["CRSP"]},{"location":"posts/crsp-missing-codes/#codes","title":"Codes","text":"Variable Name C/Fortran Value SAS Missing Code DescriptionRET
-44.0 .E No valid comparison for an excess return -55.0 .D No listing information -66.0 .C No valid previous price -77.0 .B Off-exchange -88.0 .A Out of Range -99.0 . No valid price DLRET
-55.0 .S CRSP has no source to establish a value after delisting -66.0 .T More than 10 trading periods between a security's last price and its first available price on a new exchange -88.0 .A Security is still active -99.0 .P Security trades on a new exchange after delisting, but CRSP currently h as no sources to gather price information DLRETX
-55.0 .S CRSP has no source to establish a value after delisting -66.0 .T More than 10 trading periods between a security's last price and its first available price on a new exchange -88.0 .A Security is still active -99.0 .P Security trades on a new exchange after delisting, but CRSP currently has no sources to gather price information DCLRDT
0 . Declaration date cannot be found TRTSCD
0 . Unknown trading status of the issue NMSIND
0 . Unknown whether or not the issue is a member of the Nasdaq National Market VOL
-99.0 .","tags":["CRSP"]},{"location":"posts/crsp-missing-codes/#note","title":"Note","text":"PRC
: A positive amount is an actual close for the trading date while a negative amount denotes the average between BIDLO
and ASKHI
.Units:
SHROUT
in thousands.VOL
: The sum of the trading volumes during the month, reported in units of 100, and are not adjusted for splits during the month.https://wrds-www.wharton.upenn.edu/demo/crsp/form/
","tags":["CRSP"]},{"location":"posts/decomposing-hhi-index/","title":"Decomposing Herfindahl\u2013Hirschman (HHI) Index","text":"Herfindahl\u2013Hirschman (HHI) Index is a well-known market concentration measure determined by two factors:
Intuitively, having a hundred similar-sized gas stations in town means a far less concentrated environment than just one or two available, and when the number of firms is constant, their size distribution or variance determines the magnitude of market concentration.
Since these two properties jointly determine the HHI measure of concentration, naturally we want a decomposition of HHI that can reflects these two dimensions respectively. This is particularly useful when two distinct markets have the same level of HHI measure, but the concentration may result from different sources. Note that here these two markets do not necessarily have to be industry A versus industry B, but can be the same industry niche in two geographical areas, for example.
Thus, we can think of HHI as the sum of the actual market state's deviation from 1) all firms having the same size, and the deviation from 2) a fully competitive environment with infinite number of firms in the market. Some simple math can solve our problem.
","tags":["HHI"]},{"location":"posts/decomposing-hhi-index/#some-math","title":"Some math","text":"Let's say in a market ther are \\(n\\) firms sized \\(x_1, x_2, ... x_n\\), thus we can describe the market using a \\(\\mathbb R_+^n\\) vector:
\\[ \\mathbf{x}=(x_1, x_2, ... x_n) \\]In the first scenario where all firms' sizes are equal, we can describe it with:
\\[ \\mathbf{\\bar{x}}=(\\bar{x}, \\bar{x}, ... \\bar{x}) \\]where \\(\\bar{x}=\\frac{1}{n} \\sum_{i=1}^{n}x_i\\) is the average firm size.
The Euclidean distance between the point \\(\\mathbf{x}\\) and \\(\\mathbf{\\bar{x}}\\), denoted as \\(d(\\mathbf{x}, \\mathbf{\\bar{x}})\\), is thus
\\[ d(\\mathbf{x}, \\mathbf{\\bar{x}})=\\sqrt{ \\sum_{i=1}^{n} x_{i}^2 - n \\bar{x}^2 } \\]For the ease of discussion, let's consider the other spectrum of the second scenario where there's only one firm in the market instead of infinite firms, assuming its size is the sum of all firms in the first scenario (i.e. its size is \\(n\\bar{x}\\)), we know that this market is the most concentrated state, \\(\\mathbf{x^*}\\). In other words, its distance to the market state in scenario one is the largest.
\\[ \\max_{x} d(\\mathbf{x}, \\mathbf{\\bar{x}})=d(\\mathbf{x^*}, \\mathbf{\\bar{x}}) = ... = \\sqrt{ (n-1)n \\bar{x}^2 } \\]Hence, the distance of any market state \\(\\mathbf{x}\\) to the first scenario, the equidistribution point \\(\\mathbf{\\bar{x}}\\), should lie between \\(0\\) to \\(d(\\mathbf{x^*}, \\mathbf{\\bar{x}})\\).
Thus we can derive a relative index of concentration (when \\(n>1\\)) as \\(\\tau\\):
\\[ \\tau=\\frac{ d(\\mathbf{x}, \\mathbf{\\bar{x}}) }{ d(\\mathbf{x^*}, \\mathbf{\\bar{x}}) } \\in [0, 1] \\]Now, given the definition of Herfindahl-Hirschman Index \\(H\\) that
\\[ H=\\sum_{i=1}^{n} (\\frac{x_i}{n\\bar{x}})^2 \\]we can get:
\\[ \\tau=\\sqrt{\\frac{n}{n-1}(H-\\frac{1}{n})} = \\sqrt{\\frac{nH-1}{n-1}} \\]Here comes the important implications. Recall that \\(\\tau\\) represents the ratio of the distance between a market state and the equidistribution point to the maximum possible distance given a total market size of \\(n\\bar{x}\\).
When we observe a market state \\(\\mathbf{x}=(x_1, x_2, ... x_n)\\) at a given time, the total market size is fixed and thus \\(\\tau\\) is only varying with the distance between the observed actual market state and the equidistribution state where all firms have the same size. This implies that \\(\\tau\\) could be a measure of the first determinant of market concentration, i.e. the size distribution (variance) of firms.
Further, \\(\\tau\\) represents a sequence of functions whose limit is \\(\\sqrt{H}\\) as \\(n \\to +\\infty\\), when the market is in a fully competitive environment. Thus, given a \\(H'\\) from the knowledge of \\(n'\\) and \\(\\mathbf{x'}\\), we know there is one and only one matching \\(\\tau'\\) and its limit of \\(\\sqrt{H'}\\) in the fully competitive environment.
The graph below shows that \\(H\\) can therefore be decomposed into two components, that is
\\[ H = E_i + E_n \\]where \\(E_i = \\tau^2\\), and \\(E_n = H-\\tau^2\\).
We mentioned before that \\(\\tau\\) can be measure of the market concentration resulted from the size distribution (variance) of firms, such that \\(E_i=\\tau^2\\) can be an even better one since it's smaller than \\(H\\), which enables us to measure the concentration contributed from the number of firms, measured by \\(E_n\\).
This decomposition is appealing also in that \\(E_n\\), from the graph above, effectively is the horizontal difference between the two curves, i.e. the 'distance' between the actual market state and the fully competitive market with infinite number of firms (scenario two).
Thus, it's safe to say this decomposition produces two components explaining the observed market concentration, 1) \\(E_i\\), the inequality of firm sizes effect, and 2) \\(E_n\\), the number of firms effect.
Another finding from the graph is that with higher market concentration measured by \\(H\\), the relative importance of the two components is changing.
When \\(H\\) is small, most of the concentration is resulted from \\(E_n\\) as highlighted below, which means the number of firms has a greater impact on market concentration.
When \\(H\\) is larger, on the other hand, \\(E_i\\) contributes more to \\(H\\), which means the firm size inequality plays a bigger role in market concentration.
A potential implication for regulators who are concerned about market concentration, I think, is to 1) focus more on reducing the entry barrier if the current concentration level is moderate, and to 2) focus more on antitrust if the concentration level is already high.
Another implication for researchers is that even though \\(H \\in [\\frac{1}{n}, 1]\\) is affected by the number of firms in a market, we should not attempt to use the \\(\\text{normalized HHI}=\\frac{H-1/n}{1-1/n} \\in [0,1]\\). The reason is now very simple and clear -- the normalized HHI is nothing but \\(E_i=\\tau^2\\), which reflects only the market concentration due to the inequality of firm sizes. When we compare across markets or the same market over time, apparently a market with 1,000 firms has a different competitive landscape than a market with only 2 firms.
","tags":["HHI"]},{"location":"posts/decomposing-hhi-index/#acknowledgement","title":"Acknowledgement","text":"This post is largely a replicate of the paper \"A Decomposition of the Herfindahl Index of Concentration\" by Giacomo de Gioia in 2017.
","tags":["HHI"]},{"location":"posts/docker-nginx-letsencrypt/","title":"Setup Docker/Ngnix and Let's Encrypt on Ubuntu","text":"This is a note for setting up a Docker, Nginx and Let's Encrypt environment on Ubuntu 20.04 LTS.
","tags":["Docker","Nginx","WordPress"]},{"location":"posts/docker-nginx-letsencrypt/#create-a-ubuntu-2004-lts-instance","title":"Create a Ubuntu 20.04 LTS instance","text":"","tags":["Docker","Nginx","WordPress"]},{"location":"posts/docker-nginx-letsencrypt/#install-docker-using-the-convenience-script","title":"Install Docker using the convenience script","text":"$ curl -fsSL https://get.docker.com -o get-docker.sh\n$ sudo sh get-docker.sh\n
","tags":["Docker","Nginx","WordPress"]},{"location":"posts/docker-nginx-letsencrypt/#manage-docker-as-a-non-root-user","title":"Manage Docker as a non-root user","text":"If you don't want to preface the docker
command with sudo
, create a Unix group called docker
and add users to it. When the Docker daemon starts, it creates a Unix socket accessible by members of the docker
group.
To create the docker
group and add your user:
docker
group.$ sudo groupadd docker\n
docker
group.$ sudo usermod -aG docker $USER\n
Log out and log back in so that your group membership is re-evaluated.
On Linux, you can also run the following command to activate the changes to groups:
$ newgrp docker
","tags":["Docker","Nginx","WordPress"]},{"location":"posts/docker-nginx-letsencrypt/#configure-docker-to-start-on-boot","title":"Configure Docker to start on boot","text":"$ sudo systemctl enable docker\n
To disable this behavior, use disable
instead.
$ sudo systemctl disable docker\n
","tags":["Docker","Nginx","WordPress"]},{"location":"posts/docker-nginx-letsencrypt/#install-docker-compose","title":"Install Docker Compose","text":"On Linux, you can download the Docker Compose binary from the Compose repository release page on GitHub. Follow the instructions from the link, which involve running the curl
command in your terminal to download the binaries. These step-by-step instructions are also included below.
$ sudo curl -L \"https://github.com/docker/compose/releases/download/1.25.5/docker-compose-$(uname -s)-$(uname -m)\" -o /usr/local/bin/docker-compose\n
Note
To install a different version of Compose, substitute 1.25.5
with the version of Compose you want to use.
$ sudo chmod +x /usr/local/bin/docker-compose\n
Note
If the command docker-compose
fails after installation, check your path. You can also create a symbolic link to /usr/bin
or any other directory in your path. For example:
$ sudo ln -s /usr/local/bin/docker-compose /usr/bin/docker-compose\n
","tags":["Docker","Nginx","WordPress"]},{"location":"posts/docker-nginx-letsencrypt/#set-up-nginx-proxy","title":"Set up Nginx-Proxy","text":"Create a unique network for nginx-proxy and other Docker containers to communicate through.
$ docker network create nginx-proxy\n
Create a directory nginx-proxy
for the compose file.
$ mkdir nginx-proxy && cd nginx-proxy\n
In the nginx-proxy directory, create a new file named docker-compose.yml
and paste in the following text:
docker-compose.yml
for nginx-proxy version: '3'\nservices:\nnginx:\nimage: nginx\nrestart: always\ncontainer_name: nginx-proxy\nports:\n- \"80:80\"\n- \"443:443\"\nvolumes:\n- conf:/etc/nginx/conf.d\n- vhost:/etc/nginx/vhost.d\n- html:/usr/share/nginx/html\n- certs:/etc/nginx/certs\nlabels:\n- \"com.github.jrcs.letsencrypt_nginx_proxy_companion.nginx_proxy=true\"\ndockergen:\nimage: jwilder/docker-gen\nrestart: always\ncontainer_name: nginx-proxy-gen\ndepends_on:\n- nginx\ncommand: -notify-sighup nginx-proxy -watch -wait 5s:30s /etc/docker-gen/templates/nginx.tmpl /etc/nginx/conf.d/default.conf\nvolumes:\n- conf:/etc/nginx/conf.d\n- vhost:/etc/nginx/vhost.d\n- html:/usr/share/nginx/html\n- certs:/etc/nginx/certs\n- /var/run/docker.sock:/tmp/docker.sock:ro\n- ./nginx.tmpl:/etc/docker-gen/templates/nginx.tmpl:ro\nletsencrypt:\nimage: jrcs/letsencrypt-nginx-proxy-companion\nrestart: always\ncontainer_name: nginx-proxy-le\ndepends_on:\n- nginx\n- dockergen\nenvironment:\nNGINX_PROXY_CONTAINER: nginx-proxy\nNGINX_DOCKER_GEN_CONTAINER: nginx-proxy-gen\nvolumes:\n- conf:/etc/nginx/conf.d\n- vhost:/etc/nginx/vhost.d\n- html:/usr/share/nginx/html\n- certs:/etc/nginx/certs\n- /var/run/docker.sock:/var/run/docker.sock:ro\nvolumes:\nconf:\nvhost:\nhtml:\ncerts:\nnetworks:\ndefault:\nexternal:\nname: nginx-proxy\n
Inside of the nginx-proxy
directory, use the following curl
command to copy the developer\u2019s sample nginx.tmpl
file to your VPS.
$ curl https://raw.githubusercontent.com/jwilder/nginx-proxy/master/nginx.tmpl > nginx.tmpl\n
Increase upload file size
To increase the maximum upload size, for example, add client_max_body_size 100M;
to the server{}
section in the nginx.tmpl
template file. For WordPress,
Running nginx-proxy
.
$ docker-compose up -d\n
","tags":["Docker","Nginx","WordPress"]},{"location":"posts/docker-nginx-letsencrypt/#add-a-wordpress-container","title":"Add a WordPress container","text":"Create a directory for the docker-compose.yml
with:
docker-compose.yml
for WordPress container version: \"3\"\nservices:\ndb_node_domain:\nimage: mysql:5.7\nvolumes:\n- db_data:/var/lib/mysql\nrestart: always\nenvironment:\nMYSQL_ROOT_PASSWORD: somewordpress\nMYSQL_DATABASE: wordpress\nMYSQL_USER: wordpress\nMYSQL_PASSWORD: wordpress\ncontainer_name: wp_test_db\nwordpress:\ndepends_on:\n- db_node_domain\nimage: wordpress:latest\nexpose:\n- 80\nrestart: always\nenvironment:\nVIRTUAL_HOST: blog.example.com\nLETSENCRYPT_HOST: blog.example.com\nLETSENCRYPT_EMAIL: foo@example.com\nWORDPRESS_DB_HOST: db_node_domain:3306\nWORDPRESS_DB_USER: wordpress\nWORDPRESS_DB_PASSWORD: wordpress\ncontainer_name: wp_test\nvolumes:\ndb_data:\nnetworks:\ndefault:\nexternal:\nname: nginx-proxy\n
To create a second WordPress container, add MYSQL_TCP_PORT
environment variable and set it to a different port.
Enter the bash of the WordPress container.
$ docker exec -t wordpress_container_name bash\n
Move inside your /var/www/html directory (already there if you\u2019re using the standard Docker Compose image). Run the following command to insert the values.
$ sed -i '/^# END WordPress.*/i php_value upload_max_filesize 256M\\nphp_value post_max_size 256M' .htaccess\n
Note
To restore the values, run $ sed -i \"11,12d\" .htaccess
The Wharton Research Data Services (WRDS) allows one to submit and execute SAS programs to the cloud. WRDS has an instruction on accessing WRDS data from SAS on our own PCs. Generally, you should use:
%let wrds=wrds-cloud.wharton.upenn.edu 4016;\noptions comamid=TCP remote=WRDS;\nsignon username=_prompt_;\n\nrsubmit;\n\n/* Code for remote execution goes here. */\nendrsubmit;\nsignoff;\n
However, if you want to save the effort of entering username and password every time, you'll need to encode your password. Concluding the two articles, basically you just need to follow the steps below.
","tags":["SAS"]},{"location":"posts/encode-password-for-sas-remote-submission/#simple-steps","title":"Simple Steps","text":"First, open your SAS program locally on your PC, run the following command and replace 1234567890
with your WRDS password:
proc pwencode in=\"1234567890\"; run;\n
The output {SAS002}23AA9C2811439227077603C8365060A44800CA1F
is the encoded password (which is 1234567890
in this example).
Do NOT share your SAS program with encoded password!
Encoded password functions the same as your plain-text password. You should never make public your password in any way.
Next, put the following statements at the beginning of your SAS program and replace my_username
with your WRDS username:
%let wrds=wrds-cloud.wharton.upenn.edu 4016;\noptions comamid=TCP remote=WRDS;\nsignon username=my_username password=\"{SAS002}23AA9C2811439227077603C8365060A44800CA1F\";\n
After these statements, you'll be able to submit your SAS program remotely to and execute on the WRDS server by enclosing your statements with rsubmit
and endrsubmit
. An example would be:
rsubmit;\nproc download data=comp.funda out=funda; run;\nendrsubmit;\n
As you can guess, this statement actually downloads the whole Compustat Fundamentals Annual to the local work directory, with the downloaded dataset also named funda
.
Lastly, after everything, you should run signoff
to close the connection with WRDS.
Full code is as below.
%let wrds=wrds-cloud.wharton.upenn.edu 4016;\noptions comamid=TCP remote=WRDS;\nsignon username=my_username password=\"{SAS002}23AA9C2811439227077603C8365060A44800CA1F\";\n\nrsubmit;\nproc download data=comp.funda out=funda; run;\nendrsubmit;\nsignoff;\n
Replace my_username
and the encoded password with your actual WRDS username and encoded password, paste it in the SAS program editor and press F3
. You'll be downloading comp.funda
in a few seconds!
I made a short video introduction as well, available on my YouTube channel.
","tags":["SAS"]},{"location":"posts/estimate-organization-capital/","title":"Estimate Organization Capital","text":"As in Eisfeldt and Papanikolaou (2013), we obtain firm-year accounting data from the Compustat and compute the stock of organization capital for firms using the perpetual inventory method that recursively calculates the stock of OC by accumulating the deflated value of SG&A expenses.
","tags":["Code","SAS"]},{"location":"posts/estimate-organization-capital/#organization-capital","title":"Organization Capital","text":"\\[ OC_{i,t} = (1-\\delta_{OC})OC_{i,t-1} + \\frac{SGA_{i,t}}{CPI_t} \\]where \\(SGA_{i,t}\\) is firm \\(i\\)'s SG&A expenses in year \\(t\\), \\(CPI_t\\) is the consumer price index, and \\(\\delta_{OC}\\) is the depreciation rate of OC stock, which is set to be 15% as used by the U.S. Bureau of Economic Analysis (BEA). The initial value of OC stock is set to:
\\[ OC_{i,0} = \\frac{SGA_{i,1}}{g+\\delta_{OC}} \\]where \\(g\\) is the average real growth rate of firm-level SG&A expenses, which is 10% in Eisfeldt and Papanikolaou (2013) or specific for an industry-decade in Li, Qiu and Shen (2018).
","tags":["Code","SAS"]},{"location":"posts/estimate-organization-capital/#code","title":"Code","text":"This code estimates the organization capital for all Compustat firm-years.
Note that it requires an external dataset of CPI. You need to name it cpiaucsl
and store it in your WRDS home directory.
%let wrds=wrds-cloud.wharton.upenn.edu 4016;\noptions comamid=TCP remote=WRDS;\nsignon username=_prompt_;\n\nrsubmit;\n\n/* ==============================================================================================\n * This SAS program calcualtes the firm-year Organization Capital, measured by the capitalized \n * SG&A expenses using perpetual inventory method.\n * See e.g. Eisfeldt and Papanikolaou (2013), Li, Qiu and Shen (2018), Gao, Leung and Qiu (2021).\n *\n * Input: Compustat from WRDS.\n * Output:\n * sgastock: capitalized SG&A expenses\n * oc: capitalized SG&A expenses scaled by CPI-adjusted total assets\n * indadj_oc: industry median adjusted oc\n * rank_oc: annual decile rank of oc\n * rank_indadj_oc: annual decile rank of indadj_oc\n *\n * Note:\n * This program requires an external dataset of CPI named `cpiaucsl` in your home directory.\n * I use the Consumer Price Index for All Urban Consumers: All Items (CPIAUCSL)\n * sourced from Federal Reserve Bank of St.Louis,\n * available at https://fred.stlouisfed.org/series/CPIAUCSL/\n * Also, the industry-adjustment is based on sich from compustat only.\n * This code may contain error. Please check before use.\n *\n * Author: Mingze (Adrian) Gao\n * mingze.gao@sydney.edu.au\n *\n * Last Modifed: 24 Feb 2019\n * ============================================================================================== */\nlibname home \"~/\";\ndata funda(keep=gvkey cusip cik fyear datadate at xsga xrd xad sic2);\n /* Variables from Compustat:\n * AT: Assets Total;\n * XSGA: Selling, General and Administrative Expense;\n * XRD: Research and Development Expense;\n * XAD: Advertising Expense; */\nset comp.funda;\n if cmiss(of fyear datadate)=0;\n if indfmt = 'INDL' and datafmt='STD' and popsrc='D' and consol='C';\n sic2 = int(sich/100);\nrun;\nproc sql;\n/* Keep only obs from the first year with non-missing XSGA */\ncreate table funda_nonmissing_xsga as\nselect distinct a.*\n from funda as a left join \n /* This subquery selects the first year of appearance \n with non-missing XSGA */\n (select gvkey, fyear as firstfyear from funda \n where xsga is not missing \ngroup by gvkey having fyear = min(fyear)) as b\n on a.gvkey=b.gvkey\n where a.fyear>=b.firstfyear; \n\n/* CPIAUCSL: Consumer Price Index for All Urban Consumers: All Items\n Source: https://fred.stlouisfed.org/series/CPIAUCSL/ */\ncreate table funda_cpi as\nselect distinct a.*, b.cpiaucsl as cpi\n from funda_nonmissing_xsga as a left join home.cpiaucsl as b\n on year(a.datadate) = year(b.date) and month(a.datadate) = month(b.date)\n order by gvkey, fyear;\nquit;\n/* Sanity Check -- No Duplicates */\nproc sort nodupkey data=funda_cpi; \n by gvkey fyear;\nrun;\ndata funda_adj;\n set funda_cpi;\n by gvkey fyear;\n /* Replace missing XSGA, XRD and XAD with 0 */\nif xsga=. then xsga=0;\n if xrd=. then xrd=0;\n if xad=. then xad=0;\n /* Total assets adjusted for CPI */\n adjat = at / cpi;\n /* Two alternative SG&A measures */\n adjxsga1 = xsga / cpi;\n adjxsga2 = sum(xsga, -xrd, -xad) / cpi;\nrun;\ndata sgastock(drop=cnt adjxsga1 adjxsga2 lag:);\n set funda_adj(keep=gvkey cik cusip datadate fyear sic2 adj:);\n by gvkey;\n if first.gvkey then call missing(of cnt lag:);\n cnt+1;\n array adjxsga adjxsga1-adjxsga2;\n array sgastock sgastock1-sgastock2;\n array sgastock_r sgastock_r1-sgastock_r2;\n array lag_sgastock lag_sgastock1-lag_sgastock2;\n select (cnt);\n when (1) do;\n /* Under Perpetual Inventory Method, \n * the initial value of capitalized SG&A at time 0, O(0), is:\n * O(0)=O(1)/(g+delta)\n * where g is average SGA growth rate (10%) and delta is depreciation rate (15%).\n * So that,\n * O(0)=SGA(1)/(0.15+0.1)=SGA(1)/0.25=SGA(1)*4\n * This is why `adjxsga*4` is used below, specifically,\n * O(1)=O(0)*0.85+SGA(1)\n * =SGA(0)*4*0.85+SGA(1) */\ndo over sgastock;\n sgastock = (adjxsga * 4)* 0.85 + adjxsga; end;\n end;\n otherwise do;\n /* When t>1,\n * the capitalized SG&A at time t, O(t), is:\n * O(t)=O(t-1)*(1-delta)+SGA(t)\n * where g is average SGA growth rate (10%) and delta is depreciation rate (15%).\n * Note that here SG&A is adjusted for CPI. */\ndo over sgastock;\n sgastock = lag_sgastock * 0.85 + adjxsga; end;\n end;\n end;\n do over sgastock;\n lag_sgastock = sgastock;\n /* `sgastock_r` is sgastock scaled by adjusted total assets. */\n sgastock_r = sgastock / adjat;\n if adjat=. then sgastock_r=0;\n end;\n output;\n retain lag:;\nrun;\n/* industry-adjusted OC and rank-based OC measures */\nproc sql;\ncreate table tmp as\nselect gvkey, cik, cusip, datadate, fyear, sic2,\n sgastock_r1 as oc1,\n sgastock_r2 as oc2,\n sgastock_r1 - median(sgastock_r1) as indadj_oc1,\n sgastock_r2 - median(sgastock_r2) as indadj_oc2\n from sgastock\n group by fyear, sic2\n order by gvkey, fyear;\nquit;\nproc sort data=tmp; by fyear; run;\nproc rank data=tmp out=result groups=10;\n by fyear;\n var oc1 oc2 indadj_oc1 indadj_oc2;\n ranks rank_oc1 rank_oc2 rank_indadj_oc1 rank_indadj_oc2;\nrun;\ndata download(compress=yes); set work.result; run;\nproc download data=work.download out=sgastock; run;\nendrsubmit;\nsignoff;\n
Lastly, if you use this code above, please consider citing the following article for which it was written.
Gao, M. Leung, H. and Qiu, B. (2021). Organization Capital and Executive Performance Incentives, Journal of Banking & Finance, 123, 106017.
","tags":["Code","SAS"]},{"location":"posts/firm-historical-headquarter-state-from-10k/","title":"Firm Historical Headquarter State from SEC 10K/Q Filings","text":"","tags":["SEC","Textual Analysis","Compustat","Data","Code","8-K"]},{"location":"posts/firm-historical-headquarter-state-from-10k/#why-the-need-to-use-sec-filings","title":"Why the need to use SEC filings?","text":"In the Compustat database, a firm's headquarter state (and other identification) is in fact the current record stored in comp.company
. This means once a firm relocates (or updates its incorporate state, address, etc.), all historical observations will be updated and not recording historical state information anymore.
To resolve this issue, an effective way is to use the firm's historical SEC filings. You can follow my previous post Textual Analysis on SEC filings to extract the header information, which includes a wide range of meta data. Alternatively, the University of Notre Dame's Software Repository for Accounting and Finance provides an augmented 10-X header dataset.
2023 March Update
In this update I use 1,491,368 8-K filings of U.S. firms from 2004 to Dec 2022 and extract their HQ state and zipcode. hist_state_zipcode_from_8k_2004_2022.csv.zip
","tags":["SEC","Textual Analysis","Compustat","Data","Code","8-K"]},{"location":"posts/firm-historical-headquarter-state-from-10k/#do-i-have-to-use-sec-filings","title":"Do I have to use SEC filings?","text":"I'll skip the parsing procedure for now. The most important point is that using the historical SEC filings, you can ensure that you truly are using the historical headquarter state in your empirical estimation. Based on the augmented 10-X header dataset, I find that around 2-3% of Compustat firms changed their headquarter state (as indicated by their business address) each year.
Year Firms Changed State Total Firms % Firms Changed State 1995 22 4205 0.52 1996 69 7939 0.87 1997 199 8101 2.46 1998 206 8126 2.54 1999 202 8199 2.46 2000 202 8252 2.45 2001 204 7802 2.61 2002 167 7421 2.25 2003 214 6930 3.09 2004 175 6742 2.6 2005 154 6478 2.38 2006 156 6267 2.49 2007 144 6091 2.36 2008 125 5797 2.16 2009 127 5523 2.3 2010 128 5479 2.34 2011 152 5445 2.79 2012 160 5494 2.91 2013 171 5491 3.11 2014 195 5455 3.57 2015 147 5322 2.76 2016 117 5092 2.3 2017 129 4914 2.63 2018 107 4847 2.21Moreover, 2,947 out of the 17,221 firms, or about 17% firms changed their headquarter state in the merged sample. This is by no means a small number that can be ignored. So, whenever possible, you should try to use the historical information from past SEC filings' metadata.
","tags":["SEC","Textual Analysis","Compustat","Data","Code","8-K"]},{"location":"posts/firm-historical-headquarter-state-from-10k/#how-to-get-the-actual-historical-firm-hq-state-using-sec-filings","title":"How to get the actual historical firm HQ state using SEC filings?","text":"","tags":["SEC","Textual Analysis","Compustat","Data","Code","8-K"]},{"location":"posts/firm-historical-headquarter-state-from-10k/#1969-2003","title":"1969 - 2003","text":"I start with the firm historical HQ state provided by Bai, Fairhurst and Serfling (2020 RFS). This dataset contains the historical HQ locations from 1969 to 2003, which is based on the SEC filings post 1994 and hand-collected by the authors from the Moody\u2019s Manuals (later Mergent Manuals) and Dun & Bradstreet\u2019s Million Dollar Directory (later bought by Mergent).1
","tags":["SEC","Textual Analysis","Compustat","Data","Code","8-K"]},{"location":"posts/firm-historical-headquarter-state-from-10k/#1994-2018","title":"1994 - 2018","text":"To extend the dataset, I download the augmented 10-X header dataset and use the following Python script to extract the business address (state) filed.
import pandas as pd\nfilepath = \"~/Downloads/LM_EDGAR_10X_Header_1994_2018.csv\"\nif __name__ == \"__main__\":\ndf = pd.read_csv(\nfilepath,\nusecols=[\"cik\", \"file_date\", \"ba_state\"],\ndtype={\"cik\": str},\nparse_dates=[\"file_date\"],\n)\n# Some `ba_stata` codes are lowercase\ndf[\"ba_state\"] = df[\"ba_state\"].str.upper()\n# Some `ba_state` codes are not valid US states\ndf = df[df[\"ba_state\"].str.isalpha() & ~pd.isnull(df[\"ba_state\"])]\ndf.drop_duplicates().to_stata(\n\"~/Downloads/historical_state_1994_2018.dta\",\nwrite_index=False,\nconvert_dates={\"file_date\": \"td\"},\n)\n
The result is a historical_state.dta
Stata file like this:
Finally, to merge the two datasets together, I imported them into WRDS Cloud and run the following SAS script:
libname hs \"~/historical_state\";\n\n/* Historical HQ state (1994 to 2018) from augmented 10-X header dataset */\nproc import datafile=\"~/historical_state/historical_state_1994_2018.dta\"\nout=historical_state_1994_2018 dbms=stata replace;\n/* Historical HQ state (1969 to 2003) from Bai, Fairhurst and Serfling (2020 RFS) */\nproc import datafile=\"~/historical_state/hist_headquarters_Bai_et_al.dta\"\nout=hist_headquarters_Bai_et_al dbms=stata replace;\n\n/* Build the post-1994 dataset using SEC filings */\nproc sql;\ncreate table funda as \nselect gvkey, cik, datadate, fyear from comp.funda\nwhere indfmt= 'INDL' and datafmt='STD' and popsrc='D' and consol='C'\nand year(datadate) between 1994 and 2018\n/* \"For firms that change fiscal year within a calendar year, \n we take the last reported date when extracting financial data. \n This leaves us with one set of observations for each firm (gvkey) in each year.\" \n -- Pelueger, Siriwardane and Sunderam (2020 QJE) */\ngroup by gvkey, fyear having datadate=max(datadate);\n\ncreate table firm_historical_state as \nselect a.*, b.ba_state as state_sec label=\"State from SEC filings\"\nfrom funda as a left join historical_state as b \non a.cik=b.cik and year(a.datadate)=year(b.file_date) and b.file_date<=a.datadate\ngroup by a.gvkey, a.datadate\n/* use the SEC filing closet to and before the Compustat datadate */\nhaving b.file_date=max(b.file_date);\n\ncreate table historical_state_1994_2018 as\nselect a.*, b.state as state_comp label=\"State from Compustat\"\nfrom firm_historical_state as a left join comp.company as b \non a.gvkey=b.gvkey\norder by a.gvkey, a.datadate;\nquit;\n/* Sanity check: no duplicated gvkey-fyear */\nproc sort data=historical_state_1994_2018 nodupkey; by gvkey datadate; run;\nproc sql;\ncreate table hist_headquarters_Bai_et_al as \nselect put(gvkeyn, z6.) as gvkey, fyear, state \nfrom hist_headquarters_Bai_et_al;\nquit;\n/* Stack together the two datasets */\ndata states; \nset hist_headquarters_Bai_et_al \n historical_state_1994_2018(where=(fyear>2003) keep=gvkey fyear state:);\nrun;\nproc sql;\ncreate table hs.corrected_hist_state_1969_2018 as \nselect *, coalesce(state, state_sec, state_comp) as corrected_state\nfrom states where not missing(calculated corrected_state)\norder by gvkey, fyear;\nquit;\n/* Sanity check: no duplicated gvkey-fyear */\nproc sort data=hs.corrected_hist_state_1969_2018 nodupkey; by gvkey fyear; run;\n
","tags":["SEC","Textual Analysis","Compustat","Data","Code","8-K"]},{"location":"posts/firm-historical-headquarter-state-from-10k/#data-available-for-download","title":"Data available for download","text":"You can download the data I compiled here: corrected_hist_state_1969_2018.dta.zip (1MB).
","tags":["SEC","Textual Analysis","Compustat","Data","Code","8-K"]},{"location":"posts/firm-historical-headquarter-state-from-10k/#2023-update","title":"2023 Update","text":"In this update, I use 1,491,368 8-K filings of U.S. firms from 2004 to Dec 2022 and extract their HQ state and zipcode.
Data: hist_state_zipcode_from_8k_2004_2022.csv.zip
Specifically, I download all 8-K filings from EDGAR and run a script to extract business address from filing header into a database. If a firm reported different states in its filings in a given year, I keep the last one. The code is here at GitHub.
","tags":["SEC","Textual Analysis","Compustat","Data","Code","8-K"]},{"location":"posts/firm-historical-headquarter-state-from-10k/#suggested-citation","title":"Suggested citation","text":"Lastly, if you use the code/data above, please consider citing the following article for which it was written/constructed.
Gao, M. Leung, H. and Qiu, B. (2021). Organization Capital and Executive Performance Incentives, Journal of Banking & Finance, 123, 106017.
The authors note that \"for our final sample of 115,432 firm-year observations, we find that over the 1969 to 2003 period, 9,847 (87.50%) never relocate, 1,211 (10.76%) relocate once, 178 (1.58%) relocate twice, and 18 (0.16%) relocate three times.\"\u00a0\u21a9
Since Stata 15, we can search, browse and import almost a million U.S. and international economic and financial time series made available by the St. Louis Federal Reserve's Federal Research Economic Data. This post briefly explains this great feature.
","tags":["Stata"]},{"location":"posts/fred-federal-reserve-economic-data/#prerequisite","title":"Prerequisite","text":"Before you start, you will need an API Key from FRED. Register one here
Then in Stata, you can store this key permanently so you don't need to provide again.(1)
_key_
with your actual API Key obtained.set fredkey _key_, permanently\n
","tags":["Stata"]},{"location":"posts/fred-federal-reserve-economic-data/#gui-is-always-a-good-start","title":"GUI is always a good start","text":"Alternatively, click on menu File>Import>Fedearl Reserve Economic Data (FRED)
will bring up the dialog as shown below.
Enter API Key and you'll be free to explore all the data series available on FRED.
For example, let's see the CPI of Australia...
Describe the data series, we can find many useful meta info.
Vintage
Note that \"vintage\" section lists a number of dates, with each vintage referring a particular version of the data series at that point of time.
It may sound strange but an economic data series may be revised multiple times after it has been published. Potential reasons may be that later people collect more accurate information, or that there is a change of estimation method, etc.
For example, the CPI from 2005 to 2010 collected by a research as at 2011 may be different from the one collected as at 2023. Without specifying the data vintage, replicating a prior work can be hard.
Another tricky part is that ignoring vintages introduces look-ahead bias in analysis.
For example, a trading strategy using the revised GDP accessed today, instead of the vintage GDP, implicitly uses hindsight as the GDP series may have been revised to accomodate more accurate data obtained after release.
Let's close the description, double click on the series and click on import. Another dialog will be shown to confirm some final details.
The outputs will be like the following:
. import fred AUSCPIALLQINMEI, daterange(2010-01-01 2023-08-08) aggregate(quarterly,avg)\n\nSummary\n----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------\nSeries ID Nobs Date range Frequency\n----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------\nAUSCPIALLQINMEI 53 2010-01-01 to 2023-01-01 Quarterly\n----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------\n# of series imported: 1\n highest frequency: Quarterly\n lowest frequency: Quarterly\n
","tags":["Stata"]},{"location":"posts/fred-federal-reserve-economic-data/#programmatical-is-recipe-to-reproducibility","title":"Programmatical is recipe to reproducibility","text":"We don't need to go through the GUI process every time. In fact, Stata already told us what the corresponding command is:
import fred AUSCPIALLQINMEI, daterange(2010-01-01 2023-08-08) aggregate(quarterly,avg)\n
We can simply put this line of code into our program.
For example, the code below generates a time-series chart for Australia's CPI.
// Import\nimport fred AUSCPIALLQINMEI, daterange(2010-01-01 2023-03-31) vintage(2023-05-10) aggregate(quarterly,avg) clear\nrename AUSCPIALLQINMEI_20230510 cpi_australia\n// Time format\ngen yrqtr = yq(year(daten),quarter(daten))\nformat yrqtr %tq\ntsset yrqtr\n// Set start of the period to 100\ngen cpi_ret = cpi_australia/L.cpi_australia - 1\nreplace cpi_australia = 100 if _n==1\nreplace cpi_australia = L.cpi_australia * (1+cpi_ret) if _n>1\n// Plotting\ntwoway (tsline cpi_australia), title(\"Quarterly CPI of Australia 2010Q1-2023Q1\") ytitle(\"\") ttitle(\"\") note(\"Index 2010Q1=100. Source: FRED, 2023-05-10 vintage.\")\n
Note
The code snippet above specifies the data vintage. Therefore, even if someone runs it 30 years from now, they will still get exactly the same data and plot as I do in 2023.
","tags":["Stata"]},{"location":"posts/generate-fama-french-industry-classification-from-sic/","title":"Generate Fama-French Industry Classification From SIC","text":"This Stata program creates the Fama-French industry classification from SIC code.
","tags":["Stata","Code"]},{"location":"posts/generate-fama-french-industry-classification-from-sic/#basic-usage","title":"Basic usage","text":"ffind sic, generate(\u201cFF48\u201d) type(48)\n
where sic is SIC code, FF48 is the generated industry variable name, and we are using 48-industry classification. Alternatively, one can choose 5, 10, 12, 17, 30, 38 or 49 industries.
","tags":["Stata","Code"]},{"location":"posts/generate-fama-french-industry-classification-from-sic/#full-stata-code","title":"Full Stata code","text":"/****************************************\n* ffind.ado\n* Creates variable containing Fama-French\n* industry classification.\n*\n* Author: Judson Caskey, UCLA\n* December 9, 2007\n*\n* Revised by Malcolm Wardlaw, Uiversity of Texas at Dallas (November 1, 2011)\n****************************************/\ncapture program drop ffind\n\nprogram define ffind\n version 9.2\n syntax varlist(min=1 max=1 numeric) [if] [in], Generate(string) Type(numlist max=1 min=1)\n\n tempvar ftyp\n tokenize \"`type'\"\n local `ftyp'=`1'\n * Check if generate is valid variable name\n capture confirm new variable `generate'\n if _rc != 0 {\n di as error \"Variable `generate' is invalid\"\n exit 111\n }\n\n * Check type\n if ~inlist(``ftyp'',5,10,12,17,30,38,48,49) {\n di as error \"Type must be 5, 10, 12, 17, 30, 38, 48 or 49\"\n exit 111\n }\n\n * Set industries\n tempvar ffind\n tokenize \"`varlist'\"\n local `ffind' \"`1'\"\n qui gen `generate'=.\n label variable `generate' \"Fama-French industry code (``ftyp'' industries)\"\n capture label drop `generate'\n if ``ftyp''==5 {\n label define `generate' 1 \"Consumer Durables, NonDurables, Wholesale, Retail, and Some Services (Laundries, Repair Shops)\" 2 \"Manufacturing, Energy, and Utilities\" 3 \"Business Equipment, Telephone and Television Transmission\" 4 \"Healthcare, Medical Equipment, and Drugs\" 5 \"Other -- Mines, Constr, BldMt, Trans, Hotels, Bus Serv, Entertainment, Finance\"\n label values `generate' `generate'\n qui replace `generate'=1 if inrange(``ffind'',100,999) | inrange(``ffind'',2000,2399) | inrange(``ffind'',2700,2749) | inrange(``ffind'',2770,2799) | inrange(``ffind'',3100,3199) | inrange(``ffind'',3940,3989) | inrange(``ffind'',2500,2519) | inrange(``ffind'',2590,2599) | inrange(``ffind'',3630,3659) | inrange(``ffind'',3710,3711) | inrange(``ffind'',3714,3714) | inrange(``ffind'',3716,3716) | inrange(``ffind'',3750,3751) | inrange(``ffind'',3792,3792) | inrange(``ffind'',3900,3939) | inrange(``ffind'',3990,3999) | inrange(``ffind'',5000,5999) | inrange(``ffind'',7200,7299) | inrange(``ffind'',7600,7699)\n qui replace `generate'=2 if inrange(``ffind'',2520,2589) | inrange(``ffind'',2600,2699) | inrange(``ffind'',2750,2769) | inrange(``ffind'',2800,2829) | inrange(``ffind'',2840,2899) | inrange(``ffind'',3000,3099) | inrange(``ffind'',3200,3569) | inrange(``ffind'',3580,3629) | inrange(``ffind'',3700,3709) | inrange(``ffind'',3712,3713) | inrange(``ffind'',3715,3715) | inrange(``ffind'',3717,3749) | inrange(``ffind'',3752,3791) | inrange(``ffind'',3793,3799) | inrange(``ffind'',3830,3839) | inrange(``ffind'',3860,3899) | inrange(``ffind'',1200,1399) | inrange(``ffind'',2900,2999) | inrange(``ffind'',4900,4949)\n qui replace `generate'=3 if inrange(``ffind'',3570,3579) | inrange(``ffind'',3622,3622) | inrange(``ffind'',3660,3692) | inrange(``ffind'',3694,3699) | inrange(``ffind'',3810,3839) | inrange(``ffind'',7370,7372) | inrange(``ffind'',7373,7373) | inrange(``ffind'',7374,7374) | inrange(``ffind'',7375,7375) | inrange(``ffind'',7376,7376) | inrange(``ffind'',7377,7377) | inrange(``ffind'',7378,7378) | inrange(``ffind'',7379,7379) | inrange(``ffind'',7391,7391) | inrange(``ffind'',8730,8734) | inrange(``ffind'',4800,4899)\n qui replace `generate'=4 if inrange(``ffind'',2830,2839) | inrange(``ffind'',3693,3693) | inrange(``ffind'',3840,3859) | inrange(``ffind'',8000,8099)\n qui replace `generate'=5 if missing(`generate') & ~missing(``ffind'')\n }\n else if ``ftyp''==10 {\n label define `generate' 1 \"Consumer NonDurables -- Food, Tobacco, Textiles, Apparel, Leather, Toys\" 2 \"Consumer Durables -- Cars, TV's, Furniture, Household Appliances\" 3 \"Manufacturing -- Machinery, Trucks, Planes, Chemicals, Off Furn, Paper, Com Printing\" 4 \"Oil, Gas, and Coal Extraction and Products\" 5 \"Business Equipment -- Computers, Software, and Electronic Equipment\" 6 \"Telephone and Television Transmission\" 7 \"Wholesale, Retail, and Some Services (Laundries, Repair Shops)\" 8 \"Healthcare, Medical Equipment, and Drugs\" 9 \"Utilities\" 10 \"Other -- Mines, Constr, BldMt, Trans, Hotels, Bus Serv, Entertainment, Finance\"\n label values `generate' `generate'\n qui replace `generate'=1 if inrange(``ffind'',100,999) | inrange(``ffind'',2000,2399) | inrange(``ffind'',2700,2749) | inrange(``ffind'',2770,2799) | inrange(``ffind'',3100,3199) | inrange(``ffind'',3940,3989)\n qui replace `generate'=2 if inrange(``ffind'',2500,2519) | inrange(``ffind'',2590,2599) | inrange(``ffind'',3630,3659) | inrange(``ffind'',3710,3711) | inrange(``ffind'',3714,3714) | inrange(``ffind'',3716,3716) | inrange(``ffind'',3750,3751) | inrange(``ffind'',3792,3792) | inrange(``ffind'',3900,3939) | inrange(``ffind'',3990,3999)\n qui replace `generate'=3 if inrange(``ffind'',2520,2589) | inrange(``ffind'',2600,2699) | inrange(``ffind'',2750,2769) | inrange(``ffind'',2800,2829) | inrange(``ffind'',2840,2899) | inrange(``ffind'',3000,3099) | inrange(``ffind'',3200,3569) | inrange(``ffind'',3580,3629) | inrange(``ffind'',3700,3709) | inrange(``ffind'',3712,3713) | inrange(``ffind'',3715,3715) | inrange(``ffind'',3717,3749) | inrange(``ffind'',3752,3791) | inrange(``ffind'',3793,3799) | inrange(``ffind'',3830,3839) | inrange(``ffind'',3860,3899)\n qui replace `generate'=4 if inrange(``ffind'',1200,1399) | inrange(``ffind'',2900,2999)\n qui replace `generate'=5 if inrange(``ffind'',3570,3579) | inrange(``ffind'',3622,3622) | inrange(``ffind'',3660,3692) | inrange(``ffind'',3694,3699) | inrange(``ffind'',3810,3839) | inrange(``ffind'',7370,7372) | inrange(``ffind'',7373,7373) | inrange(``ffind'',7374,7374) | inrange(``ffind'',7375,7375) | inrange(``ffind'',7376,7376) | inrange(``ffind'',7377,7377) | inrange(``ffind'',7378,7378) | inrange(``ffind'',7379,7379) | inrange(``ffind'',7391,7391) | inrange(``ffind'',8730,8734)\n qui replace `generate'=6 if inrange(``ffind'',4800,4899)\n qui replace `generate'=7 if inrange(``ffind'',5000,5999) | inrange(``ffind'',7200,7299) | inrange(``ffind'',7600,7699)\n qui replace `generate'=8 if inrange(``ffind'',2830,2839) | inrange(``ffind'',3693,3693) | inrange(``ffind'',3840,3859) | inrange(``ffind'',8000,8099)\n qui replace `generate'=9 if inrange(``ffind'',4900,4949)\n qui replace `generate'=10 if missing(`generate') & ~missing(``ffind'')\n }\n else if ``ftyp''==12 {\n label define `generate' 1 \"Consumer NonDurables -- Food, Tobacco, Textiles, Apparel, Leather, Toys\" 2 \"Consumer Durables -- Cars, TV's, Furniture, Household Appliances\" 3 \"Manufacturing -- Machinery, Trucks, Planes, Off Furn, Paper, Com Printing\" 4 \"Oil, Gas, and Coal Extraction and Products\" 5 \"Chemicals and Allied Products\" 6 \"Business Equipment -- Computers, Software, and Electronic Equipment\" 7 \"Telephone and Television Transmission\" 8 \"Utilities\" 9 \"Wholesale, Retail, and Some Services (Laundries, Repair Shops)\" 10 \"Healthcare, Medical Equipment, and Drugs\" 11 \"Finance\" 12 \"Other -- Mines, Constr, BldMt, Trans, Hotels, Bus Serv, Entertainment\"\n label values `generate' `generate'\n qui replace `generate'=1 if inrange(``ffind'',100,999) | inrange(``ffind'',2000,2399) | inrange(``ffind'',2700,2749) | inrange(``ffind'',2770,2799) | inrange(``ffind'',3100,3199) | inrange(``ffind'',3940,3989)\n qui replace `generate'=2 if inrange(``ffind'',2500,2519) | inrange(``ffind'',2590,2599) | inrange(``ffind'',3630,3659) | inrange(``ffind'',3710,3711) | inrange(``ffind'',3714,3714) | inrange(``ffind'',3716,3716) | inrange(``ffind'',3750,3751) | inrange(``ffind'',3792,3792) | inrange(``ffind'',3900,3939) | inrange(``ffind'',3990,3999)\n qui replace `generate'=3 if inrange(``ffind'',2520,2589) | inrange(``ffind'',2600,2699) | inrange(``ffind'',2750,2769) | inrange(``ffind'',3000,3099) | inrange(``ffind'',3200,3569) | inrange(``ffind'',3580,3629) | inrange(``ffind'',3700,3709) | inrange(``ffind'',3712,3713) | inrange(``ffind'',3715,3715) | inrange(``ffind'',3717,3749) | inrange(``ffind'',3752,3791) | inrange(``ffind'',3793,3799) | inrange(``ffind'',3830,3839) | inrange(``ffind'',3860,3899)\n qui replace `generate'=4 if inrange(``ffind'',1200,1399) | inrange(``ffind'',2900,2999)\n qui replace `generate'=5 if inrange(``ffind'',2800,2829) | inrange(``ffind'',2840,2899)\n qui replace `generate'=6 if inrange(``ffind'',3570,3579) | inrange(``ffind'',3660,3692) | inrange(``ffind'',3694,3699) | inrange(``ffind'',3810,3829) | inrange(``ffind'',7370,7379)\n qui replace `generate'=7 if inrange(``ffind'',4800,4899)\n qui replace `generate'=8 if inrange(``ffind'',4900,4949)\n qui replace `generate'=9 if inrange(``ffind'',5000,5999) | inrange(``ffind'',7200,7299) | inrange(``ffind'',7600,7699)\n qui replace `generate'=10 if inrange(``ffind'',2830,2839) | inrange(``ffind'',3693,3693) | inrange(``ffind'',3840,3859) | inrange(``ffind'',8000,8099)\n qui replace `generate'=11 if inrange(``ffind'',6000,6999)\n qui replace `generate'=12 if missing(`generate') & ~missing(``ffind'')\n }\n\n else if ``ftyp''==17 {\n label define `generate' 1 \"Food\" 2 \"Mining and Minerals\" 3 \"Oil and Petroleum Products\" 4 \"Textiles, Apparel & Footware\" 5 \"Consumer Durables\" 6 \"Chemicals\" 7 \"Drugs, Soap, Prfums, Tobacco\" 8 \"Construction and Construction Materials\" 9 \"Steel Works Etc\" 10 \"Fabricated Products\" 11 \"Machinery and Business Equipment\" 12 \"Automobiles\" 13 \"Transportation\" 14 \"Utilities\" 15 \"Retail Stores\" 16 \"Banks, Insurance Companies, and Other Financials\" 17 \"Other\"\n label values `generate' `generate'\n qui replace `generate'=1 if inrange(``ffind'',100,199) | inrange(``ffind'',200,299) | inrange(``ffind'',700,799) | inrange(``ffind'',900,999) | inrange(``ffind'',2000,2009) | inrange(``ffind'',2010,2019) | inrange(``ffind'',2020,2029) | inrange(``ffind'',2030,2039) | inrange(``ffind'',2040,2046) | inrange(``ffind'',2047,2047) | inrange(``ffind'',2048,2048) | inrange(``ffind'',2050,2059) | inrange(``ffind'',2060,2063) | inrange(``ffind'',2064,2068) | inrange(``ffind'',2070,2079) | inrange(``ffind'',2080,2080) | inrange(``ffind'',2082,2082) | inrange(``ffind'',2083,2083) | inrange(``ffind'',2084,2084) | inrange(``ffind'',2085,2085) | inrange(``ffind'',2086,2086) | inrange(``ffind'',2087,2087) | inrange(``ffind'',2090,2092) | inrange(``ffind'',2095,2095) | inrange(``ffind'',2096,2096) | inrange(``ffind'',2097,2097) | inrange(``ffind'',2098,2099) | inrange(``ffind'',5140,5149) | inrange(``ffind'',5150,5159) | inrange(``ffind'',5180,5182) | inrange(``ffind'',5191,5191)\n qui replace `generate'=2 if inrange(``ffind'',1000,1009) | inrange(``ffind'',1010,1019) | inrange(``ffind'',1020,1029) | inrange(``ffind'',1030,1039) | inrange(``ffind'',1040,1049) | inrange(``ffind'',1060,1069) | inrange(``ffind'',1080,1089) | inrange(``ffind'',1090,1099) | inrange(``ffind'',1200,1299) | inrange(``ffind'',1400,1499) | inrange(``ffind'',5050,5052)\n qui replace `generate'=3 if inrange(``ffind'',1300,1300) | inrange(``ffind'',1310,1319) | inrange(``ffind'',1320,1329) | inrange(``ffind'',1380,1380) | inrange(``ffind'',1381,1381) | inrange(``ffind'',1382,1382) | inrange(``ffind'',1389,1389) | inrange(``ffind'',2900,2912) | inrange(``ffind'',5170,5172)\n qui replace `generate'=4 if inrange(``ffind'',2200,2269) | inrange(``ffind'',2270,2279) | inrange(``ffind'',2280,2284) | inrange(``ffind'',2290,2295) | inrange(``ffind'',2296,2296) | inrange(``ffind'',2297,2297) | inrange(``ffind'',2298,2298) | inrange(``ffind'',2299,2299) | inrange(``ffind'',2300,2390) | inrange(``ffind'',2391,2392) | inrange(``ffind'',2393,2395) | inrange(``ffind'',2396,2396) | inrange(``ffind'',2397,2399) | inrange(``ffind'',3020,3021) | inrange(``ffind'',3100,3111) | inrange(``ffind'',3130,3131) | inrange(``ffind'',3140,3149) | inrange(``ffind'',3150,3151) | inrange(``ffind'',3963,3965) | inrange(``ffind'',5130,5139)\n qui replace `generate'=5 if inrange(``ffind'',2510,2519) | inrange(``ffind'',2590,2599) | inrange(``ffind'',3060,3069) | inrange(``ffind'',3070,3079) | inrange(``ffind'',3080,3089) | inrange(``ffind'',3090,3099) | inrange(``ffind'',3630,3639) | inrange(``ffind'',3650,3651) | inrange(``ffind'',3652,3652) | inrange(``ffind'',3860,3861) | inrange(``ffind'',3870,3873) | inrange(``ffind'',3910,3911) | inrange(``ffind'',3914,3914) | inrange(``ffind'',3915,3915) | inrange(``ffind'',3930,3931) | inrange(``ffind'',3940,3949) | inrange(``ffind'',3960,3962) | inrange(``ffind'',5020,5023) | inrange(``ffind'',5064,5064) | inrange(``ffind'',5094,5094) | inrange(``ffind'',5099,5099)\n qui replace `generate'=6 if inrange(``ffind'',2800,2809) | inrange(``ffind'',2810,2819) | inrange(``ffind'',2820,2829) | inrange(``ffind'',2860,2869) | inrange(``ffind'',2870,2879) | inrange(``ffind'',2890,2899) | inrange(``ffind'',5160,5169)\n qui replace `generate'=7 if inrange(``ffind'',2100,2199) | inrange(``ffind'',2830,2830) | inrange(``ffind'',2831,2831) | inrange(``ffind'',2833,2833) | inrange(``ffind'',2834,2834) | inrange(``ffind'',2840,2843) | inrange(``ffind'',2844,2844) | inrange(``ffind'',5120,5122) | inrange(``ffind'',5194,5194)\n qui replace `generate'=8 if inrange(``ffind'',800,899) | inrange(``ffind'',1500,1511) | inrange(``ffind'',1520,1529) | inrange(``ffind'',1530,1539) | inrange(``ffind'',1540,1549) | inrange(``ffind'',1600,1699) | inrange(``ffind'',1700,1799) | inrange(``ffind'',2400,2439) | inrange(``ffind'',2440,2449) | inrange(``ffind'',2450,2459) | inrange(``ffind'',2490,2499) | inrange(``ffind'',2850,2859) | inrange(``ffind'',2950,2952) | inrange(``ffind'',3200,3200) | inrange(``ffind'',3210,3211) | inrange(``ffind'',3240,3241) | inrange(``ffind'',3250,3259) | inrange(``ffind'',3261,3261) | inrange(``ffind'',3264,3264) | inrange(``ffind'',3270,3275) | inrange(``ffind'',3280,3281) | inrange(``ffind'',3290,3293) | inrange(``ffind'',3420,3429) | inrange(``ffind'',3430,3433) | inrange(``ffind'',3440,3441) | inrange(``ffind'',3442,3442) | inrange(``ffind'',3446,3446) | inrange(``ffind'',3448,3448) | inrange(``ffind'',3449,3449) | inrange(``ffind'',3450,3451) | inrange(``ffind'',3452,3452) | inrange(``ffind'',5030,5039) | inrange(``ffind'',5070,5078) | inrange(``ffind'',5198,5198) | inrange(``ffind'',5210,5211) | inrange(``ffind'',5230,5231) | inrange(``ffind'',5250,5251)\n qui replace `generate'=9 if inrange(``ffind'',3300,3300) | inrange(``ffind'',3310,3317) | inrange(``ffind'',3320,3325) | inrange(``ffind'',3330,3339) | inrange(``ffind'',3340,3341) | inrange(``ffind'',3350,3357) | inrange(``ffind'',3360,3369) | inrange(``ffind'',3390,3399)\n qui replace `generate'=10 if inrange(``ffind'',3410,3412) | inrange(``ffind'',3443,3443) | inrange(``ffind'',3444,3444) | inrange(``ffind'',3460,3469) | inrange(``ffind'',3470,3479) | inrange(``ffind'',3480,3489) | inrange(``ffind'',3490,3499)\n qui replace `generate'=11 if inrange(``ffind'',3510,3519) | inrange(``ffind'',3520,3529) | inrange(``ffind'',3530,3530) | inrange(``ffind'',3531,3531) | inrange(``ffind'',3532,3532) | inrange(``ffind'',3533,3533) | inrange(``ffind'',3534,3534) | inrange(``ffind'',3535,3535) | inrange(``ffind'',3536,3536) | inrange(``ffind'',3540,3549) | inrange(``ffind'',3550,3559) | inrange(``ffind'',3560,3569) | inrange(``ffind'',3570,3579) | inrange(``ffind'',3580,3580) | inrange(``ffind'',3581,3581) | inrange(``ffind'',3582,3582) | inrange(``ffind'',3585,3585) | inrange(``ffind'',3586,3586) | inrange(``ffind'',3589,3589) | inrange(``ffind'',3590,3599) | inrange(``ffind'',3600,3600) | inrange(``ffind'',3610,3613) | inrange(``ffind'',3620,3621) | inrange(``ffind'',3622,3622) | inrange(``ffind'',3623,3629) | inrange(``ffind'',3670,3679) | inrange(``ffind'',3680,3680) | inrange(``ffind'',3681,3681) | inrange(``ffind'',3682,3682) | inrange(``ffind'',3683,3683) | inrange(``ffind'',3684,3684) | inrange(``ffind'',3685,3685) | inrange(``ffind'',3686,3686) | inrange(``ffind'',3687,3687) | inrange(``ffind'',3688,3688) | inrange(``ffind'',3689,3689) | inrange(``ffind'',3690,3690) | inrange(``ffind'',3691,3692) | inrange(``ffind'',3693,3693) | inrange(``ffind'',3694,3694) | inrange(``ffind'',3695,3695) | inrange(``ffind'',3699,3699) | inrange(``ffind'',3810,3810) | inrange(``ffind'',3811,3811) | inrange(``ffind'',3812,3812) | inrange(``ffind'',3820,3820) | inrange(``ffind'',3821,3821) | inrange(``ffind'',3822,3822) | inrange(``ffind'',3823,3823) | inrange(``ffind'',3824,3824) | inrange(``ffind'',3825,3825) | inrange(``ffind'',3826,3826) | inrange(``ffind'',3827,3827) | inrange(``ffind'',3829,3829) | inrange(``ffind'',3830,3839) | inrange(``ffind'',3950,3955) | inrange(``ffind'',5060,5060) | inrange(``ffind'',5063,5063) | inrange(``ffind'',5065,5065) | inrange(``ffind'',5080,5080) | inrange(``ffind'',5081,5081)\n qui replace `generate'=12 if inrange(``ffind'',3710,3710) | inrange(``ffind'',3711,3711) | inrange(``ffind'',3714,3714) | inrange(``ffind'',3716,3716) | inrange(``ffind'',3750,3751) | inrange(``ffind'',3792,3792) | inrange(``ffind'',5010,5015) | inrange(``ffind'',5510,5521) | inrange(``ffind'',5530,5531) | inrange(``ffind'',5560,5561) | inrange(``ffind'',5570,5571) | inrange(``ffind'',5590,5599)\n qui replace `generate'=13 if inrange(``ffind'',3713,3713) | inrange(``ffind'',3715,3715) | inrange(``ffind'',3720,3720) | inrange(``ffind'',3721,3721) | inrange(``ffind'',3724,3724) | inrange(``ffind'',3725,3725) | inrange(``ffind'',3728,3728) | inrange(``ffind'',3730,3731) | inrange(``ffind'',3732,3732) | inrange(``ffind'',3740,3743) | inrange(``ffind'',3760,3769) | inrange(``ffind'',3790,3790) | inrange(``ffind'',3795,3795) | inrange(``ffind'',3799,3799) | inrange(``ffind'',4000,4013) | inrange(``ffind'',4100,4100) | inrange(``ffind'',4110,4119) | inrange(``ffind'',4120,4121) | inrange(``ffind'',4130,4131) | inrange(``ffind'',4140,4142) | inrange(``ffind'',4150,4151) | inrange(``ffind'',4170,4173) | inrange(``ffind'',4190,4199) | inrange(``ffind'',4200,4200) | inrange(``ffind'',4210,4219) | inrange(``ffind'',4220,4229) | inrange(``ffind'',4230,4231) | inrange(``ffind'',4400,4499) | inrange(``ffind'',4500,4599) | inrange(``ffind'',4600,4699) | inrange(``ffind'',4700,4700) | inrange(``ffind'',4710,4712) | inrange(``ffind'',4720,4729) | inrange(``ffind'',4730,4739) | inrange(``ffind'',4740,4742) | inrange(``ffind'',4780,4780) | inrange(``ffind'',4783,4783) | inrange(``ffind'',4785,4785) | inrange(``ffind'',4789,4789)\n qui replace `generate'=14 if inrange(``ffind'',4900,4900) | inrange(``ffind'',4910,4911) | inrange(``ffind'',4920,4922) | inrange(``ffind'',4923,4923) | inrange(``ffind'',4924,4925) | inrange(``ffind'',4930,4931) | inrange(``ffind'',4932,4932) | inrange(``ffind'',4939,4939) | inrange(``ffind'',4940,4942)\n qui replace `generate'=15 if inrange(``ffind'',5260,5261) | inrange(``ffind'',5270,5271) | inrange(``ffind'',5300,5300) | inrange(``ffind'',5310,5311) | inrange(``ffind'',5320,5320) | inrange(``ffind'',5330,5331) | inrange(``ffind'',5334,5334) | inrange(``ffind'',5390,5399) | inrange(``ffind'',5400,5400) | inrange(``ffind'',5410,5411) | inrange(``ffind'',5412,5412) | inrange(``ffind'',5420,5421) | inrange(``ffind'',5430,5431) | inrange(``ffind'',5440,5441) | inrange(``ffind'',5450,5451) | inrange(``ffind'',5460,5461) | inrange(``ffind'',5490,5499) | inrange(``ffind'',5540,5541) | inrange(``ffind'',5550,5551) | inrange(``ffind'',5600,5699) | inrange(``ffind'',5700,5700) | inrange(``ffind'',5710,5719) | inrange(``ffind'',5720,5722) | inrange(``ffind'',5730,5733) | inrange(``ffind'',5734,5734) | inrange(``ffind'',5735,5735) | inrange(``ffind'',5736,5736) | inrange(``ffind'',5750,5750) | inrange(``ffind'',5800,5813) | inrange(``ffind'',5890,5890) | inrange(``ffind'',5900,5900) | inrange(``ffind'',5910,5912) | inrange(``ffind'',5920,5921) | inrange(``ffind'',5930,5932) | inrange(``ffind'',5940,5940) | inrange(``ffind'',5941,5941) | inrange(``ffind'',5942,5942) | inrange(``ffind'',5943,5943) | inrange(``ffind'',5944,5944) | inrange(``ffind'',5945,5945) | inrange(``ffind'',5946,5946) | inrange(``ffind'',5947,5947) | inrange(``ffind'',5948,5948) | inrange(``ffind'',5949,5949) | inrange(``ffind'',5960,5963) | inrange(``ffind'',5980,5989) | inrange(``ffind'',5990,5990) | inrange(``ffind'',5992,5992) | inrange(``ffind'',5993,5993) | inrange(``ffind'',5994,5994) | inrange(``ffind'',5995,5995) | inrange(``ffind'',5999,5999)\n qui replace `generate'=16 if inrange(``ffind'',6010,6019) | inrange(``ffind'',6020,6020) | inrange(``ffind'',6021,6021) | inrange(``ffind'',6022,6022) | inrange(``ffind'',6023,6023) | inrange(``ffind'',6025,6025) | inrange(``ffind'',6026,6026) | inrange(``ffind'',6028,6029) | inrange(``ffind'',6030,6036) | inrange(``ffind'',6040,6049) | inrange(``ffind'',6050,6059) | inrange(``ffind'',6060,6062) | inrange(``ffind'',6080,6082) | inrange(``ffind'',6090,6099) | inrange(``ffind'',6100,6100) | inrange(``ffind'',6110,6111) | inrange(``ffind'',6112,6112) | inrange(``ffind'',6120,6129) | inrange(``ffind'',6140,6149) | inrange(``ffind'',6150,6159) | inrange(``ffind'',6160,6163) | inrange(``ffind'',6172,6172) | inrange(``ffind'',6199,6199) | inrange(``ffind'',6200,6299) | inrange(``ffind'',6300,6300) | inrange(``ffind'',6310,6312) | inrange(``ffind'',6320,6324) | inrange(``ffind'',6330,6331) | inrange(``ffind'',6350,6351) | inrange(``ffind'',6360,6361) | inrange(``ffind'',6370,6371) | inrange(``ffind'',6390,6399) | inrange(``ffind'',6400,6411) | inrange(``ffind'',6500,6500) | inrange(``ffind'',6510,6510) | inrange(``ffind'',6512,6512) | inrange(``ffind'',6513,6513) | inrange(``ffind'',6514,6514) | inrange(``ffind'',6515,6515) | inrange(``ffind'',6517,6519) | inrange(``ffind'',6530,6531) | inrange(``ffind'',6532,6532) | inrange(``ffind'',6540,6541) | inrange(``ffind'',6550,6553) | inrange(``ffind'',6611,6611) | inrange(``ffind'',6700,6700) | inrange(``ffind'',6710,6719) | inrange(``ffind'',6720,6722) | inrange(``ffind'',6723,6723) | inrange(``ffind'',6724,6724) | inrange(``ffind'',6725,6725) | inrange(``ffind'',6726,6726) | inrange(``ffind'',6730,6733) | inrange(``ffind'',6790,6790) | inrange(``ffind'',6792,6792) | inrange(``ffind'',6794,6794) | inrange(``ffind'',6795,6795) | inrange(``ffind'',6798,6798) | inrange(``ffind'',6799,6799)\n qui replace `generate'=17 if missing(`generate') & ~missing(``ffind'')\n\n }\n\n else if ``ftyp''==30 {\n label define `generate' 1 \"Food Products\" 2 \"Beer & Liquor\" 3 \"Tobacco Products\" 4 \"Recreation\" 5 \"Printing and Publishing\" 6 \"Consumer Goods\" 7 \"Apparel\" 8 \"Healthcare, Medical Equipment, Pharmaceutical Products\" 9 \"Chemicals\" 10 \"Textiles\" 11 \"Construction and Construction Materials\" 12 \"Steel Works Etc\" 13 \"Fabricated Products and Machinery\" 14 \"Electrical Equipment\" 15 \"Automobiles and Trucks\" 16 \"Aircraft, ships, and railroad equipment\" 17 \"Precious Metals, Non-Metallic, and Industrial Metal Mining\" 18 \"Coal\" 19 \"Petroleum and Natural Gas\" 20 \"Utilities\" 21 \"Communication\" 22 \"Personal and Business Services\" 23 \"Business Equipment\" 24 \"Business Supplies and Shipping Containers\" 25 \"Transportation\" 26 \"Wholesale\" 27 \"Retail\" 28 \"Restaraunts, Hotels, Motels\" 29 \"Banking, Insurance, Real Estate, Trading\" 30 \"Everything Else\"\n label values `generate' `generate'\n qui replace `generate'=1 if inrange(``ffind'',100,199) | inrange(``ffind'',200,299) | inrange(``ffind'',700,799) | inrange(``ffind'',910,919) | inrange(``ffind'',2000,2009) | inrange(``ffind'',2010,2019) | inrange(``ffind'',2020,2029) | inrange(``ffind'',2030,2039) | inrange(``ffind'',2040,2046) | inrange(``ffind'',2048,2048) | inrange(``ffind'',2050,2059) | inrange(``ffind'',2060,2063) | inrange(``ffind'',2064,2068) | inrange(``ffind'',2070,2079) | inrange(``ffind'',2086,2086) | inrange(``ffind'',2087,2087) | inrange(``ffind'',2090,2092) | inrange(``ffind'',2095,2095) | inrange(``ffind'',2096,2096) | inrange(``ffind'',2097,2097) | inrange(``ffind'',2098,2099)\n qui replace `generate'=2 if inrange(``ffind'',2080,2080) | inrange(``ffind'',2082,2082) | inrange(``ffind'',2083,2083) | inrange(``ffind'',2084,2084) | inrange(``ffind'',2085,2085)\n qui replace `generate'=3 if inrange(``ffind'',2100,2199)\n qui replace `generate'=4 if inrange(``ffind'',920,999) | inrange(``ffind'',3650,3651) | inrange(``ffind'',3652,3652) | inrange(``ffind'',3732,3732) | inrange(``ffind'',3930,3931) | inrange(``ffind'',3940,3949) | inrange(``ffind'',7800,7829) | inrange(``ffind'',7830,7833) | inrange(``ffind'',7840,7841) | inrange(``ffind'',7900,7900) | inrange(``ffind'',7910,7911) | inrange(``ffind'',7920,7929) | inrange(``ffind'',7930,7933) | inrange(``ffind'',7940,7949) | inrange(``ffind'',7980,7980) | inrange(``ffind'',7990,7999)\n qui replace `generate'=5 if inrange(``ffind'',2700,2709) | inrange(``ffind'',2710,2719) | inrange(``ffind'',2720,2729) | inrange(``ffind'',2730,2739) | inrange(``ffind'',2740,2749) | inrange(``ffind'',2750,2759) | inrange(``ffind'',2770,2771) | inrange(``ffind'',2780,2789) | inrange(``ffind'',2790,2799) | inrange(``ffind'',3993,3993)\n qui replace `generate'=6 if inrange(``ffind'',2047,2047) | inrange(``ffind'',2391,2392) | inrange(``ffind'',2510,2519) | inrange(``ffind'',2590,2599) | inrange(``ffind'',2840,2843) | inrange(``ffind'',2844,2844) | inrange(``ffind'',3160,3161) | inrange(``ffind'',3170,3171) | inrange(``ffind'',3172,3172) | inrange(``ffind'',3190,3199) | inrange(``ffind'',3229,3229) | inrange(``ffind'',3260,3260) | inrange(``ffind'',3262,3263) | inrange(``ffind'',3269,3269) | inrange(``ffind'',3230,3231) | inrange(``ffind'',3630,3639) | inrange(``ffind'',3750,3751) | inrange(``ffind'',3800,3800) | inrange(``ffind'',3860,3861) | inrange(``ffind'',3870,3873) | inrange(``ffind'',3910,3911) | inrange(``ffind'',3914,3914) | inrange(``ffind'',3915,3915) | inrange(``ffind'',3960,3962) | inrange(``ffind'',3991,3991) | inrange(``ffind'',3995,3995)\n qui replace `generate'=7 if inrange(``ffind'',2300,2390) | inrange(``ffind'',3020,3021) | inrange(``ffind'',3100,3111) | inrange(``ffind'',3130,3131) | inrange(``ffind'',3140,3149) | inrange(``ffind'',3150,3151) | inrange(``ffind'',3963,3965)\n qui replace `generate'=8 if inrange(``ffind'',2830,2830) | inrange(``ffind'',2831,2831) | inrange(``ffind'',2833,2833) | inrange(``ffind'',2834,2834) | inrange(``ffind'',2835,2835) | inrange(``ffind'',2836,2836) | inrange(``ffind'',3693,3693) | inrange(``ffind'',3840,3849) | inrange(``ffind'',3850,3851) | inrange(``ffind'',8000,8099)\n qui replace `generate'=9 if inrange(``ffind'',2800,2809) | inrange(``ffind'',2810,2819) | inrange(``ffind'',2820,2829) | inrange(``ffind'',2850,2859) | inrange(``ffind'',2860,2869) | inrange(``ffind'',2870,2879) | inrange(``ffind'',2890,2899)\n qui replace `generate'=10 if inrange(``ffind'',2200,2269) | inrange(``ffind'',2270,2279) | inrange(``ffind'',2280,2284) | inrange(``ffind'',2290,2295) | inrange(``ffind'',2297,2297) | inrange(``ffind'',2298,2298) | inrange(``ffind'',2299,2299) | inrange(``ffind'',2393,2395) | inrange(``ffind'',2397,2399)\n qui replace `generate'=11 if inrange(``ffind'',800,899) | inrange(``ffind'',1500,1511) | inrange(``ffind'',1520,1529) | inrange(``ffind'',1530,1539) | inrange(``ffind'',1540,1549) | inrange(``ffind'',1600,1699) | inrange(``ffind'',1700,1799) | inrange(``ffind'',2400,2439) | inrange(``ffind'',2450,2459) | inrange(``ffind'',2490,2499) | inrange(``ffind'',2660,2661) | inrange(``ffind'',2950,2952) | inrange(``ffind'',3200,3200) | inrange(``ffind'',3210,3211) | inrange(``ffind'',3240,3241) | inrange(``ffind'',3250,3259) | inrange(``ffind'',3261,3261) | inrange(``ffind'',3264,3264) | inrange(``ffind'',3270,3275) | inrange(``ffind'',3280,3281) | inrange(``ffind'',3290,3293) | inrange(``ffind'',3295,3299) | inrange(``ffind'',3420,3429) | inrange(``ffind'',3430,3433) | inrange(``ffind'',3440,3441) | inrange(``ffind'',3442,3442) | inrange(``ffind'',3446,3446) | inrange(``ffind'',3448,3448) | inrange(``ffind'',3449,3449) | inrange(``ffind'',3450,3451) | inrange(``ffind'',3452,3452) | inrange(``ffind'',3490,3499) | inrange(``ffind'',3996,3996)\n qui replace `generate'=12 if inrange(``ffind'',3300,3300) | inrange(``ffind'',3310,3317) | inrange(``ffind'',3320,3325) | inrange(``ffind'',3330,3339) | inrange(``ffind'',3340,3341) | inrange(``ffind'',3350,3357) | inrange(``ffind'',3360,3369) | inrange(``ffind'',3370,3379) | inrange(``ffind'',3390,3399)\n qui replace `generate'=13 if inrange(``ffind'',3400,3400) | inrange(``ffind'',3443,3443) | inrange(``ffind'',3444,3444) | inrange(``ffind'',3460,3469) | inrange(``ffind'',3470,3479) | inrange(``ffind'',3510,3519) | inrange(``ffind'',3520,3529) | inrange(``ffind'',3530,3530) | inrange(``ffind'',3531,3531) | inrange(``ffind'',3532,3532) | inrange(``ffind'',3533,3533) | inrange(``ffind'',3534,3534) | inrange(``ffind'',3535,3535) | inrange(``ffind'',3536,3536) | inrange(``ffind'',3538,3538) | inrange(``ffind'',3540,3549) | inrange(``ffind'',3550,3559) | inrange(``ffind'',3560,3569) | inrange(``ffind'',3580,3580) | inrange(``ffind'',3581,3581) | inrange(``ffind'',3582,3582) | inrange(``ffind'',3585,3585) | inrange(``ffind'',3586,3586) | inrange(``ffind'',3589,3589) | inrange(``ffind'',3590,3599)\n qui replace `generate'=14 if inrange(``ffind'',3600,3600) | inrange(``ffind'',3610,3613) | inrange(``ffind'',3620,3621) | inrange(``ffind'',3623,3629) | inrange(``ffind'',3640,3644) | inrange(``ffind'',3645,3645) | inrange(``ffind'',3646,3646) | inrange(``ffind'',3648,3649) | inrange(``ffind'',3660,3660) | inrange(``ffind'',3690,3690) | inrange(``ffind'',3691,3692) | inrange(``ffind'',3699,3699)\n qui replace `generate'=15 if inrange(``ffind'',2296,2296) | inrange(``ffind'',2396,2396) | inrange(``ffind'',3010,3011) | inrange(``ffind'',3537,3537) | inrange(``ffind'',3647,3647) | inrange(``ffind'',3694,3694) | inrange(``ffind'',3700,3700) | inrange(``ffind'',3710,3710) | inrange(``ffind'',3711,3711) | inrange(``ffind'',3713,3713) | inrange(``ffind'',3714,3714) | inrange(``ffind'',3715,3715) | inrange(``ffind'',3716,3716) | inrange(``ffind'',3792,3792) | inrange(``ffind'',3790,3791) | inrange(``ffind'',3799,3799)\n qui replace `generate'=16 if inrange(``ffind'',3720,3720) | inrange(``ffind'',3721,3721) | inrange(``ffind'',3723,3724) | inrange(``ffind'',3725,3725) | inrange(``ffind'',3728,3729) | inrange(``ffind'',3730,3731) | inrange(``ffind'',3740,3743)\n qui replace `generate'=17 if inrange(``ffind'',1000,1009) | inrange(``ffind'',1010,1019) | inrange(``ffind'',1020,1029) | inrange(``ffind'',1030,1039) | inrange(``ffind'',1040,1049) | inrange(``ffind'',1050,1059) | inrange(``ffind'',1060,1069) | inrange(``ffind'',1070,1079) | inrange(``ffind'',1080,1089) | inrange(``ffind'',1090,1099) | inrange(``ffind'',1100,1119) | inrange(``ffind'',1400,1499)\n qui replace `generate'=18 if inrange(``ffind'',1200,1299)\n qui replace `generate'=19 if inrange(``ffind'',1300,1300) | inrange(``ffind'',1310,1319) | inrange(``ffind'',1320,1329) | inrange(``ffind'',1330,1339) | inrange(``ffind'',1370,1379) | inrange(``ffind'',1380,1380) | inrange(``ffind'',1381,1381) | inrange(``ffind'',1382,1382) | inrange(``ffind'',1389,1389) | inrange(``ffind'',2900,2912) | inrange(``ffind'',2990,2999)\n qui replace `generate'=20 if inrange(``ffind'',4900,4900) | inrange(``ffind'',4910,4911) | inrange(``ffind'',4920,4922) | inrange(``ffind'',4923,4923) | inrange(``ffind'',4924,4925) | inrange(``ffind'',4930,4931) | inrange(``ffind'',4932,4932) | inrange(``ffind'',4939,4939) | inrange(``ffind'',4940,4942)\n qui replace `generate'=21 if inrange(``ffind'',4800,4800) | inrange(``ffind'',4810,4813) | inrange(``ffind'',4820,4822) | inrange(``ffind'',4830,4839) | inrange(``ffind'',4840,4841) | inrange(``ffind'',4880,4889) | inrange(``ffind'',4890,4890) | inrange(``ffind'',4891,4891) | inrange(``ffind'',4892,4892) | inrange(``ffind'',4899,4899)\n qui replace `generate'=22 if inrange(``ffind'',7020,7021) | inrange(``ffind'',7030,7033) | inrange(``ffind'',7200,7200) | inrange(``ffind'',7210,7212) | inrange(``ffind'',7214,7214) | inrange(``ffind'',7215,7216) | inrange(``ffind'',7217,7217) | inrange(``ffind'',7218,7218) | inrange(``ffind'',7219,7219) | inrange(``ffind'',7220,7221) | inrange(``ffind'',7230,7231) | inrange(``ffind'',7240,7241) | inrange(``ffind'',7250,7251) | inrange(``ffind'',7260,7269) | inrange(``ffind'',7270,7290) | inrange(``ffind'',7291,7291) | inrange(``ffind'',7292,7299) | inrange(``ffind'',7300,7300) | inrange(``ffind'',7310,7319) | inrange(``ffind'',7320,7329) | inrange(``ffind'',7330,7339) | inrange(``ffind'',7340,7342) | inrange(``ffind'',7349,7349) | inrange(``ffind'',7350,7351) | inrange(``ffind'',7352,7352) | inrange(``ffind'',7353,7353) | inrange(``ffind'',7359,7359) | inrange(``ffind'',7360,7369) | inrange(``ffind'',7370,7372) | inrange(``ffind'',7374,7374) | inrange(``ffind'',7375,7375) | inrange(``ffind'',7376,7376) | inrange(``ffind'',7377,7377) | inrange(``ffind'',7378,7378) | inrange(``ffind'',7379,7379) | inrange(``ffind'',7380,7380) | inrange(``ffind'',7381,7382) | inrange(``ffind'',7383,7383) | inrange(``ffind'',7384,7384) | inrange(``ffind'',7385,7385) | inrange(``ffind'',7389,7390) | inrange(``ffind'',7391,7391) | inrange(``ffind'',7392,7392) | inrange(``ffind'',7393,7393) | inrange(``ffind'',7394,7394) | inrange(``ffind'',7395,7395) | inrange(``ffind'',7396,7396) | inrange(``ffind'',7397,7397) | inrange(``ffind'',7399,7399) | inrange(``ffind'',7500,7500) | inrange(``ffind'',7510,7519) | inrange(``ffind'',7520,7529) | inrange(``ffind'',7530,7539) | inrange(``ffind'',7540,7549) | inrange(``ffind'',7600,7600) | inrange(``ffind'',7620,7620) | inrange(``ffind'',7622,7622) | inrange(``ffind'',7623,7623) | inrange(``ffind'',7629,7629) | inrange(``ffind'',7630,7631) | inrange(``ffind'',7640,7641) | inrange(``ffind'',7690,7699) | inrange(``ffind'',8100,8199) | inrange(``ffind'',8200,8299) | inrange(``ffind'',8300,8399) | inrange(``ffind'',8400,8499) | inrange(``ffind'',8600,8699) | inrange(``ffind'',8700,8700) | inrange(``ffind'',8710,8713) | inrange(``ffind'',8720,8721) | inrange(``ffind'',8730,8734) | inrange(``ffind'',8740,8748) | inrange(``ffind'',8800,8899) | inrange(``ffind'',8900,8910) | inrange(``ffind'',8911,8911) | inrange(``ffind'',8920,8999)\n qui replace `generate'=23 if inrange(``ffind'',3570,3579) | inrange(``ffind'',3622,3622) | inrange(``ffind'',3661,3661) | inrange(``ffind'',3662,3662) | inrange(``ffind'',3663,3663) | inrange(``ffind'',3664,3664) | inrange(``ffind'',3665,3665) | inrange(``ffind'',3666,3666) | inrange(``ffind'',3669,3669) | inrange(``ffind'',3670,3679) | inrange(``ffind'',3680,3680) | inrange(``ffind'',3681,3681) | inrange(``ffind'',3682,3682) | inrange(``ffind'',3683,3683) | inrange(``ffind'',3684,3684) | inrange(``ffind'',3685,3685) | inrange(``ffind'',3686,3686) | inrange(``ffind'',3687,3687) | inrange(``ffind'',3688,3688) | inrange(``ffind'',3689,3689) | inrange(``ffind'',3695,3695) | inrange(``ffind'',3810,3810) | inrange(``ffind'',3811,3811) | inrange(``ffind'',3812,3812) | inrange(``ffind'',3820,3820) | inrange(``ffind'',3821,3821) | inrange(``ffind'',3822,3822) | inrange(``ffind'',3823,3823) | inrange(``ffind'',3824,3824) | inrange(``ffind'',3825,3825) | inrange(``ffind'',3826,3826) | inrange(``ffind'',3827,3827) | inrange(``ffind'',3829,3829) | inrange(``ffind'',3830,3839) | inrange(``ffind'',7373,7373)\n qui replace `generate'=24 if inrange(``ffind'',2440,2449) | inrange(``ffind'',2520,2549) | inrange(``ffind'',2600,2639) | inrange(``ffind'',2640,2659) | inrange(``ffind'',2670,2699) | inrange(``ffind'',2760,2761) | inrange(``ffind'',3220,3221) | inrange(``ffind'',3410,3412) | inrange(``ffind'',3950,3955)\n qui replace `generate'=25 if inrange(``ffind'',4000,4013) | inrange(``ffind'',4040,4049) | inrange(``ffind'',4100,4100) | inrange(``ffind'',4110,4119) | inrange(``ffind'',4120,4121) | inrange(``ffind'',4130,4131) | inrange(``ffind'',4140,4142) | inrange(``ffind'',4150,4151) | inrange(``ffind'',4170,4173) | inrange(``ffind'',4190,4199) | inrange(``ffind'',4200,4200) | inrange(``ffind'',4210,4219) | inrange(``ffind'',4220,4229) | inrange(``ffind'',4230,4231) | inrange(``ffind'',4240,4249) | inrange(``ffind'',4400,4499) | inrange(``ffind'',4500,4599) | inrange(``ffind'',4600,4699) | inrange(``ffind'',4700,4700) | inrange(``ffind'',4710,4712) | inrange(``ffind'',4720,4729) | inrange(``ffind'',4730,4739) | inrange(``ffind'',4740,4749) | inrange(``ffind'',4780,4780) | inrange(``ffind'',4782,4782) | inrange(``ffind'',4783,4783) | inrange(``ffind'',4784,4784) | inrange(``ffind'',4785,4785) | inrange(``ffind'',4789,4789)\n qui replace `generate'=26 if inrange(``ffind'',5000,5000) | inrange(``ffind'',5010,5015) | inrange(``ffind'',5020,5023) | inrange(``ffind'',5030,5039) | inrange(``ffind'',5040,5042) | inrange(``ffind'',5043,5043) | inrange(``ffind'',5044,5044) | inrange(``ffind'',5045,5045) | inrange(``ffind'',5046,5046) | inrange(``ffind'',5047,5047) | inrange(``ffind'',5048,5048) | inrange(``ffind'',5049,5049) | inrange(``ffind'',5050,5059) | inrange(``ffind'',5060,5060) | inrange(``ffind'',5063,5063) | inrange(``ffind'',5064,5064) | inrange(``ffind'',5065,5065) | inrange(``ffind'',5070,5078) | inrange(``ffind'',5080,5080) | inrange(``ffind'',5081,5081) | inrange(``ffind'',5082,5082) | inrange(``ffind'',5083,5083) | inrange(``ffind'',5084,5084) | inrange(``ffind'',5085,5085) | inrange(``ffind'',5086,5087) | inrange(``ffind'',5088,5088) | inrange(``ffind'',5090,5090) | inrange(``ffind'',5091,5092) | inrange(``ffind'',5093,5093) | inrange(``ffind'',5094,5094) | inrange(``ffind'',5099,5099) | inrange(``ffind'',5100,5100) | inrange(``ffind'',5110,5113) | inrange(``ffind'',5120,5122) | inrange(``ffind'',5130,5139) | inrange(``ffind'',5140,5149) | inrange(``ffind'',5150,5159) | inrange(``ffind'',5160,5169) | inrange(``ffind'',5170,5172) | inrange(``ffind'',5180,5182) | inrange(``ffind'',5190,5199)\n qui replace `generate'=27 if inrange(``ffind'',5200,5200) | inrange(``ffind'',5210,5219) | inrange(``ffind'',5220,5229) | inrange(``ffind'',5230,5231) | inrange(``ffind'',5250,5251) | inrange(``ffind'',5260,5261) | inrange(``ffind'',5270,5271) | inrange(``ffind'',5300,5300) | inrange(``ffind'',5310,5311) | inrange(``ffind'',5320,5320) | inrange(``ffind'',5330,5331) | inrange(``ffind'',5334,5334) | inrange(``ffind'',5340,5349) | inrange(``ffind'',5390,5399) | inrange(``ffind'',5400,5400) | inrange(``ffind'',5410,5411) | inrange(``ffind'',5412,5412) | inrange(``ffind'',5420,5429) | inrange(``ffind'',5430,5439) | inrange(``ffind'',5440,5449) | inrange(``ffind'',5450,5459) | inrange(``ffind'',5460,5469) | inrange(``ffind'',5490,5499) | inrange(``ffind'',5500,5500) | inrange(``ffind'',5510,5529) | inrange(``ffind'',5530,5539) | inrange(``ffind'',5540,5549) | inrange(``ffind'',5550,5559) | inrange(``ffind'',5560,5569) | inrange(``ffind'',5570,5579) | inrange(``ffind'',5590,5599) | inrange(``ffind'',5600,5699) | inrange(``ffind'',5700,5700) | inrange(``ffind'',5710,5719) | inrange(``ffind'',5720,5722) | inrange(``ffind'',5730,5733) | inrange(``ffind'',5734,5734) | inrange(``ffind'',5735,5735) | inrange(``ffind'',5736,5736) | inrange(``ffind'',5750,5799) | inrange(``ffind'',5900,5900) | inrange(``ffind'',5910,5912) | inrange(``ffind'',5920,5929) | inrange(``ffind'',5930,5932) | inrange(``ffind'',5940,5940) | inrange(``ffind'',5941,5941) | inrange(``ffind'',5942,5942) | inrange(``ffind'',5943,5943) | inrange(``ffind'',5944,5944) | inrange(``ffind'',5945,5945) | inrange(``ffind'',5946,5946) | inrange(``ffind'',5947,5947) | inrange(``ffind'',5948,5948) | inrange(``ffind'',5949,5949) | inrange(``ffind'',5950,5959) | inrange(``ffind'',5960,5969) | inrange(``ffind'',5970,5979) | inrange(``ffind'',5980,5989) | inrange(``ffind'',5990,5990) | inrange(``ffind'',5992,5992) | inrange(``ffind'',5993,5993) | inrange(``ffind'',5994,5994) | inrange(``ffind'',5995,5995) | inrange(``ffind'',5999,5999)\n qui replace `generate'=28 if inrange(``ffind'',5800,5819) | inrange(``ffind'',5820,5829) | inrange(``ffind'',5890,5899) | inrange(``ffind'',7000,7000) | inrange(``ffind'',7010,7019) | inrange(``ffind'',7040,7049) | inrange(``ffind'',7213,7213)\n qui replace `generate'=29 if inrange(``ffind'',6000,6000) | inrange(``ffind'',6010,6019) | inrange(``ffind'',6020,6020) | inrange(``ffind'',6021,6021) | inrange(``ffind'',6022,6022) | inrange(``ffind'',6023,6024) | inrange(``ffind'',6025,6025) | inrange(``ffind'',6026,6026) | inrange(``ffind'',6027,6027) | inrange(``ffind'',6028,6029) | inrange(``ffind'',6030,6036) | inrange(``ffind'',6040,6059) | inrange(``ffind'',6060,6062) | inrange(``ffind'',6080,6082) | inrange(``ffind'',6090,6099) | inrange(``ffind'',6100,6100) | inrange(``ffind'',6110,6111) | inrange(``ffind'',6112,6113) | inrange(``ffind'',6120,6129) | inrange(``ffind'',6130,6139) | inrange(``ffind'',6140,6149) | inrange(``ffind'',6150,6159) | inrange(``ffind'',6160,6169) | inrange(``ffind'',6170,6179) | inrange(``ffind'',6190,6199) | inrange(``ffind'',6200,6299) | inrange(``ffind'',6300,6300) | inrange(``ffind'',6310,6319) | inrange(``ffind'',6320,6329) | inrange(``ffind'',6330,6331) | inrange(``ffind'',6350,6351) | inrange(``ffind'',6360,6361) | inrange(``ffind'',6370,6379) | inrange(``ffind'',6390,6399) | inrange(``ffind'',6400,6411) | inrange(``ffind'',6500,6500) | inrange(``ffind'',6510,6510) | inrange(``ffind'',6512,6512) | inrange(``ffind'',6513,6513) | inrange(``ffind'',6514,6514) | inrange(``ffind'',6515,6515) | inrange(``ffind'',6517,6519) | inrange(``ffind'',6520,6529) | inrange(``ffind'',6530,6531) | inrange(``ffind'',6532,6532) | inrange(``ffind'',6540,6541) | inrange(``ffind'',6550,6553) | inrange(``ffind'',6590,6599) | inrange(``ffind'',6610,6611) | inrange(``ffind'',6700,6700) | inrange(``ffind'',6710,6719) | inrange(``ffind'',6720,6722) | inrange(``ffind'',6723,6723) | inrange(``ffind'',6724,6724) | inrange(``ffind'',6725,6725) | inrange(``ffind'',6726,6726) | inrange(``ffind'',6730,6733) | inrange(``ffind'',6740,6779) | inrange(``ffind'',6790,6791) | inrange(``ffind'',6792,6792) | inrange(``ffind'',6793,6793) | inrange(``ffind'',6794,6794) | inrange(``ffind'',6795,6795) | inrange(``ffind'',6798,6798) | inrange(``ffind'',6799,6799)\n qui replace `generate'=30 if missing(`generate') & ~missing(``ffind'')\n }\n\n else if ``ftyp''==38 {\n label define `generate' 1 \"Agriculture, forestry, and fishing\" 2 \"Mining\" 3 \"Oil and Gas Extraction\" 4 \"Nonmetalic Minerals Except Fuels\" 5 \"Construction\" 6 \"Food and Kindred Products\" 7 \"Tobacco Products\" 8 \"Textile Mill Products\" 9 \"Apparel and other Textile Products\" 10 \"Lumber and Wood Products\" 11 \"Furniture and Fixtures\" 12 \"Paper and Allied Products\" 13 \"Printing and Publishing\" 14 \"Chemicals and Allied Products\" 15 \"Petroleum and Coal Products\" 16 \"Rubber and Miscellaneous Plastics Products\" 17 \"Leather and Leather Products\" 18 \"Stone, Clay and Glass Products\" 19 \"Primary Metal Industries\" 20 \"Fabricated Metal Products\" 21 \"Machinery, Except Electrical\" 22 \"Electrical and Electronic Equipment\" 23 \"Transportation Equipment\" 24 \"Instruments and Related Products\" 25 \"Miscellaneous Manufacturing Industries\" 26 \"Transportation\" 27 \"Telephone and Telegraph Communication\" 28 \"Radio and Television Broadcasting\" 29 \"Electric, Gas, and Water Supply\" 30 \"Sanitary Services\" 31 \"Steam Supply\" 32 \"Irrigation Systems\" 33 \"Wholesale\" 34 \"Retail Stores\" 35 \"Finance, Insurance, and Real Estate\" 36 \"Services\" 37 \"Public Administration\" 38 \"Almost Nothing\"\n label values `generate' `generate'\n qui replace `generate'=1 if inrange(``ffind'',100,999)\n qui replace `generate'=2 if inrange(``ffind'',1000,1299)\n qui replace `generate'=3 if inrange(``ffind'',1300,1399)\n qui replace `generate'=4 if inrange(``ffind'',1400,1499)\n qui replace `generate'=5 if inrange(``ffind'',1500,1799)\n qui replace `generate'=6 if inrange(``ffind'',2000,2099)\n qui replace `generate'=7 if inrange(``ffind'',2100,2199)\n qui replace `generate'=8 if inrange(``ffind'',2200,2299)\n qui replace `generate'=9 if inrange(``ffind'',2300,2399)\n qui replace `generate'=10 if inrange(``ffind'',2400,2499)\n qui replace `generate'=11 if inrange(``ffind'',2500,2599)\n qui replace `generate'=12 if inrange(``ffind'',2600,2661)\n qui replace `generate'=13 if inrange(``ffind'',2700,2799)\n qui replace `generate'=14 if inrange(``ffind'',2800,2899)\n qui replace `generate'=15 if inrange(``ffind'',2900,2999)\n qui replace `generate'=16 if inrange(``ffind'',3000,3099)\n qui replace `generate'=17 if inrange(``ffind'',3100,3199)\n qui replace `generate'=18 if inrange(``ffind'',3200,3299)\n qui replace `generate'=19 if inrange(``ffind'',3300,3399)\n qui replace `generate'=20 if inrange(``ffind'',3400,3499)\n qui replace `generate'=21 if inrange(``ffind'',3500,3599)\n qui replace `generate'=22 if inrange(``ffind'',3600,3699)\n qui replace `generate'=23 if inrange(``ffind'',3700,3799)\n qui replace `generate'=24 if inrange(``ffind'',3800,3879)\n qui replace `generate'=25 if inrange(``ffind'',3900,3999)\n qui replace `generate'=26 if inrange(``ffind'',4000,4799)\n qui replace `generate'=27 if inrange(``ffind'',4800,4829)\n qui replace `generate'=28 if inrange(``ffind'',4830,4899)\n qui replace `generate'=29 if inrange(``ffind'',4900,4949)\n qui replace `generate'=30 if inrange(``ffind'',4950,4959)\n qui replace `generate'=31 if inrange(``ffind'',4960,4969)\n qui replace `generate'=32 if inrange(``ffind'',4970,4979)\n qui replace `generate'=33 if inrange(``ffind'',5000,5199)\n qui replace `generate'=34 if inrange(``ffind'',5200,5999)\n qui replace `generate'=35 if inrange(``ffind'',6000,6999)\n qui replace `generate'=36 if inrange(``ffind'',7000,8999)\n qui replace `generate'=37 if inrange(``ffind'',9000,9999)\n qui replace `generate'=38 if missing(`generate') & ~missing(``ffind'')\n\n }\n else if ``ftyp''==48 {\n label define `generate' 1 \"Agriculture\" 2 \"Food Products\" 3 \"Candy & Soda\" 4 \"Beer & Liquor\" 5 \"Tobacco Products\" 6 \"Recreation\" 7 \"Entertainment\" 8 \"Printing and Publishing\" 9 \"Consumer Goods\" 10 \"Apparel\" 11 \"Healthcare\" 12 \"Medical Equipment\" 13 \"Pharmaceutical Products\" 14 \"Chemicals\" 15 \"Rubber and Plastic Products\" 16 \"Textiles\" 17 \"Construction Materials\" 18 \"Construction\" 19 \"Steel Works Etc\" 20 \"Fabricated Products\" 21 \"Machinery\" 22 \"Electrical Equipment\" 23 \"Automobiles and Trucks\" 24 \"Aircraft\" 25 \"Shipbuilding, Railroad Equipment\" 26 \"Defense\" 27 \"Precious Metals\" 28 \"Non-Metallic and Industrial Metal Mining\" 29 \"Coal\" 30 \"Petroleum and Natural Gas\" 31 \"Utilities\" 32 \"Communication\" 33 \"Personal Services\" 34 \"Business Services\" 35 \"Computers\" 36 \"Electronic Equipment\" 37 \"Measuring and Control Equipment\" 38 \"Business Supplies\" 39 \"Shipping Containers\" 40 \"Transportation\" 41 \"Wholesale\" 42 \"Retail\" 43 \"Restaraunts, Hotels, Motels\" 44 \"Banking\" 45 \"Insurance\" 46 \"Real Estate\" 47 \"Trading\" 48 \"Almost Nothing\"\n label values `generate' `generate'\n qui replace `generate'=1 if inrange(``ffind'',100,199) | inrange(``ffind'',200,299) | inrange(``ffind'',700,799) | inrange(``ffind'',910,919) | inrange(``ffind'',2048,2048)\n qui replace `generate'=2 if inrange(``ffind'',2000,2009) | inrange(``ffind'',2010,2019) | inrange(``ffind'',2020,2029) | inrange(``ffind'',2030,2039) | inrange(``ffind'',2040,2046) | inrange(``ffind'',2050,2059) | inrange(``ffind'',2060,2063) | inrange(``ffind'',2070,2079) | inrange(``ffind'',2090,2092) | inrange(``ffind'',2095,2095) | inrange(``ffind'',2098,2099)\n qui replace `generate'=3 if inrange(``ffind'',2064,2068) | inrange(``ffind'',2086,2086) | inrange(``ffind'',2087,2087) | inrange(``ffind'',2096,2096) | inrange(``ffind'',2097,2097)\n qui replace `generate'=4 if inrange(``ffind'',2080,2080) | inrange(``ffind'',2082,2082) | inrange(``ffind'',2083,2083) | inrange(``ffind'',2084,2084) | inrange(``ffind'',2085,2085)\n qui replace `generate'=5 if inrange(``ffind'',2100,2199)\n qui replace `generate'=6 if inrange(``ffind'',920,999) | inrange(``ffind'',3650,3651) | inrange(``ffind'',3652,3652) | inrange(``ffind'',3732,3732) | inrange(``ffind'',3930,3931) | inrange(``ffind'',3940,3949)\n qui replace `generate'=7 if inrange(``ffind'',7800,7829) | inrange(``ffind'',7830,7833) | inrange(``ffind'',7840,7841) | inrange(``ffind'',7900,7900) | inrange(``ffind'',7910,7911) | inrange(``ffind'',7920,7929) | inrange(``ffind'',7930,7933) | inrange(``ffind'',7940,7949) | inrange(``ffind'',7980,7980) | inrange(``ffind'',7990,7999)\n qui replace `generate'=8 if inrange(``ffind'',2700,2709) | inrange(``ffind'',2710,2719) | inrange(``ffind'',2720,2729) | inrange(``ffind'',2730,2739) | inrange(``ffind'',2740,2749) | inrange(``ffind'',2770,2771) | inrange(``ffind'',2780,2789) | inrange(``ffind'',2790,2799)\n qui replace `generate'=9 if inrange(``ffind'',2047,2047) | inrange(``ffind'',2391,2392) | inrange(``ffind'',2510,2519) | inrange(``ffind'',2590,2599) | inrange(``ffind'',2840,2843) | inrange(``ffind'',2844,2844) | inrange(``ffind'',3160,3161) | inrange(``ffind'',3170,3171) | inrange(``ffind'',3172,3172) | inrange(``ffind'',3190,3199) | inrange(``ffind'',3229,3229) | inrange(``ffind'',3260,3260) | inrange(``ffind'',3262,3263) | inrange(``ffind'',3269,3269) | inrange(``ffind'',3230,3231) | inrange(``ffind'',3630,3639) | inrange(``ffind'',3750,3751) | inrange(``ffind'',3800,3800) | inrange(``ffind'',3860,3861) | inrange(``ffind'',3870,3873) | inrange(``ffind'',3910,3911) | inrange(``ffind'',3914,3914) | inrange(``ffind'',3915,3915) | inrange(``ffind'',3960,3962) | inrange(``ffind'',3991,3991) | inrange(``ffind'',3995,3995)\n qui replace `generate'=10 if inrange(``ffind'',2300,2390) | inrange(``ffind'',3020,3021) | inrange(``ffind'',3100,3111) | inrange(``ffind'',3130,3131) | inrange(``ffind'',3140,3149) | inrange(``ffind'',3150,3151) | inrange(``ffind'',3963,3965)\n qui replace `generate'=11 if inrange(``ffind'',8000,8099)\n qui replace `generate'=12 if inrange(``ffind'',3693,3693) | inrange(``ffind'',3840,3849) | inrange(``ffind'',3850,3851)\n qui replace `generate'=13 if inrange(``ffind'',2830,2830) | inrange(``ffind'',2831,2831) | inrange(``ffind'',2833,2833) | inrange(``ffind'',2834,2834) | inrange(``ffind'',2835,2835) | inrange(``ffind'',2836,2836)\n qui replace `generate'=14 if inrange(``ffind'',2800,2809) | inrange(``ffind'',2810,2819) | inrange(``ffind'',2820,2829) | inrange(``ffind'',2850,2859) | inrange(``ffind'',2860,2869) | inrange(``ffind'',2870,2879) | inrange(``ffind'',2890,2899)\n qui replace `generate'=15 if inrange(``ffind'',3031,3031) | inrange(``ffind'',3041,3041) | inrange(``ffind'',3050,3053) | inrange(``ffind'',3060,3069) | inrange(``ffind'',3070,3079) | inrange(``ffind'',3080,3089) | inrange(``ffind'',3090,3099)\n qui replace `generate'=16 if inrange(``ffind'',2200,2269) | inrange(``ffind'',2270,2279) | inrange(``ffind'',2280,2284) | inrange(``ffind'',2290,2295) | inrange(``ffind'',2297,2297) | inrange(``ffind'',2298,2298) | inrange(``ffind'',2299,2299) | inrange(``ffind'',2393,2395) | inrange(``ffind'',2397,2399)\n qui replace `generate'=17 if inrange(``ffind'',800,899) | inrange(``ffind'',2400,2439) | inrange(``ffind'',2450,2459) | inrange(``ffind'',2490,2499) | inrange(``ffind'',2660,2661) | inrange(``ffind'',2950,2952) | inrange(``ffind'',3200,3200) | inrange(``ffind'',3210,3211) | inrange(``ffind'',3240,3241) | inrange(``ffind'',3250,3259) | inrange(``ffind'',3261,3261) | inrange(``ffind'',3264,3264) | inrange(``ffind'',3270,3275) | inrange(``ffind'',3280,3281) | inrange(``ffind'',3290,3293) | inrange(``ffind'',3295,3299) | inrange(``ffind'',3420,3429) | inrange(``ffind'',3430,3433) | inrange(``ffind'',3440,3441) | inrange(``ffind'',3442,3442) | inrange(``ffind'',3446,3446) | inrange(``ffind'',3448,3448) | inrange(``ffind'',3449,3449) | inrange(``ffind'',3450,3451) | inrange(``ffind'',3452,3452) | inrange(``ffind'',3490,3499) | inrange(``ffind'',3996,3996)\n qui replace `generate'=18 if inrange(``ffind'',1500,1511) | inrange(``ffind'',1520,1529) | inrange(``ffind'',1530,1539) | inrange(``ffind'',1540,1549) | inrange(``ffind'',1600,1699) | inrange(``ffind'',1700,1799)\n qui replace `generate'=19 if inrange(``ffind'',3300,3300) | inrange(``ffind'',3310,3317) | inrange(``ffind'',3320,3325) | inrange(``ffind'',3330,3339) | inrange(``ffind'',3340,3341) | inrange(``ffind'',3350,3357) | inrange(``ffind'',3360,3369) | inrange(``ffind'',3370,3379) | inrange(``ffind'',3390,3399)\n qui replace `generate'=20 if inrange(``ffind'',3400,3400) | inrange(``ffind'',3443,3443) | inrange(``ffind'',3444,3444) | inrange(``ffind'',3460,3469) | inrange(``ffind'',3470,3479)\n qui replace `generate'=21 if inrange(``ffind'',3510,3519) | inrange(``ffind'',3520,3529) | inrange(``ffind'',3530,3530) | inrange(``ffind'',3531,3531) | inrange(``ffind'',3532,3532) | inrange(``ffind'',3533,3533) | inrange(``ffind'',3534,3534) | inrange(``ffind'',3535,3535) | inrange(``ffind'',3536,3536) | inrange(``ffind'',3538,3538) | inrange(``ffind'',3540,3549) | inrange(``ffind'',3550,3559) | inrange(``ffind'',3560,3569) | inrange(``ffind'',3580,3580) | inrange(``ffind'',3581,3581) | inrange(``ffind'',3582,3582) | inrange(``ffind'',3585,3585) | inrange(``ffind'',3586,3586) | inrange(``ffind'',3589,3589) | inrange(``ffind'',3590,3599)\n qui replace `generate'=22 if inrange(``ffind'',3600,3600) | inrange(``ffind'',3610,3613) | inrange(``ffind'',3620,3621) | inrange(``ffind'',3623,3629) | inrange(``ffind'',3640,3644) | inrange(``ffind'',3645,3645) | inrange(``ffind'',3646,3646) | inrange(``ffind'',3648,3649) | inrange(``ffind'',3660,3660) | inrange(``ffind'',3690,3690) | inrange(``ffind'',3691,3692) | inrange(``ffind'',3699,3699)\n qui replace `generate'=23 if inrange(``ffind'',2296,2296) | inrange(``ffind'',2396,2396) | inrange(``ffind'',3010,3011) | inrange(``ffind'',3537,3537) | inrange(``ffind'',3647,3647) | inrange(``ffind'',3694,3694) | inrange(``ffind'',3700,3700) | inrange(``ffind'',3710,3710) | inrange(``ffind'',3711,3711) | inrange(``ffind'',3713,3713) | inrange(``ffind'',3714,3714) | inrange(``ffind'',3715,3715) | inrange(``ffind'',3716,3716) | inrange(``ffind'',3792,3792) | inrange(``ffind'',3790,3791) | inrange(``ffind'',3799,3799)\n qui replace `generate'=24 if inrange(``ffind'',3720,3720) | inrange(``ffind'',3721,3721) | inrange(``ffind'',3723,3724) | inrange(``ffind'',3725,3725) | inrange(``ffind'',3728,3729)\n qui replace `generate'=25 if inrange(``ffind'',3730,3731) | inrange(``ffind'',3740,3743)\n qui replace `generate'=26 if inrange(``ffind'',3760,3769) | inrange(``ffind'',3795,3795) | inrange(``ffind'',3480,3489)\n qui replace `generate'=27 if inrange(``ffind'',1040,1049)\n qui replace `generate'=28 if inrange(``ffind'',1000,1009) | inrange(``ffind'',1010,1019) | inrange(``ffind'',1020,1029) | inrange(``ffind'',1030,1039) | inrange(``ffind'',1050,1059) | inrange(``ffind'',1060,1069) | inrange(``ffind'',1070,1079) | inrange(``ffind'',1080,1089) | inrange(``ffind'',1090,1099) | inrange(``ffind'',1100,1119) | inrange(``ffind'',1400,1499)\n qui replace `generate'=29 if inrange(``ffind'',1200,1299)\n qui replace `generate'=30 if inrange(``ffind'',1300,1300) | inrange(``ffind'',1310,1319) | inrange(``ffind'',1320,1329) | inrange(``ffind'',1330,1339) | inrange(``ffind'',1370,1379) | inrange(``ffind'',1380,1380) | inrange(``ffind'',1381,1381) | inrange(``ffind'',1382,1382) | inrange(``ffind'',1389,1389) | inrange(``ffind'',2900,2912) | inrange(``ffind'',2990,2999)\n qui replace `generate'=31 if inrange(``ffind'',4900,4900) | inrange(``ffind'',4910,4911) | inrange(``ffind'',4920,4922) | inrange(``ffind'',4923,4923) | inrange(``ffind'',4924,4925) | inrange(``ffind'',4930,4931) | inrange(``ffind'',4932,4932) | inrange(``ffind'',4939,4939) | inrange(``ffind'',4940,4942)\n qui replace `generate'=32 if inrange(``ffind'',4800,4800) | inrange(``ffind'',4810,4813) | inrange(``ffind'',4820,4822) | inrange(``ffind'',4830,4839) | inrange(``ffind'',4840,4841) | inrange(``ffind'',4880,4889) | inrange(``ffind'',4890,4890) | inrange(``ffind'',4891,4891) | inrange(``ffind'',4892,4892) | inrange(``ffind'',4899,4899)\n qui replace `generate'=33 if inrange(``ffind'',7020,7021) | inrange(``ffind'',7030,7033) | inrange(``ffind'',7200,7200) | inrange(``ffind'',7210,7212) | inrange(``ffind'',7214,7214) | inrange(``ffind'',7215,7216) | inrange(``ffind'',7217,7217) | inrange(``ffind'',7219,7219) | inrange(``ffind'',7220,7221) | inrange(``ffind'',7230,7231) | inrange(``ffind'',7240,7241) | inrange(``ffind'',7250,7251) | inrange(``ffind'',7260,7269) | inrange(``ffind'',7270,7290) | inrange(``ffind'',7291,7291) | inrange(``ffind'',7292,7299) | inrange(``ffind'',7395,7395) | inrange(``ffind'',7500,7500) | inrange(``ffind'',7520,7529) | inrange(``ffind'',7530,7539) | inrange(``ffind'',7540,7549) | inrange(``ffind'',7600,7600) | inrange(``ffind'',7620,7620) | inrange(``ffind'',7622,7622) | inrange(``ffind'',7623,7623) | inrange(``ffind'',7629,7629) | inrange(``ffind'',7630,7631) | inrange(``ffind'',7640,7641) | inrange(``ffind'',7690,7699) | inrange(``ffind'',8100,8199) | inrange(``ffind'',8200,8299) | inrange(``ffind'',8300,8399) | inrange(``ffind'',8400,8499) | inrange(``ffind'',8600,8699) | inrange(``ffind'',8800,8899) | inrange(``ffind'',7510,7515)\n qui replace `generate'=34 if inrange(``ffind'',2750,2759) | inrange(``ffind'',3993,3993) | inrange(``ffind'',7218,7218) | inrange(``ffind'',7300,7300) | inrange(``ffind'',7310,7319) | inrange(``ffind'',7320,7329) | inrange(``ffind'',7330,7339) | inrange(``ffind'',7340,7342) | inrange(``ffind'',7349,7349) | inrange(``ffind'',7350,7351) | inrange(``ffind'',7352,7352) | inrange(``ffind'',7353,7353) | inrange(``ffind'',7359,7359) | inrange(``ffind'',7360,7369) | inrange(``ffind'',7370,7372) | inrange(``ffind'',7374,7374) | inrange(``ffind'',7375,7375) | inrange(``ffind'',7376,7376) | inrange(``ffind'',7377,7377) | inrange(``ffind'',7378,7378) | inrange(``ffind'',7379,7379) | inrange(``ffind'',7380,7380) | inrange(``ffind'',7381,7382) | inrange(``ffind'',7383,7383) | inrange(``ffind'',7384,7384) | inrange(``ffind'',7385,7385) | inrange(``ffind'',7389,7390) | inrange(``ffind'',7391,7391) | inrange(``ffind'',7392,7392) | inrange(``ffind'',7393,7393) | inrange(``ffind'',7394,7394) | inrange(``ffind'',7396,7396) | inrange(``ffind'',7397,7397) | inrange(``ffind'',7399,7399) | inrange(``ffind'',7519,7519) | inrange(``ffind'',8700,8700) | inrange(``ffind'',8710,8713) | inrange(``ffind'',8720,8721) | inrange(``ffind'',8730,8734) | inrange(``ffind'',8740,8748) | inrange(``ffind'',8900,8910) | inrange(``ffind'',8911,8911) | inrange(``ffind'',8920,8999) | inrange(``ffind'',4220,4229)\n qui replace `generate'=35 if inrange(``ffind'',3570,3579) | inrange(``ffind'',3680,3680) | inrange(``ffind'',3681,3681) | inrange(``ffind'',3682,3682) | inrange(``ffind'',3683,3683) | inrange(``ffind'',3684,3684) | inrange(``ffind'',3685,3685) | inrange(``ffind'',3686,3686) | inrange(``ffind'',3687,3687) | inrange(``ffind'',3688,3688) | inrange(``ffind'',3689,3689) | inrange(``ffind'',3695,3695) | inrange(``ffind'',7373,7373)\n qui replace `generate'=36 if inrange(``ffind'',3622,3622) | inrange(``ffind'',3661,3661) | inrange(``ffind'',3662,3662) | inrange(``ffind'',3663,3663) | inrange(``ffind'',3664,3664) | inrange(``ffind'',3665,3665) | inrange(``ffind'',3666,3666) | inrange(``ffind'',3669,3669) | inrange(``ffind'',3670,3679) | inrange(``ffind'',3810,3810) | inrange(``ffind'',3812,3812)\n qui replace `generate'=37 if inrange(``ffind'',3811,3811) | inrange(``ffind'',3820,3820) | inrange(``ffind'',3821,3821) | inrange(``ffind'',3822,3822) | inrange(``ffind'',3823,3823) | inrange(``ffind'',3824,3824) | inrange(``ffind'',3825,3825) | inrange(``ffind'',3826,3826) | inrange(``ffind'',3827,3827) | inrange(``ffind'',3829,3829) | inrange(``ffind'',3830,3839)\n qui replace `generate'=38 if inrange(``ffind'',2520,2549) | inrange(``ffind'',2600,2639) | inrange(``ffind'',2670,2699) | inrange(``ffind'',2760,2761) | inrange(``ffind'',3950,3955)\n qui replace `generate'=39 if inrange(``ffind'',2440,2449) | inrange(``ffind'',2640,2659) | inrange(``ffind'',3220,3221) | inrange(``ffind'',3410,3412)\n qui replace `generate'=40 if inrange(``ffind'',4000,4013) | inrange(``ffind'',4040,4049) | inrange(``ffind'',4100,4100) | inrange(``ffind'',4110,4119) | inrange(``ffind'',4120,4121) | inrange(``ffind'',4130,4131) | inrange(``ffind'',4140,4142) | inrange(``ffind'',4150,4151) | inrange(``ffind'',4170,4173) | inrange(``ffind'',4190,4199) | inrange(``ffind'',4200,4200) | inrange(``ffind'',4210,4219) | inrange(``ffind'',4230,4231) | inrange(``ffind'',4240,4249) | inrange(``ffind'',4400,4499) | inrange(``ffind'',4500,4599) | inrange(``ffind'',4600,4699) | inrange(``ffind'',4700,4700) | inrange(``ffind'',4710,4712) | inrange(``ffind'',4720,4729) | inrange(``ffind'',4730,4739) | inrange(``ffind'',4740,4749) | inrange(``ffind'',4780,4780) | inrange(``ffind'',4782,4782) | inrange(``ffind'',4783,4783) | inrange(``ffind'',4784,4784) | inrange(``ffind'',4785,4785) | inrange(``ffind'',4789,4789)\n qui replace `generate'=41 if inrange(``ffind'',5000,5000) | inrange(``ffind'',5010,5015) | inrange(``ffind'',5020,5023) | inrange(``ffind'',5030,5039) | inrange(``ffind'',5040,5042) | inrange(``ffind'',5043,5043) | inrange(``ffind'',5044,5044) | inrange(``ffind'',5045,5045) | inrange(``ffind'',5046,5046) | inrange(``ffind'',5047,5047) | inrange(``ffind'',5048,5048) | inrange(``ffind'',5049,5049) | inrange(``ffind'',5050,5059) | inrange(``ffind'',5060,5060) | inrange(``ffind'',5063,5063) | inrange(``ffind'',5064,5064) | inrange(``ffind'',5065,5065) | inrange(``ffind'',5070,5078) | inrange(``ffind'',5080,5080) | inrange(``ffind'',5081,5081) | inrange(``ffind'',5082,5082) | inrange(``ffind'',5083,5083) | inrange(``ffind'',5084,5084) | inrange(``ffind'',5085,5085) | inrange(``ffind'',5086,5087) | inrange(``ffind'',5088,5088) | inrange(``ffind'',5090,5090) | inrange(``ffind'',5091,5092) | inrange(``ffind'',5093,5093) | inrange(``ffind'',5094,5094) | inrange(``ffind'',5099,5099) | inrange(``ffind'',5100,5100) | inrange(``ffind'',5110,5113) | inrange(``ffind'',5120,5122) | inrange(``ffind'',5130,5139) | inrange(``ffind'',5140,5149) | inrange(``ffind'',5150,5159) | inrange(``ffind'',5160,5169) | inrange(``ffind'',5170,5172) | inrange(``ffind'',5180,5182) | inrange(``ffind'',5190,5199)\n qui replace `generate'=42 if inrange(``ffind'',5200,5200) | inrange(``ffind'',5210,5219) | inrange(``ffind'',5220,5229) | inrange(``ffind'',5230,5231) | inrange(``ffind'',5250,5251) | inrange(``ffind'',5260,5261) | inrange(``ffind'',5270,5271) | inrange(``ffind'',5300,5300) | inrange(``ffind'',5310,5311) | inrange(``ffind'',5320,5320) | inrange(``ffind'',5330,5331) | inrange(``ffind'',5334,5334) | inrange(``ffind'',5340,5349) | inrange(``ffind'',5390,5399) | inrange(``ffind'',5400,5400) | inrange(``ffind'',5410,5411) | inrange(``ffind'',5412,5412) | inrange(``ffind'',5420,5429) | inrange(``ffind'',5430,5439) | inrange(``ffind'',5440,5449) | inrange(``ffind'',5450,5459) | inrange(``ffind'',5460,5469) | inrange(``ffind'',5490,5499) | inrange(``ffind'',5500,5500) | inrange(``ffind'',5510,5529) | inrange(``ffind'',5530,5539) | inrange(``ffind'',5540,5549) | inrange(``ffind'',5550,5559) | inrange(``ffind'',5560,5569) | inrange(``ffind'',5570,5579) | inrange(``ffind'',5590,5599) | inrange(``ffind'',5600,5699) | inrange(``ffind'',5700,5700) | inrange(``ffind'',5710,5719) | inrange(``ffind'',5720,5722) | inrange(``ffind'',5730,5733) | inrange(``ffind'',5734,5734) | inrange(``ffind'',5735,5735) | inrange(``ffind'',5736,5736) | inrange(``ffind'',5750,5799) | inrange(``ffind'',5900,5900) | inrange(``ffind'',5910,5912) | inrange(``ffind'',5920,5929) | inrange(``ffind'',5930,5932) | inrange(``ffind'',5940,5940) | inrange(``ffind'',5941,5941) | inrange(``ffind'',5942,5942) | inrange(``ffind'',5943,5943) | inrange(``ffind'',5944,5944) | inrange(``ffind'',5945,5945) | inrange(``ffind'',5946,5946) | inrange(``ffind'',5947,5947) | inrange(``ffind'',5948,5948) | inrange(``ffind'',5949,5949) | inrange(``ffind'',5950,5959) | inrange(``ffind'',5960,5969) | inrange(``ffind'',5970,5979) | inrange(``ffind'',5980,5989) | inrange(``ffind'',5990,5990) | inrange(``ffind'',5992,5992) | inrange(``ffind'',5993,5993) | inrange(``ffind'',5994,5994) | inrange(``ffind'',5995,5995) | inrange(``ffind'',5999,5999)\n qui replace `generate'=43 if inrange(``ffind'',5800,5819) | inrange(``ffind'',5820,5829) | inrange(``ffind'',5890,5899) | inrange(``ffind'',7000,7000) | inrange(``ffind'',7010,7019) | inrange(``ffind'',7040,7049) | inrange(``ffind'',7213,7213)\n qui replace `generate'=44 if inrange(``ffind'',6000,6000) | inrange(``ffind'',6010,6019) | inrange(``ffind'',6020,6020) | inrange(``ffind'',6021,6021) | inrange(``ffind'',6022,6022) | inrange(``ffind'',6023,6024) | inrange(``ffind'',6025,6025) | inrange(``ffind'',6026,6026) | inrange(``ffind'',6027,6027) | inrange(``ffind'',6028,6029) | inrange(``ffind'',6030,6036) | inrange(``ffind'',6040,6059) | inrange(``ffind'',6060,6062) | inrange(``ffind'',6080,6082) | inrange(``ffind'',6090,6099) | inrange(``ffind'',6100,6100) | inrange(``ffind'',6110,6111) | inrange(``ffind'',6112,6113) | inrange(``ffind'',6120,6129) | inrange(``ffind'',6130,6139) | inrange(``ffind'',6140,6149) | inrange(``ffind'',6150,6159) | inrange(``ffind'',6160,6169) | inrange(``ffind'',6170,6179) | inrange(``ffind'',6190,6199)\n qui replace `generate'=45 if inrange(``ffind'',6300,6300) | inrange(``ffind'',6310,6319) | inrange(``ffind'',6320,6329) | inrange(``ffind'',6330,6331) | inrange(``ffind'',6350,6351) | inrange(``ffind'',6360,6361) | inrange(``ffind'',6370,6379) | inrange(``ffind'',6390,6399) | inrange(``ffind'',6400,6411)\n qui replace `generate'=46 if inrange(``ffind'',6500,6500) | inrange(``ffind'',6510,6510) | inrange(``ffind'',6512,6512) | inrange(``ffind'',6513,6513) | inrange(``ffind'',6514,6514) | inrange(``ffind'',6515,6515) | inrange(``ffind'',6517,6519) | inrange(``ffind'',6520,6529) | inrange(``ffind'',6530,6531) | inrange(``ffind'',6532,6532) | inrange(``ffind'',6540,6541) | inrange(``ffind'',6550,6553) | inrange(``ffind'',6590,6599) | inrange(``ffind'',6610,6611)\n qui replace `generate'=47 if inrange(``ffind'',6200,6299) | inrange(``ffind'',6700,6700) | inrange(``ffind'',6710,6719) | inrange(``ffind'',6720,6722) | inrange(``ffind'',6723,6723) | inrange(``ffind'',6724,6724) | inrange(``ffind'',6725,6725) | inrange(``ffind'',6726,6726) | inrange(``ffind'',6730,6733) | inrange(``ffind'',6740,6779) | inrange(``ffind'',6790,6791) | inrange(``ffind'',6792,6792) | inrange(``ffind'',6793,6793) | inrange(``ffind'',6794,6794) | inrange(``ffind'',6795,6795) | inrange(``ffind'',6798,6798) | inrange(``ffind'',6799,6799)\n qui replace `generate'=48 if missing(`generate') & ~missing(``ffind'')\n\n }\n else if ``ftyp''==49 {\n label define `generate' 1 \"Agriculture\" 2 \"Food Products\" 3 \"Candy & Soda\" 4 \"Beer & Liquor\" 5 \"Tobacco Products\" 6 \"Recreation\" 7 \"Entertainment\" 8 \"Printing and Publishing\" 9 \"Consumer Goods\" 10 \"Apparel\" 11 \"Healthcare\" 12 \"Medical Equipment\" 13 \"Pharmaceutical Products\" 14 \"Chemicals\" 15 \"Rubber and Plastic Products\" 16 \"Textiles\" 17 \"Construction Materials\" 18 \"Construction\" 19 \"Steel Works Etc\" 20 \"Fabricated Products\" 21 \"Machinery\" 22 \"Electrical Equipment\" 23 \"Automobiles and Trucks\" 24 \"Aircraft\" 25 \"Shipbuilding, Railroad Equipment\" 26 \"Defense\" 27 \"Precious Metals\" 28 \"Non-Metallic and Industrial Metal Mining\" 29 \"Coal\" 30 \"Petroleum and Natural Gas\" 31 \"Utilities\" 32 \"Communication\" 33 \"Personal Services\" 34 \"Business Services\" 35 \"Computer Hardware\" 36 \"Computer Software\" 37 \"Electronic Equipment\" 38 \"Measuring and Control Equipment\" 39 \"Business Supplies\" 40 \"Shipping Containers\" 41 \"Transportation\" 42 \"Wholesale\" 43 \"Retail\" 44 \"Restaraunts, Hotels, Motels\" 45 \"Banking\" 46 \"Insurance\" 47 \"Real Estate\" 48 \"Trading\" 49 \"Almost Nothing\"\n label values `generate' `generate'\n qui replace `generate'=1 if inrange(``ffind'',100,199) | inrange(``ffind'',200,299) | inrange(``ffind'',700,799) | inrange(``ffind'',910,919) | inrange(``ffind'',2048,2048)\n qui replace `generate'=2 if inrange(``ffind'',2000,2009) | inrange(``ffind'',2010,2019) | inrange(``ffind'',2020,2029) | inrange(``ffind'',2030,2039) | inrange(``ffind'',2040,2046) | inrange(``ffind'',2050,2059) | inrange(``ffind'',2060,2063) | inrange(``ffind'',2070,2079) | inrange(``ffind'',2090,2092) | inrange(``ffind'',2095,2095) | inrange(``ffind'',2098,2099)\n qui replace `generate'=3 if inrange(``ffind'',2064,2068) | inrange(``ffind'',2086,2086) | inrange(``ffind'',2087,2087) | inrange(``ffind'',2096,2096) | inrange(``ffind'',2097,2097)\n qui replace `generate'=4 if inrange(``ffind'',2080,2080) | inrange(``ffind'',2082,2082) | inrange(``ffind'',2083,2083) | inrange(``ffind'',2084,2084) | inrange(``ffind'',2085,2085)\n qui replace `generate'=5 if inrange(``ffind'',2100,2199)\n qui replace `generate'=6 if inrange(``ffind'',920,999) | inrange(``ffind'',3650,3651) | inrange(``ffind'',3652,3652) | inrange(``ffind'',3732,3732) | inrange(``ffind'',3930,3931) | inrange(``ffind'',3940,3949)\n qui replace `generate'=7 if inrange(``ffind'',7800,7829) | inrange(``ffind'',7830,7833) | inrange(``ffind'',7840,7841) | inrange(``ffind'',7900,7900) | inrange(``ffind'',7910,7911) | inrange(``ffind'',7920,7929) | inrange(``ffind'',7930,7933) | inrange(``ffind'',7940,7949) | inrange(``ffind'',7980,7980) | inrange(``ffind'',7990,7999)\n qui replace `generate'=8 if inrange(``ffind'',2700,2709) | inrange(``ffind'',2710,2719) | inrange(``ffind'',2720,2729) | inrange(``ffind'',2730,2739) | inrange(``ffind'',2740,2749) | inrange(``ffind'',2770,2771) | inrange(``ffind'',2780,2789) | inrange(``ffind'',2790,2799)\n qui replace `generate'=9 if inrange(``ffind'',2047,2047) | inrange(``ffind'',2391,2392) | inrange(``ffind'',2510,2519) | inrange(``ffind'',2590,2599) | inrange(``ffind'',2840,2843) | inrange(``ffind'',2844,2844) | inrange(``ffind'',3160,3161) | inrange(``ffind'',3170,3171) | inrange(``ffind'',3172,3172) | inrange(``ffind'',3190,3199) | inrange(``ffind'',3229,3229) | inrange(``ffind'',3260,3260) | inrange(``ffind'',3262,3263) | inrange(``ffind'',3269,3269) | inrange(``ffind'',3230,3231) | inrange(``ffind'',3630,3639) | inrange(``ffind'',3750,3751) | inrange(``ffind'',3800,3800) | inrange(``ffind'',3860,3861) | inrange(``ffind'',3870,3873) | inrange(``ffind'',3910,3911) | inrange(``ffind'',3914,3914) | inrange(``ffind'',3915,3915) | inrange(``ffind'',3960,3962) | inrange(``ffind'',3991,3991) | inrange(``ffind'',3995,3995)\n qui replace `generate'=10 if inrange(``ffind'',2300,2390) | inrange(``ffind'',3020,3021) | inrange(``ffind'',3100,3111) | inrange(``ffind'',3130,3131) | inrange(``ffind'',3140,3149) | inrange(``ffind'',3150,3151) | inrange(``ffind'',3963,3965)\n qui replace `generate'=11 if inrange(``ffind'',8000,8099)\n qui replace `generate'=12 if inrange(``ffind'',3693,3693) | inrange(``ffind'',3840,3849) | inrange(``ffind'',3850,3851)\n qui replace `generate'=13 if inrange(``ffind'',2830,2830) | inrange(``ffind'',2831,2831) | inrange(``ffind'',2833,2833) | inrange(``ffind'',2834,2834) | inrange(``ffind'',2835,2835) | inrange(``ffind'',2836,2836)\n qui replace `generate'=14 if inrange(``ffind'',2800,2809) | inrange(``ffind'',2810,2819) | inrange(``ffind'',2820,2829) | inrange(``ffind'',2850,2859) | inrange(``ffind'',2860,2869) | inrange(``ffind'',2870,2879) | inrange(``ffind'',2890,2899)\n qui replace `generate'=15 if inrange(``ffind'',3031,3031) | inrange(``ffind'',3041,3041) | inrange(``ffind'',3050,3053) | inrange(``ffind'',3060,3069) | inrange(``ffind'',3070,3079) | inrange(``ffind'',3080,3089) | inrange(``ffind'',3090,3099)\n qui replace `generate'=16 if inrange(``ffind'',2200,2269) | inrange(``ffind'',2270,2279) | inrange(``ffind'',2280,2284) | inrange(``ffind'',2290,2295) | inrange(``ffind'',2297,2297) | inrange(``ffind'',2298,2298) | inrange(``ffind'',2299,2299) | inrange(``ffind'',2393,2395) | inrange(``ffind'',2397,2399)\n qui replace `generate'=17 if inrange(``ffind'',800,899) | inrange(``ffind'',2400,2439) | inrange(``ffind'',2450,2459) | inrange(``ffind'',2490,2499) | inrange(``ffind'',2660,2661) | inrange(``ffind'',2950,2952) | inrange(``ffind'',3200,3200) | inrange(``ffind'',3210,3211) | inrange(``ffind'',3240,3241) | inrange(``ffind'',3250,3259) | inrange(``ffind'',3261,3261) | inrange(``ffind'',3264,3264) | inrange(``ffind'',3270,3275) | inrange(``ffind'',3280,3281) | inrange(``ffind'',3290,3293) | inrange(``ffind'',3295,3299) | inrange(``ffind'',3420,3429) | inrange(``ffind'',3430,3433) | inrange(``ffind'',3440,3441) | inrange(``ffind'',3442,3442) | inrange(``ffind'',3446,3446) | inrange(``ffind'',3448,3448) | inrange(``ffind'',3449,3449) | inrange(``ffind'',3450,3451) | inrange(``ffind'',3452,3452) | inrange(``ffind'',3490,3499) | inrange(``ffind'',3996,3996)\n qui replace `generate'=18 if inrange(``ffind'',1500,1511) | inrange(``ffind'',1520,1529) | inrange(``ffind'',1530,1539) | inrange(``ffind'',1540,1549) | inrange(``ffind'',1600,1699) | inrange(``ffind'',1700,1799)\n qui replace `generate'=19 if inrange(``ffind'',3300,3300) | inrange(``ffind'',3310,3317) | inrange(``ffind'',3320,3325) | inrange(``ffind'',3330,3339) | inrange(``ffind'',3340,3341) | inrange(``ffind'',3350,3357) | inrange(``ffind'',3360,3369) | inrange(``ffind'',3370,3379) | inrange(``ffind'',3390,3399)\n qui replace `generate'=20 if inrange(``ffind'',3400,3400) | inrange(``ffind'',3443,3443) | inrange(``ffind'',3444,3444) | inrange(``ffind'',3460,3469) | inrange(``ffind'',3470,3479)\n qui replace `generate'=21 if inrange(``ffind'',3510,3519) | inrange(``ffind'',3520,3529) | inrange(``ffind'',3530,3530) | inrange(``ffind'',3531,3531) | inrange(``ffind'',3532,3532) | inrange(``ffind'',3533,3533) | inrange(``ffind'',3534,3534) | inrange(``ffind'',3535,3535) | inrange(``ffind'',3536,3536) | inrange(``ffind'',3538,3538) | inrange(``ffind'',3540,3549) | inrange(``ffind'',3550,3559) | inrange(``ffind'',3560,3569) | inrange(``ffind'',3580,3580) | inrange(``ffind'',3581,3581) | inrange(``ffind'',3582,3582) | inrange(``ffind'',3585,3585) | inrange(``ffind'',3586,3586) | inrange(``ffind'',3589,3589) | inrange(``ffind'',3590,3599)\n qui replace `generate'=22 if inrange(``ffind'',3600,3600) | inrange(``ffind'',3610,3613) | inrange(``ffind'',3620,3621) | inrange(``ffind'',3623,3629) | inrange(``ffind'',3640,3644) | inrange(``ffind'',3645,3645) | inrange(``ffind'',3646,3646) | inrange(``ffind'',3648,3649) | inrange(``ffind'',3660,3660) | inrange(``ffind'',3690,3690) | inrange(``ffind'',3691,3692) | inrange(``ffind'',3699,3699)\n qui replace `generate'=23 if inrange(``ffind'',2296,2296) | inrange(``ffind'',2396,2396) | inrange(``ffind'',3010,3011) | inrange(``ffind'',3537,3537) | inrange(``ffind'',3647,3647) | inrange(``ffind'',3694,3694) | inrange(``ffind'',3700,3700) | inrange(``ffind'',3710,3710) | inrange(``ffind'',3711,3711) | inrange(``ffind'',3713,3713) | inrange(``ffind'',3714,3714) | inrange(``ffind'',3715,3715) | inrange(``ffind'',3716,3716) | inrange(``ffind'',3792,3792) | inrange(``ffind'',3790,3791) | inrange(``ffind'',3799,3799)\n qui replace `generate'=24 if inrange(``ffind'',3720,3720) | inrange(``ffind'',3721,3721) | inrange(``ffind'',3723,3724) | inrange(``ffind'',3725,3725) | inrange(``ffind'',3728,3729)\n qui replace `generate'=25 if inrange(``ffind'',3730,3731) | inrange(``ffind'',3740,3743)\n qui replace `generate'=26 if inrange(``ffind'',3760,3769) | inrange(``ffind'',3795,3795) | inrange(``ffind'',3480,3489)\n qui replace `generate'=27 if inrange(``ffind'',1040,1049)\n qui replace `generate'=28 if inrange(``ffind'',1000,1009) | inrange(``ffind'',1010,1019) | inrange(``ffind'',1020,1029) | inrange(``ffind'',1030,1039) | inrange(``ffind'',1050,1059) | inrange(``ffind'',1060,1069) | inrange(``ffind'',1070,1079) | inrange(``ffind'',1080,1089) | inrange(``ffind'',1090,1099) | inrange(``ffind'',1100,1119) | inrange(``ffind'',1400,1499)\n qui replace `generate'=29 if inrange(``ffind'',1200,1299)\n qui replace `generate'=30 if inrange(``ffind'',1300,1300) | inrange(``ffind'',1310,1319) | inrange(``ffind'',1320,1329) | inrange(``ffind'',1330,1339) | inrange(``ffind'',1370,1379) | inrange(``ffind'',1380,1380) | inrange(``ffind'',1381,1381) | inrange(``ffind'',1382,1382) | inrange(``ffind'',1389,1389) | inrange(``ffind'',2900,2912) | inrange(``ffind'',2990,2999)\n qui replace `generate'=31 if inrange(``ffind'',4900,4900) | inrange(``ffind'',4910,4911) | inrange(``ffind'',4920,4922) | inrange(``ffind'',4923,4923) | inrange(``ffind'',4924,4925) | inrange(``ffind'',4930,4931) | inrange(``ffind'',4932,4932) | inrange(``ffind'',4939,4939) | inrange(``ffind'',4940,4942)\n qui replace `generate'=32 if inrange(``ffind'',4800,4800) | inrange(``ffind'',4810,4813) | inrange(``ffind'',4820,4822) | inrange(``ffind'',4830,4839) | inrange(``ffind'',4840,4841) | inrange(``ffind'',4880,4889) | inrange(``ffind'',4890,4890) | inrange(``ffind'',4891,4891) | inrange(``ffind'',4892,4892) | inrange(``ffind'',4899,4899)\n qui replace `generate'=33 if inrange(``ffind'',7020,7021) | inrange(``ffind'',7030,7033) | inrange(``ffind'',7200,7200) | inrange(``ffind'',7210,7212) | inrange(``ffind'',7214,7214) | inrange(``ffind'',7215,7216) | inrange(``ffind'',7217,7217) | inrange(``ffind'',7219,7219) | inrange(``ffind'',7220,7221) | inrange(``ffind'',7230,7231) | inrange(``ffind'',7240,7241) | inrange(``ffind'',7250,7251) | inrange(``ffind'',7260,7269) | inrange(``ffind'',7270,7290) | inrange(``ffind'',7291,7291) | inrange(``ffind'',7292,7299) | inrange(``ffind'',7395,7395) | inrange(``ffind'',7500,7500) | inrange(``ffind'',7520,7529) | inrange(``ffind'',7530,7539) | inrange(``ffind'',7540,7549) | inrange(``ffind'',7600,7600) | inrange(``ffind'',7620,7620) | inrange(``ffind'',7622,7622) | inrange(``ffind'',7623,7623) | inrange(``ffind'',7629,7629) | inrange(``ffind'',7630,7631) | inrange(``ffind'',7640,7641) | inrange(``ffind'',7690,7699) | inrange(``ffind'',8100,8199) | inrange(``ffind'',8200,8299) | inrange(``ffind'',8300,8399) | inrange(``ffind'',8400,8499) | inrange(``ffind'',8600,8699) | inrange(``ffind'',8800,8899) | inrange(``ffind'',7510,7515)\n qui replace `generate'=34 if inrange(``ffind'',2750,2759) | inrange(``ffind'',3993,3993) | inrange(``ffind'',7218,7218) | inrange(``ffind'',7300,7300) | inrange(``ffind'',7310,7319) | inrange(``ffind'',7320,7329) | inrange(``ffind'',7330,7339) | inrange(``ffind'',7340,7342) | inrange(``ffind'',7349,7349) | inrange(``ffind'',7350,7351) | inrange(``ffind'',7352,7352) | inrange(``ffind'',7353,7353) | inrange(``ffind'',7359,7359) | inrange(``ffind'',7360,7369) | inrange(``ffind'',7374,7374) | inrange(``ffind'',7376,7376) | inrange(``ffind'',7377,7377) | inrange(``ffind'',7378,7378) | inrange(``ffind'',7379,7379) | inrange(``ffind'',7380,7380) | inrange(``ffind'',7381,7382) | inrange(``ffind'',7383,7383) | inrange(``ffind'',7384,7384) | inrange(``ffind'',7385,7385) | inrange(``ffind'',7389,7390) | inrange(``ffind'',7391,7391) | inrange(``ffind'',7392,7392) | inrange(``ffind'',7393,7393) | inrange(``ffind'',7394,7394) | inrange(``ffind'',7396,7396) | inrange(``ffind'',7397,7397) | inrange(``ffind'',7399,7399) | inrange(``ffind'',7519,7519) | inrange(``ffind'',8700,8700) | inrange(``ffind'',8710,8713) | inrange(``ffind'',8720,8721) | inrange(``ffind'',8730,8734) | inrange(``ffind'',8740,8748) | inrange(``ffind'',8900,8910) | inrange(``ffind'',8911,8911) | inrange(``ffind'',8920,8999) | inrange(``ffind'',4220,4229)\n qui replace `generate'=35 if inrange(``ffind'',3570,3579) | inrange(``ffind'',3680,3680) | inrange(``ffind'',3681,3681) | inrange(``ffind'',3682,3682) | inrange(``ffind'',3683,3683) | inrange(``ffind'',3684,3684) | inrange(``ffind'',3685,3685) | inrange(``ffind'',3686,3686) | inrange(``ffind'',3687,3687) | inrange(``ffind'',3688,3688) | inrange(``ffind'',3689,3689) | inrange(``ffind'',3695,3695)\n qui replace `generate'=36 if inrange(``ffind'',7370,7372) | inrange(``ffind'',7375,7375) | inrange(``ffind'',7373,7373)\n qui replace `generate'=37 if inrange(``ffind'',3622,3622) | inrange(``ffind'',3661,3661) | inrange(``ffind'',3662,3662) | inrange(``ffind'',3663,3663) | inrange(``ffind'',3664,3664) | inrange(``ffind'',3665,3665) | inrange(``ffind'',3666,3666) | inrange(``ffind'',3669,3669) | inrange(``ffind'',3670,3679) | inrange(``ffind'',3810,3810) | inrange(``ffind'',3812,3812)\n qui replace `generate'=38 if inrange(``ffind'',3811,3811) | inrange(``ffind'',3820,3820) | inrange(``ffind'',3821,3821) | inrange(``ffind'',3822,3822) | inrange(``ffind'',3823,3823) | inrange(``ffind'',3824,3824) | inrange(``ffind'',3825,3825) | inrange(``ffind'',3826,3826) | inrange(``ffind'',3827,3827) | inrange(``ffind'',3829,3829) | inrange(``ffind'',3830,3839)\n qui replace `generate'=39 if inrange(``ffind'',2520,2549) | inrange(``ffind'',2600,2639) | inrange(``ffind'',2670,2699) | inrange(``ffind'',2760,2761) | inrange(``ffind'',3950,3955)\n qui replace `generate'=40 if inrange(``ffind'',2440,2449) | inrange(``ffind'',2640,2659) | inrange(``ffind'',3220,3221) | inrange(``ffind'',3410,3412)\n qui replace `generate'=41 if inrange(``ffind'',4000,4013) | inrange(``ffind'',4040,4049) | inrange(``ffind'',4100,4100) | inrange(``ffind'',4110,4119) | inrange(``ffind'',4120,4121) | inrange(``ffind'',4130,4131) | inrange(``ffind'',4140,4142) | inrange(``ffind'',4150,4151) | inrange(``ffind'',4170,4173) | inrange(``ffind'',4190,4199) | inrange(``ffind'',4200,4200) | inrange(``ffind'',4210,4219) | inrange(``ffind'',4230,4231) | inrange(``ffind'',4240,4249) | inrange(``ffind'',4400,4499) | inrange(``ffind'',4500,4599) | inrange(``ffind'',4600,4699) | inrange(``ffind'',4700,4700) | inrange(``ffind'',4710,4712) | inrange(``ffind'',4720,4729) | inrange(``ffind'',4730,4739) | inrange(``ffind'',4740,4749) | inrange(``ffind'',4780,4780) | inrange(``ffind'',4782,4782) | inrange(``ffind'',4783,4783) | inrange(``ffind'',4784,4784) | inrange(``ffind'',4785,4785) | inrange(``ffind'',4789,4789)\n qui replace `generate'=42 if inrange(``ffind'',5000,5000) | inrange(``ffind'',5010,5015) | inrange(``ffind'',5020,5023) | inrange(``ffind'',5030,5039) | inrange(``ffind'',5040,5042) | inrange(``ffind'',5043,5043) | inrange(``ffind'',5044,5044) | inrange(``ffind'',5045,5045) | inrange(``ffind'',5046,5046) | inrange(``ffind'',5047,5047) | inrange(``ffind'',5048,5048) | inrange(``ffind'',5049,5049) | inrange(``ffind'',5050,5059) | inrange(``ffind'',5060,5060) | inrange(``ffind'',5063,5063) | inrange(``ffind'',5064,5064) | inrange(``ffind'',5065,5065) | inrange(``ffind'',5070,5078) | inrange(``ffind'',5080,5080) | inrange(``ffind'',5081,5081) | inrange(``ffind'',5082,5082) | inrange(``ffind'',5083,5083) | inrange(``ffind'',5084,5084) | inrange(``ffind'',5085,5085) | inrange(``ffind'',5086,5087) | inrange(``ffind'',5088,5088) | inrange(``ffind'',5090,5090) | inrange(``ffind'',5091,5092) | inrange(``ffind'',5093,5093) | inrange(``ffind'',5094,5094) | inrange(``ffind'',5099,5099) | inrange(``ffind'',5100,5100) | inrange(``ffind'',5110,5113) | inrange(``ffind'',5120,5122) | inrange(``ffind'',5130,5139) | inrange(``ffind'',5140,5149) | inrange(``ffind'',5150,5159) | inrange(``ffind'',5160,5169) | inrange(``ffind'',5170,5172) | inrange(``ffind'',5180,5182) | inrange(``ffind'',5190,5199)\n qui replace `generate'=43 if inrange(``ffind'',5200,5200) | inrange(``ffind'',5210,5219) | inrange(``ffind'',5220,5229) | inrange(``ffind'',5230,5231) | inrange(``ffind'',5250,5251) | inrange(``ffind'',5260,5261) | inrange(``ffind'',5270,5271) | inrange(``ffind'',5300,5300) | inrange(``ffind'',5310,5311) | inrange(``ffind'',5320,5320) | inrange(``ffind'',5330,5331) | inrange(``ffind'',5334,5334) | inrange(``ffind'',5340,5349) | inrange(``ffind'',5390,5399) | inrange(``ffind'',5400,5400) | inrange(``ffind'',5410,5411) | inrange(``ffind'',5412,5412) | inrange(``ffind'',5420,5429) | inrange(``ffind'',5430,5439) | inrange(``ffind'',5440,5449) | inrange(``ffind'',5450,5459) | inrange(``ffind'',5460,5469) | inrange(``ffind'',5490,5499) | inrange(``ffind'',5500,5500) | inrange(``ffind'',5510,5529) | inrange(``ffind'',5530,5539) | inrange(``ffind'',5540,5549) | inrange(``ffind'',5550,5559) | inrange(``ffind'',5560,5569) | inrange(``ffind'',5570,5579) | inrange(``ffind'',5590,5599) | inrange(``ffind'',5600,5699) | inrange(``ffind'',5700,5700) | inrange(``ffind'',5710,5719) | inrange(``ffind'',5720,5722) | inrange(``ffind'',5730,5733) | inrange(``ffind'',5734,5734) | inrange(``ffind'',5735,5735) | inrange(``ffind'',5736,5736) | inrange(``ffind'',5750,5799) | inrange(``ffind'',5900,5900) | inrange(``ffind'',5910,5912) | inrange(``ffind'',5920,5929) | inrange(``ffind'',5930,5932) | inrange(``ffind'',5940,5940) | inrange(``ffind'',5941,5941) | inrange(``ffind'',5942,5942) | inrange(``ffind'',5943,5943) | inrange(``ffind'',5944,5944) | inrange(``ffind'',5945,5945) | inrange(``ffind'',5946,5946) | inrange(``ffind'',5947,5947) | inrange(``ffind'',5948,5948) | inrange(``ffind'',5949,5949) | inrange(``ffind'',5950,5959) | inrange(``ffind'',5960,5969) | inrange(``ffind'',5970,5979) | inrange(``ffind'',5980,5989) | inrange(``ffind'',5990,5990) | inrange(``ffind'',5992,5992) | inrange(``ffind'',5993,5993) | inrange(``ffind'',5994,5994) | inrange(``ffind'',5995,5995) | inrange(``ffind'',5999,5999)\n qui replace `generate'=44 if inrange(``ffind'',5800,5819) | inrange(``ffind'',5820,5829) | inrange(``ffind'',5890,5899) | inrange(``ffind'',7000,7000) | inrange(``ffind'',7010,7019) | inrange(``ffind'',7040,7049) | inrange(``ffind'',7213,7213)\n qui replace `generate'=45 if inrange(``ffind'',6000,6000) | inrange(``ffind'',6010,6019) | inrange(``ffind'',6020,6020) | inrange(``ffind'',6021,6021) | inrange(``ffind'',6022,6022) | inrange(``ffind'',6023,6024) | inrange(``ffind'',6025,6025) | inrange(``ffind'',6026,6026) | inrange(``ffind'',6027,6027) | inrange(``ffind'',6028,6029) | inrange(``ffind'',6030,6036) | inrange(``ffind'',6040,6059) | inrange(``ffind'',6060,6062) | inrange(``ffind'',6080,6082) | inrange(``ffind'',6090,6099) | inrange(``ffind'',6100,6100) | inrange(``ffind'',6110,6111) | inrange(``ffind'',6112,6113) | inrange(``ffind'',6120,6129) | inrange(``ffind'',6130,6139) | inrange(``ffind'',6140,6149) | inrange(``ffind'',6150,6159) | inrange(``ffind'',6160,6169) | inrange(``ffind'',6170,6179) | inrange(``ffind'',6190,6199)\n qui replace `generate'=46 if inrange(``ffind'',6300,6300) | inrange(``ffind'',6310,6319) | inrange(``ffind'',6320,6329) | inrange(``ffind'',6330,6331) | inrange(``ffind'',6350,6351) | inrange(``ffind'',6360,6361) | inrange(``ffind'',6370,6379) | inrange(``ffind'',6390,6399) | inrange(``ffind'',6400,6411)\n qui replace `generate'=47 if inrange(``ffind'',6500,6500) | inrange(``ffind'',6510,6510) | inrange(``ffind'',6512,6512) | inrange(``ffind'',6513,6513) | inrange(``ffind'',6514,6514) | inrange(``ffind'',6515,6515) | inrange(``ffind'',6517,6519) | inrange(``ffind'',6520,6529) | inrange(``ffind'',6530,6531) | inrange(``ffind'',6532,6532) | inrange(``ffind'',6540,6541) | inrange(``ffind'',6550,6553) | inrange(``ffind'',6590,6599) | inrange(``ffind'',6610,6611)\n qui replace `generate'=48 if inrange(``ffind'',6200,6299) | inrange(``ffind'',6700,6700) | inrange(``ffind'',6710,6719) | inrange(``ffind'',6720,6722) | inrange(``ffind'',6723,6723) | inrange(``ffind'',6724,6724) | inrange(``ffind'',6725,6725) | inrange(``ffind'',6726,6726) | inrange(``ffind'',6730,6733) | inrange(``ffind'',6740,6779) | inrange(``ffind'',6790,6791) | inrange(``ffind'',6792,6792) | inrange(``ffind'',6793,6793) | inrange(``ffind'',6794,6794) | inrange(``ffind'',6795,6795) | inrange(``ffind'',6798,6798) | inrange(``ffind'',6799,6799)\n qui replace `generate'=49 if missing(`generate') & ~missing(``ffind'')\n\n }\n else {\n di as error \"Type must be 5, 10, 12, 17, 30, 38, 48 or 49\"\n exit 111\n }\n\nend\n
","tags":["Stata","Code"]},{"location":"posts/get-bank-holding-company-financials/","title":"Bank Holing Company Financials from FR Y-9C","text":"A SAS macro used to extract BHC data.
","tags":["SAS","Code"]},{"location":"posts/get-bank-holding-company-financials/#extract-bhc-balance-sheet-data","title":"Extract BHC balance sheet data","text":"This is the SAS macro I write to consolidate and extract BHC's balance sheet data from WRDS Bank Regulatory database. It creates a bhcf
dataset in the work directory.
%macro bhc_financials(loopdatestart,loopdateend);\n /* Specify the variables to extract */\n%let vars=rssd9999 rssd9001 rssd9007 rssd9008 bhck2170 bhck3210;\n %let loopdatestart=%sysfunc(inputn(&loopdatestart,anydtdte9.));\n %let loopdateend=%sysfunc(inputn(&loopdateend,anydtdte9.));\n %let dif=%sysfunc(intck(month,&loopdatestart,&loopdateend));\n %let dats=;\n %do i=0 %to &dif;\n %let date=%sysfunc(intnx(month,&loopdatestart,&i,e));\n %let month=%sysfunc(month(&date),z2.);\n %let year=%sysfunc(year(&date));\n %if &month=3 or &month=6 or &month=9 or &month=12 %then %do;\n %let dats=&dats bank.bhcf&year&month;\n %end;\n %end;\n %put &dats;\n data bhcf(keep=&vars); set &dats; \n rssd9999 = input(put(rssd9999, 8.), yymmdd10.);/* reporting date */\n rssd9007 = input(put(rssd9007, 8.), yymmdd10.);/* date start */\n rssd9008 = input(put(rssd9008, 8.), yymmdd10.);/* date end */\nformat rssd9999 date9.;\n format rssd9007 date9.;\n format rssd9008 date9.;\n where rssd9999 between rssd9007 and rssd9008;\n run;\n%mend bhc_financials;\n\n%bhc_financials(01jan1990,01dec2000);\n
Warning
RSSD dates are not always available, in which case lines 18-24 should be removed.
","tags":["SAS","Code"]},{"location":"posts/get-bank-holding-company-financials/#merge-with-compustatcrsp","title":"Merge with Compustat/CRSP","text":"The firm identifier in the Y-9C data is RSSD9001
. To merge the BHC's balance sheet data with Compustat/CRSP, I use the PERMCO-RSSD
link table by the Federal Reserve Bank of New York.1 I saved the most recent copy in my server, and formatted it so that it can used directly. It is available at https://mingze-gao.com/data/download/crsp_20181231.csv.
%let beg_yr = 1986;\n%let end_yr = 2018;\nproc sql;\ncreate table lnk as\nselect *\nfrom crsp.ccmxpf_lnkhist\nwhere\n linktype in (\"LU\", \"LC\") and\n (&end_yr+1 >= year(linkdt) or linkdt = .B) and \n (&beg_yr-1 <= year(linkenddt) or linkenddt = .E)\norder by \n gvkey, linkdt;\nquit;\n/* PERMCO-RSSD link table by New York FED */\nfilename csv url \"https://mingze-gao.com/data/download/crsp_20181231.csv\";\nproc import datafile=csv out=work.crsp_20181231 dbms=csv replace; run;\nproc sql;\ncreate table gvkey_permno_permco_rssd as \nselect *\nfrom lnk join crsp_20181231 as fed\non lnk.lpermco=fed.permco;\nquit;\n
Note
Please run these programs on the WRDS cloud. You'll need to modify them in order to run locally with SAS/Connect.
https://www.newyorkfed.org/research/banking_research/datasets.html\u00a0\u21a9
Many research papers on Chinese firms include a control variable that indicates if the firm is a state-owned enterprise (SOE). This is important as SOEs and non-SOEs differ in many aspects and may have structural differences. This post documents the way to construct this indicator variable from the CSMAR databases.
Specifically, we need the CSMAR China Listed Firms Shareholders - Controlling Shareholders dataset. On WRDS, this dataset is named hld_contrshr
, located at /wrds/csmar/sasdata/hld
.
Inside this dataset there're a few variables identifying the ultimate controlling shareholder.
s0701b
: ultimate controlling shareholder.s0702b
: nature of ultimate controlling shareholder.According to the CSMAR documentation, s0702b
can be one of the following. Apparently, s0702b=1100
means the firm is a SOE.
Princeton University Library has another guide on other ways to identify Chinese SOE.
","tags":["CSMAR"]},{"location":"posts/kyleslambda/","title":"Kyle's Lambda","text":"A measure of market impact cost from Kyle (1985), which can be interpreted as the cost of demanding a certain amount of liquidity over a given time period.
","tags":["Python","Code","Liquidity"]},{"location":"posts/kyleslambda/#definition","title":"Definition","text":"Following Hasbrouck (2009) and Goyenko, Holden, Trzcinka (2009), Kyle's Lambda for a given stock \\(i\\) and day \\(t\\), is calculated as the slope coefficient \\(\\lambda_{i,t}\\) in the regression:
\\[ ret_{i,t,n}= \\delta_{i,t} + \\lambda_{i,t} S_{i,t,n}+\\epsilon_{i,t,n} \\]where for the \\(n\\)th five-minute period on date \\(t\\) and stock \\(i\\), \\(ret_{i,t,n}\\) is the stock return and \\(S_{i,t,n}\\) is the sum of the signed square-root dollar volume, that is,
\\[ S_{i,t,n}=\\sum_k{\\text{sign}}(dvol_{i,t,n,k}) \\sqrt{dvol_{i,t,n,k}} \\]","tags":["Python","Code","Liquidity"]},{"location":"posts/kyleslambda/#source-code","title":"Source Code","text":"This example Python code is not optimized for speed and serves only demonstration purpose. It may contain errors.
It returns \\(\\lambda \\times 10^6\\)
# KylesLambda.py\nimport numpy as np\nname = 'KylesLambda'\ndescription = \"\"\"\nA measure of market impact cost from Kyle (1985), \nwhich can be interpreted as the cost of demanding a certain amount of liquidity over a given time period.\nResult is Lambda*1E6.\n\"\"\"\nvars_needed = ['Price', 'Volume', 'Direction']\ndef estimate(data):\nprice = data['Price'].to_numpy()\nvolume = data['Volume'].to_numpy()\ndirection = data['Direction'].to_numpy()\nsqrt_dollar_volume = np.sqrt(np.multiply(price, volume))\nsigned_sqrt_dollar_volume = np.abs(\nnp.multiply(direction, sqrt_dollar_volume))\n# Find the total signed sqrt dollar volume and return per 5 min.\ntimestamps = np.array(data.index, dtype='datetime64')\nlast_ts, last_price = timestamps[0], price[0]\nbracket_ssdv = 0\nbracket = last_ts + np.timedelta64(5, 'm')\nrets, ssdvs, = [], []\nfor idx, ts in enumerate(timestamps):\nif ts <= bracket:\nbracket_ssdv += signed_sqrt_dollar_volume[idx]\nelse:\nret = np.log(price[idx-1]/last_price)\nif not np.isnan(ret) and not np.isnan(bracket_ssdv):\nrets.append(ret)\nssdvs.append(bracket_ssdv)\n# Reset bracket\nbracket = ts + np.timedelta64(5, 'm')\nlast_price = price[idx]\nbracket_ssdv = signed_sqrt_dollar_volume[idx]\n# Perform regression.\nx = np.vstack([np.ones(len(ssdvs)), np.array(ssdvs)]).T\ntry:\ncoef, _, _, _ = np.linalg.lstsq(x, np.array(rets), rcond=None)\nexcept np.linalg.LinAlgError:\nreturn None\nelse:\nreturn None if np.isnan(coef[1]) else coef[1]*1E6\n
","tags":["Python","Code","Liquidity"]},{"location":"posts/legao-to-make-your-own-lego-mosaics/","title":"LeGao to Make Your Own LEGO Mosaics","text":"I made an online app that turns a picture into a LEGO mosaic: mingze-gao.com/legao.
A few weeks ago, I went to the new LEGO flagship store at Bondi with my fianc\u00e9e, Sherry, and we were impressed by the LEGO Mosaics -- Sydney Harbour Bridge and Opera House in sunset, designed by Ryan McNaught (photo credit: jaysbrickblog.com).
This mosaic is made of 62,300 bricks and took 282 hours to build. Every single pixel is a 1x1 LEGO brick! We love it so much so that I'm thinking of making one myself and use it to decorate a wall in our apartment in the future.
To begin this endeavour, I'll need a handy tool to convert pictures to LEGO mosaic so that I can have a preview and the data to assemble later. It turns out that there's already an open-source library named legofy for this job. So I borrowed it and wrote a small Flask app on my server to do the magic.
I wrote the frontend using React and Ant Design, and picked up the React Hooks along the way. It was great fun. I named it using a combination of LEGO and my surname Gao, so, LeGao.
Now, LeGao is served at mingze-gao.com/legao. A preview is as below:
Users can upload an image(<5MB) and decide on which palette to use and how many 1x1 bricks the output image should have for its longest axis. This is useful when we need to make a LEGO mosaic in real world, as a 1x1 brick's dimension is about 8mm x 8mm.
The output image can be downloaded, no problem. All images will be deleted from my server after 5 minutes since upload/creation for privacy concern and the fact that my server doesn't have a big storage.
LeGao also tells you about how many bricks you'll need to assemble the mosaic, if you really want to. Then you can easily order the bricks online or visit a store to purchase them all~
","tags":["Apps"]},{"location":"posts/lomackinlay1988/","title":"Variance Ratio Test - Lo and MacKinlay (1988)","text":"A simple test for the random walk hypothesis of prices and efficient market.
","tags":["Python","Code"]},{"location":"posts/lomackinlay1988/#definition","title":"Definition","text":"Let's assume:
The variance ratio of \\(k\\)-period return is defined as:
\\[ \\begin{equation} \\textit{V}(k)=\\frac{\\textit{Var}(x_t+x_{t-1}+...+x_{t-k+1})/k}{\\textit{Var}(x_t)} \\end{equation} \\]The estimator of \\(\\textit{V}(k)\\) proposed in Lo and MacKinlay (1988) is
\\[ \\begin{equation} \\textit{VR}(k)=\\frac{\\hat\\sigma^2(k)}{\\hat\\sigma^2(1)} \\end{equation} \\]where \\(\\hat\\sigma^2(1)\\) is the unbiased estimator of the one-period return variance, using the one-period returns \\(\\{x_t\\}\\), and is defined as
\\[ \\begin{equation} \\hat\\sigma^2(1)=\\frac{1}{T-1} \\sum_{t-1}^T (x_t - \\hat\\mu)^2 \\end{equation} \\]and \\(\\hat\\sigma^2(k)\\) is the estimator of \\(k\\)-period return variance using \\(k\\)-period returns. Lo and MacKinlay (1988) defined it, due to limited sample size and the desire to improve the power of the test, as
\\[ \\begin{equation} \\hat\\sigma^2(k)=\\frac{1}{m} \\sum_{t-1}^T \\left(\\ln\\frac{P_t}{P_{t-k}} - k\\hat\\mu \\right)^2 \\end{equation} \\]where \\(m=k(T-k+1)(1-k/T)\\) is chosen such that \\(\\hat\\sigma^2(k)\\) is an unbiased estimator of the \\(k\\)-period return variance when \\(\\sigma^2_t\\) is constant over time.
","tags":["Python","Code"]},{"location":"posts/lomackinlay1988/#variance-ratio-test-statistics","title":"Variance Ratio Test Statistics","text":"Lo and MacKinlay (1988) proposed that under the null hypothesis of \\(V(k)=1\\), the test statistic is given by
\\[ \\begin{equation} Z(k)=\\frac{\\textit{VR}(k)-1}{\\sqrt{\\phi(k)}} \\end{equation} \\]which follows the standard normal distribution asymptotically.
","tags":["Python","Code"]},{"location":"posts/lomackinlay1988/#homoscedasticity","title":"Homoscedasticity","text":"Under the assumption of homoscedasticity, the asymptotic variance \\(\\phi\\) is given by
\\[ \\begin{equation} \\phi(k)=\\frac{2(2k-1)(k-1)}{3kT} \\end{equation} \\]","tags":["Python","Code"]},{"location":"posts/lomackinlay1988/#heteroscedasticity","title":"Heteroscedasticity","text":"Under the assumption of heteroscedasticity, the asymptotic variance \\(\\phi\\) is given by
\\[ \\begin{equation} \\phi(k)=\\sum_{j=1}^{k-1} \\left[\\frac{2(k-j)}{k} \\right]^2\\delta(j) \\end{equation} \\] \\[ \\begin{equation} \\delta(j)=\\frac{\\sum_{t=j+1}^T (x_t - \\hat\\mu)^2(x_{t-j} - \\hat\\mu)^2}{\\left[\\sum_{t=1}^T (x_t - \\hat\\mu)^2\\right]^2} \\end{equation} \\]Erratum
Note that there's a missing \\(T\\) in the numerator of \\(\\delta(j)\\) of Equation (8). It is actually missing the 1988 RFS paper and the 1998 JE'mtric paper, but has been corrected in the 1990 RFS Issue 1: https://doi.org/10.1093/rfs/3.1.ii. The corrected version reads:
\\[ \\begin{equation} \\delta(j)=\\frac{T\\sum_{t=j+1}^T (x_t - \\hat\\mu)^2(x_{t-j} - \\hat\\mu)^2}{\\left[\\sum_{t=1}^T (x_t - \\hat\\mu)^2\\right]^2} \\end{equation} \\]To correct it in the example code below, change the line 51 below to:
delta_arr = T * b_arr / np.square(np.sum(sqr_demeaned_x))\n
I thank Simon Jurkatis for letting me know about the erratum.
","tags":["Python","Code"]},{"location":"posts/lomackinlay1988/#source-code","title":"Source Code","text":"This example Python code has been optimized for speed but serves only demonstration purpose. It may contain errors.
# LoMacKinlay.py\nimport numpy as np\nfrom numba import jit\nname = 'LoMacKinlay1988'\ndescription = 'Variance ratio and test statistics as in Lo and MacKinlay (1988)'\nvars_needed = ['Price']\n@jit(nopython=True, nogil=True, cache=True)\ndef _estimate(log_prices, k, const_arr):\n# Log returns = [x2, x3, x4, ..., xT], where x(i)=ln[p(i)/p(i-1)]\nrets = np.diff(log_prices)\n# T is the length of return series\nT = len(rets)\n# mu is the mean log return\nmu = np.mean(rets)\n# sqr_demeaned_x is the array of squared demeaned log returns\nsqr_demeaned_x = np.square(rets - mu)\n# Var(1)\n# Didn't use np.var(rets, ddof=1) because\n# sqr_demeaned_x is calculated already and will be used many times.\nvar_1 = np.sum(sqr_demeaned_x) / (T-1)\n# Var(k)\n# Variance of log returns where x(i) = ln[p(i)/p(i-k)]\n# Before np.roll() - array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])\n# After np.roll(,shift=2) - array([8, 9, 0, 1, 2, 3, 4, 5, 6, 7])\n# Discard the first k elements.\nrets_k = (log_prices - np.roll(log_prices, k))[k:]\nm = k * (T - k + 1) * (1 - k / T)\nvar_k = 1/m * np.sum(np.square(rets_k - k * mu))\n# Variance Ratio\nvr = var_k / var_1\n# a_arr is an array of { (2*(k-j)/k)^2 } for j=1,2,...,k-1, fixed for a given k:\n# When k=5, a_arr = array([2.56, 1.44, 0.64, 0.16]).\n# When k=8, a_arr = array([3.0625, 2.25, 1.5625, 1., 0.5625, 0.25, 0.0625])\n# Without JIT it's defined as:\n# a_arr = np.square(np.arange(k-1, 0, step=-1, dtype=np.int) * 2 / k)\n# But np.array creation is not allowed in nopython mode.\n# So const_arr=np.arange(k-1, 0, step=-1, dtype=np.int) is created outside.\na_arr = np.square(const_arr * 2 / k)\n# b_arr is part of the delta_arr.\nb_arr = np.empty(k-1, dtype=np.float64)\nfor j in range(1, k):\nb_arr[j-1] = np.sum((sqr_demeaned_x *\nnp.roll(sqr_demeaned_x, j))[j+1:])\ndelta_arr = b_arr / np.square(np.sum(sqr_demeaned_x))\n# Both arrarys are of length (k-1)\nassert len(delta_arr) == len(a_arr) == k-1\nphi1 = 2 * (2*k - 1) * (k-1) / (3*k*T)\nphi2 = np.sum(a_arr * delta_arr)\n# VR test statistics under two assumptions\nvr_stat_homoscedasticity = (vr - 1) / np.sqrt(phi1)\nvr_stat_heteroscedasticity = (vr - 1) / np.sqrt(phi2)\nreturn vr, vr_stat_homoscedasticity, vr_stat_heteroscedasticity\ndef estimate(data):\n\"A fast estimation of Variance Ratio test statistics as in Lo and MacKinlay (1988)\"\n# Prices array = [p1, p2, p3, p4, ..., pT]\nprices = data['Price'].to_numpy(dtype=np.float64)\nresult = []\n# Estimate many lags.\nfor k in [2, 4, 6, 8, 10, 15, 20, 30, 40, 50, 100, 200, 500, 1000]:\n# Compute a constant array as np.array creation is not allowed in nopython mode.\nconst_arr = np.arange(k-1, 0, step=-1, dtype=np.int)\nvr, stat1, stat2 = _estimate(np.log(prices), k, const_arr)\nresult.append({\nf'Variance Ratio (k={k})': vr,\nf'Variance Ratio Test Statistic (k={k}) Homoscedasticity Assumption': stat1,\nf'Variance Ratio Test Statistic (k={k}) Heteroscedasticity Assumption': stat2\n})\nreturn result\n
As an example, let's create 1 million prices from random walk and estimate the variance ratio and two test statistics at various lags.
if __name__ == \"__main__\":\nimport pandas as pd\nfrom pprint import pprint\nnp.random.seed(1)\n# Generate random steps with mean=0 and standard deviation=1\nsteps = np.random.normal(0, 1, size=1000000)\n# Set first element to 0 so that the first price will be the starting stock price\nsteps[0] = 0\n# Simulate stock prices, P with a large starting price\nP = 10000 + np.cumsum(steps)\n# Test\ndata = pd.DataFrame(P, columns=['Price'])\nresult = estimate(data)\npprint(result)\n
In just a few seconds, the output is:
[{'Variance Ratio (k=2)': 1.0003293867428105,\n'Variance Ratio Test Statistic (k=2) Heteroscedasticity Assumption': 0.3290463403922243,\n'Variance Ratio Test Statistic (k=2) Homoscedasticity Assumption': 0.32938657811705435},\n{'Variance Ratio (k=4)': 1.0007984480057006,\n'Variance Ratio Test Statistic (k=4) Heteroscedasticity Assumption': 0.4259533413884602,\n'Variance Ratio Test Statistic (k=4) Homoscedasticity Assumption': 0.4267881978178301},\n{'Variance Ratio (k=6)': 0.9999130202975425,\n'Variance Ratio Test Statistic (k=6) Heteroscedasticity Assumption': -0.035117568315004344,\n'Variance Ratio Test Statistic (k=6) Homoscedasticity Assumption': -0.03518500446785826},\n{'Variance Ratio (k=8)': 1.0001094011344318,\n'Variance Ratio Test Statistic (k=8) Heteroscedasticity Assumption': 0.036922688136577515,\n'Variance Ratio Test Statistic (k=8) Homoscedasticity Assumption': 0.03698431520269611},\n{'Variance Ratio (k=10)': 1.000702410129927,\n'Variance Ratio Test Statistic (k=10) Heteroscedasticity Assumption': 0.20772743120012313,\n'Variance Ratio Test Statistic (k=10) Homoscedasticity Assumption': 0.20803582207641647},\n{'Variance Ratio (k=15)': 1.0022173139633856,\n'Variance Ratio Test Statistic (k=15) Heteroscedasticity Assumption': 0.5213067838911684,\n'Variance Ratio Test Statistic (k=15) Homoscedasticity Assumption': 0.5219816274021579},\n{'Variance Ratio (k=20)': 1.0038048661705044,\n'Variance Ratio Test Statistic (k=20) Heteroscedasticity Assumption': 0.7646395131154204,\n'Variance Ratio Test Statistic (k=20) Homoscedasticity Assumption': 0.7655801985571125},\n{'Variance Ratio (k=30)': 1.0054447472916035,\n'Variance Ratio Test Statistic (k=30) Heteroscedasticity Assumption': 0.8819250061384853,\n'Variance Ratio Test Statistic (k=30) Homoscedasticity Assumption': 0.8829960534692654},\n{'Variance Ratio (k=40)': 1.0073830253022766,\n'Variance Ratio Test Statistic (k=40) Heteroscedasticity Assumption': 1.0290213306735625,\n'Variance Ratio Test Statistic (k=40) Homoscedasticity Assumption': 1.0303005120740392},\n{'Variance Ratio (k=50)': 1.0086502431826903,\n'Variance Ratio Test Statistic (k=50) Heteroscedasticity Assumption': 1.0741837462564026,\n'Variance Ratio Test Statistic (k=50) Homoscedasticity Assumption': 1.0755809312730416},\n{'Variance Ratio (k=100)': 1.0153961901671604,\n'Variance Ratio Test Statistic (k=100) Heteroscedasticity Assumption': 1.3415119471043384,\n'Variance Ratio Test Statistic (k=100) Homoscedasticity Assumption': 1.3434284573260773},\n{'Variance Ratio (k=200)': 1.0157046541161026,\n'Variance Ratio Test Statistic (k=200) Heteroscedasticity Assumption': 0.9639233626580027,\n'Variance Ratio Test Statistic (k=200) Homoscedasticity Assumption': 0.9653299929052963},\n{'Variance Ratio (k=500)': 1.0182166207668526,\n'Variance Ratio Test Statistic (k=500) Heteroscedasticity Assumption': 0.7055681216511915,\n'Variance Ratio Test Statistic (k=500) Homoscedasticity Assumption': 0.7065863036900429},\n{'Variance Ratio (k=1000)': 1.0187822241562863,\n'Variance Ratio Test Statistic (k=1000) Heteroscedasticity Assumption': 0.5140698821944161,\n'Variance Ratio Test Statistic (k=1000) Homoscedasticity Assumption': 0.5147582201029065}]\n
It's easy to see that at all lags tested, we cannot reject the null hypothesis that this price series follows a random walk.
For comparison purpose, below is an implementation in pure Python. It is more readable but is significantly slower.
def estimate_python(data, k=5):\n\"A slow pure python implementation\"\nprices = data['Price'].to_numpy(dtype=np.float64)\nlog_prices = np.log(prices)\nrets = np.diff(log_prices)\nT = len(rets)\nmu = np.mean(rets)\nvar_1 = np.var(rets, ddof=1, dtype=np.float64)\nrets_k = (log_prices - np.roll(log_prices, k))[k:]\nm = k * (T - k + 1) * (1 - k / T)\nvar_k = 1/m * np.sum(np.square(rets_k - k * mu))\n# Variance Ratio\nvr = var_k / var_1\n# Phi1\nphi1 = 2 * (2*k - 1) * (k-1) / (3*k*T)\n# Phi2\ndef delta(j):\nres = 0\nfor t in range(j+1, T+1):\nt -= 1 # array index is t-1 for t-th element\nres += np.square((rets[t]-mu)*(rets[t-j]-mu))\nreturn res / ((T-1) * var_1)**2\nphi2 = 0\nfor j in range(1, k):\nphi2 += (2*(k-j)/k)**2 * delta(j)\nreturn vr, (vr - 1) / np.sqrt(phi1), (vr - 1) / np.sqrt(phi2)\n
","tags":["Python","Code"]},{"location":"posts/merge-compustat-and-crsp/","title":"Merge Compustat and CRSP","text":"Using the CRSP/Compustat Merged Database (CCM) to extract data is one of the fundamental steps in most finance studies. Here I document several SAS programs for annual, quarterly and monthly data, inspired by and adapted from several examples from the WRDS.1
","tags":["CRSP","Compustat","Code","SAS","WRDS"]},{"location":"posts/merge-compustat-and-crsp/#gvkey-permno-link-table","title":"GVKEY-PERMNO
link table","text":"First, we need to create a GVKEY-PERMNO
link table.
%let beg_yr = 2000;\n%let end_yr = 2003;\nproc sql;\ncreate table lnk as\nselect *\nfrom crsp.ccmxpf_lnkhist\nwhere\n/* See below for a description of the link types */\n linktype in (\"LU\", \"LC\") and\n/* Extend the period to deal with fiscal year issues */\n/* Note that the \".B\" and \".E\" missing value codes represent the */\n/* earliest possible beginning date and latest possible end date */\n/* of the Link Date range, respectively. */\n (&end_yr+1 >= year(linkdt) or linkdt = .B) and \n (&beg_yr-1 <= year(linkenddt) or linkenddt = .E)\n /* primary link assigned by Compustat or CRSP */\nand linkprim in (\"P\", \"C\") \norder by \n gvkey, linkdt;\nquit;\n
Link Type Description LC Link research complete (after extensive research by CRSP). Standard connection between databases. LU Link is unresearched by CRSP. It is established by comparing the Compustat and historical CRSP CUSIPs. LU represents the most popular link type. LS Link valid for this security only.2 LX Link to a security that trades on foreign exchange not included in CRSP data. LD Duplicate link to a security. Two GVKEYs map to a single PERMNO
(PERMCO
) during the same period, and this link should not be used. Almost all of these cases happened before 1990. LN Primary link exists but Compustat does not have prices.3 NR No link available; confirmed by research. NU No link available; not yet confirmed. According to WRDS's support page:
LC
, LU
and LS
) account for 41% of the links in CCM.LX
, LD
and LN
) account for only 2%. NR
and NU
) account for the rest 57%, which is expected because of the different coverage of the two databases.Generally, using LC
and LU
should be sufficient.
Example ccmfunda.sas
.
proc sql;\ncreate table mydata as \nselect *\nfrom lnk, comp.funda (keep=gvkey fyear datadate indfmt datafmt popsrc consol sale) as cst\nwhere indfmt= 'INDL' \nand datafmt='STD' \nand popsrc='D' \nand consol='C' \nand lnk.gvkey = cst.gvkey\nand (&beg_yr <= fyear <= &end_yr) \nand (linkdt <= cst.datadate or linkdt = .B) \nand (cst.datadate <= linkenddt or linkenddt = .E);\nquit;\n
","tags":["CRSP","Compustat","Code","SAS","WRDS"]},{"location":"posts/merge-compustat-and-crsp/#compustat-quarterly-and-crsp","title":"Compustat Quarterly and CRSP","text":"Example ccmfundq.sas
.
proc sql;\ncreate table mydata as \nselect *\nfrom lnk, comp.fundq (keep=gvkey fyearq datadate indfmt datafmt popsrc consol saley saleq) as cst\nwhere indfmt= 'INDL' \nand datafmt='STD' \nand popsrc='D' \nand consol='C' \nand lnk.gvkey = cst.gvkey\nand (&beg_yr <= fyearq <= &end_yr) \nand (linkdt <= cst.datadate or linkdt = .B) \nand (cst.datadate <= linkenddt or linkenddt = .E);\nquit;\n
","tags":["CRSP","Compustat","Code","SAS","WRDS"]},{"location":"posts/merge-compustat-and-crsp/#compustat-monthly-and-crsp","title":"Compustat Monthly and CRSP","text":"To be done.
WRDS Overview of CRSP/COMPUSTAT Merged: https://wrds-www.wharton.upenn.edu/pages/support/manuals-and-overviews/crsp/crspcompustat-merged-ccm/wrds-overview-crspcompustat-merged-ccm/ Use CRSP-Compustat Merged Table to Add Permno to Compustat Data: https://wrds-www.wharton.upenn.edu/pages/support/research-wrds/macros/wrds-macro-ccm/ Merging CRSP and Compustat Data: https://wrds-www.wharton.upenn.edu/pages/support/applications/linking-databases/linking-crsp-and-compustat/\u00a0\u21a9
Other CRSP PERMNOs
with the same PERMCO
will link to other GVKEYs
. LS
links mainly relate to ETFs where a single CRSP PERMCO
links to multiple Compustat GVKEYs
. In Compustat, even though they may belong to the same investment company (e.g. ISHARES), ETFs are presented with different GVKEYs
and CRSP flags this situation.\u00a0\u21a9
Prices are used to check the accuracy of the link. For linktype LN there is no price information available even on a quarterly or annual basis. The user will have to decide whether or not to include these links.\u00a0\u21a9
Thomson One Banker SDC Platinum database provides comprehensive M&A transaction data from early 1980s, and is perhaps the most widely used M&A database in the world.
This post documents the steps of downloading M&A deals from the SDC Platinum database. Specifically, I show how to download the complete M&A data where:
The screenshot below is the interface we'll see on launch of SDC Platinum. Click on Login
and we'll be asked to enter our initials and a project name for billing purpose.
Click on Login
and you'll be asked to enter your initials and a project name for billing purpose.
Since we're interested in M&A deals, select the Mergers & Acquisitions
tab and check US Targets
, so that we'll be searching in the domestic mergers database.
Then select the sample period, e.g. for the entire 2020 calendar year.
","tags":["M&A","SDC"]},{"location":"posts/download-ma-deals-from-sdc-platinum/#apply-filters-on-ma-deals","title":"Apply Filters on M&A Deals","text":"Now we can apply various filters on the M&A deals we want to download.
We can quickly add some filters on the target's and acquiror's nation, and make sure we check the Action to be Select
not Exclude
. Under the Deal
tab, we set the deal value to be at least $1m.
In case we couldn't find the desired filtering variable, we can head to All Items
tab and search manually. We add restrictions on acquiror and target public status here.
Lastly, for the Form of the Deal, we restrict to A
Acquisition (Stock), M
Merger (Stock or Assets) and AM
Acquisition of Majority Interest (Stock). We do not want to include deals that are acquisition of partial interest, recapitalization or repurchases in this case.
Our search requests should now look like below. Strongly recommended saving this session for later reuse.
","tags":["M&A","SDC"]},{"location":"posts/download-ma-deals-from-sdc-platinum/#specify-deal-variables-to-download","title":"Specify Deal Variables to Download","text":"Our effort so far is only shortlisting the M&A deals that we're interested in. We now need to specify the relevant deal variables to download by creating a new custom report.
As before, we can check those variables in the Basics
tab or search under All Items
tab. Once done, we format the report like below by arranging the order of the variables. This order is preserved when exported to spreadsheet. One note here is that each page has a maximum width of 160, so we need to insert page at proper places. It does not affect the layout of output spreadsheet. It also recommended to save the custom report for later reuse.
Finally, it's time to execute the requests and download the M&A deal data.
","tags":["M&A","SDC"]},{"location":"posts/download-ma-deals-from-sdc-platinum/#final-note","title":"Final Note","text":"As a final remark, the downloaded spreadsheet can be imported into SAS and matched with CRSP/Compustat using CUSIP and Ticker (SDC doesn't have permno
or gvkey
). First, merge the SDC CUSIP with the first 6-digit CUSIP in CRSP or Compustat; if no match, then use SDC Primary Ticker Symbol to match with the ticker symbol in CRSP or Compustat.
Merton (1974) Distance to Default (DD) model is useful in forecasting defaults. This post documents a few ways to empirically estimate Merton DD (and default probability) as in Bharath and Shumway (2008 RFS).
","tags":["SAS","Code","Merton","Default Probability"]},{"location":"posts/merton-dd/#the-merton-model","title":"The Merton Model","text":"The total value of a firm follows geometric Brownian motion,
\\[ \\begin{equation} dV = \\mu Vdt+\\sigma_V VdW \\end{equation} \\]where,
Assuming the firm has one discount bond maturing in \\(T\\) periods, the equity of the firm can be viewed as a call option on the underlying value of the firm with a strike price equal to the face value of the firm's debt and a time-to-maturity of \\(T\\).
The equity value of the firm is hence a function of the firm's value (Black-Scholes-Merton model):
\\[ \\begin{equation} E=V\\mathcal{N}(d_1)-e^{-rT}F\\mathcal{N}(d_2) \\end{equation} \\]where,
and,
\\[ \\begin{equation} d_1 = \\frac{\\ln(V/F)+(r+0.5\\sigma_V^2)T}{\\sigma_V \\sqrt{T}} \\end{equation} \\]with \\(d_2 = d_1-\\sigma_V \\sqrt{T}\\).
Moreover, the volatility of the firm's equity is related to the volatility of the firm's value, which follows from Ito's lemma,
\\[ \\begin{equation} \\sigma_E = \\left(\\frac{V}{E}\\right)\\frac{\\partial E}{\\partial V}\\sigma_V \\end{equation} \\]In the Black-Scholes-Merton model, \\(\\frac{\\partial E}{\\partial V}=\\mathcal{N}(d_1)\\), so that
\\[ \\begin{equation} \\sigma_E = \\left(\\frac{V}{E}\\right)\\mathcal{N}(d_1)\\sigma_V \\end{equation} \\]We observe from the market:
We then infer and solve for:
Once we have \\(V\\) and \\(\\sigma_V\\), the distance to default (DD) can be calculated as
\\[ \\begin{equation} DD=\\frac{\\ln(V/F)+ (\\mu-0.5\\sigma_V^2)T}{\\sigma_V\\sqrt{T}} \\end{equation} \\]where,
The implied probability of default, or expected default frequency (EDF, registered trademark of Moody's KMV), is
\\[ \\begin{equation} \\pi_{Merton} = \\mathcal{N}\\left(-DD\\right) \\end{equation} \\]","tags":["SAS","Code","Merton","Default Probability"]},{"location":"posts/merton-dd/#estimation","title":"Estimation","text":"","tags":["SAS","Code","Merton","Default Probability"]},{"location":"posts/merton-dd/#an-iterative-approach","title":"An iterative approach","text":"To estimate \\(\\pi_{Merton}\\), an iterative procedure can be applied instead of solving equations (2) and (5) simultaneously (see Crosbie and Bohn (2003), Vassalou and Xing (2004), Bharath and Shumway (2008), etc.).
A na\u00efve approach by Bharath and Shumway (2008) that does not solve equations (2) and (5) is constructed as below.
The na\u00efve distance to default is then
\\[ \\begin{equation} \\text{na\u00efve } DD=\\frac{\\ln[(E+F)/F]+ (r_{it-1}-0.5\\sigma_V^2)T}{\\sigma_V\\sqrt{T}} \\end{equation} \\]and the na\u00efve default probability is
\\[ \\begin{equation} \\pi_{\\text{na\u00efve}} = \\mathcal{N}(-\\text{na\u00efve } DD) \\end{equation} \\]","tags":["SAS","Code","Merton","Default Probability"]},{"location":"posts/merton-dd/#code","title":"Code","text":"The na\u00efve method is too simple and skipped for now.
Here I discuss the iterative approach.
","tags":["SAS","Code","Merton","Default Probability"]},{"location":"posts/merton-dd/#original-sas-code-in-bharath-and-shumway-2008-rfs","title":"Original SAS code in Bharath and Shumway (2008 RFS)","text":"The original code is enclosed in the SSRN version of Bharath and Shumway (2008), and was available on Shumway's website.
However, there are two issues in this version of code:
cdt=100*year(date)+month(date)
accidentally restricts the \"past year\" daily stock returns to the \"past month\" later. Note that at line 42-43 it merges by permno
and cdt
, where cdt
refers to a certain year-month. We can pause the program after this data step to confirm that indeed there is only a month of data for each permno
.cdt=100*&yyy.+&mmm.;
.Other issues are minor and harmless.
A copy of this version can be found here on GitHub.
","tags":["SAS","Code","Merton","Default Probability"]},{"location":"posts/merton-dd/#my-code","title":"My code","text":"Based on the original SAS code in Bharath and Shumway (2008), I made some edits and below is a fully self-contained SAS code that executes smoothly. Note that I've corrected the above issues.
","tags":["SAS","Code","Merton","Default Probability"]},{"location":"posts/minimum-variance-hedge-ratio/","title":"Minimum Variance Hedge Ratio","text":"This note briefly explains what's the minimum variance hedge ratio and how to derive it in a cross hedge, where the asset to be hedged is not the same as underlying asset.
","tags":["Hedge"]},{"location":"posts/minimum-variance-hedge-ratio/#the-hedge-ratio-h","title":"The Hedge Ratio \\(h\\)","text":"The hedge ratio \\(h\\) is the ratio of the size of the hedging position to the exposure of the asset to be hedged:
Apparently, if we vary \\(h\\), the variance (risk) of the combined hedged position will also change.
","tags":["Hedge"]},{"location":"posts/minimum-variance-hedge-ratio/#the-optimal-minimum-variance-hedge-ratio-h","title":"The (Optimal) Minimum-Variance Hedge Ratio \\(h^*\\)","text":"Our objective in hedging is to manage the variance (risk) of our position, making it as low as possible by setting the hedge ratio \\(h\\) to be the optimal hedge ratio \\(h^*\\) that minimises the variance of the combined hedged position.
","tags":["Hedge"]},{"location":"posts/minimum-variance-hedge-ratio/#hedge-where-aa","title":"Hedge where \\(A'=A\\)","text":"It's relatively easy when the underlying asset of the futures (\\(A'\\)) is the same as the asset to be hedged (\\(A\\)), as they have a perfect correlation and the same variance. Thus, as long as the hedge ratio \\(h=1\\), where the size of hedging position equals the exposure of the asset held, the perfect correlation and same variance ensure the value changes in the hedging position offset the changes in the value of asset to be hedged, so that the variance of the hedged position is minimum at zero (ignoring other basis risks). This means, the optimal minimum-variance hedge ratio \\(h^*=1\\).
","tags":["Hedge"]},{"location":"posts/minimum-variance-hedge-ratio/#cross-hedge-where-a-neq-a","title":"Cross Hedge where \\(A' \\neq A\\)","text":"When the underlying asset of the futures (\\(A'\\)) differ from asset to be hedged (\\(A\\)), the optimal hedge ratio \\(h^*\\) that minimises the portfolio variance is not necessarily 1 anymore.
Let's now derive \\(h^*\\).
Let's consider a short hedge, where we long \\(S_t\\) and short \\(h\\times F_t\\), hence\uff1a
The optimal hedge ratio \\(h^*\\) is the hedge ratio that minimises the variance of \\(\\Delta C\\).
\\[ h^* =\\underset{h}{\\operatorname{argmin}} \\text{Var}(\\Delta C) =\\underset{h}{\\operatorname{argmin}} \\text{Var}(\\Delta S_t-h\\times \\Delta F_t) \\]We also know that
\\[ \\text{Var}(\\Delta S_t-h\\times \\Delta F_t) = \\sigma^2_S + h^2\\sigma^2_F - 2h(\\rho \\sigma_S \\sigma_F) \\]To minimise the variance, the first-order condition (FOC) is that
\\[ \\frac{\\partial \\text{Var}(\\Delta C)}{\\partial h}=2h\\sigma^2_F-2(\\rho \\sigma_S \\sigma_F)=0 \\]The optimal hedge ratio \\(h^*\\) is the \\(h\\) that solves the FOC above. Therefore,
\\[ h^* = \\rho \\frac{\\sigma_S}{\\sigma_F} \\]","tags":["Hedge"]},{"location":"posts/minimum-variance-hedge-ratio/#intuition","title":"Intuition","text":"The optimal hedge ratio \\(h^*\\) describes the optimal \\(N_F/N_A\\), so that the optimal size of the hedging position:
\\[ N_F^* = h^* \\times N_A \\]If \\(\\rho=1\\) and \\(\\sigma_F=\\sigma_S\\), then \\(h^*=1\\):
If \\(\\rho=1\\) and \\(\\sigma_F=2\\sigma_S\\), then \\(h^*=0.5\\):
If \\(\\rho<1\\), then \\(h^*\\) depends on \\(\\rho\\) and \\({\\sigma_S}/{\\sigma_F}\\):
Among many reasons why people find it hard to use cryptocurrency there's a simple one -- memorising the private key is too hard. So, people invented brain wallet, which turns a string of words into a private key and thus wallet.
It's genius in that now a user needs only to memorise whatever he or she used to create the wallet. You can turn your name, phone number, DoB, favourite quote, lover's home address, ..., literally anything into a cryptocurrency wallet. However, this also means that if someone else successfully guessed the passphrase you used, they can sweep all the coins you have!
","tags":["Bitcoin","Python","Code"]},{"location":"posts/never-use-a-brain-wallet/#python-brain-wallet-for-bitcoin","title":"Python brain wallet for Bitcoin","text":"After a little bit of research, I've put together a simple brain wallet Python script that turns any input string to a legal Bitcoin private key and its address.
import codecs\nimport hashlib\nimport ecdsa\nclass BrainWallet:\n@staticmethod\ndef generate_address_from_passphrase(passphrase):\nprivate_key = str(hashlib.sha256(\npassphrase.encode('utf-8')).hexdigest())\naddress = BrainWallet.generate_address_from_private_key(private_key)\nreturn private_key, address\n@staticmethod\ndef generate_address_from_private_key(private_key):\npublic_key = BrainWallet.__private_to_public(private_key)\naddress = BrainWallet.__public_to_address(public_key)\nreturn address\n@staticmethod\ndef __private_to_public(private_key):\nprivate_key_bytes = codecs.decode(private_key, 'hex')\n# Get ECDSA public key\nkey = ecdsa.SigningKey.from_string(\nprivate_key_bytes, curve=ecdsa.SECP256k1).verifying_key\nkey_bytes = key.to_string()\nkey_hex = codecs.encode(key_bytes, 'hex')\n# Add bitcoin byte\nbitcoin_byte = b'04'\npublic_key = bitcoin_byte + key_hex\nreturn public_key\n@staticmethod\ndef __public_to_address(public_key):\npublic_key_bytes = codecs.decode(public_key, 'hex')\n# Run SHA256 for the public key\nsha256_bpk = hashlib.sha256(public_key_bytes)\nsha256_bpk_digest = sha256_bpk.digest()\n# Run ripemd160 for the SHA256\nripemd160_bpk = hashlib.new('ripemd160')\nripemd160_bpk.update(sha256_bpk_digest)\nripemd160_bpk_digest = ripemd160_bpk.digest()\nripemd160_bpk_hex = codecs.encode(ripemd160_bpk_digest, 'hex')\n# Add network byte\nnetwork_byte = b'00'\nnetwork_bitcoin_public_key = network_byte + ripemd160_bpk_hex\nnetwork_bitcoin_public_key_bytes = codecs.decode(\nnetwork_bitcoin_public_key, 'hex')\n# Double SHA256 to get checksum\nsha256_nbpk = hashlib.sha256(network_bitcoin_public_key_bytes)\nsha256_nbpk_digest = sha256_nbpk.digest()\nsha256_2_nbpk = hashlib.sha256(sha256_nbpk_digest)\nsha256_2_nbpk_digest = sha256_2_nbpk.digest()\nsha256_2_hex = codecs.encode(sha256_2_nbpk_digest, 'hex')\nchecksum = sha256_2_hex[:8]\n# Concatenate public key and checksum to get the address\naddress_hex = (network_bitcoin_public_key + checksum).decode('utf-8')\nwallet = BrainWallet.base58(address_hex)\nreturn wallet\n@staticmethod\ndef base58(address_hex):\nalphabet = '123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz'\nb58_string = ''\n# Get the number of leading zeros and convert hex to decimal\nleading_zeros = len(address_hex) - len(address_hex.lstrip('0'))\n# Convert hex to decimal\naddress_int = int(address_hex, 16)\n# Append digits to the start of string\nwhile address_int > 0:\ndigit = address_int % 58\ndigit_char = alphabet[digit]\nb58_string = digit_char + b58_string\naddress_int //= 58\n# Add '1' for each 2 leading zeros\nones = leading_zeros // 2\nfor one in range(ones):\nb58_string = '1' + b58_string\nreturn b58_string\n
","tags":["Bitcoin","Python","Code"]},{"location":"posts/never-use-a-brain-wallet/#easily-cracking-a-wallet","title":"Easily \"cracking\" a wallet","text":"Let me show you some really easy-to-guess passphrases and their associated private keys and addresses. As an example, the code below uses \"password\" as the input passphrase and derives the private key and address from it.
passphrase = 'password'\nwallet = BrainWallet()\nprivate_key, address = wallet.generate_address_from_passphrase(passphrase)\nprint(f'passphrase: {passphrase}')\nprint(f'private key: {private_key}')\nprint(f'address: {address}')\n
The output is:
passphrase: password\nprivate key: 5e884898da28047151d0e56f8dc6292773603d0d6aabbdd62a11ef721d1542d8\naddress: 16ga2uqnF1NqpAuQeeg7sTCAdtDUwDyJav\n
As at May 22, 2019, this address has 45,014 transactions with a total of 0.3563 BTC (of course the balance is zero)! You can check its current balance at blockchain.com. Also, congratulations, you are now one of the many owners of this address/wallet. So next time you observe some coins transfered to it, you'll be able to use it as well (though I don't suggest you to do so)!
","tags":["Bitcoin","Python","Code"]},{"location":"posts/never-use-a-brain-wallet/#some-other-cracked-wallets","title":"Some other \"cracked\" wallets","text":"I explored a little bit more and it's surprising to find out how easy it is to crack a wallet this way. Below is a table of some passphrases and their associated keys and addresses.
Passphrase Private Key Address Used satoshi da2876b3eb31edb4436fa4650673fc6f01f90de2f1793c4ec332b2387b09726f 1ADJqstUMBB5zFquWg19UqZ7Zc6ePCpzLE True bitcoin 6b88c087247aa2f07ee1c5956b8e1a9f4c7f892a70e324f1bb3d161e05ca107b 1E984zyYbNmeuumzEdqT8VSL8QGJi3byAD True hello world b94d27b9934d3e08a52e52d7da7dabfac484efe37a5380ee9088f7ace2efcde9 1CS8g7nwaxPPprb4vqcTVdLCuCRirsbsMb True testing cf80cd8aed482d5d1527d7dc72fceff84e6326592848447d2dc0b0e87dfc9a90 1JdDsbYYRSpsTnBVgenruULVeUjt5z6WnR True god 5723360ef11043a879520412e9ad897e0ebcb99cc820ec363bfecc9d751a1a99 1KxmSmcMTmPvU1qSLYpJLrqnSzBoQ53NXN True terminator aa802f654e3ae7aaa1b73f8724056a05e2691accea8dd90057916080f84d7e93 18kvt3D6K1CG8MxGP6ke7q6vLU5NGpLZdR True abc ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad 1NEwmNSC7w9nZeASngHCd43Bc5eC2FmXpn TrueAnd a lot of swear words are used as well, but I'm just going to skip them.
Apart from the single world and short phrases, some people do use famous quotes. As an example, see this one from A Tale of Two Cities:
it was the best of times it was the worst of times
Its private key is af8da705bfd95621983e5cf4232ac1ca0c79b47122e3defd8a98fa9a4387d985
and its address is 17WenQJaYvqCNumebQU54TsixWtQ1GQ4ND. It has received 1 BTC in total but again the balance is zero, lol.
Never use a brain wallet. Because if you can think of it, someone else might also be able to come up with same passphrase. But, if you are comfortable or absolutely sure that your passphrase is secret, feel free to use the script above and make yourself a wallet. \ud83d\ude0f
","tags":["Bitcoin","Python","Code"]},{"location":"posts/probability-of-informed-trading/","title":"Probability of Informed Trading (PIN)","text":"Note
This post was originally published on my old blog in March 2018 in Chinese. Translation is provided by ChatGPT-4.
The above figure is based on data I collected in 2019 of trades on Binance.
In the market microstructure literature, Easley et. al. (1996) proposed a trading model that can decompose the bid-ask spread. The most commendable aspect of this model is the introduction of the \"Probability of Informed Trading,\" or PIN, which serves as a means of measuring the informational component in the spread. As the name suggests, under ideal conditions, PIN can reflect the probability of informed trading in a market with market maker. In this article, I attempt to comb through the modeling process in the Easley et. al. (1996) paper and discuss how to handle the objective function in maximum likelihood estimation to avoid overflow errors during computation.
","tags":["PIN","Informed Trading"]},{"location":"posts/probability-of-informed-trading/#model","title":"Model","text":"Assume that the buy and sell orders of informed and uninformed traders follow independent Poisson processes, and the following tree diagram describes the entire trading process:
Next, assume that the market maker is a Bayesian, that is, he will update his understanding of the overall market status, especially whether there is new information on that day, by observing trades and trading rates. Suppose each trading day is independent, \\(P(t)=(P_n(t), P_b(t), P_g(t))\\) is the market maker's prior probability perception, where \\(n\\) represents no new information, \\(b\\) represents bearish bad news, and \\(g\\) represents bullish good news, so \\(P(t)=(1-\\alpha, \\alpha\\delta, \\alpha(1-\\delta))\\).
Let \\(S_t\\) be the event of a sell order arriving at time \\(t\\), and \\(B_t\\) be the event of a buy order arriving at time \\(t\\). Also, let \\(P(t|S_t)\\) be the updated probability perception of the market maker after observing a sell order arriving at time \\(t\\) based on the existing information. Then, according to Bayes' theorem, if there is no new information at time \\(t\\) and the market maker observes a sell order, the posterior probability \\(P_n(t|S_t)\\) should be:
\\[ \\begin{equation} P_n(t|S_t)=\\frac{P_n(t)\\varepsilon}{\\varepsilon+P_b(t)\\mu}\\end{equation} \\]Similarly, if there is bearish information and the market maker observes a sell order at time \\(t\\), the posterior probability \\(P_b(t|S_t)\\) should be:
\\[ \\begin{equation} P_b(t|S_t)=\\frac{P_b(t)(\\varepsilon+\\mu)}{\\varepsilon+P_b(t)\\mu}\\end{equation} \\]If there is bullish information and the market maker observes a sell order at time \\(t\\), the posterior probability \\(P_g(t|S_t)\\) should be:
\\[ \\begin{equation} P_g(t|S_t)=\\frac{P_g(t)\\varepsilon}{\\varepsilon+P_b(t)\\mu} \\end{equation} \\]Thus, the expected zero-profit bid price at time \\(t\\) on day \\(i\\) should be the conditional expectation of the asset value based on historical information and observing sell order at this time, that is,
\\[ \\begin{equation} b(t)=\\frac{P_n(t)\\varepsilon V^*_i+P_b(t)(\\varepsilon+\\mu)\\underline{V}_i+P_g(t)\\varepsilon\\overline{V}_i}{\\varepsilon+P_b(t)\\mu} \\end{equation} \\]Here, \\(V_i\\) is the value of the asset at the end of day \\(i\\), and let the asset value be \\(\\overline{V}_i\\) when there is positive news, \\(\\underline{V}_i\\) when there is negative news, and \\(V^*_i\\) when there is no news, with \\(\\underline{V}_i < V^*_i < \\overline{V}_i\\).
At this point, the ask price should be:
\\[ \\begin{equation} a(t)=\\frac{P_n(t)\\varepsilon V^*_i+P_b(t)\\varepsilon\\underline{V}_i+P_g(t)(\\varepsilon+\\mu)\\overline{V}_i}{\\varepsilon+P_g(t)\\mu}\\end{equation} \\]Let's associate these bid and ask prices with the expected asset value at time \\(t\\). Considering that the conditional expectation of the asset value at this time is:
\\[ \\begin{equation} E[V_i|t]=P_n(t)V^*_i+P_b(t)\\underline{V}_i+P_g(t)\\overline{V}_i\\end{equation} \\]we can write the above \\(b(t)\\) and \\(a(t)\\) as:
\\[ \\begin{equation} b(t)=E[V_i|t]-\\frac{\\mu P_b(t)}{\\varepsilon+\\mu P_b(t)}(E[V_i|t]-\\underline{V}_i)\\end{equation} \\] \\[ \\begin{equation} a(t)=E[V_i|t]+\\frac{\\mu P_g(t)}{\\varepsilon+\\mu P_g(t)}(\\overline{V}_i-E[V_i|t])\\end{equation} \\]Thus, the bid-ask spread is \\(a(t)-b(t)\\), which is:
\\[ \\begin{equation} a(t)-b(t)=\\frac{\\mu P_g(t)}{\\varepsilon+\\mu P_g(t)}(\\overline{V}_i-E[V_i|t])+\\frac{\\mu P_b(t)}{\\varepsilon+\\mu P_b(t)}(E[V_i|t]-\\underline{V}_i)\\end{equation} \\]This indicates that the bid-ask spread at time \\(t\\) is actually:
The probability of a buy order being an informed trade \\(\\times\\) the expected loss due to the informed buyer + the probability of a sell order being an informed trade \\(\\times\\) the expected loss due to the informed seller
Therefore, the probability that any trade at time \\(t\\) is based on asymmetric information from informed traders is the sum of these two probabilities:
\\[ \\begin{equation} PIN(t)=\\frac{\\mu P_g(t)}{\\varepsilon+\\mu P_g(t)}+\\frac{\\mu P_b(t)}{\\varepsilon+\\mu P_b(t)}=\\frac{\\mu(1-P_n(t))}{\\mu(1-P_n(t))+2\\varepsilon}\\end{equation} \\]If no information event occurs (\\(P_n(t)=1\\)) or there are no informed trades (\\(\\mu=0\\)), both \\(PIN\\) and the bid-ask spread should be zero. If the probabilities of positive and negative news are equal, i.e., \\(\\delta=1-\\delta\\), the bid-ask spread can be simplified to:
\\[ \\begin{equation} a(t)-b(t)=\\frac{\\alpha\\mu}{\\alpha\\mu+2\\varepsilon}[\\overline{V}_i-\\underline{V}_i]\\end{equation} \\]And our \\(PIN\\) measure is simplified to:
\\[ \\begin{equation} PIN(t)=\\frac{\\alpha\\mu}{\\alpha\\mu+2\\varepsilon}\\end{equation} \\]","tags":["PIN","Informed Trading"]},{"location":"posts/probability-of-informed-trading/#model-estimation","title":"Model Estimation","text":"After the model is established, let's talk about the parameter estimation of this model. The parameters we need to estimate, \\(\\theta=(\\alpha, \\delta, \\varepsilon, \\mu)\\), are actually very difficult to estimate. This is because we cannot directly observe them, and can only observe the arrival of buy and sell orders. In this model, the daily buy and sell orders are assumed to follow one of the three Poisson processes. Although we don't know which process it is specifically, the overall idea is: more buy orders imply potential good news, more sell orders imply potential bad news, and overall buying and selling will decrease when there is no new information. With this idea in mind, we can try to estimate \\(\\theta\\) using the maximum likelihood estimation method.
First, according to the trading model shown in the diagram, assume that there is bad news on a certain day, then the arrival rate of sell orders is \\((\\mu+\\varepsilon)\\), which means both informed and uninformed traders participate in selling. The arrival rate of buy orders is \\(\\varepsilon\\), that is, only uninformed traders will continue to buy. Therefore, the probability of observing a sequence of trades with \\(B\\) buy orders and \\(S\\) sell orders in a period of time is:
\\[ \\begin{equation} e^{-\\varepsilon} \\frac{\\varepsilon^B}{B!} e^{-(\\mu+\\varepsilon)} \\frac{(\\mu+\\varepsilon)^S}{S!} \\end{equation} \\]If there is good news on a certain day, the probability of observing a sequence of trades with \\(B\\) buy orders and \\(S\\) sell orders in a period of time is:
\\[ \\begin{equation} e^{-\\varepsilon} \\frac{\\varepsilon^B}{B!} e^{-\\varepsilon} \\frac{\\varepsilon^S}{S!} \\end{equation} \\]If there is no new information on a certain day, the probability of observing a sequence of trades with \\(B\\) buy orders and \\(S\\) sell orders in a period of time is:
\\[ \\begin{equation} e^{-(\\mu+\\varepsilon)} \\frac{(\\mu+\\varepsilon)^B}{B!} e^{-\\varepsilon} \\frac{\\varepsilon^S}{S!} \\end{equation} \\]So, the probability of observing a total of \\(B\\) buy orders and \\(S\\) sell orders on a trading day should be the weighted average of the above three possibilities, and the weights here are the probabilities of each possibility. Therefore, we can write out the likelihood function:
\\[ \\begin{align} L((B, S)| \\theta)=\u00a0 &(1-\\alpha)e^{-\\varepsilon} \\frac{\\varepsilon^B}{B!} e^{-\\varepsilon} \\frac{\\varepsilon^S}{S!} \\\\ &+ \\alpha\\delta\u00a0 e^{-\\varepsilon} \\frac{\\varepsilon^B}{B!} e^{-(\\mu+\\varepsilon)} \\frac{(\\mu+\\varepsilon)^S}{S!} \\\\ &+ \\alpha(1-\\delta) e^{-(\\mu+\\varepsilon)} \\frac{(\\mu+\\varepsilon)^B}{B!} e^{-\\varepsilon} \\frac{\\varepsilon^S}{S!} \\end{align} \\]Hence, the objective function of the maximum likelihood function is:
\\[ \\begin{equation} L(D|\\theta)=\\prod_{i=1}^{N}L(\\theta|(B_i, S_i)) \\end{equation} \\]","tags":["PIN","Informed Trading"]},{"location":"posts/probability-of-informed-trading/#bottomline","title":"Bottomline","text":"The problem seems to end here. With the objective function, it seems to be all set as long as you program it and pay attention to the parameter boundaries. However, the real challenge comes next, because if you really write the objective function like this and run it, you will inevitably encounter an overflow error. After all, this function is filled with powers and factorials. Even if the time element is chosen very small, some highly liquid assets will still have hundreds of transactions within a few seconds. Therefore, both \\(B!\\), \\(S!\\), and \\((\\mu+\\varepsilon)^B\\) can beautifully crash your program. So, further processing of the objective function here is extremely important.
By observing equation (16), the three terms in the likelihood function can actually extract a common factor \\(e^{-2\\varepsilon}(\\mu+\\varepsilon)^{B+S}/(B!S!)\\)! After extracting this common factor, you can also substitute \\(x\\equiv \\frac{\\varepsilon}{\\mu+\\varepsilon}\\in [0, 1]\\) into it. The transformed likelihood function, after taking the logarithm, will be in the form:
\\[ \\begin{align} l((B, S)| \\theta)=&\\ln(L((B, S)| \\theta))\\\\ =&-2\\varepsilon+(B+S)\\ln(\\mu+\\varepsilon) \\\\ &+\\ln((1-\\alpha)x^{B+S}+\\alpha\\delta e^{-\\mu}x^B + \\alpha(1-\\delta)e^{-\\mu}x^S) \\\\ &-\\ln(B!S!) \\end{align} \\]Now, since the last term \\(\\ln(B!S!)\\) does not affect the parameter estimation at all, it can be safely excluded. The remaining part can perfectly avoid overflow. Personally, I think the brilliant move here is the introduction of \\(x\\equiv \\frac{\\varepsilon}{\\mu+\\varepsilon}\\in [0, 1]\\), which prevents the overflow error caused by \\((\\mu+\\varepsilon)>1\\).
","tags":["PIN","Informed Trading"]},{"location":"posts/python-shared-memory-in-multiprocessing/","title":"Python Shared Memory in Multiprocessing","text":"Python 3.8 introduced a new module multiprocessing.shared_memory
that provides shared memory for direct access across processes. My test shows that it significantly reduces the memory usage, which also speeds up the program by reducing the costs of copying and moving things around.1
In this test, I generated a 240MB numpy.recarray
from a pandas.DataFrame
with datetime
, int
and str
typed columns. I used numpy.recarray
because it can preserve the dtype
of each column, so that later I can reconstruct the same array from the buffer of shared memory.
I performed a simple numpy.nansum
on the numeric column of the data using two methods. The first method uses multiprocessing.shared_memory
where the 4 spawned processes directly access the data in the shared memory. The second method passes the data to the spawned processes, which effectively means each process will have a separate copy of the data.
A quick run of the test code below shows that the first method based on shared_memory
uses minimal memory (peak usage is 0.33MB) and is much faster (2.09s) than the second one where the entire data is copied and passed into each process (peak memory usage of 1.8G and takes 216s). More importantly, the memory usage under the second method is consistently high.
from multiprocessing.shared_memory import SharedMemory\nfrom multiprocessing.managers import SharedMemoryManager\nfrom concurrent.futures import ProcessPoolExecutor, as_completed\nfrom multiprocessing import current_process, cpu_count, Process\nfrom datetime import datetime\nimport numpy as np\nimport pandas as pd\nimport tracemalloc\nimport time\ndef work_with_shared_memory(shm_name, shape, dtype):\nprint(f'With SharedMemory: {current_process()=}')\n# Locate the shared memory by its name\nshm = SharedMemory(shm_name)\n# Create the np.recarray from the buffer of the shared memory\nnp_array = np.recarray(shape=shape, dtype=dtype, buf=shm.buf)\nreturn np.nansum(np_array.val)\ndef work_no_shared_memory(np_array: np.recarray):\nprint(f'No SharedMemory: {current_process()=}')\n# Without shared memory, the np_array is copied into the child process\nreturn np.nansum(np_array.val)\nif __name__ == \"__main__\":\n# Make a large data frame with date, float and character columns\na = [\n(datetime.today(), 1, 'string'),\n(datetime.today(), np.nan, 'abc'),\n] * 5000000\ndf = pd.DataFrame(a, columns=['date', 'val', 'character_col'])\n# Convert into numpy recarray to preserve the dtypes (1)\nnp_array = df.to_records(index=False, column_dtypes={'character_col': 'S6'})\ndel df\nshape, dtype = np_array.shape, np_array.dtype\nprint(f\"np_array's size={np_array.nbytes/1e6}MB\")\n# With shared memory\n# Start tracking memory usage\ntracemalloc.start()\nstart_time = time.time()\nwith SharedMemoryManager() as smm:\n# Create a shared memory of size np_arry.nbytes\nshm = smm.SharedMemory(np_array.nbytes)\n# Create a np.recarray using the buffer of shm\nshm_np_array = np.recarray(shape=shape, dtype=dtype, buf=shm.buf)\n# Copy the data into the shared memory\nnp.copyto(shm_np_array, np_array)\n# Spawn some processes to do some work\nwith ProcessPoolExecutor(cpu_count()) as exe:\nfs = [exe.submit(work_with_shared_memory, shm.name, shape, dtype)\nfor _ in range(cpu_count())]\nfor _ in as_completed(fs):\npass\n# Check memory usage\ncurrent, peak = tracemalloc.get_traced_memory()\nprint(f\"Current memory usage {current/1e6}MB; Peak: {peak/1e6}MB\")\nprint(f'Time elapsed: {time.time()-start_time:.2f}s')\ntracemalloc.stop()\n# Without shared memory\ntracemalloc.start()\nstart_time = time.time()\nwith ProcessPoolExecutor(cpu_count()) as exe:\nfs = [exe.submit(work_no_shared_memory, np_array)\nfor _ in range(cpu_count())]\nfor _ in as_completed(fs):\npass\n# Check memory usage\ncurrent, peak = tracemalloc.get_traced_memory()\nprint(f\"Current memory usage {current/1e6}MB; Peak: {peak/1e6}MB\")\nprint(f'Time elapsed: {time.time()-start_time:.2f}s')\ntracemalloc.stop()\n
Warning
A very important note about using multiprocessing.shared_memory
, as at June 2020, is that the numpy.ndarray
cannot have a dtype=dtype('O')
. That is, the dtype
cannot be dtype(object)
. If it is, there will be a segmentation fault when child processes try to access the shared memory and dereference it. It happens when the column contains strings.
To solve this problem, you need to specify the dtype
in df.to_records()
. For example:
np_array = df.to_records(index=False\uff0ccolumn_dtypes={'character_col': 'S6'})\n
Here, we specify that character_col
contains strings of length 6. If it contains Unicode, we can use 'U6'
instead. Longer strings will then be truncated at the specified length. As such, there won't be anymore segfault.
This test is performed on a 2017 12-inch MacBook with 1.3 GHz Dual-Core Intel Core i5 and 8 GB 1867 MHz LPDDR3 RAM.\u00a0\u21a9
This note is just to show that the different variants of Black-Scholes formula in textbook and tutorial solutions are in fact the same.
This is the one shown in our formula sheet, and is also the traditional presentation of Black-Scholes model.
\\[ \\begin{equation} C=SN(d_1)-N(d_2)Ke^{-r_f t} \\end{equation} \\] \\[ \\begin{equation} d_1=\\frac{ln(\\frac{S}{K})+(r_f+\\frac{\\sigma^2}{2})t}{\\sigma \\sqrt{t}} \\end{equation} \\] \\[ \\begin{equation} d_2=d_1 - \\sigma \\sqrt{t} \\end{equation} \\]","tags":["Option"]},{"location":"posts/reconciliation-of-black-scholes-variants/#variant-2","title":"Variant 2","text":"This one comes from textbook, and looks slightly different in that \\(PV(K)\\) replaces \\(K\\) in the natural logarithm.
\\[ \\begin{equation} C=SN(d_1)-N(d_2)PV(K) \\end{equation}\\] \\[ \\begin{equation} d_1=\\frac{ln(\\frac{S}{PV(K)})}{\\sigma \\sqrt{t}}+\\frac{\\sigma \\sqrt{t}}{2} \\end{equation} \\] \\[ \\begin{equation} d_2=d_1 - \\sigma \\sqrt{t} \\end{equation} \\]However, it's in fact easy to show that \\(d_1\\) in eq. (5) is the same as in eq. (2): Under continuous compounding, \\(PV(K)=Ke^{-r_f t}\\):
\\[ \\begin{align} d_1 &=\\frac{ln(\\frac{S}{PV(K)})}{\\sigma \\sqrt{t}}+\\frac{\\sigma \\sqrt{t}}{2}\\newline &=\\frac{ln(\\frac{S}{Ke^{-r_f t}})}{\\sigma \\sqrt{t}} +\\frac{\\frac{\\sigma^2}{2}t}{\\sigma \\sqrt{t}}\\newline &=\\frac{ln(\\frac{S}{Ke^{-r_f t}})+\\frac{\\sigma^2}{2}t}{\\sigma \\sqrt{t}}\\newline &=\\frac{ln(\\frac{S}{K})+r_f t+\\frac{\\sigma^2}{2}t}{\\sigma \\sqrt{t}}\\newline &=\\frac{ln(\\frac{S}{K})+(r_f+\\frac{\\sigma^2}{2})t}{\\sigma \\sqrt{t}}=eq. (2) \\end{align} \\]Therefore, the two variants are effectively the same under continuous compounding. \u00a0
","tags":["Option"]},{"location":"posts/specification-curve-analysis/","title":"Specification Curve Analysis","text":"","tags":["Stata","Code","Specification Curve","Apps"]},{"location":"posts/specification-curve-analysis/#motivation","title":"Motivation","text":"More often than not, empirical researchers need to argue that their chosen model specification reigns. If not, they need to run a battery of tests on alternative specifications and report them. The problem is, researchers can fit a few tables each with a few models in the paper at best, and it's extremely hard for readers to know whether the reported results are being cherry-picked.
So, why not run all possible model specifications and find a concise way to report them all?
","tags":["Stata","Code","Specification Curve","Apps"]},{"location":"posts/specification-curve-analysis/#the-specification-curve","title":"The Specification Curve","text":"The idea of specification curve is a direct answer to the question provided by Simonsohn, Simmons and Nelson (2020).1 2
To intuitively explain this concept, below is the Figure 2 from my paper Organization Capital and Executive Performance Incentives on the Journal of Banking & Finance,3 which is used to show the robustness of an substitution effect of organization capital on executive pay-for-performance sensitivity. Therefore, the estimated coefficients for the variable of interest OC are expected to be negative across different model specifications.
The plot is made up of two parts. The upper panel plots the coefficient estimates of OC in various model specifications, in descending order, and the associated 95% confidence intervals. Sample sizes of each model are plotted as bars at the bottom of the upper panel. For simplicity, we annotate only the maximum and minimum coefficient estimates, as well as the threshold of zero. The lower panel reports the exact specification for each model, where colored dots indicate the choices from various specification alternatives. Both panels share the same x-axis of model number.
To interpret this specification curve, for example, OC has an estimated coefficient of \u22120.11 in the first model, which uses the natural logarithm of DELTA_MGMT (a measure of executive pay-for-performance sensitivity) as the dependent variable, and control variables as in the baseline model, including industry fixed effects and year fixed effects, clustering standard errors at the firm level, and is estimated on the full sample.
Further, the ordered nature of the curve implies that this is the minimum estimated impact of OC on ln(DELTA_MGMT), whereas the maximum estimated coefficient is doubled at \u22120.22 when the industry fixed effects are replaced with the more conservative firm fixed effects and estimated on the sample excluding global financial crisis period. More importantly, in all specifications, we find the coefficient estimates of OC to be statistically significant. Using an alternative measure of executive pay-for-performance sensitivity as the dependent variable, again, has minimal impact on the documented substitution effect of OC.
This specification curve reports a total of 2*2*4*1*2=32 specifications:
Beyond reporting all estimates from hundreds and thousands of models, the more appealing point of specification curve is that we can identify the most impactful factors in specifying the model. As the models are sorted by the coefficient estimates, the distribution of dots in the lower panel can reveal whether certain specification choices drive the results.
Of course, even 32 models cannot exhaust all possible specifications. Nevertheless, by addressing the most critical ones, we are able to use one specification curve plot to convince readers that our findings are robust.
","tags":["Stata","Code","Specification Curve","Apps"]},{"location":"posts/specification-curve-analysis/#specurve-stata-command-for-specification-curve-analysis","title":"specurve
- Stata command for specification curve analysis","text":"I developed a Stata command specurve
for specification curve analysis. It is written in Stata Mata and has no external dependencies.4 The source code is available at GitHub.
Run the following command in Stata:
net install specurve, from(\"https://raw.githubusercontent.com/mgao6767/specurve/master\") replace\n
","tags":["Stata","Code","Specification Curve","Apps"]},{"location":"posts/specification-curve-analysis/#example-usage-output","title":"Example usage & output","text":"","tags":["Stata","Code","Specification Curve","Apps"]},{"location":"posts/specification-curve-analysis/#regressions-with-reghdfe","title":"Regressions with reghdfe
","text":". use \"http://www.stata-press.com/data/r13/nlswork.dta\", clear\n(National Longitudinal Survey. Young Women 14-26 years of age in 1968)\n\n. copy \"https://mingze-gao.com/specurve/example_config_nlswork_reghdfe.yml\" ., replace\n. specurve using example_config_nlswork_reghdfe.yml, saving(specurve_demo)\n
","tags":["Stata","Code","Specification Curve","Apps"]},{"location":"posts/specification-curve-analysis/#iv-regressions-with-ivreghdfe","title":"IV regressions with ivreghdfe
","text":". copy \"https://mingze-gao.com/specurve/example_config_nlswork_ivreghdfe.yml\" ., replace\n. specurve using example_config_nlswork_ivreghdfe.yml, cmd(ivreghdfe) rounding(0.01) title(\"IV regression with ivreghdfe\")\n
Check help specurve
in Stata for a step-by-step guide.
Estimation results are saved in the frame named \"specurve\".
Use frame change specurve
to check the results.
Use frame change default
to switch back to the original dataset.
Simonsohn, Uri and Simmons, Joseph P. and Nelson, Leif D., 2020, Specification Curve Analysis, Nature Human Behaviour.\u00a0\u21a9
Special thanks to Rawley Heimer from Boston College who visited our discipline in 2019 and introduced the Specification Curve Analysis to us in the seminar on research methods.\u00a0\u21a9
This plot was made using a previous version of specurve
.\u00a0\u21a9
Previous versions depend on Stata 16's Python integration.\u00a0\u21a9
Nowadays top journals favour more granular studies. Sometimes it's useful to dig into the raw SEC filings and perform textual analysis. This note documents how I download all historical SEC filings via EDGAR and conduct some textual analyses.
Tip
If you don't require a very customized textual analysis, you should try for example SeekEdgar.com.
","tags":["Textual Analysis","SEC","Python","Code"]},{"location":"posts/textual-analysis-on-sec-filings/#1-build-a-master-index-of-sec-filings","title":"1. Build a master index of SEC filings","text":"I use the python-edgar
to download quarterly zipped index files to ./edgar-idx
.
$ mkdir ~/edgar && cd ~/edgar\n$ git clone https://github.com/edouardswiac/python-edgar.git\n$ python ./python-edgar/run.py -d ./edgar-idx\n
Then merge the downloaded tsv files into a master file using cat
.
$ cat ./edgar-idx/*.tsv > ./edgar-idx/master.tsv\n$ du -h ./edgar-idx/master.tsv\n
The resulting master.tsv
is about 2.6G as at Feb 2020. I then use the following python script to build a SQLite database for more efficient query.
# Load index files in `edgar-idx` to a sqlite database.\nimport sqlite3\nEDGAR_BASE = \"https://www.sec.gov/Archives/\"\ndef parse(line):\n# each line: \"cik|firm_name|file_type|date|url_txt|url_html\"\n# an example:\n# \"99780|TRINITY INDUSTRIES INC|8-K|2020-01-15|edgar/data/99780/0000099780-\\\n# 20-000008.txt|edgar/data/99780/0000099780-20-000008-index.html\"\nline = tuple(line.split('|')[:5])\nl = list(line)\nl[-1] = EDGAR_BASE + l[-1]\nreturn tuple(l)\nif __name__ == '__main__':\nconn = sqlite3.connect(r\"edgar-idx.sqlite3\")\nc = conn.cursor()\nc.execute('''CREATE TABLE IF NOT EXISTS edgar_idx \n (cik TEXT, firm_name TEXT, file_type TEXT, date DATE, url TEXT,\n PRIMARY KEY(cik, file_type, date));''')\nfilename = './edgar-idx/master.tsv'\nwith open(filename, 'r') as f:\nlines = f.readlines()\ndata = [parse(line) for line in lines]\nc.executemany('INSERT OR IGNORE INTO edgar_idx \\\n (cik, firm_name, file_type, date, url) VALUES (?,?,?,?,?)', data)\nconn.commit()\nconn.close()\n
","tags":["Textual Analysis","SEC","Python","Code"]},{"location":"posts/textual-analysis-on-sec-filings/#2-download-filings-from-edgar","title":"2. Download filings from EDGAR","text":"I write the following script to download filings from EDGAR. Note that this script is only a skeleton. The full implementation has proper logging, speed control and detailed error handling. For example, you'll need to keep track of failures and re-download them later.
Warning
As per SEC's policy, you should limit concurrent requests to below 10 per second. Hence, there is no need to use a proxy pool, such as Scylla
.
This example script download all 8-K files to ./data/{cik}/{file_type}/{date}.txt.gz
.
Compression is highly recommended unless you've TBs of free disk space!
# Download all 8-K filings.\nimport os\nimport sqlite3\nimport requests\nimport concurrent.futures\nimport gzip\nimport tqdm\ndef download(job):\ncik, _, file_type, date, url = job\ntry:\nres = requests.get(url)\nfilename = f'./data/{cik}/{file_type}/{date}.txt.gz'\nif res.status_code == 200:\nwith gzip.open(filename, 'wb') as f:\nf.write(res.content)\nexcept Exception:\npass\nif __name__ == \"__main__\":\n# select what to download\nconn = sqlite3.connect(r\"edgar-idx.sqlite3\")\nc = conn.cursor()\nc.execute('SELECT * FROM edgar_idx WHERE file_type=\"8-K\";')\njobs = c.fetchall()\nconn.close()\n# start downloading\nprogress = tqdm.tqdm(total=len(jobs))\nfutures = []\nwith concurrent.futures.ThreadPoolExecutor(max_workers=16) as exe:\nfor job in jobs:\ncik, _, file_type, date, url = job\nfilename = f'./data/{cik}/{file_type}/{date}.txt.gz'\nos.makedirs(os.path.dirname(filename), exist_ok=True)\nif os.path.exists(filename):\nprogress.update()\nelse:\nf = exe.submit(download, job)\nf.add_done_callback(progress.update)\nfutures.append(f)\nfor f in concurrent.futures.as_completed(futures):\npass\n
","tags":["Textual Analysis","SEC","Python","Code"]},{"location":"posts/textual-analysis-on-sec-filings/#3-example-textual-analyses","title":"3. Example textual analyses","text":"The downloaded txt files are the text version of filings htmls, which generally are well structured. Specifically, each filing is structured as:
<SEC-DOCUMENT>\n<SEC-HEADER></SEC-HEADER>\n<DOCUMENT>\n<TYPE>\n<SEQUENCE>\n<FILENAME>\n<DESCRIPTION>\n<TEXT>\n</TEXT>\n</DESCRIPTION>\n</FILENAME>\n</SEQUENCE>\n</TYPE>\n</DOCUMENT>\n<DOCUMENT></DOCUMENT>\n<DOCUMENT></DOCUMENT>\n ...\n</SEC-DOCUMENT>\n
Example <SEC-DOCUMENT>\n<SEC-HEADER></SEC-HEADER>\n<DOCUMENT>\n<TYPE>8-K\n <SEQUENCE>1\n <FILENAME>f13478e8vk.htm\n <DESCRIPTION>FORM 8-K\n <TEXT>\n ...\n </TEXT>\n</DESCRIPTION>\n</FILENAME>\n</SEQUENCE>\n</TYPE>\n</DOCUMENT>\n<DOCUMENT>\n<TYPE>EX-99.1\n <SEQUENCE>2\n <FILENAME>f13478exv99w1.htm\n <DESCRIPTION>EXHIBIT 99.1\n <TEXT>\n ...\n </TEXT>\n</DESCRIPTION>\n</FILENAME>\n</SEQUENCE>\n</TYPE>\n</DOCUMENT>\n<DOCUMENT></DOCUMENT>\n ...\n</SEC-DOCUMENT>\n
","tags":["Textual Analysis","SEC","Python","Code"]},{"location":"posts/textual-analysis-on-sec-filings/#31-extract-all-items-reported-in-8-k-filings-since-2004","title":"3.1 Extract all items reported in 8-K filings since 2004","text":"Since 2004, SEC requires companies to file 8-K within 4 business days of many types of events. For a short description, see SEC's fast answer to Form 8-K. The detailed instruction (PDF) is available at here.
To extract all items reported in each filing since 2004, there are several ways. First, I can use a regular expression to extract all \"Item X.XX\"
from the 8-K <DOCUMENT>
. Or, I can take advantage of the information in <SEC-HEADER>
. Below is an example <SEC-HEADER>
1, of which the lines of ITEM INFORMATION
actually describe the items reported in the filing.
<SEC-HEADER>0000079732-02-000036.hdr.sgml : 20020802\n<ACCEPTANCE-DATETIME>20020802082752\nACCESSION NUMBER: 0000079732-02-000036\nCONFORMED SUBMISSION TYPE: 8-K\nPUBLIC DOCUMENT COUNT: 4\nCONFORMED PERIOD OF REPORT: 20020801\nITEM INFORMATION: Changes in control of registrant\nITEM INFORMATION: Financial statements and exhibits\nFILED AS OF DATE: 20020802\n\nFILER:\n\n COMPANY DATA: \n COMPANY CONFORMED NAME: ATLANTIC CITY ELECTRIC CO\n CENTRAL INDEX KEY: 0000008192\n STANDARD INDUSTRIAL CLASSIFICATION: ELECTRIC SERVICES [4911]\n IRS NUMBER: 210398280\n STATE OF INCORPORATION: NJ\n FISCAL YEAR END: 1231\n\n FILING VALUES:\n FORM TYPE: 8-K\n SEC ACT: 1934 Act\n SEC FILE NUMBER: 001-03559\n FILM NUMBER: 02717802\n\n BUSINESS ADDRESS: \n STREET 1: 800 KING STREET\n STREET 2: PO BOX 231\n CITY: WILMINGTON\n STATE: DE\n ZIP: 19899\n BUSINESS PHONE: 6096454100\n\n MAIL ADDRESS: \n STREET 1: 800 KING STREET\n STREET 2: PO BOX 231\n CITY: WILMINGTON\n STATE: DE\n ZIP: 19899\n</SEC-HEADER>\n
Following this strategy, I write the code below to extract all items reported in 8-K filings since 2004. I didn't use regex for this task because the text portion of the filing is actually dirty. For instance, you'll need to remove all html tags, and be careful about the \"non-breaking space\",
, etc. My experience is that using <SEC-HEADER>
for this task is the best.
# Extract all items reported in 8-K filings since 2004.\nimport os\nimport gzip\nimport tqdm\nimport sqlite3\nimport concurrent.futures\nBASE_DIR = './data'\nFILE_TYPE = '8-K'\nDB = \"result.sqlite3\"\ndef walk_dirpath(cik, file_type):\n\"\"\" Yield paths of all files for a given cik and file type \"\"\"\nfor root, _, files in os.walk(os.path.join(BASE_DIR, cik, file_type)):\nfor filename in files:\nyield os.path.join(root, filename)\ndef regsearch(cik):\nmatches = []\nfor filepath in walk_dirpath(cik, FILE_TYPE):\ndate = os.path.split(filepath)[1].strip('.txt.gz')\nif int(date.split('-')[0]) < 2004:\ncontinue\nwith gzip.open(filepath, 'rb') as f:\ndata = f.readlines()\nls = [l for l in data if l.startswith(b'ITEM INFORMATION')]\nfor l in ls:\nitem = l.decode().replace('\\t','').replace('ITEM INFORMATION:', '')\nif len(item.strip()):\nmatches.append((cik, FILE_TYPE, date, item.strip()))\nreturn matches\nif __name__ == \"__main__\":\nconn = sqlite3.connect(DB)\nc = conn.cursor()\nc.execute('''CREATE TABLE IF NOT EXISTS files_all_items\n (cik TEXT, file_type TEXT, date DATE, item TEXT,\n PRIMARY KEY(cik, file_type, date, item));''')\nconn.commit()\n_, ciks, _ = next(os.walk(BASE_DIR))\nprogress = tqdm.tqdm(total=len(ciks))\nwith concurrent.futures.ProcessPoolExecutor(max_workers=16) as exe:\nfutures = [exe.submit(regsearch, cik) for cik in ciks]\nfor f in concurrent.futures.as_completed(futures):\nres = f.result()\nc.executemany(\n\"INSERT OR IGNORE INTO files_all_items \\\n (cik, file_type, date, item) VALUES (?,?,?,?)\", res)\nconn.commit()\nprogress.update()\nconn.close()\n
","tags":["Textual Analysis","SEC","Python","Code"]},{"location":"posts/textual-analysis-on-sec-filings/#32-find-all-8-k-filings-with-item-101-andor-item-203","title":"3.2 Find all 8-K filings with Item 1.01 and/or Item 2.03","text":"To get those filings that have either:
I run the following SQL query:
-- SQLite\nCREATE TABLE `files_with_items_101_or_203` AS\nSELECT DISTINCT cik, file_type, date\nFROM `files_all_items`\nWHERE\ninstr(lower(item), \"creation of a direct financial obligation\") > 0 OR\ninstr(lower(item), \"entry into a material definitive agreement\") > 0\nORDER BY cik, file_type, date;\n
To get those with both items, use the following query:
-- SQLite\nCREATE TABLE `files_with_items_101_and_203` AS\nSELECT cik, file_type, date\nFROM `files_all_items`\nWHERE\ninstr(lower(item), \"creation of a direct financial obligation\") > 0 OR\ninstr(lower(item), \"entry into a material definitive agreement\") > 0\nGROUP BY cik, file_type, date\nHAVING count(*) > 1\nORDER BY cik, file_type, date;\n
","tags":["Textual Analysis","SEC","Python","Code"]},{"location":"posts/textual-analysis-on-sec-filings/#33-nini-smith-and-sufi-2009","title":"3.3 Nini, Smith and Sufi (2009)","text":"This example code finds the appearance of any of the 10 search words used in \"Creditor control rights and firm investment policy\" by Nini, Smith and Sufi (JFE 2009), which is used to identify the loan contracts as attached in the SEC filing.
import re\nimport os\nimport sys\nimport gzip\nimport tqdm\nimport sqlite3\nimport logging\nimport concurrent.futures\nlogging.basicConfig(stream=sys.stdout, level=logging.WARN)\nBASE_DIR = './data'\nFILE_TYPE = '10-Q'\nDB = \"result.sqlite3\"\n# Regex pattern used to remove html tags\ncleanr = re.compile(b'<.*?>|&([a-z0-9]+|#[0-9]{1,6}|#x[0-9a-f]{1,6});')\n# Regex pattern used to find the appearance of any of the 10 search words used\n# in \"Creditor control rights and firm investment policy\"\n# by Nini, Smith and Sufi (JFE 2009)\n# pat_10_words = r\"CREDIT FACILITY|REVOLVING CREDIT|(CREDIT|LOAN|(LOAN (AND|&) \\\n# SECURITY)|(FINANCING (AND|&) SECURITY)|CREDIT (AND|&) GUARANTEE) AGREEMENT\"\nNSS_10_words = ['credit facility',\n'revolving credit',\n'credit agreement',\n'loan agreement',\n'loan and security agreement',\n'loan & security agreement',\n'credit and guarantee agreement',\n'credit & guarantee agreement',\n'financing and security agreement',\n'financing & security agreement']\nNSS_10_words_str = '|'.join([word.upper() for word in NSS_10_words])\npat_10_words = re.compile(NSS_10_words_str.encode())\n# Regex pattern used in this search\npattern = pat_10_words\ndef walk_dirpath(cik, file_type):\n\"\"\" Yield paths of all files for a given cik and file type \"\"\"\nfor root, _, files in os.walk(os.path.join(BASE_DIR, cik, file_type)):\nfor filename in files:\nyield os.path.join(root, filename)\ndef regsearch(cik):\nmatches = []\nfor filepath in walk_dirpath(cik, FILE_TYPE):\ndate = os.path.split(filepath)[1].strip('.txt.gz')\ntry:\nwith gzip.open(filepath, 'rb') as f:\ndata = b' '.join(f.read().splitlines())\ndata = re.sub(cleanr, b'', data)\nmatch = pattern.search(data)\nif match:\nmatches.append((cik, FILE_TYPE, date))\nlogging.info(f'{filepath}, {match.group()}')\nexcept Exception as e:\nlogging.error(f'failed at {filepath}, {e}')\nreturn matches\nif __name__ == \"__main__\":\nconn = sqlite3.connect(DB)\nc = conn.cursor()\n# create a table to store the indices\nc.execute('''CREATE TABLE IF NOT EXISTS files_with_10_words\n (cik TEXT, file_type TEXT, date DATE,\n PRIMARY KEY(cik, file_type, date));''')\nconn.commit()\n_, ciks, _ = next(os.walk(BASE_DIR))\nprogress = tqdm.tqdm(total=len(ciks))\nwith concurrent.futures.ProcessPoolExecutor(max_workers=16) as exe:\nfutures = [exe.submit(regsearch, cik) for cik in ciks]\nfor f in concurrent.futures.as_completed(futures):\nmatches = f.result()\nc.executemany(\n\"INSERT OR IGNORE INTO files_with_10_words \\\n (cik, file_type, date) VALUES (?,?,?)\", matches)\nconn.commit()\nprogress.update()\nconn.close()\nlogging.info('complete')\n
The original file is at https://www.sec.gov/Archives/edgar/data/0000008192/0000079732-02-000036.txt\u00a0\u21a9
Uninitialized variable in C can be anything (most of the time). I find, in some cases, we can know the value of an uninitialized variable and thus maybe exploit it.
The example code below compiled with gcc
, without optimization, exits successfully. Very interesting!
#include <assert.h>\n#include <limits.h>\n#include <stdlib.h>\nvoid f(int n) {\n// Declare and init `a` with the value of n\n// This would push n on the stack memory\nint a = n;\nreturn;\n// f() returns,\n// but `a` leaves on the stack a garbage value n\n}\nvoid g(int n) {\n// Declare but do no initialize `x` so\n// `x` may be of anything...?\nint x;\nassert(x == n); // Should fail here if `x` is not n\n}\nint main() {\nfor (int i = INT_MIN; i < INT_MAX; i++) {\nf(i); // However, if we call f() and g() sequentially...\ng(i); // The local variable `x` in g() will always be i,\n// which is the garbage value left by f() on the stack.\n}\n// This program will end peacefully\nreturn 0;\n}\n
We can also try to \"contaminate\" the stack by filling it with a value, e.g.,
#include <assert.h>\n#include <memory.h>\n#include <stdint.h>\n#include <stdio.h>\nvoid f(uint8_t n) {\n// Try to \"contaminate\" the stack with value n\nuint8_t arr[BUFSIZ];\nmemset(arr, n, BUFSIZ * sizeof(uint8_t));\n}\nvoid g(uint8_t n) {\nuint8_t x;\nassert(x == n);\nprintf(\"uninitialized x is %d\\n\", x);\nuint8_t y;\nassert(y == n); // uninitialized y is also n\n}\nint main() {\nfor (uint8_t i = 0; i < UINT8_MAX; i++) {\nf(i);\ng(i);\n}\n// This program will end peacefully!\nreturn 0;\n}\n
As a result, the uninitialized local variables x
and y
both have the same value of n
because f(n)
writes many n
on the stack.
Studying C is real fun!
","tags":["C","Code"]},{"location":"posts/use-sas-macros-on-wrds/","title":"Use SAS Macros on WRDS","text":"The Wharton Research Data Services (WRDS) provides quite a handful of SAS macros that can be used directly. This article explains how to use those handy macros on WRDS when you use remote submission to run your code on the WRDS cloud. Lastly, it explains how to load and use third-party SAS macros from a URL.
","tags":["SAS","Code","WRDS"]},{"location":"posts/use-sas-macros-on-wrds/#prerequisite","title":"Prerequisite","text":"Before everything, just make sure that this autoexec.sas
is located in the home folder on your WRDS cloud.
* The library name definitions below are used by SAS;\n* Assign default libref for WRDS (Wharton Research Data Services);\n%include '/wrds/lib/utility/wrdslib.sas';\noptions sasautos=('/wrds/wrdsmacros/', SASAUTOS) MAUTOSOURCE;\n
This code runs automatically when you've connected to the WRDS cloud. The first line assigns the default library references for you to use, e.g. comp
for Compustat. The second line makes available the macros. A list of these handy macros is available at the WRDS documentation.
If you don't have this SAS code in the home folder, simply create one there or you can choose to include these two lines of code in your remotely submitted code.
","tags":["SAS","Code","WRDS"]},{"location":"posts/use-sas-macros-on-wrds/#simple-usage","title":"Simple usage","text":"Let's say we want to winsorize a dataset by using the macro provided by WRDS (full code). Below is an example of winsorizing Total Assets AT
of Compustat sample by fiscal year from 1980 to 2018.
%let wrds=wrds-cloud.wharton.upenn.edu 4016;\noptions comamid=TCP remote=WRDS;\nsignon username=_prompt_;\n\nrsubmit;\n\n/* Create a dataset in the work directory */\ndata work.funda(keep=gvkey fyear at);\n set comp.funda;\n if 1980 <= fyear <= 2018;\n /* Generic filter */\nif indfmt='INDL' and datafmt='STD' and popsrc='D' and consol='C';\nrun;\n/* Invoke the macro */\n/* The documentation is available at:\n https://wrds-www.wharton.upenn.edu/pages/support/research-wrds/macros/wrds-macros-winsorize/ */\n%WINSORIZE(INSET=funda,OUTSET=funda_w,SORTVAR=fyear,VARS=at,PERC1=1,TRIM=0);\n\n/* Before the winsorization */\nproc means data=work.funda; by fyear; var at; \noutput out=funda_before_win min= mean= max= / autoname; run;\n/* After the winsorization */\nproc means data=work.funda_w; by fyear; var at;\noutput out=funda_after_win min= mean= max= / autoname; run;\nproc print data=funda_before_win;\nproc print data=funda_after_win; run;\nendrsubmit;\nsignoff;\n
Invoking the macro is as simple as a single line (line 18 above):
%WINSORIZE(INSET=funda,OUTSET=funda_w,SORTVAR=fyear,VARS=at,PERC1=1,TRIM=0);\n
However, one thing to note about this particular winsorization macro by WRDS is that a variable named a
is used in line 57 and 59. So if the INSET
has a variable named a
as well, there\u2019ll be possible data integrity issue. Hence, I prefer to use another version described in my other post Winsorization in SAS.
I tend to collect and store all useful macros on my personal server, hence I don't need to worry about a loss of or changes to the macros. To use these macros, simply include them before invoking.
filename winsor url \"https://mingze-gao.com/utils/winsor.sas\";\n%include winsor;\n
Then, I can simply call winsor
as below.
%let winsVars = tac inv_at_l drev drevadj ppe roa;\n%winsor(dsetin=work.funda, dsetout=work.funda_wins, byvar=fyear, vars=&winsVars, type=winsor, pctl=1 99);\n
","tags":["SAS","Code","WRDS"]},{"location":"posts/what-it-takes-to-be-a-ceo-a-fun-survey-of-literature/","title":"What it takes to be a CEO? A fun survey of literature","text":"Taking up the position of CEO means more than pressure from the board and investors. You\u2019ll also face heavy scrutiny from academia. Whether or not a firm\u2019s hiring and compensation committees use them as a reference, here are some of the findings that you may want to be aware of.
","tags":["Literature","CEO"]},{"location":"posts/what-it-takes-to-be-a-ceo-a-fun-survey-of-literature/#upon-birth","title":"Upon birth","text":"There are many things determined when you\u2019re born. It\u2019ll be naive to think that they matter less than anything else. A starter example is the Journal of Financial Economics paper \"Are CEOs born leaders? Lessons from traits of a million individuals\" by Adams, Keloharju and Knupfer (2018).
","tags":["Literature","CEO"]},{"location":"posts/what-it-takes-to-be-a-ceo-a-fun-survey-of-literature/#1-birthday-month","title":"1. Birthday (month)","text":"Birth month affects school entry, which affects whether you are relatively older in the class. If you are born after the cutoff month, you'll have to wait for another year for entry. But this extra year buys you some more time to develop, which makes you more confident than the younger peers. This increased confidence is linked to adult labor market outcomes. Bai, Ma, Mullally and Solomon (2019 JFE) find that in mutual fund industry, it's associated with better stock selection and fund performance. These relatively older fund managers also appear more confident in photographs and display more confident behaviours: making larger bets, window dressing their holdings less, and so on.
","tags":["Literature","CEO"]},{"location":"posts/what-it-takes-to-be-a-ceo-a-fun-survey-of-literature/#2-birth-order","title":"2. Birth order","text":"The birth order also matters: negative associations between birth order and intelligence level have been found in numerous studies. More frankly, first born kids tend to have higher IQ scores. Kristensen and Bjerkedal (2007 Science) show that this is more dependent on social rank in the family where they receive more-favorable family interaction and stimulation. However, birth order is still the most prominent observable factor. Custodio and Siegel (2018) published a working paper where they find CEOs are more likely to be the first-born child of their family, and the results hold for both family and non-family firms, though thankfully the advantage of being first-born seems to decay over time.
","tags":["Literature","CEO"]},{"location":"posts/what-it-takes-to-be-a-ceo-a-fun-survey-of-literature/#3-gender","title":"3. Gender","text":"Studies on CEO gender difference and its relation with firm risk-taking, capital allocation, accounting conservatism, corporate social responsibility, and so on are plenty. Genearlly it is shown that male executives are overconfident relative to female executives (Huang and Kisgen, 2013 JFE), and we know that overconfidence is not necessarily a good thing. Firms run by female CEOs use lower leverage and have less volative earnings, (Faccio, Marchica and Mura 2016 JCF), and there are a lot more differences in terms of firm operational, financial, and M&A performances. Tate and Yang (2015 JFE) show that female leaders cultivate more female-friendly cultures inside their firms. Moreover, (having) a female director may bring a firm more access to external finance. Goldman Sachs announced on 23 January 2020 that they won't take companies public anymore unless they have at least one \"diverse\" board member.
","tags":["Literature","CEO"]},{"location":"posts/what-it-takes-to-be-a-ceo-a-fun-survey-of-literature/#4-hometown","title":"4. Hometown","text":"Everyone has some sort of hometown biases as well as hometown advantages. For example, Jiang, Qian and Yonker (2019 JFQA) find that CEOs are over twice as likely to acquire targets located in the states of their childhood homes than similar targets elsewhere. Smaller such deals are on average destorying shareholder value but bigger ones tend to be value enhancing. They conclude that CEOs may seek private benefits when acquiring small targets in their hometown but can also avoid poor deals due to hometown advantages. In a Chinese study, Kong, Pan, Tian and Zhang (2020 JCF) show that CEO's hometown connections increase access to trade credit and such effect is more pronounced for non-SOEs and firms in poor regions. In another Chinese study on commercial banks, Bian, Ji and Zhang (2019 JBF) find that a higher degree of dialect similarity between chairman and the CEO is associated with a higher ROA, ROE and a lower cost-to-income ratio, but is not with bank risks, CEO pay or lower pay-performance sensitivity. They conclude that speaking a similar dialect with the chairman doesn't undermine monitoring and reduces agency costs.
","tags":["Literature","CEO"]},{"location":"posts/what-it-takes-to-be-a-ceo-a-fun-survey-of-literature/#5-cultural-heritage","title":"5. Cultural heritage","text":"The place where you're born has even more profound implications through cultural heritage. Nguyen, Hagendorff and Eshraghi (2018 RFS) show that following shocks to industry competition, firms led by CEOs who are second- or third-generation immigrants are associated with a 6.2% higher profitability compared with the average firm. Their analysis attributes this effect to various cultural values that prevail in a CEO\u2019s ancestral country of origin. Through an epidemiological approach, Liu (2016 JFE) show that a corruption culture of corporate insiders' country of ancestry is associated with higher likelihood of earnings management, accounting fraud, option backdating and opportunistic insider trading.
","tags":["Literature","CEO"]},{"location":"posts/what-it-takes-to-be-a-ceo-a-fun-survey-of-literature/#early-in-life","title":"Early in life","text":"Many early life experiences are closely linked to natural and family endowment, yet others may be random and exogenous. Either way, early life experience is something that will have an impact on CEO behaviours later on.
","tags":["Literature","CEO"]},{"location":"posts/what-it-takes-to-be-a-ceo-a-fun-survey-of-literature/#1-education","title":"1. Education","text":"No doubt education matters for everyone including CEO. Custodio and Metzger (2014 JFE) find that financial expert CEOs tend to be hired by more mature firms. Firms with financial expert CEOs hold less cash, more debt and engage in more share repurchases. They are able to raise external funds even when credit conditions are tight and their investments are less sensitive to cash flows. On the other hand, CEOs with an engineering (or scientific) education display higher investment-cash flow sensitivity (Malmendier and Tate (2005 JFE)). Similar findings appear in banking sector as shown by King, Srivastav and Williams (2016 JCF) focusing on CEO's MBA quality. Moreover, education offers more than just knowledge and skills. Although Khanna, Kim and Lu (2015 JF) do not find evidence that connections and network ties developed during education are associated with corporate fraud, such CEO connectedness certainly affect information sharing, investments and so on. Wang and Yin (2018 JCF) find that CEOs tend to initiate more, larger and better M&A deals where target firms are headquarted in those states where they received their undergraduate and graduate degrees.
","tags":["Literature","CEO"]},{"location":"posts/what-it-takes-to-be-a-ceo-a-fun-survey-of-literature/#2-disaster-experience","title":"2. Disaster experience","text":"People are shaped by their experiences and disasters are a major one. Several Chinese studies have shown that CEOs who have experienced famine are more risk-averse and hold more cash. They conduct less takeovers, but the M&A deals tend to perform better when they do according to Zhang (2017 PBFJ). Such risk aversion can sometimes be good as Hu, Li and Luo (2019 PBFJ) find that firms governed by CEOs experienced great famine have higher market value during crisis. But generally speaking this effect is mitigated by higher education background and is weaker in SOEs, as well as for CEOs who also experienced economic reform, which is shown to increase CEO's risk tolerance by Hao, Wang, Chou and Ko (2018 IRF). American CEOs, for sure, are no exception. A famous Journal of Finance paper \"what doesn't kill you will only make you more risk-loving\" by Bernile, Bhagwat and Rau (2016) concludes like its title. But more importantly, CEOs who experienced fatal disasters without extremely negative consequences lead firms more aggressively, whereas CEOs who witness the extreme downside of disasters behave more conservatively.
","tags":["Literature","CEO"]},{"location":"posts/what-it-takes-to-be-a-ceo-a-fun-survey-of-literature/#3-academic-military-and-other-experiences","title":"3. Academic, military and other experiences","text":"Apart from previous industry experience, researchers also examined the role of many other executive experiences. Shen, Lan, Xiong, Lv and Jian (2019 Economic Modelling) find that top management team's academic experience promotes corporate innovations and attribute the effect to improved internal control level and reduced information asymmetry. Benmelech and Frydman (2015 JFE) find that military service could make CEOs pursue lower coporate investment, and military CEOs are less likely to be involved in corporate fraudulent activity, performing better during industry downturns.
","tags":["Literature","CEO"]},{"location":"posts/what-it-takes-to-be-a-ceo-a-fun-survey-of-literature/#personality-traits","title":"Personality traits","text":"","tags":["Literature","CEO"]},{"location":"posts/what-it-takes-to-be-a-ceo-a-fun-survey-of-literature/#1-masculinity","title":"1. Masculinity","text":"Masculinity is a long-studied factor in many fields of research and there're also many interesting papers specifically on male CEOs. Since CEO testosterone levels cannot be tested directly, a common proxy in the literature is the facial width-to-height ratio (fWHR). Jia, Van Lent and Zeng (2014 JAR) find that a higher fWHR of a male CEO, representing more masculine faces, is associated with more misreporting, predicts his firm's likelihood of being subject to SEC enforcement action and incidence of insider trading and option backdating. They also find that executive's facial masculinity is associated the likelihood of being named as a perpetrator by SEC. In a forthcoming European Financial Management paper by Kamiya, Kim and Park (2018), male CEOs' facial masculinity is found to be related to higher stock return volatility, higher financial leverage and more M&A activities. A paper at the 2018 Academy of Managment Annual Meeting by Joshi, Misangyi, Rizzi and Neely (2018), however, find that masculinity does not have a direct effect on the firm's operational performance. The researchers also find that masculinity worked to the detriment of CEOs in female-dominated industries; less masculine CEOs also performed poorly in highly male-dominated environments.
","tags":["Literature","CEO"]},{"location":"posts/what-it-takes-to-be-a-ceo-a-fun-survey-of-literature/#2-sensation-seeking-corruption-and-frugality","title":"2. Sensation-seeking, corruption and frugality","text":"In \"desperate\" search for proxies and signals of CEO/manager quality and traits, studies have turned to some really interesting areas such as the cars they drive and whether they can fly airplanes. Brown, Lu, Ray and Teo (2018 JF) show that sensation-seeking hedge fund managers who own powerful sports cars take on more investment risks but do not deliver higher returns. \"Red Ferrari syndrome\", as described by Business Insider, February 2016. Unfortunately, some investors themselves are susceptible to sensation seeking and hence fuel the demand for such managers. Mironov (2015 JFE) has an interesting study and finds that if you can get away from a traffic violation through bribe, as a manager, you may deliver some outperformance through, for instance, tax evasion, because corruption sometimes promotes efficiency.
Sunder, Sunder and Zhang (2017 JFE) look at pilot CEOs who fly airplanes as a hobby and find that they are significantly associated with better corporate innovation outcomes. They conclude that sensation seeking combines risk taking with a desire to pursue novel experiencecs and has been associated with creativity. Davidson, Dey and Smith (2015 JFE) even hired private investigators to collect data on executives' legal infractions and ownership of real estate, boats, luxury vehicles and motocycles. They find no direct evidence of a relation between executives' frugality and the propensity to perpetrate fraud. But there will be a relatively loose control environment characterized by relatively high and increasing probabilities of other insiders perpetrating fraud and unintentional material reporting errors during unfrugal CEOs' reigns.
","tags":["Literature","CEO"]},{"location":"posts/what-it-takes-to-be-a-ceo-a-fun-survey-of-literature/#3-creativity-and-innovation","title":"3. Creativity and innovation","text":"One in five U.S. high-technology firms are led by CEOs with hands-on innovation experience as inventors. Islam and Zein (2020 JFE) show that firms led by \u201cInventor CEOs\u201d are associated with higher quality innovation, especially when the CEO is a high-impact inventor. During an inventor CEO's tenure, firms file a greater number of patents and more valuable patents in technology classes where the CEO's hands-on experience lies. It is possible that such inventor CEOs are more capable of evaluating, selecting and executing innovative investment projects.
","tags":["Literature","CEO"]},{"location":"posts/what-it-takes-to-be-a-ceo-a-fun-survey-of-literature/#family-marriage-and-fidelity","title":"Family, marriage and fidelity","text":"","tags":["Literature","CEO"]},{"location":"posts/what-it-takes-to-be-a-ceo-a-fun-survey-of-literature/#1-newborns-and-loss-of-family-members","title":"1. Newborns and loss of family members","text":"\"Corporate executives managing some of the largest public companies in the U.S. are shaped by their daughters\". Cronqvist and Yu (2017 JFE) find that when a firm\u2019s CEO has a daughter, the corporate social responsibility rating (CSR) is about 9.1% higher, compared to a median firm. This finding perhaps reveals another plausibly exogenous determinant of CEO's styles. On the other hand, a loss of important family member poses a significant negative shock. In the 2020 AFA Annual Meeting, I encountered a paper by Liu, Shu, Sulaeman and Yeung (2019) who find that after deaths in the family, bereaved managers take significantly less risk. Firms managed by bereaved CEOs exhibit lower capital expenditures, fewer acquisitions, lower debt issuance and lower CEO ownership after the bereavement events.
","tags":["Literature","CEO"]},{"location":"posts/what-it-takes-to-be-a-ceo-a-fun-survey-of-literature/#2-marriage-divorce-and-infidelity","title":"2. Marriage, divorce and (in)fidelity","text":"While previous studies focused on the cultural background of the CEOs themselves, another paper I encountered in AFA Annual Meeting by Antoniou, Cuculiza, Kumar and Yang (2019) incorporates CEO spouses into the research. They show that the high uncertainty avoidance of CEO spouses will influence CEOs\u2019 personal uncertainty avoidance, and then lead to less corporate risk-taking. Larcker, McCall and Tayan (2013) show that CEO's divorce is impactful because it causes loss of control due to sale of stocks for divorce settlement, affects productivity, and attitude towards defaults.
One final interesting paper I want to mention to conclude this post is \"the geography of financial misconduct\" by Parsons, Sulaeman and Titman (2018 JF). In 2015, the website Ashley Madison, whose target clients are married people seeking an extramarital affair, was hacked and there was a leak of 40 million user account data of name, address and billing information. The researchers use the data to measure the intensity of spousal infidelity of a local area and find that financial misconducts are strongly related to unfaithfulness in the city.
","tags":["Literature","CEO"]},{"location":"posts/what-it-takes-to-be-a-ceo-a-fun-survey-of-literature/#final-note","title":"Final note","text":"This short survey of CEO literature is not meant to be comprehensive, but to list a few very interesting papers that I find fun to read. I guess the message is that being a CEO means a lot more than managing the firm and stakeholders, and shareholders also need to open their minds and eyes.
A funny example. Next time hiring a CEO, other things equal, maybe you'll want a female immigrant who is the first-born kid and born in August, attended certain schools in certain areas, experienced natural disasters, served in military before, has a daughter and no sports cars, knows how to fly airplanes, loyal to her spouse from certain countries, and whose all family members are live and well...
","tags":["Literature","CEO"]},{"location":"posts/winsorization-in-sas/","title":"Winsorization in SAS","text":"These are two versions of winsorization in SAS, of which I recommend the first one.
","tags":["SAS","Code","WRDS"]},{"location":"posts/winsorization-in-sas/#version-1-unknown-author","title":"Version 1 (Unknown Author)","text":"/*****************************************\nAuthor unknown - that is a pity because this macro is the best since sliced bread! \nTrim or winsorize macro\n* byvar = none for no byvar;\n* type = delete/winsor (delete will trim, winsor will winsorize;\n*dsetin = dataset to winsorize/trim;\n*dsetout = dataset to output with winsorized/trimmed values;\n*byvar = subsetting variables to winsorize/trim on;\nSample usage:\n%winsor(dsetin=work.myDsetIn, byvar=fyear, \n dsetout=work.myDsOut, vars=btm roa roe, type=winsor, pctl=1 99);\n****************************************/\n%macro winsor(dsetin=, dsetout=, byvar=none, vars=, type=winsor, pctl=1 99);\n\n%if &dsetout = %then %let dsetout = &dsetin;\n\n%let varL=;\n%let varH=;\n%let xn=1;\n\n%do %until ( %scan(&vars,&xn)= );\n %let token = %scan(&vars,&xn);\n %let varL = &varL &token.L;\n %let varH = &varH &token.H;\n %let xn=%EVAL(&xn + 1);\n%end;\n\n%let xn=%eval(&xn-1);\ndata xtemp;\n set &dsetin;\n run;\n%if &byvar = none %then %do;\n data xtemp;\n set xtemp;\n xbyvar = 1;\n run;\n%let byvar = xbyvar;\n\n%end;\nproc sort data = xtemp;\n by &byvar;\n run;\nproc univariate data = xtemp noprint;\n by &byvar;\n var &vars;\n output out = xtemp_pctl PCTLPTS = &pctl PCTLPRE = &vars PCTLNAME = L H;\n run;\ndata &dsetout;\n merge xtemp xtemp_pctl;\n by &byvar;\n array trimvars{&xn} &vars;\n array trimvarl{&xn} &varL;\n array trimvarh{&xn} &varH;\n\ndo xi = 1 to dim(trimvars);\n\n%if &type = winsor %then %do;\n if not missing(trimvars{xi}) then do;\n if (trimvars{xi} < trimvarl{xi}) then trimvars{xi} = trimvarl{xi};\n if (trimvars{xi} > trimvarh{xi}) then trimvars{xi} = trimvarh{xi};\n end;\n %end;\n\n%else %do;\n if not missing(trimvars{xi}) then do;\n if (trimvars{xi} < trimvarl{xi}) then delete;\n if (trimvars{xi} > trimvarh{xi}) then delete;\n end;\n %end;\n\nend;\n drop &varL &varH xbyvar xi;\n run;\n%mend winsor;\n
","tags":["SAS","Code","WRDS"]},{"location":"posts/winsorization-in-sas/#version-2-wrds","title":"Version 2 (WRDS)","text":"A potential problem with this WRDS macro is that a variable named a
is used in line 57 and 59 (highlighted below). So if the INSET
has a variable named a
as well, there\u2019ll be possible data integrity issue.
WINSORIZE
macro /* ********************************************************************************* */\n/* ******************** W R D S R E S E A R C H M A C R O S ******************** */\n/* ********************************************************************************* */\n/* WRDS Macro: WINSORIZE */\n/* Summary : Winsorizes or Trims Outliers */\n/* Date : April 14, 2009 */\n/* Author : Rabih Moussawi, WRDS */\n/* Variables : - INSET and OUTSET are input and output datasets */\n/* - SORTVAR: sort variable used in ranking */\n/* - VARS: variables to trim and winsorize */\n/* - PERC1: trimming and winsorization percent, each tail (default=1%) */\n/* - TRIM: trimming=1/winsorization=0, default=0 */\n/* ********************************************************************************* */\n%MACRO WINSORIZE (INSET=,OUTSET=,SORTVAR=,VARS=,PERC1=1,TRIM=0);\n\n/* List of all variables */\n%let vars = %sysfunc(compbl(&vars));\n%let nvars = %nwords(&vars);\n\n/* Display Output */\n%put ### START.;\n/* Trimming / Winsorization Options */\n%if &trim=0 %then %put ### Winsorization; %else %put ### Trimming;\n%put ### Number of Variables: &nvars;\n%put ### List of Variables: &vars;\noptions nonotes;\n\n/* Ranking within &sortvar levels */\n%put ### Sorting... ;\nproc sort data=&inset; by &sortvar; run;\n/* 2-tail winsorization/trimming */\n%let perc2 = %eval(100-&perc1);\n\n%let var2 = %sysfunc(tranwrd(&vars,%str( ),%str(__ )))__;\n%let var_p1 = %sysfunc(tranwrd(&vars,%str( ),%str(__&perc1 )))__&perc1 ;\n%let var_p2 = %sysfunc(tranwrd(&vars,%str( ),%str(__&perc2 )))__&perc2 ;\n\n/* Calculate upper and lower percentiles */\nproc univariate data=&inset noprint;\nby &sortvar;\nvar &vars;\noutput out=_perc pctlpts=&perc1 &perc2 pctlpre=&var2;\nrun;\n%if &trim=1 %then\n%let condition = %str(if myvars(i)>=perct2(i) or myvars(i)<=perct1(i) then myvars(i)=. );\n%else %let condition = %str(myvars(i)=min(perct2(i),max(perct1(i),myvars(i))) );\n\n%if &trim=0 %then %put ### Winsorizing at &perc1.%... ;\n%else %put ### Trimming at &perc1.%... ;\n\n/* Save output with trimmed/winsorized variables */\ndata &outset;\nmerge &inset (in=a) _perc;\nby &sortvar;\nif a;\narray myvars {&nvars} &vars;\narray perct1 {&nvars} &var_p1;\narray perct2 {&nvars} &var_p2;\ndo i = 1 to &nvars;\n if not missing(myvars(i)) then\ndo;\n &condition;\n end;\nend;\ndrop i &var_p1 &var_p2;\nrun;\n/* House Cleaning */\nproc sql; drop table _perc; quit;\noptions notes;\n\n%put ### DONE . ; %put ;\n%MEND WINSORIZE;\n\n/* ********************************************************************************* */\n/* ************* Material Copyright Wharton Research Data Services *************** */\n/* ****************************** All Rights Reserved ****************************** */\n/* ********************************************************************************* */\n
","tags":["SAS","Code","WRDS"]},{"location":"posts/working-remotely-on-a-windows-machine-wsl-from-vscode-on-a-mac/","title":"Working Remotely on a Windows Machine from VSCode on a Mac","text":"Now I only need a MacBook (1.3 GHz dual-core i5) to do all my work anywhere, thanks to a powerful workstation provided by the university. Yet the workstation is based on Windows 10 and sitting behind the university VPN. I don't want to use Remote Desktop every time I need to do some coding, so I decided to make it so I can code remotely on the workstation but from the lovely VSCode on my little MacBook.
"},{"location":"posts/working-remotely-on-a-windows-machine-wsl-from-vscode-on-a-mac/#1-set-up-the-windows-10-host-machine","title":"1. Set up the Windows 10 host machine","text":"The first step is to enable remote SSH login on the Windows machine. It is now super easy to do with the Windows Subsystem for Linux (WSL). I use the Ubuntu 18.04 LTS distro but other Linux distros should work just fine. This will be the remote environment that I work in. Then I follow the instruction in SSH on Windows Subsystem for Linux (WSL). The post is in great detail with step-by-step guidance. So I won't repeat it again.
"},{"location":"posts/working-remotely-on-a-windows-machine-wsl-from-vscode-on-a-mac/#2-set-up-the-vscode-on-mac","title":"2. Set up the VSCode on Mac","text":"The second step is to install the Remote-SSH extension on VSCode. Then simply ssh into the Ubuntu environment on Windows 10 host machine using the username and password created for the Ubuntu distro. In my case is ssh
myusername
@asgard.econ.usyd.edu.au
. A password prompt will of course kindly show up.
The annoying thing is that each time the window reloads and when I start VSCode, I need to manually type in my lengthy password. The better way must be using a SSH key instead.
To do so, open up the Terminal on the Mac and run:
ssh-keygen\n
A public-private key pair will be generated as ~/.ssh/id_rsa.pub
and ~/.ssh/id_rsa
. Then we need to tell the host machine that this key can be used to identify myself so i can skip entering password next time:
ssh-copy-id myusername@asgard.econ.usyd.edu.au\n
It will ask for the password on the host machine to confirm I am who I am. But after this, starting VSCode will never ask my password again. What a relief!
"},{"location":"posts/working-remotely-on-a-windows-machine-wsl-from-vscode-on-a-mac/#lastly","title":"Lastly...","text":"Because the host machine is inside the university network, I need to first connect to the university VPN, otherwise the host address asgard.econ.usyd.edu.au
will not resolve. Still, it's really great that I can code and run my programs remotely on the powerful 8-core 16-thread machine without feeling the hotness and noise, which turns out to be really important in the summer of Australia......
Question
Given a centrifuge with \\(n\\) holes, can we balance it with \\(k\\) (\\(1\\le k \\le n\\)) identical test tubes?
This is a simple yet interesting problem, very well illustrated by Numberphile and discussed by Matt Baker's blog.
The now proved solution is that:
Note
You can balance \\(k\\) identical test tubes, \\(1\\le k\\le n\\), in an \\(n\\)-hole centrifuge if and only if both \\(k\\) and \\(n-k\\) can be expressed as a sum of prime divisors of \\(n\\).
Example: 18-hole centrifugeExample: 20-hole centrifugeBelow is my attempt to programmatically answer the centrifuge problem.
"},{"location":"posts/centrifuge-problem/#method-1-naive-dfs","title":"Method 1: Na\u00efve DFS","text":"The very first method literally follows the solution. For a given \\((n,k)\\) pair, check if \\(k\\) and \\(n-k\\) can be written as a linear combination of the prime divisors of \\(n\\) (with non-negative coefficients).
def is_linear_combination(x: int, prime_numbers: list) -> bool:\n\"\"\"Check if `x` can be written as a linear combination of prime numbers, i.e.,\n x = b1*p1 + b2*p2 + b3*p3 + ... + bn*pn\n where pi represents a prime number in `prime_numbers`, bi is a non-negative integer.\n \"\"\"\n# very naive and not optimized\nfor n in prime_numbers:\n# n divides x \nif x % n:\nreturn True\n# n does not divides x, check if the difference between x and multiples of n can be\n# a linear combination of other remaining prime numbers\nfor i in range(x//n):\nif is_linear_combination(x - i*n, [p for p in prime_numbers if p!=n]):\nreturn True\nreturn False \ndef centrifuge_naive(n: int, k: int) -> bool:\n\"\"\"Check if a `n`-hole centrifuge can be balanced with `k` identical test tubes.\n True if both `k` and `n-k` can be written as a linear combination of the prime divisors of `n`.\n \"\"\"\nprime_divisors = get_prime_divisors(n) # simple cached function, skipped\nreturn is_linear_combination(k, prime_divisors) and is_linear_combination(n-k, prime_divisors)\n
"},{"location":"posts/centrifuge-problem/#some-optimizations","title":"Some Optimizations","text":"The above method works just fine, but very slow if we want to compute the total number of solutions, instead of just checking whether a particular \\(k\\) works.
There can be a few optimizations, for example, we can compute only the lower half of \\(k\\)s:
from functools import lru_cache\n@lru_cache(maxsize=None)\ndef centrifuge_naive(n: int, k: int) -> bool:\nprime_divisors = get_prime_divisors(n) # cached\nif k > n//2:\nreturn centrifuge(n, n-k)\nreturn is_linear_combination(k, prime_divisors) and is_linear_combination(n-k, prime_divisors)\n
Further, if \\(n\\) is a (large) prime number itself, we understand that all \\(1\\le k\\lt n\\) will not work. Similarly, if \\(n\\) is a power of prime number, we can bypass many values of \\(k\\) too.
@lru_cache(maxsize=None)\ndef centrifuge_naive(n, k):\nprime_divisors = get_prime_divisors(n)\n# ...\n# special case when n is power of prime\nif len(prime_divisors) == 1:\np = prime_divisors[0]\nreturn (k % p == 0) and ((n - k) % p == 0)\n# ...\n
At certain point, we will realize that it would be faster to simply compute all possible \\(k\\)s instead of checking one by one whether a certain \\(k\\) can balance the centrifuge. This leads us to the second approach, which I call \"bootstrap\".
"},{"location":"posts/centrifuge-problem/#method-2-bootstrap","title":"Method 2: Bootstrap","text":"The bootstrap method is a variant of DFS, which essentially generates all possible \\(k\\) for a given \\(n\\) by exhausting the values from linear combinations of \\(n\\)'s prime divisors. The generated values should be between 2 and \\(n\\). Then we can tell if \\(k'\\) can balance the \\(n\\)-hole centrifuge by checking whether \\(k'\\) and \\(n-k'\\) are in the generated values.
def bootstrap(x, n, numbers, result):\n\"\"\"Compute all linear combinations of the given numbers smaller than n\"\"\"\nfor p in numbers:\nif p+x > n:\nbreak\nfor i in range((n-x) // p):\np_ = x + p * i # p_ <= n\nif not result[p_]:\n# x + p*i has not been tested, and is a linear combination of given numbers \nresult[p_] = True\n# check whether we can add multiples of remaining numbers\nbootstrap(p_, n, [n2 for n2 in numbers if n2 != p], result)\ndef centrifuge_bootstrap(n: int, k: int) -> bool:\nprime_divisors = get_prime_divisors(n) # cached, `prime_divisors` is sorted\n# result[k] represents whether k is valid, k=0...n\nresult = [True] + [False] * (n-1) + [True]\nbootstrap(0, n, prime_divisors, result) # TODO: bootstrap only once for a given `n`\nreturn result[k] and result[n-k]\n
This method invests some time in pre-computing all possible linear combinations of the prime divisors of \\(n\\). If we are only interested to see a particular \\((n,k)\\) pair, we can break out when we have done result[k]
and result[n-k]
in bootstrap()
.
The last method uses dynamic programming. We can use \\(f[k]\\)=True
to represent that \\(k\\) is a linear combination of \\(n\\)'s prime divisors. A value \\(i\\) is either itself a prime divisor of \\(n\\) (and thus a linear combination of the prime divisors), or the sum of a \\(n\\)'s prime divisor \\(p\\) and \\((i-p)\\). In the latter case, if \\((i-p)\\) is a linear combination of \\(n\\)'s prime divisors, so is \\(p+(i-p)=i\\).
Hint
If \\((i-p)\\) is a linear combination of \\(n\\)'s prime divisors, i.e., \\(i-p=a_1p_1+a_2p_2+...+a_np_n\\), where \\(\\{p_i\\}\\) are the prime divisors of \\(n\\) and \\(\\{a_i\\}\\) are non-negative integers, then \\(i-p+p\\) is definitely a linear combination too: \\(p\\)'s coefficient becomes \\(a+1\\ge0\\).
Hence,
The boundary condition is \\(f[0]\\) = True
, i.e., an empty centrifuge is balanced.
The whole function is extremely short:
def centrifuge_dp(n: int, k: int) -> bool:\nprime_divisors = get_prime_divisors(n) # cached, `prime_divisors` is sorted\nf = [True] + [False] * n\nfor p in prime_divisors: # TODO: DP only once for a given `n`\nfor i in range(p, n+1):\nf[i] = f[i] or f[i-p]\nreturn f[k] and f[n-k]\n
"},{"location":"posts/centrifuge-problem/#performance-comparison","title":"Performance Comparison","text":"Obviously, the Method 2 and 3 are much faster than the na\u00efve Method 1. Method 3 does not even use recursion and is the fastest.
Note
A note there is that if we are to check all \\(1\\le k\\le n\\), e.g., [i for i in range(1, n+1) if centrifuge(n,i)]
, we need to make some adjustment to the functions above so as to bootstrap or perform DP only once for each \\(n\\). This is trivial.
Below are some plots of balanced centrifuges. Note that for a particular value of \\(k\\), there can be more than one way to balance the centrifuge. Here, I illustrate only one.
6-hole10-hole12-hole12-hole20-hole24-hole33-holeplot_centrifuge(6, \"6-hole-centrifuge.svg\")\n
plot_centrifuge(10, \"10-hole-centrifuge.svg\")\n
plot_centrifuge(12, \"12-hole-centrifuge.svg\")\n
plot_centrifuge(18, \"18-hole-centrifuge.svg\")\n
plot_centrifuge(20, \"20-hole-centrifuge.svg\")\n
plot_centrifuge(24, \"24-hole-centrifuge.svg\")\n
plot_centrifuge(33, \"33-hole-centrifuge.svg\")\n
"},{"location":"posts/centrifuge-problem/#python-code","title":"Python code","text":"The code to generate the plots above:
from functools import lru_cache\nimport numpy as np\nimport matplotlib.pyplot as plt\n@lru_cache(maxsize=None)\ndef prime_divisors(n):\n\"\"\"Return list of n's prime divisors\"\"\"\nprimes = []\np = 2\nwhile p**2 <= n:\nif n % p == 0:\nprimes.append(p)\nn //= p\nelse:\np += 1 if p % 2 == 0 else 2\nif n > 1:\nprimes.append(n)\nreturn primes\ndef centrifuge(n):\n\"\"\"Return a list of which the k-th element represents if k tubes can balance the n-hole centrifuge\"\"\"\nF = [True] + [False] * n\nfor p in prime_divisors(n):\nfor i in range(p, n + 1):\nF[i] = F[i] or F[i - p]\nreturn [F[k] and F[n - k] for k in range(n + 1)]\ndef factorize(k: int, nums: list) -> list:\n\"\"\"Given k, return the list of numbers from the given numbers which add up to k.\n The given numbers are guaranteed to be able to generate k via a linear combination.\n Examples:\n >>> factorize(5, [2, 3])\n [2, 3]\n >>> factorize(6, [2, 3])\n [2, 2, 2]\n >>> factorize(7, [2, 3])\n [2, 2, 3]\n \"\"\"\ndef _factorize(k, nums, res: list):\nfor p in nums:\nif k % p == 0:\nres.extend([p] * (k // p))\nreturn True\nelse:\nfor i in range(1, k // p):\nif _factorize(k - p * i, [n for n in nums if n != p], res):\nres.extend([p] * i)\nreturn True\nreturn False\nres = []\n_factorize(k, nums, res)\nreturn res\n@lru_cache(maxsize=None)\ndef centrifuge_k(n, k):\n\"\"\"Given (n, k) and that k balances a n-hole centrifuge, find the positions of k tubes\"\"\"\nif n == k:\nreturn [True] * n\nfactors = factorize(k, prime_divisors(n))\npos = [False] * n\ndef c(factors: list, pos: list) -> bool:\nif sum(pos) == k:\nreturn True\nif not factors:\nreturn False\np = factors.pop(0)\npos_wanted = [n // p * i for i in range(p)]\nfor offset in range(n):\npos_rotated = [(i + offset) % n for i in pos_wanted]\n# the intended positions of the p tubes are all available\nif not any(pos[i] for i in pos_rotated):\n# claim the positions\nfor i in pos_rotated:\npos[i] = True\nif not c(factors, pos):\n# unclaim the positions\nfor i in pos_rotated:\npos[i] = False\nelse:\nreturn True\n# all rotated positions failed, add p back to factors to place later\nfactors.append(p)\nc(factors, pos)\nreturn pos\ndef plot_centrifuge(n, figname=\"centrifuge.svg\"):\nncols = max(int(n**0.5), 1) # minimum 1 column\nnrows = n // ncols if n % ncols == 0 else n // ncols + 1\nheight = 3 if nrows == ncols else 2\nwidth = 2\nfig, axes = plt.subplots(nrows, ncols, figsize=(height * nrows, width * ncols))\nz = np.exp(2 * np.pi * 1j / n)\ntheta = np.linspace(0, 2 * np.pi, 20)\nradius = 1 / (ncols + nrows)\na = radius * np.cos(theta)\nb = radius * np.sin(theta)\ncent = centrifuge(n)\nfor nr in range(nrows):\nfor nc in range(ncols):\nk = nr * ncols + nc + 1\naxis = axes[nr, nc] if ncols > 1 else axes[nr]\nif k > n:\naxis.axis(\"off\")\ncontinue\n# draw the n-holes\nfor i in [z**i for i in range(n)]:\naxis.plot(a + i.real, b + i.imag, color=\"b\" if cent[k] else \"gray\")\n# draw the k tubes\nif cent[k]:\nif k > n // 2:\npos = [not b for b in centrifuge_k(n, n - k)]\nelse:\npos = centrifuge_k(n, k)\nfor i, ok in enumerate(pos):\ni = z**i\nif ok:\naxis.fill(a + i.real, b + i.imag, color=\"r\")\naxis.set_aspect(1)\naxis.set(xticklabels=[], yticklabels=[])\naxis.set(xlabel=None)\naxis.set_ylabel(f\"k={k}\", rotation=0, labelpad=10)\naxis.tick_params(bottom=False, left=False)\nfig.suptitle(f\"$k$ Test Tubes to Balance a {n}-Hole Centrifuge\")\nfig.text(0.1, 0.05, \"Red dot represents the position of test tubes.\")\nplt.savefig(figname)\nplt.close(fig)\nif __name__ == \"__main__\":\nfor n in range(6, 51):\nprint(f\"Balancing {n}-hole centrifuge...\")\nplot_centrifuge(n, f\"{n}-hole-centrifuge.png\")\n
"},{"location":"posts/centrifuge-problem/#download-plots-of-balanced-centrifuges","title":"Download plots of balanced centrifuges","text":"Success
You can download the Python code and all plots of balanced \\(n\\)-hole centrifuge, \\(6\\le n\\le50\\), which I calculated using the code above.
"},{"location":"tags/","title":"Tags","text":""},{"location":"tags/#8-k","title":"8-K","text":"