1 Lecture Notes for ORIE 473: Empirical Methods in Financial Engineering by D. Ruppert © Copyright D. Ruppert, 2001 ii Contents 1 Introduction: 3/21/01 1 2 Review of Prob and Stats: 3/12/01 5 2.1 Densities, CDF's, means, variances, and correlation..... 5 2.2 Best Linear Prediction ...................... 6 2.2.1 Prediction Error...................... 7 2.3 Conditional Distributions.................... 7 2.4 The Normal Distribution..................... 8 2.4.1 Conditional expectations and variance......... 9 2.5 Linear Functions of Random Variables............. 9 2.6 Maximum Likelihood Estimation................ 11 2.7 Likelihood Ratio Tests ...................... 12 3 Returns: 3/12/01 15 3.1 Prices and returns......................... 15 3.2 Log returns............................. 16 3.3 Behavior of returns........................ 17 3.4 Common Model — IID Normal Returns............ 18 3.5 The Lognormal Model...................... 18 3.6 Random Walk........................... 21 3.6.1 Geometric Random Walk................ 22 3.7 Are log returns really normally distributed?.......... 23 3.7.1 Do the GE daily returns look like a geometric random walk? ......................... 25 3.8 Portrait of an econometrician, Eugene Fama.......... 30 m iv CONTENTS 3.9 Other empirical work related to Fama's............ 32 3.10 Technical Analysis ........................ 34 3.11 Fundamental Analysis...................... 35 3.12 Efficient Markets Hypothesis (EMH).............. 36 3.12.1 Three types of efficiency................. 36 3.12.2 Testing market efficiency ................ 37 3.13 Summary.............................. 38 4 Univariate Time Series Models: 3/12/01 41 4.1 Time Series............................. 41 4.2 Stationary Processes ....................... 41 4.2.1 Weak White Noise.................... 42 4.2.2 Estimating parameters of a stationary process .... 43 4.3 AR(1) processes.......................... 43 4.3.1 Properties of a stationary AR(1) process........ 44 4.3.2 Nonstationary AR(1) processes............. 45 4.3.3 Estimation......................... 51 4.4 AR(p) models........................... 58 4.4.1 Example: GE daily returns ............... 59 4.5 Moving Average (MA) Processes................ 59 4.5.1 MA(1) processes ..................... 59 4.5.2 General MA processes.................. 60 4.6 ARIMA Processes......................... 60 4.6.1 The backwards operator................. 60 4.6.2 ARMA Processes..................... 60 4.6.3 The differencing operator................ 61 4.6.4 From ARMA processes to ARIMA process...... 61 4.7 Model Selection.......................... 64 4.7.1 AICandSBC ....................... 64 4.7.2 Stepwise regression applied to AR processes..... 66 4.7.3 Using ARIMA in SAS: Cree data............ 70 4.8 Example: Three-month Treasury bill rates........... 77 4.9 Forecasting............................. 92 4.9.1 GE daily returns ..................... 92 CONTENTS v 5 Portfolio Selection: 3/12/01 95 5.1 Trading off expected return and risk.............. 95 5.2 One risky asset and one risk-free asset............. 96 5.2.1 Example.......................... 98 5.2.2 Estimating E(R) and aR................. 100 5.3 Two risky assets.......................... 100 5.3.1 Estimating means, standard deviations, and covari- ances............................ 101 5.4 Combining two risky assets with a risk-free asset....... 103 5.4.1 Tangency portfolio with two risky assets....... 104 5.4.2 Effect of pi2 ........................ 107 5.5 Harry Markowitz......................... 109 5.6 Risk-efficient portfolios with N risky assets.......... 110 5.6.1 Efficient-set mathematics ................ 110 5.6.2 Selling short........................ 126 5.6.3 The Interior decorator fallacy.............. 128 5.6.4 Back to the math..................... 128 5.6.5 Example: JV = 2...................... 134 5.7 Is the theory useful? ....................... 134 5.8 Example—Global Asset Allocation............... 135 5.9 Quadratic programming..................... 137 6 The Capital Asset Pricing Model: 3/26/01 141 6.1 Introduction to CAPM...................... 141 6.2 The capital market line (CML).................. 144 6.3 Betas and the Security Market Line .............. 147 6.3.1 Examples of betas..................... 151 6.3.2 Comparison of the CML with the SML ........ 151 6.4 The security characteristic line.................. 152 6.4.1 Reducing unique risk by diversification........ 153 6.5 Some theory............................ 155 6.5.1 Contributions to the market portfolio's risk...... 155 6.5.2 Derivation of the SML.................. 156 6.6 Estimation of beta and testing the CAPM........... 158 VI CONTENTS 6.6.1 Interpretation of alpha..................164 6.7 Summary..............................164 7 Pricing Options: 4/12/01 167 7.1 Introduction............................ 167 7.2 Call options............................ 168 7.3 The law of one price ....................... 169 7.3.1 Arbitrage ......................... 171 7.4 Time value of money and present value............ 171 7.5 A simple binomial example................... 172 7.6 Two-step binomial option pricing................ 176 7.7 Arbitrage pricing by expectation................ 179 7.8 A general binomial tree model ................. 181 7.9 Martingales ............................ 183 7.9.1 The risk-neutral world.................. 185 7.10 Trees to random walks to Brownian motion.......... 186 7.10.1 Getting more realistic .................. 186 7.10.2 A three-step binomial tree................ 187 7.10.3 More time steps...................... 189 7.10.4 Properties of Brownian motion............. 190 7.11 Geometric Brownian motion................... 191 7.12 Using the Black-Scholes formula................ 194 7.12.1 How does the option price depend on the inputs? . . 194 7.12.2 An example —GE.................... 194 7.12.3 Early exercise of calls is never optimal......... 196 7.12.4 Are there returns on non-trading days?........ 198 7.12.5 Implied volatility..................... 199 7.13 Puts................................. 201 7.13.1 Pricing puts by binomial trees.............. 201 7.13.2 Why are puts different than calls? ........... 204 7.13.3 Put-call parity....................... 204 7.14 The evolution of option prices.................. 206 7.15 Intrinsic value and time value.................. 206 7.16 Black, Scholes, and Merton.................... 210 CONTENTS Vil 7.17 Summary.............................. 212 7.18 References............................. 213 8 GARCH models: 4/24/01 215 8.1 Introduction............................ 215 8.2 Modeling conditional means and variances.......... 216 8.3 ARCH(l) processes........................ 217 8.3.1 Example.......................... 219 8.4 The AR(1)/ARCH(1) model................... 221 8.5 ARCT%) models......................... 223 8.6 GARCH(p, q) models....................... 224 8.7 Heavy-tailed distributions.................... 226 8.8 Comparison of ARM A and GARCH processes........ 228 8.9 Fitting GARCH models...................... 229 8.9.1 Example: S&P 500 returns................ 232 8.10 I-GARCH models......................... 238 8.10.1 What does it mean to have an infinite variance? . . . 242 8.11 GARCH-M processes....................... 246 8.12 E-GARCH............................. 248 8.13 Back to the S&P 500 example .................. 250 8.14 The GARCH zoo ......................... 253 8.15 Applications of GARCH in finance............... 253 8.16 Summary.............................. 254 8.17 References............................. 257 9 Fixed Income Securities: 4/30/01 259 9.1 Introduction............................ 259 9.2 Zero coupon bonds........................ 260 9.2.1 Price and returns fluctuate with the interest rate . . . 261 9.3 Coupon bonds........................... 262 9.4 Yield to maturity ......................... 264 9.4.1 Spot rates ......................... 266 9.5 Term structure........................... 267 9.6 Continuous compounding.................... 272 9.6.1 Continuous forward rates................ 274 viii CONTENTS 9.7 Summary.............................. 276 9.7.1 Introduction........................ 276 9.7.2 Zero coupon bonds.................... 277 9.7.3 Risk due to interest rate changes............ 277 9.7.4 Coupon bonds ...................... 278 9.7.5 Term structure of interest rates............. 279 9.7.6 Continuous compounding................ 280 9.8 References............................. 281 10 Behavioral finance: 5/1/01 283 10.1 Introduction............................ 283 10.2 Defense of EMH.......................... 284 10.3 Challenges to the EMH...................... 285 10.4 Can arbitrageurs save the day?................. 286 10.5 What do the data say? ...................... 287 10.6 References............................. 289 Chapter 1 Introduction: 3/21/01 The title of this course is "Empirical Research Methods in Financial Engineering." "Empirical" means derived from experience, observation, or experiment — so we are going to work with data. We'll be doing statistics. Financial engineering is the construction of financial products such as stock options. Financial engineering uses probability models, e.g., those used to derive the famous Black-Scholes formula. • are these models supported by financial markets data? • how are the parameters in these models estimated? Let's look ahead to the Black-Scholes formula for the price of a European call option. "Now" is called time 0. The maturity date of the option is T. The option gives us the right to purchase one share of stock for E dollars at time T. Let ST be the price of the stock at time T. At time 0, T and E are known but S is unknown. At time T, ST will become known. If at time T we learn that ST > E then 1 2 CHAPTER 1. INTRODUCTION: 3/21/01 we will exercise the option and purchase one share. We can immediately sell the share for St dollars and earn a profit of St — E dollars. If at time T, ST < E then we do not exercise the option. The option expires and we lose the original cost of the option, but no more. The value of the option at time T is, therefore, max{0, S — E}. But right now at time 0, what is the value of the option, i.e., the price for which it should sell on the market? Prior to the 1970's, options were priced by "seat of pants". Then Black, Sc-holes, and Merton deduced the correct price of a call option from a mathematical model (and much hard thinking). They assumed that one can lend and borrow at a risk-free rate r. Thus, if Bt is the price at time í of a risk-free bond purchased for $1 at time 0, then Bt = exp(rŕ). Let St be the price of the underlying stock. They assumed that St = S0exp(fit + aWt), where ji is a "drift" or "growth rate," Wt is a Brownian motion stochastic process, and a is a standard deviation that measures the volatility of the stock. In this course, you will learn exactly what this model means. Right now, the "take home" message is that there are precise mathematical models of stock price movements that we can check against the data. Also, there are important parameters such as /i and a that must be estimated from data. The Black-Scholes formula is C = $(di)So - §(d2)Eexp(-rT) where C is the price of the option at time 0, is the standard normal CDF, 4 = Wtiti,t^ and d2 = di_aVf. 3 The formula is, quite obviously, complicated and it not easy to derive, but it is easy to compute and was hard-wired into calculators almost immediately; the Black-Scholes formula and hand-held calculators both emerged in the early 1970's. We will be interested in the underlying assumptions behind the formula. Remember: GI - GO (garbage in, garbage out). If the assumptions don't hold, then there is no reason to trust the Black-Scholes formula, despite the impressive mathematics behind it. The equation Bt = exp(ri) of continuous compounding is the solution to the differential equation The general solution is Bt = B0 exp(ri) and B0 = 1 since we have assumed that the bond can be purchased for $1 at time 0. Where does St = S0exp(aWt + fit) come from? If a were 0, then this would be exponential growth, St = S0 exp(jui), just like the bond price Bt. The term aWt comes from the random behavior of stock prices, a is a standard deviation, essentially of the changes in the stock prices. The random process Wt is something we will need to learn much more about; and we will. In this course we will • study models f o financial markets (they are complex but fascinating) • learn to test the models — do they fit financial markets data adequately? • estimate parameters in the models such as /i and a that are essential for correct pricing of financial products such as a call option. 4 CHAPTER 1. INTRODUCTION: 3/21/01 Key question: How do the prices of stocks and other financial assets behave? Looking ahead to where this course is going • We will start by defining "returns" on the prices of a stock • We will then look at "ARIMA models" - these are models for "time series," which are sequences of data sampling over time - ARIMA models are stochastic processes • After looking at returns and time series models of return behavior we will look at optimal portfolios of risky assets (e.g., stocks) and of risky assets and risk-free bonds (e.g., US Treasury bills). - This will take us to the famous Capital Asset Pricing Model (CAPM) Looking even farther ahead, we will later return to the pricing of stock options by the Black-Scholes formula and cover other areas of financial engineering such as the term structure of interest rates. But before we get into applications of probability and statistics in financial engineering, we need to review probability and statistics so that we are all up to speed. Chapter 2 Review of Prob and Stats: 3/12/01 2.1 Densities, CDF's, means, variances, and correlation random variable — large set of possible values but only one will actually occur continuous random variable — X is a continuous random variable if it has a p.d.f. fx such that P(X e A) = í fx{x)dx for all sets A ■I A The CDF of X if rx Fx(x) := / fx{u)du J — oo The expectation of X is / + 00 xfx{x)dx -oo The variance of X is o\ := f{x - E(X)}2fx(x)dx = E{X - E(X)}2 5 6 CHAPTER! REVIEWOFPROB AND STATS: 3/12/01 Useful formula: a\ = E(X2) - {E(X)}2. The standard deviation is the square root of the variance. A pair of random variables, (X, Y), as a bivariate density /xľ(i, y) Covariance: aXY = E[{X-E(X)}{Y-E(Y)}] = J{x-E(X)}{y-E(Y)}fXY(x,y)dxdy. Useful formulas: • aXY = E(XY) - E(x)E(y) . aXY = E[{X - E(X)}Y] . aXY = E[{Y - E(Y)}X] • oXY = E(XY) if E(X) = 0 or E(Y) = 0 The correlation coefficient between X and Y is pXY := oXY jaxaY. 2.2 Best Linear Prediction Suppose we observe X and want to predict Y; this can be done if pXY is not zero. Best linear prediction means finding ß0 and ßi so that H(ß0,ß1):=E{Y-(ß0 + ß1X)}2 is minimized. if (A,, A) = /i(F2) - 2ßoE{Y) - 2ß1E(XY) + (A, + AX)2. Setting the partial derivatives to zero we get 0 = -E{Y) + ß0 + ß1E(X)emd 0 = -E(XY)+ß0E{X) + ß1E(X2). 2.3. CONDITIONAL DISTRIBUTIONS 7 After some algebra we find that ßi = oXYja\ (2.1) and A, = E{Y) - ß1E(X) = E{Y) - aXY/ax E(X). Thus, the best linear predictor of Y is Y:=ßo + ßiX = E(Y) + aXY/a2x{X - E(X)} 2.2.1 Prediction Error The prediction error is Y — Y. It is easy to prove that E{Y — Y} = 0 so that the prediction is "unbiased." With a little algebra we can show that the expected squared prediction error is E{Y-YY = al-^f = aUl-P\y). °x If we do not observe X, then the best predictor of Y is E(Y) and the expected squared prediction error is aY. Therefore, pXY is the fraction by which the prediction error is reduced when X is known. Example: If pXY = .5, then the prediction error is reduced by 25% by observing X. If aY = 3, then the expected squared prediction error is 3 if X is unobserved by only 2 1 /4 if X is observed. 2.3 Conditional Distributions Let fXY(x, y) be the joint density of a pair of random variables, (X, Y). The marginal density of X is fx(x) := / fXY(x, y)dx and similarly for fY. The conditional density of Y given X is , / i n fxy(x,y) fY\x{y\x) = . 8 CHAPTER! REVIEWOFPROB AND STATS: 3/12/01 The conditional expectation of Y given X is just the expectation calculated using fY\x(y\x): E(Y\X = x) = j yfY\x{y\x)dy which is, of course, a function of x. The conditional variance of Y given X is Var(r|X = x) = J{y - E(Y\X = x)}2fY\x{y\x)dy. Example: Suppose /iľ(i, y) = 2on0 < x < 1 and x < y < 1. Then the marginal density of X is fx(x) = 2(1 — x). The conditional density of Y given X is fY\x{y\x) = (1—x)'1 for x < y < 1. The conditional expectation of Y is E{Y\X = x) = l-±^. The conditional variance of Y is (1 - XÝ Var(r|X = x) = ^—^-. 2.4 The Normal Distribution The standard normal distribution has density ^):=-i=exp(^). The N{ji, a2) density is a \ a J The standard normal CDF is $(rc) := / 4>{u)du. J—oo 2.5. LINEAR FUNCTIONS OF RANDOM VARIABLES 9 $ can be evaluated using tables (ugh!) or more easily using software such as MATLAB or MINITAB. If X ~ N(fjL, a2) then P(X < x) = ${{x - ß)/a}. Example: If X ~ iV(5,4) then what is P(X < 7). Answer: $(1) = .8413. In MATLAB, "cdf n (1)" gives "ans = 0.8413". 2.4.1 Conditional expectations and variance The calculation of conditional expectations and variances can be difficult for some probability distributions, but it is quite easy for a pair (X, Y) that has a bivariate normal distribution. For a bivariate normal pair, the conditional expectation of Y given X equals the best linear predictor of Y given X: E(Y\X) = E{Y) + °-^f{X - E(X)}. °x The conditional variance of Y given X is the expectation squared prediction error: Var(r|X) = 4(l-p^) 2.5 Linear Functions of Random Variables E(aY + b) = aE(Y) + b where F is a random variable and a and b are constants. Also, Var(aF + 6) = a2Var(F). If X and Y are random variables and w\ and w2 are constants, then E{WlX + w2Y) = WlE{X) + w2E(Y), 10 CHAPTER 2. REVIEW OF PROB AND STATS: 3/12/01 and Var(wiX + w2Y) = w\\ax(X) + 2Wlw2Cov(X, Y) + ^Var(F). Check that Let X = (Xi,..., XN)T be a random vector. We define the expectation vector of X to be í E(Xi) \ \E(Xn)J The covariance matrix of X is COV(X) / Var(X0 Cov(XuX2) ■■■ Cov(X1,XN)\ Cov{X2,X1) Var(X2) ••• Cov^X^ VCov(Xw,^) Cov(XN,X2) ••• Var(Xw) / Let w = (ii>i,..., wN)T be a vector of weights. Then N wTX = J2 WiXi i=l is a weighted average of the components of bX; it is a random variable. One can show that E(wTX) = wT{E(X)}. Also N N Var(wJX) = ^2 ^2 Wi Wj Cov(Xj, Xj) This result can be expressed more simply using vector/matrix notation: Vaľ(wJX) = wTCOV(X)w. (2.2) 2.6. MAXIMUM LIKELIHOOD ESTIMATION 11 Important fact: If X has a multivariate normal distribution, then w1X is a normally distributed random variable. Example: Suppose that E{X\) = 1, E(X2) = 1.5, a2Xl = 1, ax = 2, and Cov(Xi,X2) = .5. Find E{.3Xl + .7X2) and Var(.3Xi + .7X2). If (Xi X2)T is bivariate normal, find P(.3X1 + .7X2 < 2). Ansioer: E(.3X1 + .7X2) = 1.35, Var(.3Xi + .7X2) = 1.28, and P(.3Xl + .7X2 < 2) = ${(2 - 1.35)/v/L28} = $(.5745) = .7172. 2.6 Maximum Likelihood Estimation Maximum likelihood is the most important and widespread method of estimation. Many well-known estimators such as the sample mean and the least-squares estimator in regression are maximum likelihood estimators. Maximum likelihood is a very useful in practice and tends to give more precise estimates than other methods of estimation. Let Y" = (Fi,..., Fn)T be a vector of data and let 6 = (#i,..., 9P)T be a vector of parameters. Suppose that /(y; 6) is the density of Y which depends on the parameters. Example: Suppose that Fi,..., Fn are HD N(ß, a2). Then 0 = {jjL.a2). Also, /(,;e) = ni0(^) = _L_exp{^|:W-^}. L{0) := f(Y; 6) is called the "likelihood function" and is the density evaluated at the observed data. It tells us the likelihood of what was actually observed. The maximum likelihood estimator (MLE) is the value of 6 that maximizes the likelihood function. In other words, the MLE is the value of 0 that maximizes the likelihood of the data that was observed. We will denote the MLE by 0ml- Often it is mathematically easier to maximize log{L(0)}; since the log function is increasing, maximizing log{L(0)} is equivalent to maximizing L(0). 12 CHAPTER 2. REVIEW OF PROB AND STATS: 3/12/01 Example: In the example above, it is an easy calculus exercise to show that P-ML = Y. Also, with ji fixed at its MLE, the MLE of a2 solves d n \ n The solution to this equation is n i=l The MLE of a2 has a small bias. The "bias-corrected" MLE is the so-called "sample variance" defined as In a "textbook example" such as the one above, it is possible to find an explicit formula for the MLE. With more complex models, there is no explicit formula for the MLE. Rather, one writes a program to compute log{L(ö)} for any value of 6 and then using optimization software to maximize this function numerically. For some models such as the ARIMA time series models discussed in Chapter 4, there are software packages, e.g, MINITAB and SAS, that compute the MLE; the computation of the log-likelihood function has been pre-programmed. 2.7 Likelihood Ratio Tests Likelihood ratio tests, like maximum likelihood estimation, are a convenient, all-purpose tool. We will consider likelihood ratio tests when we wish to test a restriction on a subset of the parameters. Let -O be a partitioning of the parameter vector into two components. Suppose we want to test a hypothesis about 8\ without making any hypothesis 2.7. LIKELIHOOD RATIO TESTS 13 about the value of 02. For example, we might want to test that a population mean is zero; then Q\ = ß and 02 = o2. Let 0ljO be the hypothesized value of 0X, e.g., 0ijO = 0 if we want to test that ß is zero. Then the hypotheses are Ho: 0i = 0i,o and Hi : 0O ^ 0i,o. For example, if we are testing that ß is zero then the hypotheses are H0 : ß = 0 and Hi : ß ^ 0. Let Q ml be the maximum likelihood estimator and let 02;O be the value of 02 that maximizes L(0) when 0i = 0ljO. Idea: If H0 is true, then £(0i,o, 02,o) should be similar to L{0). Otherwise, £(0i,o, 02,o) should be smaller that 1/(0). The likelihood ratio test rejects H0 if 2[log{L(0ML)}-log{L(0i,o,02,o)}]>X^dim{öl). Here dim(0i) is the dimension (number of components) of 0i and x2a k is the a upper-probability value of the chi-squared distribution with k degrees of freedom; the probability above \2a k ^s a- Example: Suppose again that Yu...,Yn are HD N{ß,a2) and 0 = (ß.a2). We want to test that ß is zero. Note that n n 1 n log(i) = - g log(27r) - - log(a2) - — YSY, - ßf. If we evaluate log(L) at the MLE, we get log{L(F, a2ML)} = --{1 + log(27r) + log^)}. The value of a2 that maximizes L when ß = 0 is 1 n % = &*■ 14 CHAPTER 2. REVIEW OF PROB AND STATS: 3/12/01 Therefore, 2[log{L(F,alfL)}-log{L(0,a02)}] = nlog (jjpj = nlog (^^^2) The likelihood ratio test rejects H0 if Tn Y2 \ 3XAYi-T)\ "log u» ,r; '^ >4 Chapter 3 Returns: 3/12/01 3.1 Prices and returns Let Pt be the price of an asset at time t. Assuming no dividends the net return is r, Pt n Pt — Pt-1 tU = TÍ---------I = -----^-------- Pt-1 Pt-1 The simple gross return is Pt Pt-i l + Rt Example: If Pt = 2 and Pť+1 =2.1 then l + Rt = 1.05 and i?t = .05. The gross return over the most recent k periods (t — kto t) is 1 + Rt(k) := Pt = (JjA (Pt-A (Pt-k+i\ Pt_k \Pt_J {Pt_2)'"{ Pt_k ) = (1 + Rt) ■ ■ ■ (1 + Rt_k+1) 15 16 CHAPTER 3. RETURNS: 3/12/01 Returns are scale-free, meaning that they do not depend on units (dollars, cents, etc.). Returns are not unitless. Their unit is time; they depend on the units of t (hour, day, etc.). Example: Time t - 2 t-l t t + 1 P 200 210 206 212 R 1.05 .981 1.03 Rt(2) 1.03 1.01 Rt{3) 1.06 3.2 Log returns Continuously compounded returns, also known as "log returns" are: rt := log(l + Rt) = log í -^- J = pt - pt-i where Pt ■= log(Pt) [Notation: log(rc) will mean the natural logarithm of x throughout these notes. log10(x) will be used to denote the logarithm to base ten, if it is needed.] Advantage — simplicity of multiperiod returns rt{k) := log{l + i?t(A;)} = log{(l + Rt)---(l + Rt.k+1)} = log(l + Rt) + --- + \og{l + Rt.k+1) = rt + Tt-x -\-------h rt_k+l 3.3. BEHAVIOR OF RETURNS 17 3.3 Behavior of returns What can we say about returns? • They cannot be perfectly predicted — i.e., they are random. • If we were ancient Greeks, we would think of the returns as determined by the Gods or Fates (three Goddesses of destiny). The Greeks did not seem to realize that random phenomena do exhibit some regularities such as the law of large numbers and the central limit theorem. Peter Bernstein has written an interesting popular book "Against the Gods: The Remarkable Story of Risk." He chronicles the developments of probability theory and our understanding of risk. It took a surprisely long time for proability theory to develop. The ancient Greeks did not have probability theory. Probability arose out of gambling during the Renaissance. University of Chicago economist Frank Knight (1916 Cornell PhD) distinguishes between • measurable uncertainty or "risk proper" (e.g., games of chance) where the probabilities are known • unmeasurable uncertainty (e.g., finance) where the probabilities are unknown At time t — 1 Pt and Rt are not only unknown, but we do not know their probability distributions. However, we can estimate this probability distribution if we are willing to make an assumption. Leap of Faith Future returns will be similar to past returns. 18 CHAPTER 3. RETURNS: 3/12/01 More precisely, the probability distribution of Pt can be determined from past data With this (big) assumption, we can get somewhere — and we will! Asset pricing models (e.g. CAPM) use the joint distribution of a cross-section {Rh, ..., Rm} of returns on iV assets at a single time t Other models use the time series {Rľ, R2l..., Rt} of returns on a single asset. We will start with a single asset. 3.4 Common Model — IID Normal Returns Here Rľ, R2l... are the returns from a single asset. A common model is that they are 1. mutually independent 2. identically distributed, i.e., they have the same mean and variance 3. normally distributed IID = independent and identically distributed There are (at least) two problems with this model: • The model implies the possbility of unlimited losses, but liability is usually limited; Rt > — 1 since you can lose no more than your investment • 1 + Rt(k) = (1 + -Rt)(l + Rt-i) •••(! + Rt-k+i) is not normal — sums of normals are normal but not so with products. 3.5 The Lognormal Model A second model assumes that the continuously compounded single-period returns, a.k.a. the log returns and denoted by rt/ are IID. Recall that the log 3.5. THE LOGNORMAL MODEL 19 return is n = log(l + Rt) where 1 + Rt is the simple gross return Thus, we assume that \og(l + Rt)~N{ß, 0 so that Rt > — 1. This solves the first problem. Also, 1 + Rt{k) = (l + Rt)---{l + Rt-k+1) = exp(rr) • • • exp(rt-k+1) = exp(rrH-------hrt_i+i). Therefore, log{ 1 + Rt(k)} = r t + ...n_fc+1 Sums of normals are normal =>• the second problem is solved — normality of single period returns implies normality of multiple period returns. The lognormal distribution goes back to Louis Bachelier (1900). • dissertation at Sorbonne called The Theory of Speculation • Poincarě: "M. Bachelier has evidenced an original and precise mind [but] the subject is somewhat remote from those our other candidates are in the habit of treating." • Bachelier was awarded "mention honorable" rather than "mention trés honorable" — Bachelier never found a decent academic job. • Bachelier anticipated Einstein's (1905) theory of Brownian motion. In 1827, Brown, a Scottish Botanist, observed the erratic, unpredictable motion of pollen grains under a microscope 20 CHAPTER 3. RETURNS: 3/12/01 Einstein (1905) — movement due to bombardment by water molecules — Einstein developed a mathemetical theory giving precise quantitative predictions. Later, Norbert Wiener, an MIT mathematician, developed a more precise mathematical model of Brownian motion. This model is now called the Wiener process. [Aside: 1905 was a good year for Einstein. He published: • the paper on introducing special relativity • a paper on quantization of light which led quantum theory (which he never embraced — "God does not play dice with the world") • the paper on Brownian motion] Bachelier stated that • "The mathematical expectation of the speculator is zero" (this is essentially true of short-term speculation but not of long term investing) • "It is evident that the present theory solves the majority of problems in the study of speculation by the calculus of probability" Bachelier's thesis came to light accidently more than 5 years after he wrote it. Jimmie Savage found a book by Bachelier in the U. Chicago library and asked other economists about it. Paul Samuelson found Bachelier's thesis in the MIT library. The English translation was published in 1964 in The Random Character of Stock Market Prices, an edited volume. Example: A simple gross return, (1 + R), is lognormal(0,(.l)2). What is P{l + R< .9)? Answer: Since log(.9)= —.105, P{l+R < .9) = P(log(l+i?) < log(.9)) = ${(-.105-0/.!} = $(-1.05) = .1469 3.6. RANDOM WALK 21 InMATLAB,cdfn(-1.05) = .1469. Example: Assume again that 1 + R is log normal(0,(.l)2). Find the probability that a simple gross two-period return is less than .9. Answer: The two-period gross return is log normal(0,2(.l)2) so this probability is $ S^É^L] = $(-.745) = .2281. 1(>/2)(.1)J l ; Let's find a general formula for the kth period returns: Assume that . 1 + Rt(k) = (1 + Rt) ■ ■ ■ (1 + Rt-k+1). • log(l + Ri) ~ N{(jl, a2) for all i • The {Rj} are mutually independent. Then log{l + Rt{k)} is the summ of k independent N{jjl, a2) random variables, so that log(l + Rt(k)) ~ N(kfi, ka2). P(l + W<,) = ${i^^}. 3.6 Random Walk Let Z]_, Z2-, ■ ■ ■ be IID with mean jjl and standard deviation a. Z0is an arbitrary starting point. Let S0 = Z0 and St-Zo + Zr + '-' + Zt, t>l. S0, S\,... is called a random walk. We have E(St\ZQ) = ZQ + [it and Vai(St\Z0) = o2t. 22 CHAPTER 3. RETURNS: 3/12/01 Random Walk Figure 3.1: Mean and probability bounds on a random walk with S0 = 0, p = .5 and a = 1. At any given time, the probability of being between the probability bounds (dashed curves) is 68%. 3.6.1 Geometric Random Walk Recall that log{l + Rt(k)} = rt-\-------h rt-k+i- Therefore p —^— = 1 + Rt(k) = exp(rt H-------h rt_k+l) ťt-k so taking k = t we have Pt = -Po exp(rt + rt-i H-------h n). Conclusion: If the log returns are IID normals, then the process {Pt : t = 1,2,...} is the exponential of a random walk. We call such a process a "geometric random walk". lír = log(l + R) is N(p, a2), then the median of R is exp(ju) — 1 since P{R < exp(/i)-l) = P(l+R < exp(ji)) = P(r < p) = P{N{p, a2) < p) = \. 3.7. ARE LOG RETURNS REALLY NORMALLY DISTRIBUTED? 23 Lognormal densities Figure 3.2: Log normal densities. 3.7 Are log returns really normally distributed? There are several ways to check whether log returns are really normally distributed. One way is to look at a normal probability plot of the log returns to see if the plot is approximately a straight line. Another method is to look at the sample skewness and kurtosis of the log returns and to check if their values are near those of the normal distribution; any normal distribution has a skewness coefficient of 0 and a kurtosis of 3. Suppose with have a time series of log retu asset. The sample skewness, denoted by S, i series of log returns, "is ri,..., rt,..., tt on some The sample kurtosis is The sample skewness, denoted by S, is 24 CHAPTER 3. RETURNS: 3/12/01 The "excess kurtosis" is K — 3. Both the sample skewness and the excess kurtosis should be near 0 if the log returns are normally distributed. Table 1.1 of Campbell et al. gives S and K — 3 for several market indices and common stocks. In that table, S is generally close to zero, which indicates that log returns are not very skewed. However, the excess kurtosis is typically rather large for daily returns and positive though not as large for monthly returns. By the CLT, the distribution of log returns over longer periods should approach the normal distribution. Therefore, the smaller excess kurtosis for monthly log returns, in contrast to daily log returns, is expected. The large kurtosis of daily returns indicates that the are "heavy-tailed." Normal probability plots can be supplemented by tests of normality based on the sample CDF, F. F(x) is defined to be the proportion of the sample that is less than or equal to x; if 10 out of 40 data points are 3 or less then F(3) = .25. Normality is tested by comparing the sample CDf with the normal CDF with mean and variance equal to the sample mean and variance, i.e., with compare F(x) with ${(a; — p)/s}. Three common tests of normality that compare the sample CDF with the normal CDF are the Anderson-Darling test, the Shapiro-Wilks test, and the Kolmogorov-Smirnov test. All three are available on MINITAB. Actually, MINITAB uses the Ryan-Joiner test which is close to the Shapiro-Wilks test. In MINI-TAB, go to "Stat," then "Basic Statistics," and then "Normality test." You will need to choose one the three tests. The output is a normal plot plus the results of the test. You can re-run the procedure to run the other tests. The Kolmogorov-Smirnov test is based on the maximum distance between the sample CDF and the normal CDF. The Shapiro-Wilks test is closely tied to the normal probability plot, since it is based on the correlation between the normal quantiles and the sample quantiles. The correlation measures how close the normal plot is to being a straight line. 3.7. ARE LOG RETURNS REALLY NORMALLY DISTRIBUTED? 25 CDF's and quantiles are closely related. In fact, quantiles are given by the inverse of the CDF function; if a random variable X has CDF F then the půl quantile of X is F~l{p) since P{X < F~l(p)} = F{F_1(p)} = p. Let's look at daily returns for GE common stock from December 1999 to December 2000. The daily price Pt is taken to be the average of the high and the low for the day. It might have been better to use the closing price for each day. Why? As can be seen ifn Figure 3.3, the net returns R and the log returns R are very similar. A normal plot is roughly linear. The log return have a sample mean, standard deviation, skewness, and excess kurtosis of .00014, .0176, -.094, and .094, respectively. The values of the sample skewness and excess kurtosis suggest than the log returns are approximately normally distributed. From MINITAB, the Kolmogorov-Smirnov, Anderson-Darling, and Ryan-Joiner tests of normality have a p-values of .15, .40, and .10, respectively. Since each p-value exceeds .05, each test would accept the null hypothesis of normality at a = .05. 3.7.1 Do the GE daily returns look like a geometric random walk? Figure 3.4 shows five independent simulated geometric random walks with the same parameters as the GE daily log returns. Note that the geometric random walks seem to have "patterns" and "momentum" even though they do not. The GE log returns look similar to the geometric random walks. It is somewhat difficult to distinguish between a random walk and a geometric random walk. Figure 3.5 shows three independent simulated time series. For each pair, the log price series (a random walk) is plotted on 26 CHAPTER 3. RETURNS: 3/12/01 the left while the price series (a geometric random walk) is plotted on the right. Note the subtle differences between the prices and the log prices. We prefer the geometric random walk model to the random walk model, because the geometric random walk model is more realistic: the geometric random walk implies non-negative prices and net returns that are at least -1. This graphical comparison of GE prices to geometric random walks is not strong evidence in favor of the geometric random walk hypothesis. This hypothesis implies that the log returns are mutually independent and, therefore, uncorrelated. Therefore we should check for evidence that the log returns are correlated. If we find no such evidence, then we have more reason to believe the geometric random walk hypothesis. 3.7. ARE LOG RETURNS REALLY NORMALLY DISTRIBUTED? 27 GE, daily - 12/17/99 to 12/15/00 0Ĺ 100 150 200 250 50 100 150 200 250 Normal plot of log returns 150 200 250 0.999 0.997 0.99 0.98 0.95 0.90 i- 0.75 ra 0.50 a. 0.25 0.10 0.05 0.02 0.01 0.003 0.001 '+"" / -0.04 -0.02 0 0.02 log return 0.04 100 150 200 250 Figure 3.3: GE daily returns. The first plot is the prices. The second and third are the net returns and the log returns. The fourth plot is a normal probability plot of the log returns. The final plot is of the absolute log returns; there is a scatterplot smooth to help show lohether the volatility is constant. 28 CHAPTER 3. RETURNS: 3/12/01 Geometric Random Walk Geometric Random Walk 0 50 100 150 200 250 50 100 150 200 250 Geometric Random Walk Geometric Random Walk 0 50 100 150 200 250 50 100 150 200 250 Geometric Random Walk GE, daily - 12/17/99 to 12/15/00 0 50 100 150 200 250 0 50 100 150 200 250 Figure 3.4: Five independent geometric random walks and GE daily log returns. The geometric random walks have the same expected log return, volatility, and starting point as the GE log returns. 3.7 ARE LOG RETURNS REALLY NORMALLY DISTRIBUTED? 29 Random Walk Geometric Random Walk 200 400 600 800 1000 200 400 600 800 1000 Random Walk Geometric Random Walk 200 400 600 800 1000 200 400 600 800 1000 Random Walk Geometric Random Walk 0 200 400 600 800 1000 200 400 600 800 1000 Figure 3.5: Three independent simulated price series. On left: log prices. On right: prices. 30 CHAPTER 3. RETURNS: 3/12/01 3.8 Portrait of an econometrician, Eugene Fama This material is taken from Chapter 7 of Capital Ideas by Peter Bernstein. Fama was born in 1939 in Boston, majored in French at Tufts, and was an outstanding student-athlete. In college, Fama earned extra money working for Harry Ernst who published a stock market newsletter: • Fama's job was to find workable buy and sell signals. • Ernst believed that trends, once in place, would continue because of "price momentum." • Bernstein writes that "Fama's efforts to develop profitable trading rules were by no means unsuccessful" but "the ones he found worked only on the old data, not on the new." - like many other investors Fama found that rules that worked well on "backtests" couldn't beat the market when applied in real time. - the market environment would shift or too many people would be using the same strategy Fama decided to go to business school to learn what was really going on. • 1964 doctorate at University of Chicago. • he thought of going to Harvard but was told that he was "more intellectual than the typical Harvard type" Fama stayed at Chicago where he taught finance. • scholars at Chicago were keenly interested in collecting facts (empirical research!) 3.8. PORTRAIT OF AN ECONOMETRICIAN, EUGENE FAMA 31 at Chicago, James Lorie and Lawrence Fisher were demonstrating what the computer could offer to economic research - 1964: Lorie and Fisher published a "bombshell" — $1000 invested in 1926 would grow to almost $30,000 in 1960, a growth of over 9% a year (log(30)/35 = .097) * Remember: 1929 was the great crash and the ensuing great depression lasted until the US entered WW II in the 40's. This was not exactly a favorable time for investing. * These findings increased the interest in stocks as long-term investments 1965: Fama published "The Behavior of Stock Market Prices" (his thesis) in Journal of Business. - a less technical version was published in 1966 as "Random Walks in Stock Market Prices" in Financial Analysts Journal. - the less technical version was reprinted in Institutional Investor. Fama's first target was "technical analysis" as practiced by so-called "chartists." - technical analysts believe that future prices can be predicted from past patterns - Charting stock prices was once fashionable * I remember as a young child my grandmother explaining to me how to chart stock prices. - Fama: "The chartist must admit that the evidence in favor of the random walk model is both consistent and voluminous, whereas there is precious little published in discussion of rigorous empirical test of various technical theories." Fama's next target was "fundamental analysis" as practiced by securities analysts. 32 CHAPTER 3. RETURNS: 3/12/01 - Fundamental analysts examine accounting data, interview management, and look at economic forecasts, interest rates, and political trends. - Selecting stocks by fundamental analysis seems to do no better than using a dartboard - Of course, good management, favorable economic trends, etc. influence the prices of assets, but Fama claimed that this information is already fully reflected in stock prices by the time we learn it — markets react instantaneously to information. - Security analysis is essential in order for stocks to be priced correctly, but ironically it means that there are few discrepancies between actual prices and the values of stocks - William Sharpe discussed the antagonism of professional investors to the random walk theories of Fama and other academics. He stated that "Interestingly, professional economists seem to think more highly of professional investors than do other professional investors." (Later we will learn more about Sharpe, the economist who developed the CAPM and winner of the Nobel Prize.) 3.9 Other empirical work related to Fama's Fama's work was preceded by that of other researchers. • In 1933 Alfred Cowles published "Can stock market forecasters forecast?" The three-word abstract stated "It is doubtful." The article appeared in the brand-new journal Econometrica. Econometrica is now the leading journal in econometrics. - Cowles analyzed the track records of: * 16 leading financial services that furnished their subscribers with selected lists of common stocks 3.9. OTHER EMPIRICAL WORK RELATED TO FAMA'S 33 * purchases and sales of stock by 20 leading fire insurance companies * 24 publications by financial services, financial weeklies, bank letters, etc. * editorials in The Wall Street Journal by William Peter Hamilton, an expounder of the "Dow Theory" due to Charles Dow (the Dow of Dow-Jones). Dow compared stock prices to tides and ocean waves; the tides were a way to explain "price momentum." - Cowles found that only 6 of 16 financial services had achieved any measure of success * even the best record could not be definitely attributed to skill rather than luck (one needs statistical analysis to reach such a conclusion) - In 1944, Cowles published a new study with basically the same conclusions. • In 1936, Holbrook Working published a paper in The Journal of the American Statistical Association on commodity prices. - These were once believed to have rhythms and trends. - Working found that he could not distinguish the price changes from an independent sequence of random changes. - Perturbed, Working took his data to professional commodity traders. * He also showed them graphs of random series. * The professionals could not distinguish the random series from real commodity prices. * of course, Working's study does not prove anything about stock returns, but it is an interesting example of a financial time series where momentum was thought to exist, but where no evidence of momemtum was found in a statistical analysis. 34 CHAPTER 3. RETURNS: 3/12/01 • Maurice Kendall published the paper "The analysis of economic time series" in the Journal of the Royal Statistical Society in 1953. - Kendall wrote "the patterns of events in the price series was much less systematic than is generally believed," and - "Investors can, perhaps, make money on the Stock Exchange, but not, apparently by watching price movements and coming in on what looks like a good thing ... But it is unlikely that anything I say or demonstrate will destroy the illusion that the outside investor can make money by playing the markets, so let us leave him to his own devices." There is no question as to whether one can make money in the stock market. Over the long haul, stocks outperform bonds which outperform savings accounts. The question is rather whether anyone can "beat the market." 3.10 Technical Analysis "A Random Walk Down Wall Street" was written by Burton G. Malkiel, a professor of economics at Princeton. It is a perennial best seller and has been revised several times. It contains much sensible advice for the small investor. This book is also quite humorous, and the discussion of technical analysts is particularly amusing (unless you are a technical analyst). Malkiel writes I, personally, have never known a successful technician, but I have seen the wrecks of several unsuccessful ones. (This is, or course, in terms of following their own technical advice. Commissions from urging customers to act on their recommendations are very lucrative.) Malkiel describes many of the technical theories, including the Dow Theory, the Filter System, and the Relative-Strength system, which advises 3.11. FUNDAMENTAL ANALYSIS 35 buying stocks that have done well recently. There is also the hemline theory which predicts price changes by the lengths of women's dresses and the super bowl indicator which says that "a victory by an NFL team predicts a bull market, whereas a victory by a former AFL team is bad news for stock-market investors." There is also the odd-lot theory. It is based on the impeccable logic that a person who is always wrong is a reliable source of information—just negate whatever that person says. The believe is that the odd-lot trader is precisely that sort of person. It turns out that the odd-lotter isn't such a dolt after all. Human nature seems to find randomness very hard to accept. For example, sports fans have many theories of streaks in athletics, e.g., the "hot hand" theory of basketball. Extensive testing of basketball players' performances have show no evidence of streaks beyond what would be expected by pure chance. The point is that streaks will occur by chance, but you cannot make money on the basis of random streaks since you cannot predict if they will continue. Why are technicians hired? Malkiel has the skeptical view that it is because their theories recommend a lot of trading. "The technicians do not help produce yachts for the customers, but they do help generate the trading that provides yachts for the brokers." 3.11 Fundamental Analysis The practitioners of fundamental analysis are called security analysts. Their job is basically to predict future earnings of companies, since it is future earnings that ultimately drive prices. Although few on Wall Street still have much faith in technical analysis, there is much faith in fundamental analysis. However, some academics studying the financial markets data have come to the conclusion that security analysts can do no better than blindfolded monkeys that throw darts at the Wall Street Journal. 36 CHAPTER 3. RETURNS: 3/12/01 3.12 Efficient Markets Hypothesis (EMH) As evidence accumulated that stock price fluctuated like random walks, economists sought a theory as to why that would be so. In 1965 Paul Samuelson published a paper "Proof that properly anticipated prices fluctuate randomly." The idea is that random walk behavior is due to the very efficiency of the market. • A market is information efficient if prices "fully reflect" available information • A market is "efficient with respect to an information set" if prices would be unchanged by revealing that information to all participants - this implies that it is impossible to make economic profits by trading on the basis of this information set • This last idea is the key to testing (empirically) the EMH. 3.12.1 Three types of efficiency weak-form efficiency the information set includes only the history of prices or returns semi-strong efficiency the information set includes all information that is publically available strong-form efficiency the information set includes all information known to any market participant Weak-form efficiency =>■ technical analysis will not make money Semistrong-form efficiency =>• fundamental analysis will not help the average investor 3.12. EFFICIENT MARKETS HYPOTHESIS (EMH) 37 3.12.2 Testing market efficiency The research of Fama, Cowles, Working, and Kendall just described tests the various forms of the EMH. Cowles's work supports the semi-strong and perhaps the strong form of the EMH. In their book Investments, Bodie, Kane, and Marcus discuss some of the issues involved when testing the EMH. One is the magnitude issue. No one believes that markets are perfectly efficient. The small inefficiencies might be important to the manager of a large portfolio. If one is managing a $5 billion portfolio, beating the market by .1% results in a $5 million increase in profit. This is clearly worth achieving. Yet, no statistical test is likely to undercover a .1% inefficiency amidst typical market fluctuations. The S&P 500 index has a 20% standard deviation in annual returns. Another issue is selection bias. If there is someone who can consistently beat the market, they probably are keeping that a secret. We can only test market efficiency by testing methods of technical or fundamental analysis that are publicized. These may be the ones that don't reveal market inefficiencies. Another problem is that for any time periods, by chance there will be some investment managers that consistently beat the the market. • if 2,000 people each toss a coin 10 times, it is likely that at least one will get 10 heads since 2,000*2"10 = 1.95. Using the Poisson approximation to the binomial, the probability that no one tosses 10 heads is exp(—1.95) = .14. If some does toss 10 heads, it would be a mistake to say that that person has skill in tossing heads. • Peter Lynch's Magellan Fund outperformed the S&P 500 in 11 of 13 years ending in 1989. Was Lynch a skilled investment manager or 38 CHAPTER 3. RETURNS: 3/12/01 just lucky? (If he really was skilled, then this is evidence against the semi-strong form of the EMH.) Campbell, Lo, and MacKinlay and Bodie, Kane, and Marcus discuss much of the empirical literature on testing the EMH and give references to the original studies. Fama was written a review article: Fama, E. (1970), "Efficent Captial Markets: A Review of Theory and Empirical Work," Journal of Finance, 25,383-417. There is a sequel as well: Fama, E., (1991), "Efficient Capital Markets: II," Journal of Finance, 46, 1575-1618. The Journal of Finance, as well as many other journals in economics and finance, are available online at JStor: http: / / www.jstor.org/cgi-bin/jstor /listjournal However, the most recent five years of these journals are not available online. Good course project: Read one or more of the studies of the EMH and prepare a report summarizing the work. The two review articles by Fama could help you find studies that would interest you. Using some new financial markets data, try to replicate some of original work. 3.13 Summary Let Pt be the price of an asset at time t. Then PtjPt-\ is the simple gross return and Rt = Pt/Pt-i — 1 is the simple net return. ("Simple" means one period.) The gross return over the last k periods is 1 + Rt{k) = Pt/Pt_k. Let Pt — log(Pi). The (one-period) log return is rt = pt — Pt-i- Rt ~ rt- 3.13. SUMMARY 39 Log returns are often models as geometric random walks. This model implies that log returns are mutually independent; one cannot predict future returns from past returns. The model also implies that Rt is lognormally distributed. Empirical research by Eugene Fama, Alfred Cowles, Holbrook Working, Maurice Kendall, and other ecomometricians supports the geometric random walk model. The geometric random walk suggest the efficient market hypothesis (EMH) that states that all valuable information is reflected in the market prices; price changes occur because of unanticipated new information. There are three forms of the EMH, the weak form, the semi-strong form, and the strong form. 40 CHAPTER 3. RETURNS: 3/12/01 Chapter 4 Univariate Time Series Models: 3/12/01 4.1 Time Series A univariate time series is a sequence of observations taking over time, for example, a sequence of daily returns on a stock. A multivariate time series is a sequence of vectors of observations taking in time, for example, the sequence of vectors of returns on a fixed set of stocks. In this chapter, we will study statistical models for univariate times series. These models are widely used in econometrics as well as in other business and OR applications. For example, time series models are routinely used to model the output of simulations. 4.2 Stationary Processes A process is stationary if its behavior is unchanged by shifts in time. More precisely Xi, X2,... is a weakly stationary process if • E(Xi) = /i (a constant) for all i • Var(Xj) = a2 (a constant) for all i 41 42 CHAPTER 4. UNIVARIATE TIME SERIES MODELS: 3/12/01 • Corr(Xj, Xj) = p(\i — j|) for all i and j Thus, the mean and variance do not change with time and the correlation between two observations depends only on the time distance between them. For example, if the process is stationary then the correlation between X2 and X5 is the same as the correlation between X7 and Xw, since each pair are separated from each other by three units of time. p is called the correlation function of the process. Note that p(h) = p(—h). Why? The covariance between Xt and Xt+h is denoted by j(h). 7(-) is called the autocovariance function. Note that ^(h) = o2p{h) and that 7(0) = a1 since p(0) = 1. 4.2.1 Weak White Noise White noise is the simplest example of a stationary process. Xi, X2,... is a WN(0,cr2) process (weak white noise process) if • E(Xi) = 0 for all i • Var(Xj) = o1 (a constant) for all i • Corr(Xi, Xj) = 0 for all i ^ j If in addition, Xi,X2... are independent normal random variables, then the process is called a Gaussian white noise process. (The normal distribution is sometimes called the Gaussian distribution.) A weak white noise process is weakly stationary with p(0) = 1 p(t) = 0 if t Ý 0- 4.3. AR(1) PROCESSES 43 Properties of Gaussian white noise E(Xi+t\Xu ...,Xi) = 0 for all t > 1. (You cannot predict the future, because the future is independent of the past and present.) To us, "white noise" will mean weak white noise, which includes Gaussian white noise as a special case. White noise (either weak or Gaussian) is uninteresting in itself but is the building block of important time series models used for economic data. 4.2.2 Estimating parameters of a stationary process Suppose we observe yi,...,yn from a stationary process. To estimate the mean /i and variance a2 of the process we use the sample mean y and sample variance s2. To estimate the autocovariance function we use n—h i(h) = n~l Yl(Vj+h - y){vj - y)- To estimate p(-) we use the sample autocorrelation function (SACF) defined as p{h) = wj- 4.3 AR(1) processes Let ei, e2,... be WH(0,of). We say that y1; y2,... is an AR(1) process if for some constant parameters ji and Vt-ß = Hvt-i -v) + et (4.1) 44 CHAPTER 4. UNIVARIATE TIME SERIES MODELS: 3/12/01 for all t. It \4>\ < 1, then yi,... is a weakly stationary process. Its mean is /i. Simple algebra shows that (4.1) can be rewritten as yt = (1 - )fj, + 4>yt-i + e. (4.2) Remember the linear regression model, yt = ß0 + ß\xt + et from your statistics courses. (4.2) is just a linear regression model with ß0 = (1 — <^)/i and ßi = ej). It it is assumed that /i = 0, then /?0 = 0 as well. Linear regression with ß0 = 0 is the "linear regression through the origin model." The term autoregression refers to the regression of the process on its own past values. When \(f)\\ < 1 (stationarity), then 1. E{yt)=n Mt 2. 2 7(0) = Var(yí) = j^ W 3. j(h)=Cov(yuyt+h) = T^- Vt. 4.3. AR(1) PROCESSES 45 4. p(h) = Con(yt, yt+h) = (ßw Ví. It is important to remember that these formulas hold only if \\ > 1, then the AR(1) process is nonstation-ary, and the mean, variance, and correlation are not constant. These formulas can be proved using (4.3). For example / oo \ oo 2 Varfo) = Var £ 4>\-h = ^ £ ^ = T^V \h=0 J /i=0 L 9 Also, for h > 0 (OO 00 i=0 j=0 Be sure to distinguish between erf which is the variance of the stationary white noise process e1; e2,... and 7(0) which is the variance of the AR(1) process y1; y2,___We can see from the result above that 7(0) is bigger than of unless <^ = 0 in which case yt = et. 4.3.2 Nonstationary AR(1) processes Random Walk If 4> = 1 then ž/t = Vt-i + e and the process is not stationary. This is the random walk process we saw in Chapter 3. It is easy to see that Vt = I/o + ei H-----et. of^l 1-02' 46 CHAPTER 4. UNIVARIATE TIME SERIES MODELS: 3/12/01 Suppose we start at the process at an arbitrary point y0. Then E(yt \y0) = y0 for all t, which is constant but depends entirely on the arbitrary starting point. Moreover, Var(yt\y0) = to\ which is not stationary but rather increases linearly with time. The increasing variance makes the random walk "wander." AR(1) processes when 0> 1 When |0 > 1, an AR(1) process has explosive behavior. This can be seen in Figure 4.1. This figure shows simulations of 200 observations from AR(1) processes with various values of 0. The explosive case where 0 = 1.02 clearly is different than the other cases where |0| < 1. However, the case where 0 = 1 is not that much different than 0 = .9 even though the former is non-stationary while the latter is stationary. The ability to distinguish the three types of AR(1) processes (stationary, random walk, and explosive) depends on the length of the observed series. For short AR(1), it is very difficult to tell if the process is stationary, random walk, or explosive. For example, in Figure 4.2, we see 30 observations from processes with the same parameter values as in Figure 4.1. If we observe the AR processes for longer than 200 observations, then the the behavior of 0 = .9 and 0 = 1 processes would not look as similar as in Figure 4.1. For example, in Figure 4.3 there are 1,000 observations from each of the processes. Now the processes with 0 = .9 and 0 = 1 look dissimilar. The stationary process 0 = .9 continues to return to its mean of zero. The random walk (0 = 1) wanders without tending to return to zero. Suppose an explosive AR(1) process starts at y0 and has ß = 0. Then yt = (f>yt-i + et = 4>(yt-2 + et-i) + et = ■ ■ ■ = et + 0et_i + 02et_2 H-------h 0^. Therefore, Varfo) = a2(l + 02 + 04 + • • • + 02i) = a2

oo. Explosive AR processes are not widely used in econometrics since economic growth is usually not explosive, though these processes may serve as good models of rabbit populations. /I _ /I ,-~ / Y :| // / I jj /( ." >-"~"-v" / Y jo o I ( ~T~ j >•_"' _•/ / "~" I Y _, I /I ;-"~ _ i / 1/ ,-"~ \ \//\/ •- \ Y /Y* 1 I ! ]\ _\ /"\ (" ~----( ~ Y. ) 48 CHAPTER 4. UNIVARIATE TIME SERIES MODELS: 3/12/01 AR(1):4> = 0.9 AR(1):4> = 0.6 50 100 150 200 50 100 150 200 AR(1):4> = 0.2 AR(1):4> = -0.9 50 100 150 200 0 50 100 150 200 AR(1): 4> = 1 AR(1):4>= 1.02 200 200 Figure 4.1: Simulations of 200 observations from AR(1) processes with various values of (j) and // = 0. The white noise "residual" or "error" process e1; e2,... is the same for all six AR(1) processes. 4.3. AR(1) PROCESSES 49 AR(1):<|> = 0.9 AR(1):<|> = 0.6 AR(1):<|> = 0.2 AR(1):<|> = -0.9 AR(1):<|» = 1 AR(1):<|> = 1.02 Figure 4.2: Simulations of 30 observation from AR(1) processes with various values of (j) and \i = 0. The white noise "residual" or "error" process ei,e2,...is the same for all six AR(1) processes. 50 CHAPTER 4. UNIVARIATE TIME SERIES MODELS: 3/12/01 AR(1):4> = 0.9 AR(1):4> = 0.6 200 400 600 800 1000 0 200 400 600 800 1000 AR(1):4> = 0.2 AR(1):4> = -0.9 200 400 600 800 1000 800 1000 AR(1): (f> = 1 AR(1):4> = 1.02 0 200 400 600 800 1000 0 200 400 600 800 1000 Figure 4.3: Simulations of 1000 observation from AR(1) processes with various values of (j) and p, = 0. The white noise "residual" or "error" process e1; e2,... is the same for all six AR(1) processes. 4.3. AR(1) PROCESSES 51 4.3.3 Estimation Depending upon the application, one will want to fit an AR(1) to either one of the variables in the raw data or a variable that has been constructed from the raw data. In finance applications, one often has the prices as the raw data but wants to fit an AR(1) to the log returns. To create the log returns, one first log-transforms the prices and then differences the log prices. MINITAB and SAS both have functions to do differencing. For example, in MINITAB, go to the "Stat" menu, then the "Time Series" menu, and then select "differences." Once a variable containing the log returns has been created, one then can fit an AR(1) model to it. Let's assume we have a time series yu..., yn and we want to fit an AR(1) model to this series. Since an AR(1) model is a linear regression model, it can be analyzed using linear regression software. One creates a lagged variable in yt and uses this as the "^-variable" in the regression. MINITAB and SAS both support lagging. For example, in MINITAB, go to the "Stat" menu, then the "Time Series" menu, and then select "lag." The least squares estimation of p and /i minimize Y, [{yt -ß}- {4>{vt-i - ß)}] ■ í=2 If errors (ei,..., en) are Gaussian white noise then the least-squares estimate is also the MLE. Moreover, both MINITAB or SAS have special procedure for fitting AR models. In MINITAB, go the the "Stat" menu, then the "Time Series" menu, and then choose ARIMA. Use 1 autoregressive parameter, 0 differencing, and 0 moving average parameters. In SAS, use the "AUTOREG" or the "ARIMA" procedure. Once (f) has been estimated, one can calculate the residuals, ?i, e2,..., en, 52 CHAPTER 4. UNIVARIATE TIME SERIES MODELS: 3/12/01 defined by et = yt - fi - 4>(yt-i -fi). The residuals estimate ei, e2,..., en and can be used to check the assumption that yi,y2,...,ynisan AR(1) process; any autocorrelation in the residuals is evidence that against the assumption of an AR(1) process. To test for residual autocorrelation one can use the "test bounds" provided by MINITAB's or SAS's autocorrelation plots. One can also use the Ljung-Box test that simultaneously tests that all autocorrelations up to a specified lag are zero. Example: GE daily returns Autoregressive models can be analyzed in both MINITAB and SAS. The MINITAB output was obtained by running MINITAB interactively. Here is the MINITAB output. 4.3. AR(1) PROCESSES 2/2/01 10:45:25 AM Welcome to Minitab, press Fl for help. Retrieving worksheet from file: C:\COURSES\OR473\MINITAB\GE_DAILY.MTW # Worksheet was saved on Wed Jan 10 2001 Results for: GE_DAILY.MTW ARIMA Model: logR ARIMA model for logR Estimates at each iteration Iteration SSE Parameters 0 2. .11832 0.100 0. .090 1 0. .12912 0.228 0. .015 2 0. .07377 0.233 0. .001 3 0. .07360 0.230 0. .000 4 0. .07360 0.230 -0. .000 5 0. .07360 0.230 -0. .000 Relative change in each estimate less than 0.0010 Final Estimates of Parameters Type Coef SE Coef T P AR 1 0.2299 0.0621 3.70 0.000 Constant -0.000031 0.001081 -0.03 0.977 Mean -0.000040 0.001403 Number of observations: 252 Residuals: SS = 0.0735911 (backforecasts excluded) MS = 0.0002944 DF = 250 Modified Box-Pierce (Ljung-Box) Chi-Square statistic Lag 12 24 36 48 Chi-Square 23.0 33.6 47.1 78.6 DF 10 22 34 46 P-Value 0.011 0.054 0.066 0.002 54 CHAPTER 4. UNIVARIATE TIME SERIES MODELS: 3/12/01 The SAS output comes from running the following program. options linesize = 72 ; comment Restrict the linesize to 72 characters ; data ge ; comment Start the data step ; infile 'c:\courses\or473\data\ge.dat' ; comment Specify the input data set ; input close ; comment Create a new variable ; D_p = dif(close); comment Take first differences ; logP = log(close) ; logR = dif(logP) ; comment logR = log returns ; run ; title 'GE - Daily prices, Dec 17, 1999 to Dec 15, 2000' ; title2 'AR(l)' ; proc autoreg ; model logR =/nlag = 1 ; run ; 4.3. AR(1) PROCESSES Here is the SAS output. GE - Daily prices, Dec 17, 1999 to Dec 15, 2000 AR(1) 10:32 Friday, February 2, 20 The AUTOREG Procedure Dependent Variable logR Ordinary Least Squares Estimates SSE 0.07762133 DFE 251 MSE 0.0003092 Root MSE 0.01759 SBC -1316.8318 AIC -1320.3612 Regress R-Square 0.0000 Total R-Square 0.0000 Durbin-Watson 1.5299 Standard Approx Variable DF Estimate Error t Value Pr > |t| Intercept 1 -0.000011 0.001108 -0.01 0.9917 Estimates of Autocorrelations Lag Covariance Correlation 0 0.000308 1.000000 1 0.000069 0.225457 Estimates of Autocorrelations Lag -198765432101234567891 0 I 1********************1 1 I I * * * * * I Preliminary MSE 0.000292 Estimates of Autoregressive Parameters Standard Lag Coefficient Error t Value 1 -0.225457 0.061617 -3.66 GE - Daily prices, Dec 17, 1999 to Dec 15, 2000 AR(1) 10:32 Friday, February 2, 20 56 CHAPTER 4. UNIVARIATE TIME SERIES MODELS: 3/12/01 The AUTOREG Procedure Yule-Walker Estimates SSE 0.07359998 DFE 250 MSE 0.0002944 Root MSE 0.01716 SBC -1324.6559 AIC -1331.7148 Regress R-Square 0.0000 Total R-Square 0.0518 Durbin-Watson 1.9326 Standard Approx Variable DF Estimate Error t Value Pr > |t| Intercept 1 -0.000040 0.001394 -0.03 0.9773 From MINITAB we see that (f> = .2299 and the estimated standard deviation of 4> is 0.0621. The t-value for testing H0 : 0 = 0 versus Hi : 4> =/ 0 is .2299/.0621 = 3.70 and the p-value is .000 (zero to three decimals). Since the p-value is so small, we reject the null hypothesis. [Note: Recall from your statistics course that small p-values are significant; we reject the null hypothesis if the p-value is less than a, e.g., less than .05.] The null hypothesis is that the log returns are white noise and the alternative is that they are correlated. Thus, we have evidence against the geometric random walk hypothesis. However, 4> = .2299 is not large. Since p{h) = 4>h, the correlation between successive log returns is .2299 and the squared correlation is only .0528 — only about five percent of the variation in a log return can be predicted by the previous days return. We have seen that an AR(1) process fits the GE log returns better than a white noise model. Of course, this is not proof that the AR(1) fits these data, only that it fits better than a white noise model. To check that the AR(1) fits well, one looks at the sample autocorrelation function (SACF) of the residuals. A plot of the residual SACF can be requested when using either MINITAB or SAS. 4.3. AR(1) PROCESSES 57 The SACF of the residuals from the GE daily log returns shows high negative autocorrelation at lag 6; p(6) is outside the test limits so is "significant" at a = .05; see Figure 4.4. This is disturbing. Qa Daily log returns ACF of Residuals for logR (with 95% confidence limits for the autocorrelations) o 4-» « 1_ O ü o 1.0 0.8 0.6 0.4 0.2 0.0 -0.2 -0.4--0.6--0.8 -1.0 JjL T^ ~r~ 10 15 _±J___L i I ~r~ 20 25 —I------ 30 Lag ~r~ 35 ..II 1 I I | 40 45 50 55 60 Figure 4.4: SACF of residuals from an AR(1) fit to the GE daily log returns. Notice the large negative residual autocorrelation at lag 6. This is a sign that the AR(1) model does not fit well. Moreover, the more conservate Ljung-Box "simultaneous" test that p(l) = ■ ■ ■ p(12) = 0 has p = .011. Since the AR(1) model does not fit well, one might consider more complex models. These will be discussed in the following sections. The SAS estimate of ó is —.2254. SAS uses the model Vt -<\>yt-\ + et 58 CHAPTER 4. UNIVARIATE TIME SERIES MODELS: 3/12/01 so SAS's as we, and MINITAB, define it. The difference, .2299 versus .2254, between MINITAB and SAS is due to slight variation in the estimation algorithm. We can also estimate /i and test that ß is zero. From the MINITAB output, we see fi is nearly zero, the t-value for testing that /i is zero is very small while the p-value is near one. Remember that small values of the p-value are significant; since the p-value is large we accept the null hypothesis that /i is zero. 4.4 AR(p) models yt is an AR(p) process if (ž/ť - ß) = 4>i{yt-i - aO + hivt-2 - m) h-------1- (f>p(yt-p -n) + et where ei,..., en is WN(0, of). This is a multiple linear regression model with lagged values of the time series as the "x-variables." The model can be reexpressed as yt = ßo + <\>Vt-\ + • • • + p)}ii. The least-squares estimator minimizes n Y. {vt - (ßo + <ßiyt-i + ■■■ + (f>Pyt-p)}2-t=p+l The least-squares estimator can be calculated using a multiple linear regression program but one must create "x-variables" by lagging the time series with lags 1 throught p. It is easier to use the ARIMA command in MINITAB or SAS or SAS's AUTOREG procedure; these procedures do the lagging automatically. 4.5. MOVING AVERAGE (MA) PROCESSES 59 4.4.1 Example: GE daily returns The SAS program shown above was rerun with model logR =/nlag = 1 replaced by model logR =/nlag = 6 . The output is on the course's web site as "GE DAILY, AR(6) (SAS)." The autoregression coefficients (the 4>i) are "significant" at lags 1 and 6 but not at lags 2 through 5. Here "significant" means at a = .05 which corresponds to absolute t-value bigger than 2. MINITAB will not allow p > 5 but SAS does not have such aconstraint. 4.5 Moving Average (MA) Processes 4.5.1 MA(1) processes The moving average process of order [MA(1)] is yt- fi = et-0et-i, where as before the e/s are WH(0, of). One can show that E{yt) = m> Var(yt) = aUl + 62), 7(1) = Oal 7(/i) = 0ii\h\ > 1, and p(h) = 0 if \h\ > 1. 60 CHAPTER 4. UNIVARIATE TIME SERIES MODELS: 3/12/01 4.5.2 General MA processes The MA(q) process is Vt — M = et — Olfy-l — ■ ■ ■ — Qqtt-q- One can show that ^{h) = 0 and p{h) = 0 if \h\ > q. 4.6 ARIMA Processes Stationary time series with complex autocorrelation behavior are better modeled by mixed autoregressive and moving average (ARMA) processes than by either a pure AR or pure MA process. ARIMA (autoregressive, integrated, moving average) processes are based on ARMA processes and are models for nonstationary time series. ARIMA processes are more easily described if we introduce the "backwards" operator, B. 4.6.1 The backwards operator The backwards operator B is defined by B yt = yt_! and, more generally, Bkyt = yt-k. Note that B c = c for any constant c since a constant does not change with time. 4.6.2 ARMA Processes The ARMA(p, q) process satisfies the equation (1-tjiB---------<ßpBp)(yt -ti) = (l-91B-...- 6qBq)et. A white noise process is ARMA(0,0). 4.6. ARIMA PROCESSES 61 4.6.3 The differencing operator The differencing operator is A = 1 — B so that Ayt = yt- Byt = yt- yt_x. Thus, differencing a time series produces a new time series consisting of the changes in the original series. For example, if pt = log(Pt) is the log price, then the log return is r t = Aft. Differencing can be iterated. For example, A2yt = A(Ayt) = A(yt - yt_x) = (yt - yt_x) - (yt_x - yt_2) = yt - 2yt_1 + yt_2 ■ 4.6.4 From ARMA processes to ARIMA process Often the first or second differences of nonstationary time series are stationary. For example, the first differences of random walk (nonstationary) are white noise (stationary). A time series yt is said to by ARIMA(p, d, q) if Adyt is ARMA(p, q). Also, if log returns (rt) on an asset are ARMA(p, q), then the log prices (pt) are ARIMA(p,l,i). The ARIMA procedures in MINITAB and SAS allow one to specify p, d, and q. Notice that an ARIMA(p, 0, q) model is the same as an ARMA(p, q) model. ARIMA((p, 0, 0), ARMA(p, 0), and AR(p) models are the same. Also, ARI-MA(0,0, q), ARMA(0, q), and MA(q) models are the same. A random walk is an ARIMA(0,1,0) model. Why? The inverse of differencing is "integrating." The integral of a process yt is the process wt where wt = wto + yto + yto+1 H-----yt. 62 CHAPTER 4. UNIVARIATE TIME SERIES MODELS: 3/12/01 where t0 is an arbitrary starting time point and wto is the starting value of the wt process. Figure 4.5 shows an AR(1), its "integral" and its "second integral," meaning the integral of its integral. 4.6. ARIMA PROCESSES 63 ARIMA(1,0,0) with \l = 0 and ty = 0.4 0 50 100 150 200 250 300 350 400 ARIMA(1,1,0) 0 50 100 150 200 250 300 350 400 ARIMA(1,2,0) 0 50 100 150 200 250 300 350 400 Figure 4.5: The top plot is of an AR(1) process with \i = 0 and = 0.4. The middle and bottom plots are, respectively, the integral and second integral of this AR(1) process. 64 CHAPTER 4. UNIVARIATE TIME SERIES MODELS: 3/12/01 4.7 Model Selection Once the parameters p, d7 and q of an ARIMA process have been selected, the AR and MA coefficients can be estimated by maximum likelihood. But how do we choose p, d7 and q? Generally, d is either 0,1, or 2 and is chosen by looking at the SACF of yt, Ayt, and A2yt. A sign that a process is nonstationary is that its SACF decays to zero very slowly. If this is true of yt then the original series is nonstationary and should be differenced at least once. If the SACF of Ayt looks stationary then we use d = 1. Otherwise, we look at the SACF of A2yt; if this looks stationary we use d = 2. I have never seen a real time series where A2yt did not look stationary, but if one were encountered then d > 2 would be used. Once d has been chosen, we know that we will fit an ARMA(p, q) process to Adytr but we still need to select p and q. This can be done by comparing various choices of p and q by some criterion that measures how well a model fits the data. 4.7.1 AIC and SBC AIC and SBC are model selection criteria based on the log-likelihood. Akaike's information criterion (AIC) is defined as -2\og{L) + 2(p + q), where L is the likelihood evaluated at the MLE. Schwarz's Bayesian Criterion (SBC) is also called the Bayesian Information 4.7. MODEL SELECTION 65 Criterion (BIC) and is defined as -21og(L) + 21og(n)(p + <7), where n is the length of the time series. The "best" model according to either criterion is the model that minimizes that criterion. Either criteria will tend to select models with large values of the likelihood; this makes perfect sense since a large value of L means that the observed data are likely under that model. The term 2(p + q) in AIC or log(n)(p + q) is a penalty on having too many parameters. Therefore, AIC and SBC both try to tradeoff a good fit to the data measured by L with the desire to use as few parameters as possible. Note that log(n) > 2 if n > 8. Since most time series are much longer than 8, SBC penalizes p + q more than AIC. Therefore, AIC will tend to choose models with more parameters than SBC. Compared to SBC, with AIC the tradeoff is more in favor of a large value of L than a small value of p + q. This difference between AIC and SBC is due to the way they were designed. AIC is designed to select the model that will predict best and is less concerned with having a few too many parameters. SBC is designed to select the true values of p and q exactly. In practice the best AIC model is usually close to the best SBC model; often they are the same model. Two model can be compared by likelihood ratio testing when one model is "bigger" than the other. Therefore, AIC and SBC are closely connected with likelihood ratio tests. 66 CHAPTER 4. UNIVARIATE TIME SERIES MODELS: 3/12/01 4.7.2 Stepwise regression applied to AR processes Stepwise regression is a way of looking at a variety of regression models to see which ones fit the data well. You may encounter stepwise regression if you take an advanced regression course. In backwards regression, sometimes called backstepping, one starts with all possible x-variables and eliminates them one at time until all remaining variables are "significant" by some criterion. Stepwise regression can, of course, be applied to AR models since these are a type of multiple regression model. SAS's AUTOREG procedure allows backstepping as an option. 4.7. MODEL SELECTION 67 The following SAS program starts with an AR(6) model and backsteps. options linesize = 72 ; comment Restrict the linesize to 72 characters ; data ge ; comment Start the data step ; infile 'c:\courses\or473\data\ge_quart.dat' ; comment Specify the input data set ; input close ; D_p = dif(close); comment Take first differences ; logP = log(close) ; logR = dif(logP) ; comment logR = log returns ; run ; title 'GE - Quarterly closing prices, Dec 1900 to Dec 2000' ; title2 'AR(6) with backstepping' ; proc autoreg ; model logR =/nlag = 6 backstep ; run ; 68 CHAPTER 4. UNIVARIATE TIME SERIES MODELS: 3/12/01 Here is the SAS output: GE - Quarterly closing prices, Dec 1900 to Dec 2000 1 AR(6) with backstepping 23:32 Tuesday, January 30, 2001 The AUTOREG Procedure Dependent Variable logR Ordinary Least Squares Estimates SSE MSE SBC Regress R-Square Durbin-Watson 0.15125546 0.00398 -102.20076 0.0000 2.0710 D FE Root MSE AIC Total R-Square 0.06309 -103.86432 0.0000 Variable Intercept DF 1 Estimate 0.0627 Standard Approx Error t Value Pr > |t| 0.0101 6.21 <.0001 Estimates of Autocorrelations g Covariance Correlation 0 0.00388 1.000000 i -0.00014 -0.036627 2 -0.00023 -0.059114 3 0.00152 0.392878 4 -0.00014 -0.035792 5 -0.00075 -0.193269 6 0.000337 0.086919 Lag Estimates of Autocorrelations -198765432101234567 ******************** ******** GE - Quarterly closing prices, Dec 1900 to Dec 2000 2 4.7. MODEL SELECTION AR(6) with backstepping 23:32 Tuesday, January 30, 2 The AUTOREG Procedure Backward Elimination of Autoregressive Terms Lag 4 2 1 6 5 Estimate t Value Pr > |t| 0.020648 0.12 0.9058 0.023292 -0.14 0.8921 0.035577 0.23 0.8226 0.082465 0.50 0.6215 0.170641 1.13 0.2655 Preliminary MSE 0.0032E Estimates of Autoregressive Parameters Standard Lag Coefficient Error t Value 3 -0.392878 0.151180 -2.60 Expected Autocorrelations Lag Autocorr 0 1.0000 1 0.0000 2 0.0000 3 0.3929 Yule-Walker Estimates SSE 0.12476731 DFE 37 MSE 0.00337 Root MSE 0.05807 SBC -105.5425 AIC -108.86962 Regress R-Square 0.0000 Total R-Square 0.1751 Durbin-Watson 1.982 0 GE - Quarterly closing prices, Dec 1900 to Dec 2000 AR(6) with backstepping 23:32 Tuesday, January 30, 2 70 CHAPTER 4. UNIVARIATE TIME SERIES MODELS: 3/12/01 The AUTOREG Procedure Standard Approx Variable DF Estimate Error t Value Pr > |t| Intercept 1 0.0632 Exp 0.0146 ected 4.33 0.0001 Autocorrelations Lag Autocorr 0 1.0000 1 0.0000 2 0.0000 3 0.3929 4.7.3 Using ARIMA in SAS: Cree data Daily returns of Cree from December 1999 to December 2000 are shown in Figure 4.6. 4.7. MODEL SELECTION 71 CREE, daily - 12/17/99 to 12/15/00 0Ĺ 0 50 100 150 200 250 -0.2 250 250 0.999 0.997 0.99 0.98 0.95 0.90 ž- 0.75 ra 0.50 a. 0.25 0.10 0.05 0.02 0.01 0.003 0.001 Normal plot of log returns ' ' .'■<+ '■■ ■++'■ .........-■--•,£■.-- #'+ -10 0 log return 10 100 150 200 250 Figure 4.6: Cree daily returns. The first plot is the prices. The second and third are the net returns and the log returns. The fourth plot is a normal probability plot of the log returns. The final plot is of the absolute log returns; there is a scatterplot smooth to help show lohether the volatility is constant. 72 CHAPTER 4. UNIVARIATE TIME SERIES MODELS: 3/12/01 In this example, we will illustrate fitting an ARMA model in SAS. We use daily log returns on Cree from December 1999 to December 2000. The SAS program is: options linesize = 72 ; data cree ; infile 'U:\courses\473\data\cree_daily.dat ' ; input month day year volume high low close ; logP = log(close) ; logR = dif(logP) ; run ; title Cree daily log returns ; title2 ARMA(1,1) ; proc arima ; identify var=logR ; estimate p=l q=l ; run ; 4.7. MODEL SELECTION 73 The "identify" statement specifies the input series and tells SAS to compute the SACK It can also be used to specify the amount of differencing; "identify var=logP(l) ;" would tell SAS to use the first differences of the log prices as input. Here is the SAS output. The result is that the Cree log returns appear to be white noise since i (denoted by AR1,1 in SAS), Q\ (denoted by MA1,1) and /i not significantly different from zero. Cree daily log returns 1 AEMA(1,1) 15:18 Friday, February 2, 2001 The ARIMA Procedure Name of Variable = logR Mean of Working Series -0.00071 Standard Deviation 0.067473 Number of Observations 2 52 Autocorrelations Lag Covariance Corre .ation 0 0.0045526 1 00000 1 0.00031398 0 06897 2 -0.0000160 - 00351 3 -5.5958E-6 - 00123 4 -0.0002213 - 04862 5 0.00002748 0 00604 6 -0.0000779 - 01712 7 -0.0000207 - 00454 8 -0.0003281 - 07207 9 0.00015664 0 03441 10 0.00057077 0 12537 11 0.00023632 0 05191 12 -0.0003475 - 07633 13 -0.0001348 - 02961 14 -0.0005590 - 12278 15 0.00023425 0 05145 16 -0.0001021 - 02242 17 -0.0000582 - 01278 18 -0.0007147 - 15699 19 0.00006314 0 01387 20 -0.0000466 - 01024 21 -0.0001681 - 03692 22 -0.0001439 - 03161 23 -0.0002135 _ 04690 74 CHAPTER 4. UNIVARIATE TIME SERIES MODELS: 3/12/01 24 0.00007502 0.01648 | . | "." marks two standard errors Lag Cree daily log returns ARMA(1,1) 15:18 Friday, February 2, The ARIMA Procedure Inverse Autocorrelations Correlation -198765432101234567891 2 2001 1 -0 11452 | 2 0 06356 | 3 -0 08905 | 4 0 12788 | 5 -0 04576 | 6 0 07209 | 7 -0 06322 | 8 0 09828 | 9 -0 04639 | 10 -0 05006 | 11 -0 09283 | 12 0 10049 | 13 -0 02141 | 14 0 15284 | 15 -0 09318 | 16 0 05864 | 17 -0 02983 | 18 0 16300 | 19 -0 05602 | 20 0 05126 | 21 0 01713 | 22 0 04942 | 23 0 00197 | 24 -0 01745 | Partial Autocorrelations Lag Correlation 765432101234567 1 0.06897 | 2 -0.00830 | 3 -0.00041 | 4 -0.04877 | 5 0.01287 | 6 -0.01916 | 7 -0.00183 | 8 -0.07486 1 4.7. MODEL SELECTION 9 0.04628 1 0 0.11841 1 1 0.03697 1 2 -0.09207 1 3 -0.01457 1 4 -0.11485 1 Cree daily log returns AEMA(1,1) 15:18 Friday, February 2, 2 0 The ARIMA Procedure Partial Autocorrelations Lag Correlation 765432101234567 15 0.07540 | 16 -0.04385 | 17 0.00180 | 18 -0.16594 | 19 0.05041 | 20 -0.06240 | 21 -0.02732 | 22 -0.05643 | 23 0.00111 | 24 0.01957 | Autocorrelation Check for White Noise To Chi- Pr > Lag Square DF ChiSq T\ ~\ "\ i~ r^r^ r^~\^~\^r^~\ ~i 1- ~\ r^i~\ ^-| riU.ULJ^LJJ.J.tSXclUX LJllo 6 1.91 6 0.9276 0 069 -0 004 -0.001 -0.049 0 006 -0 017 12 10.02 12 0.6143 -0 005 -0 072 0.034 0.125 0 052 -0 076 18 21.95 18 0.2344 -0 030 -0 123 0.051 -0.022 -0 013 -0 157 24 23.37 24 0.4978 0 014 -0 010 -0.037 -0.032 -0 047 0 016 Conditional Least Squares Estimation Parameter Estimate Error t va: .ue Pr > |t| Lag MU -0.0006814 0.0045317 -0 15 0.8806 0 MA1,1 -0.18767 0.88710 -0 21 0.8326 1 AR1,1 -0.11768 0.89670 -0 13 0.8957 1 Constant Estimate -0.00076 Variance Estimate 0.004585 Std Error Estimate 0.067712 AIC -638.889 76 CHAPTER 4. UNIVARIATE TIME SERIES MODELS: 3/12/01 SBC -628.301 Number of Residuals 252 * AIC and SBC do not include log determinant. Cree daily log returns 4 ARMA(1,1) 15:18 Friday, February 2, 2001 The ARIMA Procedure Correlations of Parameter Estimates Parameter MU MA1,1 AR1,1 1.000 0.005 0.006 1 0.005 1.000 0.998 1 0.006 0.998 1.000 Autocorrelation Check of Residuals 6 0. .75 4 0. .9444 0. .000 0. .004 0. .001 -0. .049 0. .010 -0. .019 12 8. .54 10 0. .5761 0. .003 -0. .075 0. .032 0. .118 0. .050 -0. .079 18 21 .12 16 0. .1741 -0. .014 -0. .127 0. .062 -0. .029 0. .001 -0. .159 24 22 .48 22 0. .4314 0. .025 -0. .011 -0. .035 -0. .026 -0. .045 0. .016 30 32. .65 28 0. .2490 0. .054 0. .127 0. .102 -0. .023 -0. .029 0. .070 36 38. .16 34 0. .2858 -0. .055 -0. .038 -0. .026 -0. .079 0. .021 0. .083 42 47. .23 40 0. .2009 -0. .061 -0. .092 -0. .004 -0. .028 -0. .118 -0. .055 48 49. .15 46 0. .3480 -0. .032 -0. .011 -0. .004 0. .027 0. .054 -0. .036 Model for variable logR Estimated Mean -0.00068 Autoregressive Factors Factor 1: 1 + 0.11768 B**(l) Moving Average Factors Factor 1: 1 + 0.18767 B**(l) 4.8. EXAMPLE: THREE-MONTH TREASURY BILL RATES 77 4.8 Example: Three-month Treasury bill rates The efficient market hypothesis predicts that log returns will be white noise, and our empirical results are that log returns have little autocorrelation even if they are not exactly white noise. Other financial time series do have substantial autocorrelation, as is shown in this example. The time series in this example is monthly interest rates on three-month US Treasury bills from December 1950 until February 1996. The data come from Example 16.1 of Pindyck and Rubin (1998), Econometric Models and Economic Forecasts. The rates are plotted in Figure 4.7. The first differences look somewhat stationary, and we will fit ARMA models to the first differences. 78 CHAPTER 4. UNIVARIATE TIME SERIES MODELS: 3/12/01 3 month T—bills 0 200 400 600 month since Jan 1950 0 200 400 600 month since Jan 1950 Figure 4.7: Time series plot of 3 month Treasury bill rates, plot of first differences, and sample autocorrelation function of first differences. Monthly values from January 1950 until March 1996. 4.8. EXAMPLE: THREE-MONTH TREASURY BILL RATES 79 First we try fitting an AR(10) model with ARIMA. Here is the SAS program. Note statement "identify var=z(l);" specifies that the model should be fit to the first differences of the variance z; z is the interest rate. options linesize = 72 ; data ratel ; infile 'c:\courses\or473\data\fygn.dat' ; input date $ z; title 'Three month treasury bills' ; title2 'ARIMA model - to first differences' ; proc arima ; identify var=z(l) ; estimate p=10 plot; run ; Here is the SAS output. Three month treasury bills 1 ARIMA model - to first differences 14:41 Saturday, February 3, 2001 The ARIMA Procedure Name of Variable = z Period(s) of Differencing 1 Mean of Working Series 0.006986 Standard Deviation 0.494103 Number of Observations 554 Observation(s) eliminated by differencing 1 Autocorrelations Lag Covariance Correlation -198765432101234567891 0 0.244138 1.00000 | | ********************\ 1 0.067690 0.27726 | . |****** | 2 -0.026212 -.10736 | **| . | 80 CHAPTER 4. UNIVARIATE TIME SERIES MODELS: 3/12/01 3 -0.022360 - 09159 | 4 -0.0091143 - 03733 | 5 0.011399 0 04669 | 6 -0.045339 - 18571 | 7 -0.047987 - 19656 | 8 0.022734 0 09312 | 9 0.047441 0 19432 | 10 0.014282 0 05850 | 11 -0.0017082 - 00700 | 12 -0.022600 - 09257 | 13 0.0087638 0 03590 | 14 0.038426 0 15739 | 15 -0.024885 - 10193 | 16 0.0012018 0 00492 | 17 0.020048 0 08212 | 18 0.019043 0 07800 | 19 -0.0081609 - 03343 | 20 -0.056547 - 23162 | 21 -0.038945 - 15952 | 22 -0.0035774 - 01465 | 23 -0.0018465 - 00756 | 24 -0.0080554 - 03300 | * * * * * * * * * * * * * * * * * * * * marks two standard errors Lag Correlation 1 -0 38226 | 2 0 17388 | 3 -0 03944 | 4 0 09813 | 5 -0 15403 | 6 0 16052 | 7 0 03458 | 8 -0 07833 | 9 -0 01029 | 0 -0 01264 | 1 -0 07557 | 2 -0 00166 | 3 0 12786 | 4 -0 22060 | 5 0 19060 | 6 -0 10958 1 Three month treasury bills 2 ARIMA model - to first differences 14:41 Saturday, February 3, 2001 The ARIMA Procedure Inverse Autocorrelations 8765432101234567891 ******** 4.8. EXAMPLE: THREE-MONTH TREASURY BILL RATES 81 17 0.03736 | 18 -0.05356 | 19 0.07262 | 20 0.03663 | 21 0.03580 | 22 0.02890 | 23 0.00507 | 24 0.00765 | Lag Partial Autocorrelations Correlation -198765432101234567 1 0 27726 | 2 -0 19958 | 3 -0 00061 | 4 -0 03172 | 5 0 05661 | 6 -0 25850 | 7 -0 05221 | 8 0 14071 | 9 0 08439 | 0 -0 04699 | 1 0 06148 | 2 -0 11389 | 3 0 05561 | 4 0 13716 | * * * * * Three month treasury bills ARIMA model - to first differences 14:41 Saturday, February 3, 2 0 01 The ARIMA Procedure Partial Autocorrelations ag Correlation 15 -0.13273 | 16 0.15741 | 17 0.02301 | 18 0.01777 | 19 -0.13330 | 20 -0.08447 | 21 -0.07718 | 22 -0.04553 | 23 -0.01479 | 24 -0.01071 1 765432101234567 * * * Autocorrelation Check for White Noise 82 CHAPTER 4. UNIVARIATE TIME SERIES MODELS: 3/12/01 To Chi- Pr > Lag Square DF ChiSq T\ ~\ "\ i~ r^r^ r^~\^~\^r^~\ ~i 1- ~\ r^i~\ ^-| ŕiU. ULJL, LJ J. J. ti X d L, J. LJllo 6 75.33 6 <.0001 0 .277 -0. .107 -0.092 -0.037 0. .047 -0. .186 12 130.15 12 <.0001 -0. .197 0. .093 0.194 0.059 -0. .007 -0. .093 18 158.33 18 <.0001 0. .036 0. .157 -0.102 0.005 0. .082 0. .078 24 205.42 24 <.0001 -0. .033 -0. .232 -0.160 -0.015 -0. .008 -0. .033 Conditional Least Squares Estimation Parameter MU AR1, ,1 AR1, ,2 AR1, ,3 AR1, ,4 AR1, ,5 AR1, ,6 AR1, ,7 AR1, ,8 AR1, ,9 AR1, ,10 Estimate 0.0071463 0.33494 -0.16456 0.01712 -0.10901 0.14252 -0.21560 -0.08347 0.10382 0.10007 -0.04723 Standard Approx Error t Value Pr > |t| 0.02056 0. .35 0.7283 0.04287 7 .81 <.0001 0.04501 -3. .66 0.0003 0.04535 0. .38 0.7060 0.04522 -2. .41 0.0163 0.04451 3. .20 0.0014 0.04451 -4 .84 <.0001 0.04522 -1 .85 0.0655 0.04536 2. .29 0.0225 0.04502 2. .22 0.0267 0.04290 -1 .10 0.2714 Lag 0 1 2 3 4 5 6 7 10 Constant Estimate 0.006585 Variance Estimate 0.198648 Std Error Estimate 0.445699 Three month treasury bills ARIMA model - to first differences 14:41 Saturday, February 3, 2001 The ARIMA Procedure AIC 687.6855 SBC 735.1743 Number of Residuals 554 AIC and SBC do not include log determinant. Correlations of Parameter Estimates Parameter MU AR1,1 AR1,2 AR1,3 AR1,4 AR1,5 MU 1. .000 0.001 -0.000 -0.001 -0.001 -0.000 AR1,1 0. .001 1.000 -0.315 0.160 -0.020 0.095 AR1,2 -0. .000 -0.315 1.000 -0.357 0.166 -0.033 AR1,3 -0. .001 0.160 -0.357 1.000 -0.350 0.204 AR1,4 -0. .001 -0.020 0.166 -0.350 1.000 -0.375 AR1,5 -0. .000 0.095 -0.033 0.204 -0.375 1.000 4.8. EXAMPLE: THREE-MONTH TREASURY BILL RATES 83 AR1,6 -0. .001 -0. .131 0. .122 -0. .068 0.218 -0.367 AR1,7 -0. .001 0. .200 -0. .178 0. .161 -0.078 0.218 AR1,8 -0. .001 0. .080 0. .163 -0. .166 0.161 -0.068 AR1,9 -0. .001 -0. .106 0. .123 0. .163 -0.178 0.122 AR1,10 -0. .003 -0. .085 -0. .106 0. .080 0.200 -0.131 Correlations of Parameter Estimates Parameter AR1,6 AR1,7 AR1,8 AR1,9 AR1,10 MU -0. .001 -0.001 -0.001 -0. .001 -0.003 AR1,1 -0. .131 0.200 0.080 -0. .106 -0.085 AR1,2 0. .122 -0.178 0.163 0. .123 -0.106 AR1,3 -0. .068 0.161 -0.166 0. .163 0.080 AR1,4 0. .218 -0.078 0.161 -0. .178 0.200 AR1,5 -0. .367 0.218 -0.068 0. .122 -0.131 AR1,6 1. .000 -0.375 0.2 04 -0. .033 0.096 AR1,7 -0. .375 1.000 -0.350 0. .166 -0.020 AR1,8 0. .204 -0.350 1.000 -0. .357 0.161 AR1,9 -0. .033 0.166 -0.357 1. .000 -0.315 AR1,10 0. .096 -0.020 0.161 -0. .315 1.000 Three month treasury bills 5 ARIMA model - to first differences 14:41 Saturday, February 3, 2001 The ARIMA Procedure Autocorrelation Check of Residuals To Chi- Pr > Lag Square DF ChiSq ---Autocorrelatior 6 0.00 0 <.0001 0. .003 -0. .011 0. .003 0.021 -0. .015 -0. .031 12 9.56 2 0.0084 0. .036 -0. .001 -0. .031 0.018 0. .105 -0. .040 18 42.72 8 <.0001 -0. .076 0. .177 -0. .115 0.081 0. .019 0. .025 24 62.06 14 <.0001 -0. .062 -0. .149 -0. .078 -0.025 -0. .024 -0. .013 30 65.76 20 <.0001 0. .002 0. .008 0. .045 0.048 -0. .043 -0. .007 36 73.52 26 <.0001 -0. .070 -0. .004 -0. .051 -0.003 -0. .053 -0. .052 42 74.14 32 <.0001 -0. .007 0. .028 -0. .007 -0.005 0. .010 0. .006 48 82.20 38 <.0001 -0. .011 -0. .000 -0. .006 0.001 -0. .103 0. .050 Autocorrelation Plot of Residuals Lag Covariance Correlation -198765432101234567891 0 0.198648 1.00000 | | ********************| 1 0.00057812 0.00291 | • I • I 2 -0.0020959 -.01055 | • I • I 3 0.00068451 0.00345 | • I • I 84 CHAPTER 4. UNIVARIATE TIME SERIES MODELS: 3/12/01 4 0.0041792 0 02104 5 -0.0030362 - 01528 6 -0.0061377 - 03090 7 0.0071315 0 03590 8 -0.0001693 - 00085 9 -0.0061781 - 03110 10 0.0036055 0 01815 11 0.020788 0 10465 12 -0.0078818 - 03968 13 -0.015171 - 07637 14 0.035240 0 17740 15 -0.022934 - 11545 16 0.016000 0 08054 17 0.0037288 0 01877 18 0.0049781 0 02506 19 -0.012221 - 06152 20 -0.029590 - 14896 21 -0.015566 - 07836 22 -0.0050098 - 02522 23 -0.0048445 - 02439 24 -0.0026174 - 01318 "." marks two standard errors Three month treasury bills 6 ARIMA model - to first differences 14:41 Saturday, February 3, 2001 The ARIMA Procedure Inverse Autocorrelations Lag Correlation -198765432101234567891 1 -0 04462 2 0 02988 3 0 02921 4 -0 04817 5 0 00308 6 0 02072 7 -0 02134 8 -0 01272 9 0 01308 10 -0 02753 11 -0 10241 12 0 03617 13 0 06350 14 -0 16306 15 0 12298 16 -0 08990 17 -0 02141 18 -0 00130 19 0 04419 4.8. EXAMPLE: THREE-MONTH TREASURY BILL RATES 20 0.11901 | 21 0.08929 | 22 0.02613 | 23 0.00628 | 24 0.00879 | * * * * Lag Lag Partial Autocorrelations Correlation -198765432101234567 1 0 00291 | 2 -0 01056 | 3 0 00351 | 4 0 02091 | 5 -0 01534 | 6 -0 03040 | 7 0 03569 | 8 -0 00204 | 9 -0 02966 | 0 0 01926 | 1 0 10200 | 2 -0 04035 | 3 -0 07248 | 4 0 17834 | Three month treasury bills ARIMA model - to first differences 14:41 Saturday, February 3, The ARIMA Procedure Partial Autocorrelations Correlation -19876543210123456789] 15 -0.13109 | 16 0.09936 | 17 0.02268 | 18 0.00293 | 19 -0.05597 | 20 -0.13881 | 21 -0.10044 | 22 -0.02905 | 23 -0.00750 | 24 -0.00979 | * * * * * Model for variable z Estimated Mean 0.007146 Period(s) of Differencing 1 86 CHAPTER 4. UNIVARIATE TIME SERIES MODELS: 3/12/01 Autoregressive Factors Factor 1: 1 - 0.33494 B**(l) + 0.16456 B**(2) - 0.01712 B**(3) + 0.10901 B**(4) - 0.14252 B**(5) + 0.2156 B**(6) + 0.08347 B**(7) - 0.10382 B**(8) - 0.10007 B**(9) + 0.04723 B**(10) 4.8. EXAMPLE: THREE-MONTH TREASURY BILL RATES 87 The AR(10) model does not fit well. Next we try an AR(24) model with backfitting. Here is the SAS program: options linesize = 72 ; data ratel ; infile 'c:\courses\or473\data\fygn.dat' ; input date $ z; zdif=dif (z) ; title 'Three month treasury bills' ; title2 'AR(24) model to first differences with backfitting' ; proc autoreg ; model zdif= / nlag=24 backstep; run ; Here is the output. Three month treasury bills 1 AR(24) model to first differences with backfitting 10:32 Wednesday, February 14, 2001 The AUTOREG Procedure Dependent Variable zdif Ordinary Least Squares Estimates SSE 135.25253 D FE 553 MSE 0.24458 Root MSE 0.49455 SBC 797.34939 AIC 793.032225 Regress R-Square 0.0000 Total R-Square 0.0000 Durbin-Watson 1.4454 Standard Approx Variable DF Estimate Error t Value Pr > |t| Intercept 1 0.006986 0.0210 0.33 0.7397 Estimates of Autocorrelations Lag Covariance Correlation CHAPTER 4. UNIVARIATE TIME SERIES MODELS: 3/12/01 0 0.2441 1 000000 1 0.0677 0 277260 2 -0.0262 -0 107364 3 -0.0224 -0 091587 4 -0.00911 -0 037332 5 0.0114 0 046690 6 -0.0453 -0 185710 7 -0.0480 -0 196558 8 0.0227 0 093118 9 0.0474 0 194318 Estimates of Autocorrelations Lag 765432101234567 * * * * * * * * ******************** ****** * * I * * * * Three month treasury bills 2 AR(24) model to first differences with backfitting 10:32 Wednesday, February 14, 2001 The AUTOREG Procedure Estimates of Autocorrelations ag Covariance Correlation 10 0.0143 0 058501 11 -0.00171 -0 006997 12 -0.0226 -0 092572 13 0.00876 0 035897 14 0.0384 0 157393 15 -0.0249 -0 101930 16 0.00120 0 004923 17 0.0200 0 082117 18 0.0190 0 078001 19 -0.00816 -0 033427 20 -0.0565 -0 231618 21 -0.0389 -0 159520 22 -0.00358 -0 014653 23 -0.00185 -0 007563 24 -0.00806 -0 032995 4.8. EXAMPLE: THREE-MONTH TREASURY BILL RATES 89 Lag Estimates of Autocorrelations 98765432101234567 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 * * * * * * * * * * * * Three month treasury bills AR(24) model to first differences with backfitting 10:32 Wednesday, February 14, The AUTOREG Procedure 2001 Backward Elimination of Autoregressive Terms Lag 10 23 17 3 24 13 7 18 22 20 4 Estimate t Value 0 007567 0 16 0 010212 0 22 0 008951 0 19 0 014390 -0 32 0 015798 0 40 0 041434 0 92 0 038880 0 85 0 037456 -0 90 0 042555 1 02 0 058230 1 31 0 059903 1 48 0 058141 -1 42 Pr 111 8721 8241 8492 7496 6907 3605 3964 3702 3090 1912 1389 1562 Preliminary MSE 0.1765 Estimates of Autoregressive Parameters Standard 90 CHAPTER 4. UNIVARIATE TIME SERIES MODELS: 3/12/01 .ag Coefficient Error t Value 1 -0. .388246 0. .040419 -9.61 2 0. .200242 0. .040438 4.95 5 -0. .108069 0. .040513 -2.67 6 0. .249095 0. .039719 6.27 8 -0. .103462 0. .039668 -2.61 11 -0. .102896 0. .040278 -2.55 12 0. .119950 0. .040704 2.95 14 -0. .204702 0. .040427 -5.06 15 0. .223381 0. .042441 5.26 16 -0. .151917 0. .040811 -3.72 19 0. .103356 0. .038847 2.66 21 0. .108074 0. .039511 2.74 Three month treasury bills 4 AR(24) model to first differences with backfitting 10:32 Wednesday, February 14, 2001 The AUTOREG Procedure Expected Autocorrelations Lag Autocorr 0 1.0000 1 0.2840 2 -0.1196 3 -0.0801 4 0.0273 5 0.0656 6 -0.1914 7 -0.1923 8 0.0880 9 0.1549 10 0.0223 11 -0.0229 12 -0.0737 13 0.0767 14 0.1628 15 -0.1000 16 -0.0017 17 0.0685 18 0.0437 19 -0.0638 20 -0.1968 21 -0.1296 Yule-Walker Estimates 4.8. EXAMPLE: THREE-MONTH TREASURY BILL RATES SSE 97.7597462 DFE 541 MSE 0.18070 Root MSE 0.42509 SBC 695.767655 AIC 639.644514 Regress R-Square 0.0000 Total R-Square 0.2772 Durbin-Watson 2.0627 Standard Approx Variable DF Estimate Error t Value Pr > |t| Intercept 1 0.006664 0.0192 0.35 0.7289 Three month treasury bills AR(24) model to first differences with backfitting 10:32 Wednesday, February 14, 20 The AUTOREG Procedure Expected Autocorrelations Lag Autocorr 0 1.0000 1 0.2840 2 -0.1196 3 -0.0801 4 0.0273 5 0.0656 6 -0.1914 7 -0.1923 8 0.0880 9 0.1549 10 0.0223 11 -0.0229 12 -0.0737 13 0.0767 14 0.1628 15 -0.1000 16 -0.0017 17 0.0685 18 0.0437 19 -0.0638 20 -0.1968 21 -0.1296 92 CHAPTER 4. UNIVARIATE TIME SERIES MODELS: 3/12/01 4.9 Forecasting ARIMA models are often used to forecast future values of a time series. Consider first an AR(1) process. Suppose that we have data yu...,yn and estimates ß and 4>. Then we estimate yn+i by yn+1 := ß + 4>{yn - ß) andyt+2by yn+2 := ß + (f>(ýn+i - ß) = (f>{(yn - ß)}, etc. In general, yn+k = ß+cßk(yn — ß). If 4> < 1 as is expected for a stationary series, then as k increases the forecasts will decay exponentially fast to ß. Forecasting general AR(p) processes is similar. For example, for an AR(2) process yn+i := ß + ii(yn - ß) + (f>2(yn-i ~ ß) and yn+2 := ß + <Í>i{yn+i - ß) + fo{yn - A)- Forecasting ARMA and ARIMA processes is only slightly more complicated than forecasting AR processes and is discussed in time series courses such as ORIE 563. Moreover, the forecasts can be generated automatically by MINITAB and SAS, so you don't need to know the details in order to forecast. 4.9.1 GE daily returns We have learned that fitting an ARIMA(1,0,0) model to log returns is equivalent to fitting an ARIMA(1,1,0) model to the log prices. Here we will fit both models to the GE daily price data. Figure 4.8 shows the forecasts of the log returns up to 24 days ahead. The forecasts are given in red and 95% confidence limits on the forecasts are show in blue. The observed time series is plotted in black. 4.9. FORECASTING 93 Time Series Plot for logR (with forecasts and their 95% confidence limits) T U T T » n I 1,1 , Hl i f" if Ii|Ml 1 J L_ p}! |™Im 1 ] ľ 1 1 111 ni i 1 M Ji '[_____ 20 40 60 80 100 120 140 160 180 200 220 240 Time Figure 4.8: Time series plot of the daily GE log returns with forecasts from an AR(1) model. Next we fit an ARIMA(l,l/0) model to the log prices. Although this model is equivalent to the last model, it generates forecasts of the log prices, not the log returns. (MINITAB always forecasts the input series.) The forecasts are given in Figure 4.9. Notice that the forecasts predict that the price of GE will stay constant, but the confidences limits on the forecasts get wider as we forecast further ahead. This is exactly the type of behavior we would expect from a random walk [ARIMA(0,1,0)] model. The ARIMA(1,1,0) model for the log prices isn't quite a random walk model, but it is similar to a random walk model with zero drift {ji = 0) since § is close to 0 94 CHAPTER 4. UNIVARIATE TIME SERIES MODELS: 3/12/01 Time Series Plot for logP (with forecasts and their 95% confidence limits) 4.1 4.0 0. 3.8 3.7 20 40 60 80 100 120 140 160 180 200 220 240 Time Figure 4.9: Time series plot of the daily GE log prices with forecasts from an AR(1) model. and ß is extremely close to 0. The forecast limits suggest that accurately forecasting future GE stock prices is pretty hopeless. For practical purposes the log prices behave like a random walk so that the prices behave like a geometric random walk. Chapter 5 Portfolio Selection: 3/12/01 5.1 Trading off expected return and risk How should we invest our wealth? There are two principles: • we want to maximize the expected return • we want to minimize the risk = variance of return These goals are somewhat at odds. Nonetheless, there are optimal compromises between expected return and risk. In this chapter we will see how to maximize expected return subject to an upper bound on the risk, or to minimize the risk subject to a lower bound on the expected return. The key concept that we will discuss is reduction of risk by diversifying the portfolio of assets held. Diversification was not always considered as favorably as it is now. The investment philosophy of Keynes The famous economist, John Maynard Keynes, did not believe in diversifying a portfolio. He wrote: ... the management of stock exchange investment of any kind is a low pursuit... from which it is a good thing for most members of society to be free 95 96 CHAPTER 5. PORTFOLIO SELECTION: 3/12/01 I am in favor of having as large a unit as market conditions will allow ... to suppose that safety-first consists in having a small gamble in a large number of different [companies] where I have no information to reach a good judgement, as compared with a substantial stake in a company where ones's information is adequate, strikes me as a travesty of investment policy This quote is taken from Bernstein, Capital Ideas: The Improbable Origins of Modern Wall Street. Keynes is advocating stock picking or "fundamental analysis." But the semi-strong version of the EMH says that fundamental analysis does not lead to economic profit. Of course, Keynes lived well before the EMH and one wonders what Keynes with think about diversification if he were alive now. Modern portfolio theory takes a very different viewpoint than Keynes. This is not to say that Keynes was wrong. Keynes was investing on a long time horizon, and fundamental analysis, if done well, might be very successful in the long run. However, portfolio managers are judged on short-term successes. Also, using fundamental analysis to find bargains is probably more difficult now than in Keynes's time. 5.2 One risky asset and one risk-free asset We will start with a simple example where we have • one risky asset, which could be a portfolio, e.g., a mutual fund - expected return is .15 - standard deviation of the return is .25 • one risk-free asset, e.g., a 30-day T-bill - expected value of the return is .06 - standard deviation of the return is 0 by definition of "risk-free." 5.2. ONE RISKY ASSET AND ONE RISK-FREE ASSET 97 We are faced with the problem of constructing an investment portfolio that we will hold for one time period which could be an hour, a day, a month, a quarter, a year, ten years, etc. At the end of the time period we might want to readjust the portfolio, so for now we are only looking at returns over one time period. Suppose that • a fraction w of our wealth is invested in the risky asset • the remaining fraction 1 — w is invested in the risk-free asset • then the expected return is E(R) = w(.15) + (l — iu)(.06) = .06 + .09«;. • the variance of the return is 4 = w2 (.25)2 + (1 - wf (0)2 = w2(.25)2. or aR = .25 w. Would w > 1 make any sense? 0.2 0.18 0.16 0.14 0.12 £ 0.1 LU 0.08 0.06 0.04 0.02 0 0 0.2 0.4 0.6 0.8 1 w Figure 5.1: Expected return for a portfolio with allocation w to the risky asset with expected return 0.15 and allocation 1 — wto the risk-free return ivith return 0.06. 98 CHAPTER 5. PORTFOLIO SELECTION: 3/12/01 Question: Suppose you want an expected return of .10? What should w be? [answer: 4/9] Question: Suppose you want aR = .05. What should w be? [answer: 0.2] More generally if the expected returns on the risky and risk-free assets are /ii and (if and if the standard deviation of the risky asset is o\, then the expected return on the portfolio is w/j,i + (1 — w)jif while the standard deviation of the portfolio's return is w o\. This model is simple but not as useless as it might seem at first. Finding an optimal portfolio can be achieved in two steps. 1. finding the "optimal" portfolio of risky assets, called the "tangency portfolio" 2. finding the appropriate mix of the risk-free asset and the tangency portfolio from step one So we now know how to do the second step. What we need to learn is how to mix optimally a number of risky assets; we will do that in the next sections. First, we look at a related example. 5.2.1 Example In the February 2001 issue of Paine Webber's Investment Intelligence: A Report for Our Clients, the advantages of holding municipal bonds are touted. Paine Webber says "The chart at the right shows that a 20% municipal/80% S%P 500 mix sacrificed only 0.42% annual after-tax return relative to a 100% S&P 500 portfolio, while reducing risk by 13.6% from 14.91% to 12.88%. The chart is show here as Figure 5.2. Although Paine Webber's point is correct, the chart is cleverly designed to over-emphasize the reduction in volatility; how? 5.2. ONE RISKY ASSET AND ONE RISK-FREE ASSET 99 Return Volatility Murticipal/S&P 500 Balanced Portfolios Annualized After-Tax Returns 19B1-20M and Portfolio Volatility % at Purllolio Invested In Municipal Bonds {balance invested in 5SP 5DD] Sourer: Nuvem InvtiWienti, "Tu-a h Gredttr Than One, "faxnary 200L Pnii pcrfoTTTiATtft is riß guarantee effuture refifltf- Figure 5.2: Chart from PaineWebber newsletter showing reduction in volatility by mixing municipal bonds with the S&P 500 index. 100 CHAPTER 5. PORTFOLIO SELECTION: 3/12/01 5.2.2 Estimating E(R) and aR The risk-free rate, /i/, will be known; Treasury bill rates are published in most newspapers. What should we use as the values of E(R) and aR? If returns on the asset are assumed to be stationary, then we can take a time series of past returns and use the sample mean and standard deviation. Whether the station-arity assumption is realistics or not is always debatable. If we think that E(R) and aR will be different than in the past, we could subjectively adjust these estimates upward or downward according to our opinions, but we must live with the consequences if our opinions prove to be incorrect. Another question is how long a time series to use, that is how far back in time when should gather data. A long series, say 10 or 20 years, will give much less variable estimates. However, if the series is not stationary but rather has slowly drifting parameters, then a shorter series (mabye 1 or 2 years) will be more representative of the future. 5.3 Two risky assets The mathematics of mixing risky assets is most easily understood when there are only two risky assets. This is where we will start. Suppose the two risky assets have returns i?i and R2 and that we mix them in proportions w and 1—w, respectively. The return is R = wR1 + (l — w)R2. The expected return on the portfolio is E(R) = wjjli + (1 — w)/i2. Let p12 be the correlation between the returns on the two risky assets. The variance of the return on the portfolio is aR = w2a\ + (1 - wfo\ + 2w(l - w)p12 o\o2. [Note: erÄ1)Ä2 = p\2oxo2\ Example: 5.3. TWO RISKY ASSETS 101 If iii = .14, \i2 = -08, <7i = .2, a2 = .15, and pi2 = 0, then E{R) = M + .06w. Also, because rhoi2 = 0 o£ = (.2)2«,2 + (.15)2(l-,/,)2. Using differential calculus, one can easily show that the portfolio with the minimum risk is w = .045/.125 = .36. For this portfolio E(R) = .08 + (.06)(.36) = .1016 and aR = ^/(.2)2(.36)2 + (.15)2(.64)2 = .12. Here are values of E(R) and aR for some other values of w: w E(R) aR ~o xm .150 1/4 .095 .123 1/2 .110 .125 3/4 .125 .155 1 .140 .200 The somewhat parabolic curve in Figure 5.3 is the locus of values of (aRl E(R)) when 0 < w < 1. The points labeled Ri and R2 corresponds to w = 1 and w = 0, respectively. The other features of this figure will be explained in the next section. 5.3.1 Estimating means, standard deviations, and covari-ances Estimates of jii and o\ can be obtained from a univariate times series of past returns on the first risky asset; denote this time series by i?i;i,..., Rľjn where the first subscript indicates the asset and the second subscript is for time. Let i?i and sRl be the sample mean and standard deviation of this series. Similarly, fi2 and o2 can be estimated from a time series of past returns on the second risky asset. The covariance er12 can be estimated by sample covariance n 102 CHAPTER 5. PORTFOLIO SELECTION: 3/12/01 0.125 LU 0.075 0.05 Figure 5.3: Expected return versus risk. The parabola is the locus of portfolios combining the two risky assets. The lines are the locus of portfolios of two risky assets and the risk-free asset. F = risk-free asset. T = tangency portfolio. Ri is the first risky asset. R2 is the second risky asset. The correlation p12 can be estimated by the sample correlation Pl2 SlS2' pi2, sometimes denoted by r\2, is called the cross-correlation coefficient between Ri and R2 at lag 0, since we are correlating the return on the first risky asset with the return on the second during the same time periods. Cross-correlations at other lags can be defined but are not needed here. In fact, we can define a cross-correlation function, which is a function of lag. The cross-correlation function plays an important role in the analysis of 5.4. COMBINING TWO RISKY ASSETS WITH A RISK-FREE ASSET 103 multivariate time series. Sample correlations and covariances can be computed on MINITAB. Go to "Stat," then "Basic statistics," and then "Correlation" or "Covariance." 5.4 Combining two risky assets with a risk-free asset As mentioned at the end of the last section, each point on the parabola in Figure 5.3 is (aR, E(R)) for some value of w between 0 and 1. If we fix w, then we have a fixed portfolio of the two risky assets. Now let us mix that portfolio of risky assets with the risk-free asset. The point F in Figure 5.3 gives (aR, E(R)) for the risk-free asset; of course oR = 0 at F. The possible values of (aR, E(R)) for a portfolio consisting of the fixed portfolio of two risky assets and the risk-free asset is a line connecting the point F with a point on the parabola, e.g., the dashed line. The dotted line connecting F with Ri mixes the risk-free asset with the first risky asset. Notice that the dotted line lies above the dashed line. This means that for any value of aR/ the dotted line gives a higher expected return than the dashed line. The slope of any line is called the "Sharpe ratio" of the line; it is named after William Sharpe whom we have met before in Section 3.8 and will meet again in Chapter 6. Sharpe's ratio can be thought of as a "reward-to-risk" ratio. It is the ratio of the "excess exprected return" to the risk as measure by the standard deviation. Clearly, the bigger the Sharpe ratio the better. Why? The point T on the parabola represents the portfolio with the highest Sharpe ratio. It is the optimal portfolio for the purpose of mixing with the risk-free asset. This portfolio is called the "tangency portfolio" since its line is tangent to the parabola. Key result: The optimal or "efficient" portfolios mix the tangency portfolio of two risky assets with the risk-free asset. Each efficient portfolio has 104 CHAPTER 5. PORTFOLIO SELECTION: 3/12/01 two properties: • it has a higher expected return than any other portfolio with the same (or smaller) risk • it has a smaller risk than any other portfolio with the same (or smaller) expected return. Thus we can only improve (reduce) the risk of an efficient portfolio by accepted a worse (smaller) expected return, and we can only improve (increase) the expected return of an efficient portfolio by accepting worse (higher) risk. Note that all efficient portfolios use the same mix of the two risky assets, namely the tangency portfolio. Only the proportion allocated to the tangency portfolio and the proportion allocated to the risk-free asset vary. 5.4.1 Tangency portfolio with two risky assets Given the importance of the tangency portfolio, you may be wondering "how do we find it?" Again let Hi, p2, and (if be the expected returns on the two risky assets and the return on the risk-free asset. Let cti and o2 be the standard deviations of the returns on the two risky assets and let p12 be the correlation between the returns on the risky assets. Define Vi = /ii — /x/ and V2 = ß2 — Hf', V\ and V2 are called the "excess returns." Then the tangency portfolio uses weight __________V\o\ - V2P12C1O2_______ (5 ,, V\o\ + V2o\ - {Vi + V2)Pl2 0"lO"2 ' This formula will be derived in Section 5.6.5. The tangency portfolio allocates a fraction wT of the investment to the first risky asset and (1 — wT) to the second risky asset. 5.4. COMBINING TWO RISKY ASSETS WITH A RISK-FREE ASSET 105 Let RT, E{RT), and aT be the return, expected return, and standard deviation of the return on the tangency portfolio. Example: Suppose as before that Hi = .14, /i2 = .08, cti = .2, a2 = .15, and Pi2 = 0. Suppose as well that jif = .06. Then Vi = .14 — .06 = .08 and V2 = .08 - .06 = .02. Using (5.1) we get wT = .693. Therefore, E(RT) = (.693)(.14) + (.307)(.08) = .122, and aT = v/(.693)2(.2)2 + (.307)2(.15)2 = .146. Let R be the return on the portfolio that allocates a fraction uj of the investment to the tangency portfolio and 1 — u to the risk-free asset. Then R = ujRt + (1 — a;)/// = /if + co(Rt — Rf) so that E{R) = jif + u){E(RT) — jif} and aR = ujot. Continuation of previous example: What is the optimal investment with a R = .05? answer: The maximum expected return with aR = .05 mixes the tangency portfolio and the risk-free asset such that oR = .05. Since oT = .146, we have that .05 = aR = ujaT = .146 u, so that u = .05/.146 = .343 and \-u) = .657. So 65.7% of the portfolio should be in the risk-free asset. 34.3% should be in the tangency portfolio. Thus (.343)(69.3%) = 23.7% should be in the first risky asset and (.343)(30.7%) = 10.5 should be in the second risky asset. In summary 106 CHAPTER 5. PORTFOLIO SELECTION: 3/12/01 Asset Allocation risk-free risky 1 risky 2 65.7% 23.7% 10.5% Total 99.9% The total is not quite 100% because of rounding errors. Now suppose that you want a 10% expected return. Compare • The best portfolio of only risky assets • the best portfolio of the risky assets and the risk-free asset Answer: • (best portfolio of risky assets) - .1 = w(.U) + (1 - w)(.08) implies that w = 1/3. - This is the only portfolio of risky assets with E(R) = .1, so by default it is best. - Then aR = ^2(.2)2 + (l-«;)2(.15)2 = v/(l/9)(.2)2 + 4/9(.15)2 = .120. • (best portfolio of the two risky assets and the risk-free asset) - .1 = E(R) = .06 + .062cj = .06 + A25aR, since oR = uaT or to = aR/aT = ctä/.146. - This implies that aR = .04/.425 = .094 and u = .04/.062 = .645. So combining the risk-free asset with the two risky assets reduces aR from .120 to .094 while maintaining E(R) at .1. The reduction in risk is (.120 — .094)/.094 = 28%. More on the example: What is the best we can do combining the risk-free asset with only one risky asset? Assume that we still want to have E(R) = .1 5.4. COMBINING TWO RISKY ASSETS WITH A RISK-FREE ASSET 107 • Second risky asset with the risk-free - Since [if = .06 < .1 and ji2 = -08 < .1, no portfolio with only the second risky asset and the risk-free asset will have an expected return of .1. • First risky asset with the risk-free - .1 =w(.14) + (l-o;)(.06) = .06+w(.08)implesthatw = .04/.08 = 1/2. - Then aR = a; (.20) = .10 which is greater than .094, the smallest risk with two risky assets and the risk-free asset such that E(R) = .1. The minimum value of aR under various combinations of available assets are given in Table 5.4.1. Available Assets Minimum aR 1st risky, risk-free 0.1 2nd risky, risk-free - Both riskies 0.12 All three 0.094 Table 5.1: Minimum value of aR as a function of the available assets. 5.4.2 Effect of p12 Positive correlation between the two risky asets is bad. With positive correlation, then two assets tend to move together which increases the volatility of the portfolio. Conversely, negative correlation is good. If the assets are negatively correlated, a negative return of one tends to occur with a positive return of the other so the volatility of the portfolio decreases. Figure 5.4 shows the efficient frontier and tangency portfolio when /ii = .14, /U2 = -09, ai = .2, cr2 = -15, and /i/ = .03. The value of p12 is varied 108 CHAPTER 5. PORTFOLIO SELECTION: 3/12/01 from .7 to —.7. Notice that the standard deviation of the portfolio returns decreases as p12 decreases. p = 0.7 LU 0.05 0 0.05 0.1 0.15 0.2 p = 0 LU 0.05 0 0.05 0.1 0.15 0.2 LU 0.1 p = 0.3 0.05 F'' 0 0.05 0.1 0.15 0.2 p = -0.7 LU 0.05 0 0.05 0.1 0.15 0.2 Figure 5.4: Efficient frontier and tangency portfolio when ßi = .14, ß2 = -09, ax = .2, o2 = .15, and \i$ = .03. The value of p12 is varied from .7 to —.7. 5.5. HARRY MARKOWITZ 109 5.5 Harry Markowitz Chapter Two of Capital Ideas: The Improbable Origins of Modern Wall Street by Peter Bernstein is titled "Fourteen Pages to Fame." The title refers to the paper "Portfolio Selection" by Harry Markowitz that was published in the Journal of Finance in 1952. This article is indeed only fourteen pages though it was later expanded to the book Portfolio Selection: Efficient Diversification of Investments that was published by Markowitz in 1959. Markowitz was not primarily interested in the stock market or investing. Rather, he was drawn to the more general issue of how people make tradeoffs. Investors are faced with a trade-off between risk and expected return. The maxim "nothing ventured, nothing gained" isn't quite true, but risk-free rates of return can be smaller than many investors find acceptable. Markowitz's solution to the problem of risk also can be expressed as a maxim, "don't put all your eggs in one basket." (Keynes, would have agreed with Mark Twain who said, "put all your eggs in one basket — and then, watch that basket!" Markowitz was born in 1927 and grow up in Chicago. His high school grades were not impressive, but he was intellectually curious and read a great deal on his own. At fourteen, he read Darwin's Origin of Species and later his hero was the philosopher David Hume. The knowledge he acquired on his own got him into the University of Chicago and even exempted him from the required science courses there. This self-study may have been ideal preparation for the highly original work that came later. After graduation, Markowitz became a research associate at the Cowles Commission and a graduate student at his Alma Mater. While waiting outside his advisor's office one day, he began a conversation with a stock broker who suggested that he write his thesis on the stock market. Markowitz was somewhat surprised when later his advisor was enthusiastic over this idea. Markowitz started to read what he could about investing. In the 1937 book 110 CHAPTER 5. PORTFOLIO SELECTION: 3/12/01 The Theory of Investment Value by John Burr Williams, he found Williams's prescription for selecting stocks: one estimated the "instrinsic value" of a stock by forecasting all future dividends and calculating the "present value" all future dividends, that is, the discounted sum of all future dividends. William then recommends that one put all one's capital in the stock with the highest intrinsic value. Markowitz had enough knowledge of the world to realize that this is not how investors actually operated. He had the key insight that humans are risk-averse, and he began to explore the relationship between diversification and risk. Interestingly, Markowitz did not recommend that expected returns be estimated from past data but rather from Williams's Dividend Discounted Model. 5.6 Risk-efficient portfolios with N risky assets 5.6.1 Efficient-set mathematics Efficient-set mathematics generalizes our previous analysis with two risky assets to the more realistic case of many risky assets. This material is taken from Section 5.2 of Campbell, Lo, and MacKinlay. Assume that we have N risky assets and that the return on the ith risky asset is Hi. Define to be the random vector of returns. Then ( V\ \ E(R) = n= ; 5.6. RISK-EFFICIENT PORTFOLIOS WITH N RISKY ASSETS 111 Let fž„ be the covariance between Rj and Rj. Also, let o i = y/H^ be the the standard deviation of i?,. Define p„ = ^/(o-jOj) as the correlation between Ri and i?j. Finally, let fž be the covariance matrix of R, i.e., n = cov(H), so that the z, jth element of ÍŽ is íí^. Let \uJn / be a matrix of portfolio weights and let be a column of N ones. We assume that uji + ■ ■ ■ + uN = lTo; = 1. The expected return on a portfolio with weights u; is Y^íĹi ^ifo = u;TA*- When N = 2, uj2 = 1 — u)\. Suppose there is a target value, ßPl of the expected return on the portfolio. We assume that min Uj < up < max /*,-, since no portfolio can have an expected return higher than the individual asset with the highest expected return or smaller than the individual asset with the lowest expected return. When N = 2 the target, ßPr is achieved by only one portfolio and its uji value solves ßp = Uißi + U2ß2 = ß2+ ^i(Mi - ß2)- For N > 3, there will be an infinite number of portfolios achieving the target, ßP. The one with the smallest variance is called the "efficient" portfolio. Our goal is to find the efficient portfolio. 112 CHAPTER 5. PORTFOLIO SELECTION: 3/12/01 By equation (2.2), the variance of the return on the portfolio with weights IjJ is N N J2 Yl ^i wi % = ^T^- (5-2) i=l j=l Thus, given a target jiP, the efficent portfolio minimizes (5.2) subject to CĽT/Lt = jip (5.3) and wTl = 1. (5.4) We will denote the weights of the efficient portfolio by U3ßp. To find 0JßP, form the Lagrangian L = u}Tflu} + 5i(/ip - u>lpn) + S2(l- wTpj). Then solve 0 = -^-L = 2ííwílJ,----------- ' (5-7) 114 CHAPTER 5. PORTFOLIO SELECTION: 3/12/01 and h=cn-y-.4Q-i (58) Notice that g and h are fixed vectors, since they depend on the fixed vector /x and the fixed matrix Í7. Also, the scalars A, C, and D are functions of /x and Í7 so they are also fixed. The target expected return, fiP, can be varied over the range min u, < up < max n,-.. As jj,p varies over this range, we get a locus ojßp of efficient portfolios called the "efficient frontier." We can illustrate the efficient frontier by the following algorithm: 1. Vary jjlp along a grid. For each value of /j,p on this grid, compute aßp by: (a) computing uißp = g + h fiP (b) then computing aßp = ^vßp nußp 2. Plot the values (,up, aßp) 5.6. RISK-EFFICIENT PORTFOLIOS WITH N RISKY ASSETS 115 This algorithm is implemented in the MATLAB program "portfolio02.m" on the course's web site and listed below: % Input mean vector and covariance matrix of returns here bmu = [ .08; .03; .05] ; bOmega = [ .3 .02 .01 ; .02 .15 .03 ; .01 .03 .18 ] ; bone = ones(length(bmu),1) ; short = 1 ; % short = 1 implies extensive short selling % short = 0 reduces the short selling, but % does not eliminate short sell ngrid = 2 00 ; if short == 1 ; muP = linspace(-.02,.2,ngrid) ; w = linspace(-5,7,ngrid) ; else ; muP = linspace(min(bmu),max(bmu),ngrid) ; w = linspace(0,1,ngrid) ; end ; sigmaP = zeros(1,ngrid) ; omegaP = zeros(3,ngrid) ; mul2 = zeros(1,ngrid) ; sigmal2 = mul2 ; mul3 = zeros(1,ngrid) ; sigmal3 = mul2 ; mu23 = zeros(1,ngrid) ; sigma2 3 = mul2 ; 116 CHAPTER 5. PORTFOLIO SELECTION: 3/12/01 ibOmega = inv(bOmega) ; A = bone'*ibOmega*bmu ; B = bmu'*ibOmega*bmu ; C = bone'*ibOmega*bone ; D = B*C - A~2 ; bg = (B*ibOmega*bone - A*ibOmega*bmu)/D ; bh = (C*ibOmega*bmu - A*ibOmega*bone)/D ; for i=l:ngrid ; omegaP(:,i) = bg + muP(i)*bh ; sigmaP(i) = sqrt(omegaP(:,i)'*bOmega*omegaP(:,i)) ; mul2(i) = w(i)*bmu(l) + (l-w(i))*bmu(2) ; sigmal2(i) = sqrt(w(i)~2*b0mega(1,1) + 2*w(i)*(l-w(i))*bOmega(1,2) + (l-w(i))~2*b0mega(2,2)) ; mul3(i) = w(i)*bmu(l) + (l-w(i))*bmu(3) ; sigmal3(i) = sqrt(w(i)~2*b0mega(1,1) + 2*w(i)*(1-w(i))*bOmega(1,3! + (l-w(i))~2*bOmega(3,3)) ; mu23(i) = w(i)*bmu(2) + (1-w(i))*bmu(3) ; sigma23(i) = sqrt(w(i)~2*b0mega(2,2) + 2*w(i)*(1-w(i))*bOmega(2,3) + (l-w(i))~2*bOmega(3,3)) ; end ; fsize = 16 ; figure (1) p = plot(sigmaP,muP,sigmal2,mul2, set(p,'linewidth',6) ; xlabel('standard deviation of return (\sigma_P)', ylabel('expected return (\mu_P)','fontsize',fsize text(sqrt(bOmega(1,1)),bmu(1),'1' text(sqrt(bOmega(2,2)),bmu(2),'2' text(sqrt(bOmega(3,3)),bmu(3),'3' set(gca,'fontsize',fsize) ; if short == 0 ; set(gca,'ylim', [ .025, .085]) ; end ; if short == 1 ; set(gca,'ylim',[-.02,.2]) ; set(gca,'xlim', [ .2 , 2]) ; end ; ',sigma13,mul3,'-.',sigma23,mu23, 'fontsize',fsize) ; ,'fontsize' ,'fontsize' ,'fontsize' ,24) ,24) ,24! 5.6. RISK-EFFICIENT PORTFOLIOS WITH N RISKY ASSETS 117 grid ; if short == 0 ; print portfolio02.ps -depsc ; !mv portfolio02.ps ~/public_html/or473/LectNotes/portfolio02.ps ; else ; print portfolio02SH.ps -depsc ; !mv portfolio02SH.ps ~/public_html/or473/LectNotes/portfolio02SH.ps ; end ; figure (2) p2 = plot(muP,omegaP(1,:),muP,omegaP(2,:),'--',muP,omegaP(3,:),'-.') ; set(p2,'linewidth',6) ; set(gca,'fontsize',fsize) ; grid ; xlabel('\mu_P','fontsize',fsize) ; ylabel('weight','fontsize' , fsize) ; legend('w_l','w_2','w_3',0) ; if short == 0 print portfolio02_wt.ps -depsc ; !mv portfolio02_wt.ps ~/public_html/or473/LectNotes/portfolio02_wt.ps ; else ; print portfolio02_wtSH.ps -depsc ; !mv portfolio02_wtSH.ps ~/public_html/or473/LectNotes/portfolio02_wtSH.ps ; end ; To use this program replace bmu and bOmega in the program by the vector of expected returns and covariance matrix of returns for the assets you wish to analyze. The parameter "short" should be set equal to 0 or 1. If "short" is 1, then there is extensive short selling, i.e., weights get quite negative. If "short" is 0, then the amount of short selling is small. 118 CHAPTER 5. PORTFOLIO SELECTION: 3/12/01 Figure 5.5 was produced by this program with "short" equal to 0. 0.35 0.4 0.45 0.5 standard deviation of return (o ) Figure 5.5: Efficient frontier (solid) plotted for N = 3 assets by the program "portfolio02.m" with the parameter "short" equal to 0. "1," "2," and "3" are the three single assets. The efficient frontiers for just two assets are dashed (1 and 2), dashed-and-dotted (1 and 3), and dotted (2 and 3). 5.6. RISK-EFFICIENT PORTFOLIOS WITH N RISKY ASSETS 119 The portfolio weights as functions of ßP are plotted in Figure 5.6. The weights can be negative. Negative weights can be obtained by the technique of selling short which is described in Section 5.6.2. 0.03 0.04 0.05 0.06 0.07 0.08 Figure 5.6: Weights for assets 1,2, and 3 as functions of \iv. Note that the weights for assets 1 and 2 can be negative, so that short selling would be required. 120 CHAPTER 5. PORTFOLIO SELECTION: 3/12/01 If one wants to avoid short selling, then one must impose the additional constraints that wi > 0 for i = 1,..., N. Minimization of portfolio risk subject to u}T ß = Up, tt>Tl = 1, and these additional nonnegativity constraints is a quadratic programming problem. (This minimization problem cannot be solved by the method of Lagrange multipliers because of the inequality constraints.) Quadratic programming algorithms are not hard to find. For example, the program "quadprog" in MATLAB's Optimization Toolbox does quadratic programming. Figure 5.7 and 5.8 were produced by the program "portfolio02QP.m" that uses "quadprog" in MATLAB. Quadratic programming in MATLAB and "portfolio02QP.m are discussed in Section 5.9. 5.6. RISK-EFFICIENT PORTFOLIOS WITH N RISKY ASSETS 121 1 1 J^ ^ ^0*^\ o O ^^*o — no negative wts - - ■ unconstrained wts 1 0.25 0.3 0.35 0.4 0.45 0.5 standard deviation of return (o ) 0.55 0.6 Figure 5.7: Efficient frontier plotted by the program "portfolio02QP.m" for N = 3 assets. "1," "2," and "3" are the three single assets. The efficient frontiers are found with and without the constraint of no negative loeights. The constrained efficient frontier is computed using MATLAB's quadratic programming algorithm. 122 CHAPTER 5. PORTFOLIO SELECTION: 3/12/01 0.03 0.04 0.05 0.06 0.07 0.08 Figure 5.8: Weights for assets 1, 2, and 3 as functions of /j,p. The iveightsfor all three assets are constrained to be nonnegative. 5.6. RISK-EFFICIENT PORTFOLIOS WITH N RISKY ASSETS 123 The portfolio weights with the nonnegativity constraint are plotted as functions of ßp in Figure 5.8. That figure was made using the program "port-folio02QP.m" listed here: % Input mean vector and covariance matrix of returns here bmu = [ .08; .03; .05] ; bOmega = [ .3 .02 .01 ; .02 .15 .03 ; .01 .03 .18 ] ; A = [ones (1,3) ;bmu'] ; ngrid = 50 ; muP = linspace(.03,.08,ngrid)' ; icompute = 0 ; if icompute == 1 ; sigmaP = muP ; sigmaP2 = sigmaP ; omegaP = zeros(3,ngrid) ; omegaP2 = omegaP ; for i = 1:ngrid ; omegaP(:,i) = quadprog(bOmega,zeros(3,1) ,-eye(3) ,zeros (3,1) ,A, [l;muP(i)]) ; omegaP2(:,i) = quadprog(bOmega,zeros(3,1) ,zeros(1, 3) ,0,A, [l;muP(i)]) ; sigmaP(i) = sqrt(omegaP(:,i)'*bOmega*omegaP(:,i)) ; sigmaP2(i) = sqrt(omegaP2(:,i)'*b0mega*omegaP2(:,i)) ; end ; end ; fsize = 16 ; figure(1) elf p = plot(sigmaP,muP,sigmaP2,muP,'--') ; l=legend('no negative wts','unconstrained wts',4) ; set(gca,'fontsize',fsize) ; set(1,'fontsize',fsize) ; xlabel('standard deviation of return (\sigma_P)','fontsize',fsize) ; 124 CHAPTER 5. PORTFOLIO SELECTION: 3/12/01 ylabel('expected return (\mu_P)','fontsize',fsize) ; text(sqrt(bOmega(1,1)) ,bmu(1) ,'ľ,'fontsize' ,24) ; text(sqrt(bOmega(2,2)),bmu(2),'2','fontsize',24) ; text(sqrt(bOmega(3,3)),bmu(3),'3','fontsize',24) ; set(gca,'ylim', [ .025, .085]) ; set(p,'linewidth',4) ; grid ; print portfolio02QP.ps -depsc ; !mv portfolio02QP.ps ~/public_html/or473/LectNotes/portfolio02QP.ps ; figure(2) p2 = plot(muP,omegaP(1,:),muP,omegaP(2,:),'--',muP,omegaP(3,:),'-.') ; set(p2,'linewidth',6) ; set(gca,'fontsize',fsize) ; grid ; xlabel('\mu_P','fontsize',fsize) ; ylabel('weight','fontsize',fsize) ; legendi' w_ľ , 'w_2' , 'w_3' , 0) ; print portfolio02_wtQP.ps -depsc ; !mv portfolio02_wtQP.ps ~/public_html/or473/LectNotes/portfolio02_wtQP.ps ; 5.6. RISK-EFFICIENT PORTFOLIOS WITH N RISKY ASSETS 125 Now suppose that we have a risk-free asset and we want to mix the risk-free asset with some efficient portfolio. One can see geometrically that there is a tangency portfolio; see Figure 5.9. The optimal portfolio always is a mixture of the risk-free asset with the tangency portfolio. This is a remarkable simplification. mu = expected return T= tangency portfolio ^"**^ ^,__ ^ ^^efficient ^jr frontier /y 4 arbitrary portfolio of best portfolios of / / risky assets risky and ™*A^ risk-free assets / ■ P = arbitrary efficient portfolio of risky assets //%. mixtures of P Ar and R X R = risk-free sigma Figure 5.9: Finding the best portfolios that combine risky and risk-free assets. R is the risk-free asset. T is the tangency portfolio. The optimal portfolios are on the line connecting R and T. The efficient frontier gives the set of optimal portfolios of risky assets. 126 CHAPTER 5. PORTFOLIO SELECTION: 3/12/01 5.6.2 Selling short Selling short is a way to profit if a stock price goes down. To sell a stock short, one sells the stock without owning it. The stock must be borrowed from a broker or another customer of the broker. At a later point in time, one buys the sale and gives it back to the lender. This closes the short position. Suppose a stock is selling at $25/share and you sell 100 shares short. This gives you $2,500. If the goes down to $17 share, you can buy the 100 shares for $1,700 and close out your short position. Suppose that you have $100 and there are two risky assets. With your money you could buy $150 worth of risky asset 1 and sell $50 short of risky asset 2. The net cost would be exactly $100. If i?i and R2 are the returns on risky assets 1 and 2, then our the return on your portfolio would be §* + (-i)«, Your portfolio weights are wi = 3/2 and w2 = —1/2. Thus, you hope that risky asset 1 rises in price and risky asset 2 falls in price. Here, as elsewhere, we have ignored transaction costs. Figure 5.10 is the same as Figure 5.5 except that the range of values of fiP has been expanded. Values of /iP below min(/ij) and above max(^) are possible by using short selling. In principle, there is no upper limit to jjlP/ but in practice security exchanges place limits on the amount of stock one can sell short becausing selling short increases risk. 5.6. RISK-EFFICIENT PORTFOLIOS WITH N RISKY ASSETS 127 Figure 5.10: Efficient frontier (solid) plotted for N = 3 assets by the program "portfolio02.m" with the parameter "short" equal to 1. "1," "2," and "3" are the three single assets. The efficient frontiers for just two assets are dashed (1 and 2), dashed-and-dotted (1 and 3), and dotted (2 and 3). This figure is the same as Figure 5.5 except that the range offip has been expanded. 128 CHAPTER 5. PORTFOLIO SELECTION: 3/12/01 5.6.3 The Interior decorator fallacy It is often thought that a stock portfolio should be tailored to the financial circustances of a client, as an interior decorator furnishes your home to suit your tastes. For example, widows and orphans should hold conservative "income stocks," or so it is said. Bernstein, in his book Capital Ideas, calls this the "interior decorator fallacy." Bernstein tells the story of a woman in her forties who came to him in 1961 for investment advice. She was married to a clergyman with a modest income. She had just inherited money which she wanted to invest. Bernstein recommended a portfolio that included stocks with good growth potential but low dividends, e.g., Georgia Pacific, IBM, and Gillette. The client was worried that these were too risky, but she eventually took Bernstein's advice, which turned out to be sound. Bernstein reasoned that even someone with modest means should benefit from the long-term growth potential of the "hot" stocks. In another case, Bernstein recommended electric utilities, a conservative choice, to a young business excecutive who wanted a more aggressive portfolio. Again, this recommendation was at odds with conventional wisdom. A new view, based both on mathematical theory and experience, is that there is a best portfolio (the tangency portfolio) that is the same for everyone. An individual's circumstances only determines the appropriate mix between risk-free assets and the tangency portfolio. The clergyman's wife should invest a higher percentage of her money in risk-free assets than the young business executive. In 1961, Bernstein had the right intuition but he had not yet heard of the Efficient Frontier or the tangency portfolio. 5.6.4 Back to the math Here's the mathematics behind Figure 5.9. We now remove the assumption that ujj1 = 1. The quantity 1 — usJ 1 is invested in the risk-free asset. 5.6. RISK-EFFICIENT PORTFOLIOS WITH N RISKY ASSETS 129 (Does it make sense to have 1 — a;Tl < 0?). The expected return is u/V+(l-^Tl)/"/, (5-9) where jif is the return on the risk-free asset. The constraint to be satisfied is that (5.9) is equal to jiP. Thus, the Lagrangian function is L = ujtQ.uj + 5{jip — uj1n — (1 — u;Tl)/i/}. Here 5 is a Lagrange multiplier. Since d 0 = — L = 2ftu; + ô(-fí + l/i/), du; the optimal weight vector, i.e., the vector of weights that minimizes risk subject to the constraint on the expected return, is ivßp = xn-1(ß-f,fi), (5.10) where A = 5/2. To find A, we use our constraint: ^Íp^ + (1-wÍp1)^/ = mp- (5-H) Rearranging (5.11), we get ^Ipip- J"/1) = Vp- Vf- (5-12) Therefore, substituting (5.10) into (5.12) we have X(fi - ///l)Tíí_1(/x - fifl) =[ip- fif, or A = 7---------^T^----------v (5-13) (^-M/)1^ (/x-M/1) Then substituting (5.13) into (5.10) u>ßp = cp w, where _ lip- nf Cp — (ti-fifiyn-'ifi-tifi) 130 CHAPTER 5. PORTFOLIO SELECTION: 3/12/01 and üJ = Q-1(/j,-ljfl). (5.14) Note that (fj, — /i/l) is the vector of "excess returns," that is, the amount by which the expected returns on the risky assets exceed the risk-free return. The excess returns measure how much the market buys for assuming risk. ČU is not quite a portfolio because these weights do not necessarily sum to one. The tangency portfolio is a scalar multiple of u;: u>T = -^. (5.15) 1 UJ cp tells us how much weight to put on W and therefore on the tangency portfolio. The amount of weight to put of the risk-free asset is 1 — uJTl = 1 — cp (wTl). The weight on the tangency portfolio is cp (oJJl). Note that ÜJ and u:T do not depend on /j,p. The MATLAB program "portfolio03.m" on the course web site is an extension of "portfolio02.m." portfolio03.m, which is listed below, also plots of the tangency portfolio (T) and the line connecting the risk-free asset (F) with the tangency portfolio. % portfolio03 - extension of portfolio02 % Input mean vector and covariance matrix of returns here bmu = [ .08; .03; .05] ; bOmega = [ .3 .02 .01 ; .02 .15 .03 ; .01 .03 .18 ] ; muf = .02 ; bone = ones(length(bmu),1) ; 5.6. RISK-EFFICIENT PORTFOLIOS WITH N RISKY ASSETS muP = linspace(min(bmu),max(bmu),50) ; siqmaP = zeros(1,50) ; ibOmeqa = inv(bOmeqa) ; A = bone'*ibOmeqa*bmu ; B = bmu'*ibOmeqa*bmu ; C = bone'*ibOmeqa*bone ; D = B*C - A~2 ; bq = (B*ibOmeqa*bone - A*ibOmeqa*bmu)/D ; bh = (C*ibOmeqa*bmu - A*ibOmeqa*bone)/D ; for i=l:50 ; omeqaP = bq + muP(i)*bh ; siqmaP(i) = sqrt(omeqaP'*bOmeqa*omeqaP) ; end ; bomeqabar = ibOmeqa*(bmu - muf*bone) ; bomeqaT = bomeqabar/(bone'*bomeqabar) ; siqmaT = sqrt(bomeqaT'*bOmeqa*bomeqaT) ; muT = bmu'*bomeqaT ; fsize = 16 ; fsize2 = 28 ; bomeqaP2 = [0; .3; .7] ; siqmaP2 = sqrt(bomeqaP2'*bOmeqa*bomeqaP2) ; muP2 = bmu'*bomeqaP2 ; elf ; pi = plot(siqmaP,muP) ; 11 = line([0,siqmaT],[muf,muT]) ; tl= text(siqmaP2,muP2,'* P','fontsize',fsize2) t2= text(siqmaT,muT,'* T','fontsize',fsize2) ; t3=text(.01,muf+.006,'F','fontsize',fsize2) ; t3B= text(0,muf,'*','fontsize',fsize2) ; 132 CHAPTER 5. PORTFOLIO SELECTION: 3/12/01 set(pi,'linewidth',2) ; set(11,'linewidth',2) ; set(11,'linestyle','--') ; xlabel('standard deviation of return','fontsize',fsize) ; ylabel('expected return','fontsize',fsize) ; print portfolio03.ps -deps ; 5.6. RISK-EFFICIENT PORTFOLIOS WITH N RISKY ASSETS 133 0.08 0.07 C 1— CD i_ T3 CD ~*—» O CD Q. X CD 0.06 0.05 0.04 0.03 0.02 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 standard deviation of return 0.45 0.5 Figure 5.11: Efficient frontier and line of optimal combinations of risky and risk-free assets plotted by the program "portfolio03.m" for N = 3 assets. "P" is the portfolio loith loeights (0 .3 .7) that is not on the effcient frontier. "T" is the tangency portfolio and "F" is the risk-free asset. 134 CHAPTER 5. PORTFOLIO SELECTION: 3/12/01 5.6.5 Example: N = 2 If N = 2, then n = You should check that of J0120102 \ pi2ö-i<72 , H = fi, / equal to aiVxl vector of zeros, A = —I (the N x N identity matrix), b equal to a N x 1 vector of zeros, Aeg= (^TJ, and Here is the documentation for MATLAB's "quadprog" illustrating several ways that this program can be used. In our applications, e.g., in the program "portfolio02QP.m," we call the program "quadprog" with a command of the type "X=QUADPROG(H,f,A,b,Aeq,beq)". This can be seen in the listing of "portfolio02QP.m" which is given later. QUADPROG Quadratic programming. X=QUADPROG(H,f,A,b) solves the quadratic programming problem: min 0.5*x'*H*x + f'*x subject to: A*x <= b x 138 CHAPTER 5. PORTFOLIO SELECTION: 3/12/01 X=QUADPROG(H,f,A,b,Aeq,beq) solves the problem above while additionally satisfying the equality constraints Aeq*x = beq. X=QUADPROG(H,f,A,b,Aeq,beq,LB,UB) defines a set of lower and upper bounds on the design variables, X, so that the solution is in the range LB <= X <= UB. Use empty matrices for LB and UB if no bounds exist. Set LB(i) = -Inf if X(i) is unbounded below; set UB(i) = Inf if X(i) is unbounded above. X=QUADPROG(H,f,A,b,Aeq,beq,LB,UB,XO) sets the starting point to XO. X=QUADPROG(H,f,A,b,Aeq,beq,LB,UB,XO,OPTIONS) minimizes with the default optimization parameters replaced by values in the structure OPTIONS, an argument created with the OPTIMSET function. See OPTIMSET for details. Used options are Display, Diagnostics, TolX, TolFun, HessMult, LargeScale, Maxlter, PrecondBandWidth, TypicalX, TolPCG, and MaxPCGIter. Currently, only 'final' and 'off are valid values for the parameter Display ('iter' is not available). X=QUADPROG(Hinfo,f,A,b,Aeq,beq,LB,UB,XO,OPTIONS,PI,P2,...) passes the problem-dependent parameters P1,P2,... directly to the HMFUN function when OPTIMSET('HessMult',HMFUN) is set. HMFUN is provided by the user. Pass empty matrices for A, b, Aeq, beq, LB, UB, XO, OPTIONS, to use the default values. [X,FVAL]=QUADPROG(H,f,A,b) returns the value of the objective function at X: FVAL = 0.5*X'*H*X + f'*X. [X,FVAL,EXITFLAG] = QUADPROG(H,f,A,b) returns a string EXITFLAG that describes the exit condition of QUADPROG. If EXITFLAG is: > 0 then QUADPROG converged with a solution X. 0 then the maximum number of iterations was exceeded (only occurs with large-scale method). < 0 then the problem is unbounded, infeasible, or QUADPROG failed to converge with a solution X. [X,FVAL,EXITFLAG,OUTPUT] = QUADPROG(H,f,A,b) returns a structure OUTPUT with the number of iterations taken in OUTPUT.iterations, the type of algorithm used in OUTPUT.algorithm, the number of conjugate gradient iterations (if used) in OUTPUT.cgiterations, and a measure of first order optimality (if used) in OUPUT.firstorderopt. [X,FVAL,EXITFLAG,OUTPUT,LAMBDA]=QUADPROG(H,f,A,b) returns the set of Lagrangian multipliers LAMBDA, at the solution: LAMBDA.ineqlin for the linear inequalities A, LAMBDA.eqlin for the linear equalities Aeq, LAMBDA.lower for LB, and LAMBDA.upper for UB. Here is the program "portfolio02QP.m": % Input mean vector and covariance matrix of returns here bmu = [ .08; .03; .05] ; bOmega = [ .3 .02 .01 ; 5.9. QUADRATIC PROGRAMMING 139 .02 .15 .03 ; .01 .03 .18 ] ; A = [ones(1,3);bmu'] ; nqrid = 50 ; muP = linspace(.03,.08,nqrid)' ; icompute = 0 ; if icompute == 1 ; siqmaP = muP ; siqmaP2 = siqmaP ; omeqaP = zeros(3,nqrid) ; omeqaP2 = omeqaP ; for i = 1:nqrid ; omeqaP(:,i) = quadproq(bOmeqa,zeros(3,1),-eye(3),... zeros(3,1),A,[l;muP(i)]) ; omeqaP2(:,i) = quadproq(bOmeqa,zeros(3,1),zeros(1,3),... 0,A,[l;muP(i)]) ; siqmaP(i) = sqrt(omeqaP(:,i)'*bOmeqa*omeqaP(:,i)) ; siqmaP2(i) = sqrt(omeqaP2(:,i)'*b0meqa*omeqaP2(:,i)) ; end ; end ; fsize = 16 ; fiqure(1) elf p = plot(siqmaP,muP,siqmaP2,muP,'-- ' ) ; l=leqend('no neqative wts','unconstrained wts',4) ; set(qca,'fontsize',fsize) ; set(1,'fontsize',fsize) ; xlabel('standard deviation of return (\siqma_P)',... 140 CHAPTER 5. PORTFOLIO SELECTION: 3/12/01 'fontsize',fsize) ; ylabel('expected return (\mu_P)','fontsize',fsize) ; text(sqrt(bOmega(l,1)),bmu(l),'ľ,'fontsize',24) ; text(sqrt(b0mega(2,2)),bmu(2),'2','fontsize',24) ; text(sqrt(bOmega(3,3)),bmu(3),'3','fontsize',24) ; set(gca,'ylim', [ .025, .085]) ; set(p,'linewidth',4) ; grid ; print portfolio02QP.ps -depsc ; !mv portfolio02QP.ps ... ~/public_html/or473/LectNotes/portfolio02QP.ps ; figure(2) p2 = plot(muP,omegaP(1,:),muP,omegaP(2,:),'--',muP,... omegaP(3,:),'-.') ; set(p2,'linewidth',6) ; set(gca,'fontsize',fsize) ; grid ; xlabel('\mu_P','fontsize',fsize) ; ylabel('weight','fontsize',fsize) ; legend('w_l','w_2','w_3',0) ; print portfolio02_wtQP.ps -depsc ; !mv portfolio02_wtQP.ps ... ~/public_html/or473/LectNotes/portfolio02_wtQP.ps ; Chapter 6 The Capital Asset Pricing Model: 3/26/01 6.1 Introduction to CAPM The CAPM (capital asset pricing model has a variety of uses: • It provides a theoretical justification for the widespread practice of "passive" investing known as indexing. - Indexing means holding a diversified portfolio in which securities are held in the same relative proportions as in a broad market index such as the S&P 500. Individual investors can do this easily by holding shares in an index fund. • CAPM can provide estimates of expected rates of return on individual investments • CAPM can establish "fair" rates of return on invested capital in regulated firms or in firms working on a cost-plus basis — what should the "plus" be? • CAPM starts with the question, what would be the risk premiums on securites if the following assumptions were true? 141 142 CHAPTER 6. THE CAPITAL ASSET PRICING MODEL: 3/26/01 - The market prices are "in equilibrium." * In partcular, for each asset, supply equals demand. - Everyone has the same forecasts of expected returns and risks. - All investors chose portfolios optimally according to the prinici-ples of efficient diversification discussed in Chapter 5. * This implies that everyone holds the tangency portfolio of risky assets. - The market rewards people for assuming unavoidable risk, but there is no reward for needless risks due to inefficient portfolio selection. * Therefore, the risk-premium on a single security is not due to its "stand alone" risk, but rather to its contribution to the risk of the tangency portfolio. • The various components of risk will be discussed in Section 6.4. As in Chapter 5, "return" can either refer to one-period net returns or one-period log returns. Suppose that there are exactly three assets with a total market value of $100 billion. • Stock A: $60 billion • Stock B: $30 billion • risk-free: $10 billion The market portfolio of Stock A to Stock B is 2:1. CAPM says that under equilibrium, all investors will hold Stock A to Stock B in a 2:1 ratio. Therefore, the tangency portfolio puts weight 2/3 on Stock A and 1 /3 on Stock B and all investors will have two-thirds of their allocation to risky assets in Stock A and one-third in Stock B. 6.1. INTRODUCTION TO CAPM 143 Suppose there was too little of Stock A and too much of Stock B for everyone to have a 2:1 allocation. For example, suppose that there were one million shares of each stock and the price per share was $60 for Stock A and $40 for Stock B. Then the market portfolio must hold Stock A to Stock B in a 3:2 ratio, not 2:1. Not everyone could hold the tangency portfolio, though everyone would want to. Thus, prices would be in disequilibrium and would change. The price of Stock A would go up since the supply of Stock A is less than the demand. Similarly the price of Stock B would go down. As these prices changed, so would expected returns and the tangency portfolio would change. These changes in prices and expected returns would stop when the market portfolio was equal to the tangency portfolio, so that prices were in equilibrium. At least, this adjustment to equilibrium would happen under the ideal conditions of economic theory. The real world would be a little messier. The underlying message from theory, is however, correct. Prices adjusts as all investors look for an efficient portfolio and supply and demand converge to each other. The market portfolio is 9:1 risky to risk-free. In total, investors must hold risky to risk-free in a 9:1 ratio — they are the market. For an individual investor, the risky:risk-free ratio will depend on that investor's risk aversion. • At one extreme, a portfolio of all risk-free has a standard deviation of returns equal to 0 • At the other extreme, all risky assets, the standard deviation is maximized. (This assumes no margin. If we allow negative positions in the risk-free, then there is no limit to the risk) At equilibrium, returns on risky and risk-free assets are such that aggregate demand for risk-free assets equal supply. 144 CHAPTER 6. THE CAPITAL ASSET PRICING MODEL: 3/26/01 6.2 The capital market line (CML) The capital market line (CML) relates the excess expected return on an efficient portfolio to its risk; "excess expected return" means the amount by which the expected return exceeds the risk-free rate of return. The CML is ßR = fif-\----------------PR, (6.1) cm where R is the return on a given efficient portfolio (mixture of the market portfolio and the risk-free asset), jj,r = E(RM), /j,f is the rate of return on the risk-free asset, RM is the return on the market portfolio, aM is the standard deviation of the return on the market portfolio, and oR is the standard deviation of return on the portfolio. The slope of the CML is, of course, ßM ~ ßf which can be interpreted as the ratio of the "risk premium" to the standard deviation of the market porffolio. This is Sharpe's "reward-to-risk ratio." Equation (6.1) can be rewritten as ßR — ßF _ ßM — ßF vr om which says that the reward-to-risk ratio for any efficient portfolio equals that ratio for the market portfolio. Example: Suppose that risk-free rate of interest is /j,f = 0.06, that the expected return on the market portfolio is /iM = .15, and the risk of the market portfolio is uM = 0.22. Then the slope of the CML is (.15 - .06)/.22 = 9/22. The CML of this example is illustrated in Figure 6.1. The CML is easy to derive. Consider an efficient portfolio that allocates a proportion w of its assets to the market portfolio and (1 — w) to the risk-free asset. Then R = wRM + (1 - w)(j,f = (if + w(RM - (if). 6.2. THE CAPITAL MARKET LINE (CML) 145 Therefore, Ate = A*/ + w(aíb - A*/)- (6-2) Also, or «, = ^. (6.3) Substituting (6.3) into (6.2) gives the CML. 146 CHAPTER 6. THE CAPITAL ASSET PRICING MODEL: 3/26/01 O std de v of return .22 Figure 6.1: CML when \ij = 0.06, \iM = 0.15, and om = 0.22. All efficient portfolios are on the line connecting the risk-free asset (F) and the market portfolio (M). Therefore, the reward-to-risk ratio is the same for all efficient portfolios, including the market portfolio. 6.3. BETAS AND THE SECURITY MARKET LINE 147 CAPM says that the optimal way to invest is to: 1. Decide on the risk a r that you can tolerate, 0 < a r < o m- (or > er m is possible by borrowing money to buy risky assets.) 2. Calculate w = orJom- 3. Invest w proportion of your investment in an index fund, i.e., a fund that tracks the index. 4. Invest 1 — w proportion of your investment in risk-free treasury bills, or a money-market fund that invests in T-bills. Alternatively, 1. Choose the reward jj,r — /j,f that you want. 2. Calculate (J-R — M/ w =-----------. Mm — ßf 3. Do steps 3 and 4 as above. One can view w = orJom as is an index of the risk aversion of the investor. The smaller the value of w the more risk averse the investor. If an investor has w equal to 0, then that investor is 100% in risk-free assets. Similarly, an invest with w = 1 is totally invested in the tangency portfolio of risky assets. 6.3 Betas and the Security Market Line The Security Market Line (SML) relates the excess return on an asset to the slope of its regression on the market portfolio. Suppose that there are many securities indexed by j. Define 1 =4> "aggressive" ßj = 1 => "average risk" ßj < I => "not aggressive". 6.3. BETAS AND THE SECURITY MARKET LINE 149 Figure 6.2 illustrated the SML and an asset, J, that is not on the SML. This asset contradicts the CAPM; according to CAPM no such asset exists. Risk premium Figure 6.2: Security market line (SML) showing that the risk premium of an asset is a linear function of the asset's beta. J is a security not on the line and a contradiction to CAPM. Theory predicts that the price of] will decrease until J is on the SML. Consider what would happen if an asset like J did exist. Investors would not want to buy it because its risk premium is too low. They would invest less in J and more in other securities. Therefore the price of J would 150 CHAPTER 6. THE CAPITAL ASSET PRICING MODEL: 3/26/01 decline and its expected return would increase. After that increase, the asset J would be on the SML, or so the theory predicts. In other words, J is mispriced according to CAPM. 6.3. BETAS AND THE SECURITY MARKET LINE 151 Stock (symbol) Industry Stock's ß Ind's ß Celanese (CZ) Synthetics 0.13 0.86 General Mills (GIS) Food - major diversif 0.29 0.39 Kellogg (K) Food - major, diversif 0.30 0.39 Proctor & Gamble (PG) Cleaning Prod 0.35 0.40 Exxon-Mobil (XOM) Oil/gas 0.39 0.56 7-Eleven (SE) Grocery stores 0.55 0.38 Merck (Mrk) Major drug manuf 0.56 0.62 McDonalds (MCD) Restaurants 0.71 0.63 McGraw-Hill (MHP) Pub - books 0.87 0.77 Ford (F) Auto 0.89 1.00 Aetna (AET) Health care plans 1.11 0.98 General Motors (GM) Major auto manuf 1.11 1.09 AT&T (T) Long dist carrier 1.19 1.34 General Electric (GE) Conglomerates 1.22 0.99 Genentech (DNA) Biotech 1.43 0.69 Microsoft (MSFT) Software applic. 1.77 1.72 Cree (Cree) Semicond equip 2.16 2.30 Amazon (AMZN) Net soft & serv 2.99 2.46 Doubleclick (Dclk) Net soft & serv 4.06 2.46 6.3.1 Examples of betas Netscape's home page has a link to stock quotes from Salomon Smith Barney. If you request a quote on a stock, you will be given menu for choosing further information about the company. Under "profile" you will find the five-year beta of the company, its industry, and the S&P 500. Table 6.3.1 has some "five-year betas" that I took from net on February 27 and March 5, 2001. The beta for the S&P 500 is given as 1.00; why? 6.3.2 Comparison of the CML with the SML The CML applies only to the return R of an efficient portfolio. It can be arranged so as to relate the excess expect return of that portfolio to the 152 CHAPTER 6. THE CAPITAL ASSET PRICING MODEL: 3/26/01 excess expected return of the market portfolio: Mä - A*/ = (— ) (Mm - /•*/)• (6-5) The SML applies to any asset and like the CML relates its excess expected return to the excess expected return of the market portfolio: fij-fif = ßj(fiM-fif). (6.6) If we take an efficient portfolio and consider it as an asset, then /j,r and fj,j both denote the expected return on that portfolio/asset. Both (6.5) and (6.6) hold so that 6.4 The security characteristic line Let Rjt be the return at time t on the jth asset. Similarly, let RMt and ßß be the return on the market portfolio and the risk-free return at time t. The security characteristic line (sometimes shortened to the characteristic line) is a regression model: Rjt = Vft + ßj(RMt - M/t) + ejť, (6-7) where tjt is iV(0, of^-). It is often assumed that the e^'s are uncorrelated across assets, that is, that ejt is uncorrelated with e,-/ť for j ^ j'. This assumption has important ramifications for risk reduction by diversification; see Section 6.4.1. Let fij = E (Rjt) and /j,m = E(RMt). Taking expectations in (6.7) we get \H = i*f + ßjim - Vf), which is our friend the SML again. The SML gives us information about expected returns, but not about the variance of the returns. For the latter we need the characteristic line. The characteristic line is said to be a 6.4. THE SECURITY CHARACTERISTIC LINE 153 "returning generating process" since it gives us a probability model of the returns, not just a model of their expected values. An analogy to the distinction between the SML and characteristic line is this. The regression line E[Y\X) = ß0 + ßiX gives of the expected value of Y given X but not the conditional probability distribution of Y given X. The regression model Yt = ßo + ßiXt + eť, and et ~ JV(0, a2) does give us this conditional probability distribution. The characteristic line implies that 2 o2 2 , 2 aj = Pj^M + Veji a j j, = ßjßj>a2M for j ^ j', and that &Mj = ßj&M- The total risk of the jth asset is The risk has two components: ß2o2M is called the market or systematic component of risk and a2e is called the unique, nonmarket, or unsystematic component of risk. 6.4.1 Reducing unique risk by diversification The market component cannot be reduced by diversification, but the unique component can be reduced in this way. Suppose that there are JV assets with returns Ru,..., RNt for holding period t. If we form a portfolio with weights wi,..., wN then the return of the portfolio is Rpt = wxRlt -\-------h wNRNt. 154 CHAPTER 6. THE CAPITAL ASSET PRICING MODEL: 3/26/01 Let RMt be the return on the market portfolio. According to the characteristic line model Rjt = (iß + ßj{RMt — (iß) + ejt so that (N \ N Y^ ßjWj {RMt - fJ'Ft) + X] W3e3t- Therefore, the portfolio beta is N ßP = YdWjßj, and the "epsilon" for the portfolio is N €pt = J2W3eJt- We will now assume that en,..., e^vt ^re uncorrelated. Therefore, 2 V"^ 2 2 Example Suppose that Wj = 1/N for all j. Then Ä z2j=i ßj N ' and 2 _ iV-1 Ef=14 _ ^ where of is the average of the a^. If of- is a constant, say of for all j, then N For example, suppose that j given to the zth risky asset and weight (1 — Wi) given to the market portfolio. The return on this portfolio is Rpt = WiRit + (1 - Wi)RMf The expected return is Up = Wim + (1 - Wi)/j,M, and the risk is up = yjwfaf + (1 - Wi)2a2M + 2iuť(l - Wi)aiM- As we vary Wi we get the locus of points on (a, n) space that is shown as a blue curve in Figure 6.3. Key idea: The derivative of this locus of points evaluated at the market portfolio is equal to the slope of the CML. We can calculate this derivative and equate it to the slope of the CML to see what we get. The result will be the SML. 6.5. SOME THEORY 157 Efficient frontier portfolios of M and i Figure 6.3: Derivation of the SML. M is the market portfolio and T is the tan-gency portfolio; they are equal according to the CAPM. The blue curve is the locus of portfolios combining asset i and the market portfolio. The derivative of this curve at M is equal to the slope of the CML, since this curve is tangent to the CML at M. 158 CHAPTER 6. THE CAPITAL ASSET PRICING MODEL: 3/26/01 We have ~; = IM ~ (Ami dWi and d(Jp 1 _i -Up1 {2wi<7i - 2(! - wi)aM + 2(! - 2wí) 0 then the security if underpriced; the returns are too large on average. This is an indication of an asset worth purchasing. Of course, one must be careful. If we reject the null hypothesis that a = 0, all we have done is shown that the security was mispriced in the past. Since for the Microsoft data we accepted the null hypothesis that a is zero, there is no evidence that Microsoft was mispriced. 6.7 Summary The CAPM assumes that prices are in equilibrium, that everyone has the same forecasts of returns, and that everyone uses the principles of portfolio selection introduced in Chapter 5. The CAPM assumptions imply that everyone will hold risk efficient portfolios which mix the tangency portfolio and risk-free assets. This fact implies that the market portfolio will equal the tangency portfolio. A further consequence is that the Sharpe ratio for any efficient portfolio will equal the Sharpe ratio of the market portfolio: or 0 and = 0 if x < 0. With this notation, the value of a call at exercise date is {St - E)+, where E is the exercise data and ST is stock's price on the exercise data, T. A call is never exercised if the strike price is greater than the price of the stock, since exercising the option would amount to buying the stock for more than it would cost on the market. If a call is not exercised, then one loses the cost of the option. One can lose money on an option even if it is exercised, because the amount gained by exercising the option might be less than the cost of the option. In the example above, if Stock A were selling for $71 at the exercise data, then one would exercise the option and gain $100. This would be less than the $200 paid for the option. Even though exercising the option results in a loss, then loss is less than it would be if the option were not exercised. An option should always be exercised if (St — E) is positive. 7.3 The law of one price The "law of one price" states that if two financial instruments have exactly the same payoffs, then they will have the same price. This prinicple is used 170 CHAPTER 7. PRICING OPTIONS: 4/12/01 to price options. To valuate an option, one must find a portfolio or a self-financing1 trading strategy with a known price and which has exactly the same payoffs as the option. The price of the option is then known; it must be the same as the price of the portfolio or self-financing trading strategy. Here's a simple example of pricing by the law of one price. Suppose stock in company A sells at $100/share. The risk-free rate of borrowing is 6% compounded annually. Consider a futures contract obliging one party to sell to the other party one share of Company A exactly one year from now at a price P. (No money changes hands now.) What is the fair market price, i.e., what should P be? Note that this contract is not an option. The sale must take place. It would seem that P should depend on the expected price of company A stock one year from now. However, this is not the case. Consider the following strategy. The party that, one year from now, must sell the share of company A can borrow $100 and buy one share now; this involves no capital since the share is purchased with borrowed money. A year from now that party sells the share for P dollars and pays back $106 (principle plus interest) to the lender, who could be a third party. The profit is P — 106. The fair profit is 0 since no capital was used and there is no risk. Therefore, P should be $106. Consider what would happen if P were not $106. You should be able to see that any other value of P besides $106 would lead to unlimited risk-free profits. As investors rushed in to take advantage of this situation, the market would immediately correct the value of P to be $106. 1A trading strategy is "self-financing" if it requires no investment other than the initial investment. After the initial investment, any further purchases of assets are financed by the sale of other assets or by borrowing. 7.4. TIME VALUE OF MONEY AND PRESENT VALUE 171 7.3.1 Arbitrage Arbitrage is the making of a guaranteed risk-free profit by trading in the market with no invested capital2. Speaking informally, arbitrage is a "free lunch." The arbitrage price of a security is the price that guarantees no arbitrage opportunities. The law of one price is equivalent to stating that the market is free of arbitrage opportunities, i.e., that there are no free lunches. Arbitrage pricing is the same as pricing by the law of one price. The price of $106 that we just derived in the example of the future contract is, therefore, the arbitrage price. 7.4 Time value of money and present value "Time is money" is an old adage that is still true. A dollar a year from now is worth less to us than a dollar now. In finance it is essential that we be able to convert value in future payments to their present values, or vice versa. For example, we saw in Section 7.1 that the arbitrage enforced future price of a stock is simply the present price converted into a "future value" by multiplying by 1 + r. Let r be the risk-free annual interest rate. Then the "present value" of $D dollars one year from now is $D/(1 + r) without compounding or $Dexp(—r) under continuous compounding. Another way of stating this is that $D dollars now is worth $(l+r)D dollars a year from now without compounding, or $(exp(r)) D dollars a year from now under continuous compounding. When $D is a future cash flow, then its present value is also called a discounted value and r is the discount rate. The distinction between simple and compounding is not essential since an interest rate of r without compounding is equivalent to an interest rate of r' with continuously compounding where 1 + r = exp (r') investing in risk-free T-bills guarantees a positive net return but is not arbitrage since capital is invested. 172 CHAPTER 7. PRICING OPTIONS: 4/12/01 so that r = exp(r') — 1 or r' = log(l + r). We will work with both simple and compound interest, whichever is most convenient. Examples If r = 5%, then r' = log(1.05) = .0488 or 4.88%. If r' = 4%, then r = exp(.04) - 1 = 1.0408 - 1 or 4.08%. In general, r > r' Occasionally, we will simplify life by making the unrealistic assumption that r = 0 so that present and future values are equal. This simplifying assumption allows us to focus on other concepts besides discounting. 7.5 A simple binomial example We will start our study of options with a very simple example. Suppose that a stock is currently selling for $80. At the end of one time period it can either have increased to $100 or decreased to $60. What is the current value of a call option that allows one to purchase one share of the stock for $80, the exercise price, after one time period? At the end of the time period, the call option will be worth $20 ($100 — $80) if the stock has gone up and worth $0 dollars if the stock has gone down. See Figure 7.1. However, the question is "what is the option worth noiu?" This question if vital since the answer is, of course, the fair market price for the option at the current time. One might think that the current value of the option depends on the probability that the stock will go up. However, this is not true. The current value of the option depends only on the rate of risk-free borrowing. For 7.5. A SIMPLE BINOMIAL EXAMPLE 173 Stock 30 Option 60 GIVEN: Exercise price = $80 Hedge ratio = 1/2 Buj 1/2 share Borrow: $30 Initial value of portfolio = 1/2(80) -30= $10 This must be the value of the call option. Figure 7.1: Example of one-step binomial option pricing. For simplicity, it is assumed that the interest rate is 0. The portfolio of 1/2 share of stock and —$30 of risk-free assets replicates the call option. 174 CHAPTER 7. PRICING OPTIONS: 4/12/01 simplicity, we will assume that this rate is 0; later we will see how to valu-ate options when the rate of interest is positive. It turns out that the value of the option is $10. How did I get this value? Consider the following investment strategy. Borrow $30 and buy one-half of a share of stock. The cost upfront is $40 — $30 = $10, so the value now of the portfolio is $10. If after one time period the stock goes up, then the portfolio is worth 100/2 — 30 = 20 dollars. If the stock goes down, then the portfolio is worth 60/2 — 30 = 0 dollars. Thus after one time period, the portfolio's value will be exactly the same as the value of the call option, no matter which way the stock moves. By the law of one price, the value of the call option now must be the same as the value now of the portfolio which is $10. Let's summarize what we have done. We have found a portfolio of the stock and the risk-free asset that replicates the call option. The current value of the portfolio is easy to calculate. Since the portfolio replicates the option, the option must have the same value as the portfolio. Suppose we have just sold a call option. By purchasing this portfolio we have hedged the option. By hedging is meant that we have eliminated all risk, because the net return of selling the option and purchasing the portfolio is exactly 0 no matter what happens to the stock price. How did I know that the portfolio should be 1/2 share of stock and —$30 in cash? I didn't use trial-and-error; that would have been tedious. Rather, I used the following logic. First, the volatility of the stock is $100 — $60 = $40 while the volatility of the option is $20 — $0 = $20. The ratio of the volatility of the option to the volatility of the stock is 1/2; this is called the hedge ratio. If the portfolio is to exactly replicate the option, then the portfolio must have exactly the same volatility as the option; this means the portfolio must have one-half a share. Key point: The number of shares in the portfolio must equal the hedge 7.5. A SIMPLE BINOMIAL EXAMPLE 175 ratio, where , . volatility of option hedge ratio = —-——-------^—r—. volatility of stock If the stock goes down, the portfolio is worth $30 minus the amount borrowed. But we want the portfolio's value to equal that of the option, which is $0. Thus, the amount borrowed is $30. Key point: We can determine the amount borrowed by equating the value of the portfolio when the stock goes down to the value of the option when the stock goes down. (Alternatively, we could equate the value of the portfolio to the value of the option when the stock goes up. This would tell us that $50 minus the amount borrowed equals $20, or that $30 must be borrowed.) Now suppose that the interest rate is 10%. Then, we borrow $30/(1.1) = $27.27 so that the amount owed after one year is $30. The cost of the portfolio is 40 - 30/1.1 = $12.7273. Thus, the value of the option is $12.7273 if the risk-free rate of interest if 10%. This value is higher than the value of the option when the risk-free rate is 0, because the initial borrowing used by the self-financing strategy is more expensive when the interest rate is higher. Here's how to valuate one-step binomial options for other values of the parameters. Suppose the current price is s± and after one time period the stock either goes up to s3 or down to s2. The exercise price is E. The risk-free rate of interest is r. It is assumed that s2 < E < s3, so the option is exercised if and only if the stock goes up.3 Then the hedge ratio is c S3-E «3 -82 (7.1) 3If S2 < S3 < E, then the option will not be exercised under any circumstances. We are certain the option will be worthless, so its price must be 0. If E < s% < S3, then the option will always be exercised; it really isn't an option, it is a futures contract. We have already seen how to valuate a futures contract—that was done in Section 7.3. 176 CHAPTER 7. PRICING OPTIONS: 4/12/01 This is the number of shares of stock that are purchased; the cost is Ssi. The amount borrowed is TT-- <7'2) 1 + r and the amount that will be paid back to the lender will be 5S2. Therefore, the price of the option is fU-^-UÜzI L--*-}. (7.3) I 1+rJ s3 - s2 I 1 + rJ If the stock goes up, then the option is worth (s3 — E) and the portfolio is also worth (s3 — E). If the stock goes down, both the option and the portfolio are worth 0. Thus, the portfolio does replicate the option. Example In the example analyzed before, si = 80, s3 = 100, s2 = 60, and E = 80. Therefore, 100 - 80 _ 1 ~ 100 - 60 ~ 2" The price of the option is 1 r„„ 60 2 I 1 + r) which is $10 is r = 0 and $12.7273 is r = 0.1. The amount borrow is 5s2 _ (1/2)60 _ 30 1 + r 1 + r 1 + r' which is $30 if r = 0 and $27.27 if r = .1. 7.6 Two-step binomial option pricing A one-step binomial model for a stock price may be realistic for very short maturities. For longer maturities, multiple-step binomial models are needed. A multiple-step model can be analyzed by analyzing the individual steps, going backwards in time. 7.6. TWO-STEP BINOMIAL OPTION PRICING 177 Stock Option 100 jS 20 Figure 7.2: Two-step binomial model for option pricing. To illustrate multi-step binomial pricing, consider the two-step model of a European call option in Figure 7.2. The option matures after the second step. The stock price can either go up $10 or down $10 on each step. Assume that r = 0. Using the pricing principles just developed and working backwards, we can fill in the question marks in Figure 7.2. See Figure 7.3. For example, at node B, the hedge ratio is 5 = 1 so we need to own one share which at this node is worth $90. Also, we need to have borrowed 5s2/(l + r) = (1)(80)/(1 + 0) = $80 so that our portfolio has the same value at nodes E and F as the option, that is, the portfolio should be worth $0 at node E and $20 at node F. Since at node B we have stock worth $90 and risk-free worth —$80, the net value of our portfolio is $10. By the same reasoning, at node C the hedge ratio is 0 and we should have no stock and no borrowing, so our portfolio is worth $0. Given: exercise price is $30 178 CHAPTER 7. PRICING OPTIONS: 4/12/01 Option Given: exercise price is $30 Figure 7.3: Pricing the option by backwards induction. We can see in Figure 7.3 that at the end of the first step the option is worth $10 is the stock is up (node B) and $0 if it is down (node C). Applying one step pricing at node A, at the beginning of trading the hedge ratio is 1/2 and we should own 1/2 share of stock (worth $40) and we should have borrowed $35. Therefore, the portfolio is worth $5 at node A, which proves that $5 is the correct price of the option. Note the need to work backwards. We could not apply one-step pricing at node A until we had already found the value of the portfolio (and of the option) at nodes B and C. Let's show that our trading strategy is self-financing. To do this we need to show that we invest no money other than the initial $5. Suppose that the stock is up on the first step, so we are at node B. Then our portfolio is worth $90/2 — $35 or $10. At this point we borrow $45 and buy another half-share for $45; this is self-financing. If the stock is down on the first 7.7. ARBITRAGE PRICING BY EXPECTATION 179 step, we self the half share of stock for $35 and buy off our debt; again the step is self-financing. 7.7 Arbitrage pricing by expectation It was stated earlier that one prices an option by arbitrage, that is, the price is determined by the requirement that the market be arbitrage-free. The expected value of the option is not used to price the option. In fact, we do not even consider the probabilities that the stock moves up or down. However, there is a remarkable result showing that arbitrage pricing can be done using expectations. More specifically, there exists probabilities of the stock moving up and down such that the arbitrage price of the option is equal to the expected value of the option according to these probabilities. Whether these are the "true" probabilities of the stock moving up or down is irrelevant. The fact is that these probabilities give the correct arbitrage price when they are used to calculate expectations. Let "now" be time 0 and let "one step ahead" be time 1. Because of the time value of money, the present value of $D dollars at time 1 is $D/(1 + r) where r is the interest rate. Let /(2) = 0 and /(3) = S3 — E be the values of the option if the stock moves up or down, respectively. We will now show that there is a value of q between 0 and 1, such that the present value of the option is 7^/(3)+ (l-s)/(2)}. (7.4) The quantity in (7.4) is the present value of the expectation of the option at time 1. To appreciate this, notice that the quantity in curly brackets is the value of the option if the stock goes up times q, which is the arbitrage determined "probability" that the stock goes up, plus the option's value if the stock goes down times (1 — q). Thus the quantity in curly brackets is the expectation of the value's option at the end of the holding period. Dividing by 1 + r converts this to a "present value." 180 CHAPTER 7. PRICING OPTIONS: 4/12/01 Okay, how do we find this magical value of q? That's easy. We know that q must satisfy Tl7{9/(3) + (l-9)/(2)} = |^|{Sl-T|7}. (7.5) since the left hand side of this equation is (7.4) and the right-hand side is the value of the option according to (7.3). Substituting /(2) = 0 and /(3) = s3 — E into (7.5) we get an equation that can be solved for q to find that q = (1 + r)5l~S2. (7.6) «3 - S2 We want q to be between 0 and 1 so that it can be interpreted as a probability. From (7.6) one can see that 0 < q < 1 if s2 < (1 + r)si < s3. Why should the latter hold? We will show that s2 < (1 + r)si < s3 is required in order for the market to be arbtrage-free. If we invest si in a risk-free asset at time 0, then the value of our holdings at time 1 will be (1 + r)si. If we invest si in the stock, then the value of our holdings at time 1 will be either s2 or s3. If s2 < (1 + r)s1 < s3 were not true, then there would be an arbitrage opportunity. For example, if (1 + r)s\ < s2 < S3, then could borrow at the risk-free rate and invest the borrowed money in the stock with a guaranteed profit; at time 1 we would pay back (1 + r)si and receive at least s2 which is greater that (1 + r)si. Exercise: How would we make a guaranteed profit if s2 < s3 < (1 + r)si? Answer: Sell the stock short and invest the si dollars in the risk-free asset. At the end of the holding period (maturity) receive (1 + r)si from the risk-free investment and buy the stock for at most s3 < (1 + r)si. Thus, the requirement that the market be arbitrage-free ensures that 0 < q<\. 7.8. A GENERAL BINOMIAL TREE MODEL 181 Figure 7.4: Two-step non-recombinant tree. The q(j) is the risk-neutral probability at node j of the stock moving upioard. 7.8 A general binomial tree model The material in this section follows Chapter 2 of Financial Calculus by Baxter and Rennie. Consider a possibility non-recombinant4 tree as seen in Figure 7.4 Assume that: • At the jth node the stock is worth s j and the option is worth f(j). 4The tree would be recombinant if the stock prices at nodes 5 and 6 were equal so that these two nodes could be combined. 182 CHAPTER 7. PRICING OPTIONS: 4/12/01 • The jth node leads to either the 2 j + 1th node or the 2jth node after one time "tick." • The actual time between ticks is St. • Interest is compounded continuously at a fixed rate r so that B0 dollars now is worth exp(rn 5t)B0 dollars after n time ticks. (Or, B0 dollars after n ticks is worth exp(—rn5t)Bo dollars now.) Then at node j: • The value of the option is f(j) = exp(-r6t){qjf(2j + 1) + (1 - qj)f(2j)}. where • The arbitrage determined qj is e s j S2j ,„ _, Qj =--------------■ (7-7) S2J+1 - S2j The number of shares of stock to be holding is ± f(2j + 1) - f(2j) $2j+l - S2j = hedge ratio. • Denote the amount of capital to hold in the risk-free asset by ipj) typically tpj is negative because money has been borrowed. Since the portfolio replicates the option, at node j the option's value, which is f(j), must equal the portfolio's value which is Sjifij + ipj. Therefore, b = UU) - &}■ (7-8) (ißj increases in value to erSt{f(j) — 4>jSj} after one more time tick). Expectations for paths along the tree are computed using the q/s. The probability of any path is just the product of all the probabilities along the path. 7.9. MARTINGALES 183 An example The tree for the example of Section 7.6 is shown in Figure 7.5. Because r = 0 is assumed and because the stock moves either up or down the same amount ($10), the qj are all equal to 1/2.5 The probability of each full path from node 1 to one of nodes 4, 5, 6, or 7 is 1/4. Given the values of the option at nodes 4,5, 6, and 7, it is easy to compute the expectations of the option's value at other nodes. These expectations are shown in magenta in Figure 7.5. The path probabilities are independent of the exercise price, since they depend only on the prices of stock at the nodes and on r. Therefore, it is easy to price options with other exercise prices. Exercise Assuming the same stock price process as in Figure 7.5, price the call option with an exercise price of $70. Answer: Given this exercise price, it is clear that the option is worth $0, $10, $10, and $30 dollars at nodes 4, 5, 6, and 7, respectively. Then we can use expectation to find that the option is worth $5 and $20 at nodes 2 and 3, respectively. Therefore, the option's value at node 1 is $12.50; this is the price of the option. 7.9 Martingales A martingale is a probability model for a fair game, that is, a game where the expected changes in one's fortune are always zero. More formally, a 5It follows from (7.7) that whenever r = and the up moves and down moves are of equal length, then qj = 1/2 for all j. 184 CHAPTER 7. PRICING OPTIONS: 4/12/01 Figure 7.5: Two-step example with pricing by probabilities. Red is node number. Blue is value of the stock. Magenta is value of the option. Path probabilites are in dark green. The exercise price is $80. 7.9. MARTINGALES 185 stochastic process Y0, Yi, Y2,... is a martingale if E(Yt+1\Yt) = Yt for all t. Let Pt, t = 0,1,... be the price of the stock at the end of the ith step in a binomial model. Then Pt* := exp(—rt 5t)Pt is the discounted price process. Key fact: Under the {qj} probabilities, the discounted price process Pr* is a martingale. To see that Pr* is a martingale, we calculate: E(Pt+1 \Pt = Sj) = qjS2j+i + (1 - qj)s2j = s2j + Qj(s2j+i - s2j) = s2j + {exp(r 5t)sj — s2j} = exp(r5ť)Sj. This holds for all values of Sj. Therefore, E{Pt+1\Pt) = exp(r5t)Pt, so that E{exp{-r(t + l)5t)Pt+1\Pt)=exp{-rt6t)Pt, or e{p;+1\p;) = p;. This shows that Pr* is a martingale. Any set of path probabilities, {pj}, is called a measure of the process. The measure {qj} is called the martingale measure or the risk-neutral measure. We will also call {qj} the risk-neutral path probabilities. 7.9.1 The risk-neutral world If all investors were risk-neutral, that is, indifferent to risk, then there would be no risk premiums and all expected asset prices would rise at 186 CHAPTER 7. PRICING OPTIONS: 4/12/01 the risk-free rate. Therefore, all discounted asset prices, with discounting at the risk-free rate, would be martingales. We know that we do not live in such a risk-free world, but there is a general prinicple that expectations taken with respect to a risk-neutral model give correct, i.e., arbitrage-free, prices of options and other financial instruments. Example In Section 7.3 it was argued that if a stock is selling at $100/share and the risk-free interest rate is 6%. then the correct future delivery price of a share one year from now is $106. We can now calculate this value using the risk-neutral measure—in the risk-neutral world, the expected stock price will increase to exactly $106 one year from now. 7.10 Trees to random walks to Brownian motion 7.10.1 Getting more realistic Binomial trees are useful because they illustrate several important concepts, in particular: • arbitrage pricing • self-financing trading strategies • hedging • computation of arbitrage prices by expectations with respect to an appropriate set of probabilities called the risk-neutral measure However, binomial trees are not realistic, because stock prices are continuous, or at least approximately continuous. This lack of realism can be alleviated by increasing the number of step. In fact, one can increase the 7.10. TREES TO RANDOM WALKS TO BROWNIAN MOTION 187 number of steps without limit to derive the Black-Scholes model and formula. That is the goal of Section 7.11. The present section will get us closer to that goal. 7.10.2 A three-step binomial tree Figure 7.6 is a three-step tree where at each step the stock price either goes up $10 or down $10. Assume that the risk-free rate is r = 0. Now consider the price of the stock, call it Pt at time t where t = 0,1, 2,3. Using the risk-neutral path probabilities, which are each 1/2 in this example, Pt is a stochastic process, that is a process that evolves randomly in time. In fact, since Pť+1 equals Pt± $10, this process is a random walk. We have Pt = P0 + ($10){2{W1 + ... + Wt)-t} (7.9) where Wi, ■ ■ ■, Ws are independent and Wt equal 0 or 1, each with probability 1 /2. If Wt is 1, then 2Wt — 1 = 1 and the price jumps up $10 on the ith step. If Wt is 0, then 2Wt — 1 = — lthe price jumps down $10. The random sum W1 + • ■ • + Wt is Binomial(t, 1/2) distributed and so has a mean of t/2 and variance equal to t/A. The value of the call option is P{(P3 - E)+} (7.10) where x+ equals x if x > 0 and equals 0 otherwise. The expectation in (7.10) is with respect to the risk-neutral probabilities. Since W\ + W2 + W3 is Binomial(3,1/2), it equals 0,1, 2, or 3 with probabilities 1/8, 3/8, 3/8, and 1/8, respectively. Therefore, P{(P3 -E)+} = ^[{P0-30-£ + (20)(0)}+ + 3{P0-30-£+(20)(l)}_ + 3{P0 - 30 - E + (20)(2)}+ + {P0 - 30 - E + (20)(3)}+]. 188 CHAPTER 7. PRICING OPTIONS: 4/12/01 All qsare 1/2 50 21.25 100 time=0 time=l time =2 Exercise price = $30 Figure 7.6: Three-step example of pricing a European call option by probabilities. Red is node number. Blue is value of the stock. Magenta is value of the option. Risk-neutral path probabilites are not shown, but they are all equal to 1/2. The exercise price is $80. The risk-free rate is r = 0. 7.10. TREES TO RANDOM WALKS TO BROWNIAN MOTION 189 Examples If P0 = 100 and E = 80, then PQ - 30 - E = -10 and E{(P,-E)+} = I{(-10 + 0)+ + 3(-10 + 20) + + 3(-10 + 40)+ + (-10 + 60) + } = -(0 + 30 + 90 + 50) = —= 21.25 8 8 as seen in Figure 7.6. Similarly, if P0 = 100 and E = 100, then P0 - 30 - E = -30 and E{(P3-E)+} = I{(-30 + 0)+ + 3(-30 + 20)+ + 3(-30 + 40)+ + (-30 + 60) + } 1/ ,60 = -(0 + 0 + 30 + 30) = — = 7.5 8 8 7.10.3 More time steps Let's consider a call option with maturity data equal to 1. Take the time interval [0, 1] and divide it into n steps, each of length 1/n. Suppose that the stock price goes up or down a/y/n at each step. Then the price after m steps (0 < m < n) when t = m/n is Prn/n = Po + ^{2(1^1 + ■ ■ ■ + Wm) - m}. (7.11) Since, W\ +-----h Wm is Binomial(m, 1/2) distributed, it follows that E(Pt\P0) = PQ. (7.12) and TT ,^,^x 4 oo, Pi converges to a N(P0, a2) random variable. Let E be the exercise price. Remember that the value of an option is the expectation with respect to the risk-neutral measure of the present value of the option at expiration. Therefore, in the limit, as the number of steps goes to oo, the price of the option converges to E{(P0 + aZ-E)+} (7.15) where Z is N(0,1) so that P1 = P0 + aZ is N{P0, a2). For a fixed value of n, Pt is a discrete time stochastic process since t = 0,1/n, 2/n,..., (n — l)/n, 1. In fact, as we saw before, for any finite value of n, Pt is a random walk. However, in the limit as n —> oo, Pt becomes a continuous time stochastic process. This limit process is called Brownian motion. In other words, the continuous time limit of random walks is Brownian motion. 7.10.4 Properties of Brownian motion We have seen that Brownian motion is a continuous-time stochastic process that is the limit of discrete-time random walk processes. A Brownian motion process, Bt, starting at 0, i.e., with B0 = 0, has the following mathematical properties: 1. E{Bt) =0 for all t. 2. Var(Bt) = ta2 for all t. Here a2 is the volatility of Bt. 3. Changes over non-overlapping increments are independent. More precisely, if t1 < t2 < ts < i4 then Bt2 — Btl and Bti — Bt3 are independent. 4. Bt is normally distributed for any t. If B0 is not zero, then each of these properties holds for the process Bt — Bü/ which is the change in Bt from times 0 to t. All of these properties but the last are shared to random walks with mean-zero steps. 7.11. GEOMETRIC BROWNIAN MOTION 191 7.11 Geometric Brownian motion Random walks are not realistic models for stock prices, since a random walk can go negative. Therefore, (7.15) is close to but not quite the correct price of the option. To get the correct price we need to make our model more realistic. We saw in Chapter 3 that geometric random walks are much better than random walks as models for stock prices since geometric random walks are always non-negative. We will now introduce a binomial tree model that is geometric random walk. We do this by making the steps proportional to the current stock price. Thusy if s is the stock price at the current node, then price at the next node is sexp(fi/n ± a/y/n) = (sup, sdown). Notice that the log of the stock price is a random walk since (log(sUp), log(sdown)) = log(s) + -±—=. Tl \/Tl Therefore, the stock price process is a geometric random walk. There is a drift if n ^ 0, but we will see that the amount of drift is irrelevant. We could have set the drift equal to 0 but we didn't to show later that the drift does NOT affect the option's price. The risk neutral probability of an up jump is sexp(r/n) - sdown Q = exp(r/n) — exp(/i/n — o j ^/n) exp(/i/n + oj^fn) — exp(fi/n — a/^/n) 1 / _ M-r + a2/2\ 2 I u^ ) ' Then a m Pt = Pm/n = PoexpUt + -= £(2Wť - 1) J . where as before Wi is either 0 or 1 (so 2Wi — 1 = ±1). 192 CHAPTER 7. PRICING OPTIONS: 4/12/01 Using risk-neutral probabilities, we have E{ ' ±m-A = ™(2pl) * (r-/.-5V2' ( cr2\ m , , ,„. and " ^ow 11 _ 4ff2™,(l - 9) _ , , since 5 —>• 1/2 as n —>■ oo. Therefore, in the risk-neutral world Pt « Po exp{(r - (72/2)i + aßr}, (7.16) where Bt is Brownian motion and 0 < t < 1. Time could be easily extended beyond 1 by adding more steps. We will assume that this has been done. Notice that (7.16) does NOT depend on /i, only on a. The reason is that in the risk-neutral world, the expectation of all assets increase at rate r. The rate of increase in the real world is /i but this is irrelevant for risk-neutral calculations. Remember that risk-neutral expectations DO give the correct option price in the real world even if they do not correctly describe real world probability distributions. If E is the exercise or strike price and T is the expiration date of a European call option, then the value of the option at maturity is [PQ exp{(r - a2/2)T + aBT} - e] . (7.17) Since BT - N(0,T), we can write BT = VTZ where Z - N(0,1). The discounted value of (7.17) is P0exp J -°— + aVTZ 1 - exp(-rT)E (7.18) 7.11. GEOMETRIC BROWNIAN MOTION 193 We will again use the principle that the price of an option is the risk-neutral expectation of the option's discounted value at expiration. By this prinicple, the call's price at time t = 0 is the expectation of of (7.18). Therefore, C = ! P0 exp J --— + ox/Tz I - exp(-rT)£ is the iV(0,1) pdf (probability density function). Computing this integral is not easy, but it can be done. The result is the famous Black-Scholes formula: Let So be the current stock price (we have switched notation from P0), let E be the exercise price, let r to be continuously compounded interest rate, let a be the volatility, and let T be the expiration date of a call option. Then by evaluating the integral in (7.19) it can be shown that C = $(eři)S0 - $(d2)Eexp(-rT) where $ is the standard normal CDF, ^M^I.H^t and d2 = di_aVŤ a y/T Example Here's a numerical example. Suppose that So = 100, E = 90, a = .4, r = .1, and T = .25. Then d = log(100/90) + {-l + (.4)2/2}(.25) = ^ 1 .4v^25 and d2 = d1 - A\/Ž25 = .5518. Then $(dx) = .7739 and $(d2) = .7095. Also, exp(-rT) = exp{(.l)(.25)} = .9753. Therefore, C = (100)(.7739) - (90)(.9753)(.7095) = 15.1. 194 CHAPTER 7. PRICING OPTIONS: 4/12/01 7.12 Using the Black-Scholes formula 7.12.1 How does the option price depend on the inputs? Figure 7.7 shows the variation in the price of a call option as the parameters change. The baseline values of the parameters are So = 100, E = lOOexp(rT), T = .25, r = .06, and a = .1. The exercise price E and initial price So have been chosen so that if invested at the risk-free rate, So would increase to E at expiration time. In each of the subplots in Figure 7.7, one of the parameters is varied while the others are held at baseline. One see that the price of the call increases with er. This makes sense since E = Soexp(rT) = E (St) in this example (E(St) is the risk-neutral expectation of St)- The expected value of E (St) is at the money. Thus, St is equally likely to be in the money or out of the money. As a increase, the likelihood that St is considerably larger than E also increases. As an extreme case, suppose that a = 0. Then in the risk-neutral world St = exp(rT)So = E and the option at expiration is at the money so its value is 0. The value at maturity is (St — E)+ so we expect that the price of the call will increase as So increases and decrease as E increases. This is exactly the behavior seen in Figure 7.7. Also, note that the price of the call increases as either r or T increases. 7.12.2 An example — GE Table 7.12.2 gives the exercise price E, month of expiration, and the price of call options on GE on February 13, 2001. This information was taken from The Wall Street Journal, February 14. Traded options are generally American rather than European and that is true of the options in Table 7.12.2. However, under the Black-Scholes theory it can be proved that the price 7.12. USING THE BLACK-SCHOLES FORMULA 195 100 So 0.04 0.08 0.12 r 0.12 Figure 7.7: Price of a call as a function of volatility (a), exercise price (E), initial price (S0), risk-free rate (r), and expiration date (T). Baseline values of the parameters are S0 = 100, E = 100 exp(rT), T = .25, r = .06, and a = .1. In each subplot, all parameters except the one on the horizontal axis are fixed at baseline. 196 CHAPTER 7. PRICING OPTIONS: 4/12/01 of an American call option is identical to the price of a European call option.6 See Section 7.12.3 for discussion of this point. Since an American call has the same price as a European call, we can use the Black-Scholes formula for European call options to price the options in Table 7.12.2. We will compare the Black-Scholes prices with the actual market prices. Only the month of maturity is listed in a newspaper. However, maturities (days until expiration) can be determined as follows. An option expires on 10:59pm Central Time of the Saturday after the third Friday in the month of expiration (Hull, 1995, page 180). February 16, 2001 was the third Friday of its month, so that on February 13, an option with a February expiration date had three trading days (and four calendar days) until expiration. Since there are returns on stocks only on trading days, T = 3 for options expiring in February. Similarly, on February 13 an option expiring in March had T = 23 trading days until expiration. Since there are 253 trading days/year, there are 253/12 w 21 trading days per month. For June, I used T = 23 + (21)(3) and for September I used T = 23 + (6)(21). GE closed at $47.16 on February 13, so we use So = 47.16. On February 13, the 3-month T-bill rate was 4.91%. Thus, the daily rate of return on T-bills would be r = 0.0491/253 = .00019470, assuming that a T-bill only has a return on the 253 trading days per year; see Section 7.12.4. I used two values of a. The first, 0.0176, was based on daily GE return from December 1999 to December 2000. The second, 0.025, was chosen to give prices somewhat similar to the actual market prices. 7.12.3 Early exercise of calls is never optimal It can be proved that early exercise of an American call option is never optimal. The reason is that at any time before the expiration date, the price of the option will be higher than the value of the option if exercised. 6However, American and European put options will in general have different prices. 7.12. USING THE BLACK-SCHOLES FORMULA 197 E Month of Expiration T (in days) Actual Price B&S calculated price Implied Volatility a = .0176 a = .025 35 Sep 149 14.90 13.40 14.03 .0320 40 Sep 149 10.80 9.22 10.37 .0275 42.50 Mar 23 5.30 5.03 5.38 .0235 45 Feb 3 2.40 2.22 2.32 .0290 45 Mar 23 3.40 3.00 3.57 .0228 50 Feb 3 0.10 0.016 0.09 .0258 50 Mar 23 0.90 0.64 1.23 .0209 50 Sep 149 4.70 3.42 5.12 .0232 55 Mar 23 0.20 0.06 0.28 .0223 55 Jun 86 1.30 0.92 2.00 .0204 Table 7.1: Actual prices and prices determined by the Black-Scholes formula for options on February 13, 2001. E is the exercise price. T is the maturity. Therefore, it is always better to sell the option rather than to exercise it early. To see empirical evidence of this principle, consider the first option in Table 7.12.2. The strike price is 35 and the closing price of GE was 47.16. Thus, if the option had been exercised at the closing of the market, the option holder would have gained $(47.16 — 35) = $12.16. However, the option was selling on the market for $14.90 that day. Thus, one would gain $(14.90 — 12.16) = $2.74 more by selling the option rather than exercising it. Similarly, the other options in Table 7.12.2 are worth more if sold than if exercised. The second option is worth $(47.16 — 40) = $7.16 is exercised but $10.80 if sold. The third option is worth $(47.16 - 42.5) = $4.66 if exercised but $5.30 if exercised. Since it is never optimal to exercise an American call option early, the abil- 198 CHAPTER 7. PRICING OPTIONS: 4/12/01 ity to exercise an American call early is not worth anything. This is why American calls are equal in value to European calls with the same exercise price and expiration date. 7.12.4 Are there returns on non-trading days? We have assumed that there are no returns on non-trading days. For T-bills, this assumption is justified by the way we calculated the daily interest rate. We took the daily rate to be the annual rate divided by 253 on trading days and 0 on non-trading days. If instead we took the daily rate to be the annual rate divided by 365 on every calendar day, then the interest on T-Bills over a year, or a quarter, would be the same. A stock price is unchanged over a non-trading day. However, the efficient market theory says that stock prices change due to new information. Thus, we might expect that there is a return on a stock over a weekend or holiday but it is realized until the market reopens. If this were true, then returns from Friday to Monday would be more volatile than returns over a single trading day. Empirical evidence fails to find such an effect. A reason why returns over weekends are not overly volatile might be that there is little business news over a weekend. However, this does not seem to be the explanation why there is not excess volatility over a weekend. In 1968, the NYSE was closed for a series of Wednesday. Of course, other businesses were open on these Wednesdays so there was the usual amount of business news during the Wednesdays when the NYSE was closed. For this reason, one would expect increased volatility for Tuesday to Thursday price changes on weeks with a Wednesday closing compared to, say, Tuesday to Wednesday price changes on weeks without a Wednesday market closing. However, no such effect has been detected (French and Roll, 1986). Trading appears to generate volatility by itself. Traders react to each other. Stock prices react to both trading "noise" and to new information. Short term volatility might be mostly due to noise trading. 7.12. USING THE BLACK-SCHOLES FORMULA 199 7.12.5 Implied volatility Given the exercise price, current price, and maturity of an option and given the risk-free rate, there is some value of a that makes the price determined by the Black-Scholes formula equal to the current market price. This value of a is called the implied volatility. One might think of implied volatility as the amount of volatility the market believes to exist currently. How does one determine the implied volatility? The Black-Scholes formula gives price as a function of a with all other parameters held fixed. What we need is the inverse of this function, that is, a as a function of the option price. Unfortunately, there is no formula for the inverse function. The function exists, of course, but there is no explicit formula for it. However, using interpolation one can invert the Black-Scholes formula to get a as a function of price. Figure 7.8 shows how this could be done for the third option in Table 7.12.2. The implied volatility in Figure 7.8 is 0.0235 and was determined by MATLAB's interpolation function, interpl.m. The implied volatilities of the other options in Table 7.12.2 were determined in the same manner. Notice that the implied volatilities are substantially higher than 0.0176, the average volatility over the previous year. However, there is evidence that volatility of GE was increasing at the end of last year; see the estimated volatility in Figure 3.3. In that figure, volatility is estimated from December 15,1999 to December 15, 2000. Volatility is highest at the end of this period and shows some sign of continuing to increase. The estimated volatility on December 15, 2000 was 0.023, which is similar to the implied volatilities in Table 7.12.2. It would be worthwhile to re-estimate volatility with data from December 15, 2000 to February 13, 2001. It may be that the implied volatilities in Table 7.12.2 are similar to the observed volatility in early 2001. The implied volatilities also vary somewhat among themselves. One reason for this variation is that the option prices and closing price of GE stock 200 CHAPTER 7. PRICING OPTIONS: 4/12/01 0.015 0.02 0.025 0.03 sigma Figure 7.8: Calculating the volatility implied by the option with an exercise price of $42.50 expiring in March 2001. The price was $5.30 on February 13, 2001. The blue curve is the price given by the Black-Scholes formida as a function of a. The horizontal line is drawn where price is $5.30. This line intersects the curve at a = .0242. This value of a is the volatility implied by the option's price. 7.13. PUTS 201 are not concurrent. Rather, each price is for the last trade of the day for that option or for the stock. This lack of concurrence introduces some error into pricing by the Black-Scholes formula and therefore into the implied volatilities. Another problem with these prices is that the Black-Scholes formula assumes that the stock pays no dividends, but GE does pay dividends.7 7.13 Puts Recall that a put option gives one the right to sell a certain number of shares of a certain stock at the exercise price. The pricing of puts is similar to the pricing of calls, but as we will see in this section, there are some differences. 7.13.1 Pricing puts by binomial trees Put options can be priced by binomial trees in the same way that call options are priced. Figure 7.9 shows a two-step binomial tree where the stock price starts at $100 and increases or decreases by 20% at each step. Assume that the interest rate is 5% compounded continuously and that the strike price of the put is $110. In this example, European and American puts do NOT have the same price at all nodes. We will start with a European put and then see how an American put differs. At each step, _ exp(.05) - .8 _ q - i2_8 - .6282. The value of a put after two steps is (110 — S)+ where S is the price of the stock after two steps. Thus the put is worth $46, $14, and $0 at nodes 4, 5, Modifications of the formula to accommodate dividend payments are possible, but we will not pursue that topic here. 202 CHAPTER 7. PRICING OPTIONS: 4/12/01 Exercise price = $110 r=5% 0 4.91 144 11.65 (I3.54j^-~VL> rTV^^^ 120 ^4 100 24.63 96 30 Put Option Figure 7.9: Pricing a put option. The stock price is in blue and the price of a European put option is in magenta. The price of an Amercian put option is shown in black with parentheses when it differs from the price of a European put. 7.13. PUTS 203 and 6 respectively. Therefore, the price of the option at node 3 is e-05{(^)(0) + (1 - q)(U)} = e-05{(.6282)(0) + (.3718)(14)} = 4.91. The price of the option at node 2 is e-05{(tf)(14) + (1 - g)(46)} = 24.63. Finally the price of the put at node 1 is e-05{(9)(4.91) + (1 - ?)(24.63)} = 11.65. Now consider an American option. At nodes 4, 5, and 6 we have reached the expiration time so that the American option has the same value as the European option. At node 3 the European option is worth $4.91. At this node, should we exercise the American option early? Clearly not, since the strike price ($110) is less than the stock price ($120). Since early exercise is suboptimal at node 3, the American option is equivalent to the European option at this node and both options are worth $4.91. At node 2 the European option is worth $24.63. The American option can be exercised to earn ($110 — $80) = $30. Therefore, the American option should be exercised early since early exercise earns $30 while holding the option is worth only $25.89. Thus, at node 2 the European option is worth $24.63 but the American option is worth $30. At node 1, the American option is worth e-06{(?)(4.91) + (1 - E. In other words, the payoff is either E or St, whichever is larger. The second portfolio holds a put and one share of stock. Its payoff at time T is St if St > E so that the put is not exercised. If St < E, then the put is exercised and the stock is sold for a payoff of E. Thus, the payoff is E or St, whichever is larger, which is the same payoff as the first portfolio. Since the two portfolios have the same payoff for all values of St, their initial values at time 0 must be equal to avoid arbitrage. Thus, C + e~rTE = P + S0, which can be rearranged to yield equation (7.20).8 Relationship (7.20) holds only for European options. European calls have the same price as European calls so that the right hand side of (7.20) is the same for European and American options. American puts are worth more than European puts, so the left hand side of (7.20) is larger for American than for European puts. Thus, (7.20) becomes P > C + e~rTE - So, (7.21) for American options, and clearly (7.21) does not tell us the price of an American put. 8As usual in these notes, we are assuming that the stock pays no dividend, at least not during the lifetime of the two options. If there are dividends, then a simple adjustment of formula (7.20) is needed. The reason the adjustment is needed is that the two portfolios will no longer have exactly the same payoff. One can see that the first portfolio which holds the stock will receive a dividend and so receive a higher payoff than the second portfolio which will not receive the dividend. 206 CHAPTER 7. PRICING OPTIONS: 4/12/01 7.14 The evolution of option prices As time passes the price of an option changes with the changing stock price and the decreasing about of time until the expiration date. We will assume that r and a are constant, though in the real financial world these could change too. The Black-Scholes formula remains in effect and can be used to update the price of an option. Suppose that t = 0 is when the option was written and t = T is the expiration date. Consider a time point t such that 0 < t < T. Then the Black-Scholes formula can be used with So in the formula set equal to St and T in the formula set equal to T — t. Figure 7.10 illustrates the evolution of option prices for two simulations of the geometric Brownian motion process of the stock price. Here T = 1, a = A, r = .06, S0 = 100, and E = 100 for both the put and the call. In one case the call was in the money at expiration, while in the second case it was the put that was in the money. Notice that around t = .18 the stock price is around 110 in the red simulation but the put is still worth something, since there is still plenty of time for the price to go down. Around t = 1 the stock price of the blue simulation is around 110 but the value of the put is essentially 0; now there is too little time for the put to go in the money (the risk-neutral probability is not 0, but almost 0). 7.15 Intrinsic value and time value The intrinsic value of a call is (So — E)+, the payoff one would obtain for immediate exercise of the option (which would be possible only for an American option). The intrinsic value is always less than the price, so immediate exercise is never optimal. The difference between the intrinsic value and the price is called the time value of the option. Time value has two components. The first is a volatility component. The stock price could drop between now and the expiration date; by waiting until the last 7.15. INTRINSIC VALUE AND TIME VALUE 207 120 110 CD Ü 100 0.4 0.6 time time 10 CD 8 o a- 6 ■*—> tt 4 2 0.2 0.4 0.6 0.8 time Figure 7.10: Evolution of option prices. The stock price is a geometric Brown-ian motion. Two independent simulations of the stock price are shown and color coded. Here T = 1, a = .1, r = .06, ^o = 100, and E = 100 for both the put and the call. In the blue and red simidations the call, respectively, put are in the money at the expiration date. 208 CHAPTER 7. PRICING OPTIONS: 4/12/01 12 10 CO Ü M— ° 6 CD Ü 0^ 85 price of European call intrinsic value adj intrinsic value Figure 7.11: Price (for European or American option), intrinsic value, and adjusted intrinsic value of a call option. The intrinsic value is the payoff if one exercises early. Here E = 100, T = .25, r = 0.06, and a = 0.1. moment, one can avoid exercising the option when St < E. The second component is the time value of money. If you do exercise the option, it is best to wait until time T so that you delay payment of the exercise price. The adjusted intrinsic value is (So — e E)+. The difference between the price and the adjusted intrinsic value is the volatility component of the time value of the option. As Sq —> oo, the price converges to the adjusted intrinsic value and the volatility component converges to 0. The reason this happens is that as So —>■ oo you become sure that the option will be in the money at the expiration date. Figure 7.11 shows the price, intrinsic value, and adjusted intrinsic value of a call option when S0 = 100, E = 100, T = .25, r = 0.06, and a = 0.1 7.15. INTRINSIC VALUE AND TIME VALUE 209 Figure 7.12: Price (for European option), intrinsic value, and adjusted intrinsic value of a put option. The intrinsic value is the payoff if one exercises early. The price of an American put would be either the price of the European put of the intrinsic value, whichever is larger. Here S0 = 100, E = 100, T = .25, r = 0.06, and a = 0.1. The intrinsic value of put is (E — So)+, which again is the the payoff one would obtain for immediate exercise of the option, if that is possible (American option). The intrinsic value is sometimes greater than the price, in which case immediate exercise is optimal. The adjusted intrinsic value is (e~rTE — So)+. As So —> 0, the likelihood that the option will be in the money at the expiration date increase to 1 and the price converges to the adjusted intrinsic value. Figure 7.12 shows the price, intrinsic value, and adjusted intrinsic value of a put option when S0 = 100, E = 100, T = .25, r = 0.06, and a = 0.1 210 CHAPTER 7. PRICING OPTIONS: 4/12/01 7.16 Black, Scholes, and Merton This section is based on chapter 11 of Bernstein's (1992) book Capital Ideas. Fischer Black graduated in 1959 from Harvard with a degree in physics. In 1964 he received a PhD in applied mathematics from Harvard where he studied operations research, computer design, and artifical intelligence. He never took a course in either finance or economics. Finding his doctorial studies at bit too abstract, he went to work at Arthur D. Little where he became acquainted with the CAPM. He found this subject so fascinating that he moved into finance. At ADL, Black tried to apply the CAPM to the pricing of warrants, which are much like options. Bernstein (1992) quotes Black as recalling I applied the Capital Asset Pricing Model to every moment in a warrant's life, for every possible stock price and warrant value .... I stared at the differential equation for many, many months. I made hundreds of silly mistakes that led me down blind alleys. Nothing worked ... [The calculations revealed that] the warrant value did not depend on the stock's expected return, or on any other asset's expected return. That fascinate me. ... Then Myron Scholes and I started working together. Scholes received a bacheler's degree from McMaster's University in Ontario in 1962, earned a doctorate in finance from Chicago, and then took a teaching job at MIT. When Scholes meet Black he too was working intensely on warrant pricing by the CAPM. Realizing that they were working on the same problem, they began a collaboration that proved to be very fruitful. Black and Scholes came to understand that the expected return on a stock or option had no effect of what the current price of the option should be. 7.16. BLACK, SCHOLES, AND MERTON 211 With this insight and building on the CAPM, they arrived at the option equation and derived the formula for the option price. In 1970, Scholes described his work with Black on options pricing to Robert C. Merton. Merton had studied engineering mathematics at Columbia and then Cal Tech. He developed an interest in economics and planned to study that subject in graduate school. His lack of formal training in economics put off many graduate schools, but MIT offered him a fellowship where he worked under the direction of Paul Samuelson. Merton developed the "intertemportal capital asset pricing model" that converted the CAPM from a static model describing the market for a single discrete holding period to a model for finance in continuous time. Merton realized that Ito's stochastic calculus was a goldmine for someone working on finance theory in continuous time. In the preface to his book, "Continuous-Time Finance," Merton has written The mathematics of the continuous-time model contains some of the most beautiful applications of probability and optimization theory. But, of course, not all that is beautiful in science need also be practical. And surely, not all that is practical in science is beautiful. Here we have both. Merton developed a much more elegant derivation of the Black-Scholes formula, a derivation based on an arbitrage argument. Black has said "A key part of the options paper I wrote with Myron Scholes was the arbitrage argument for deriving the formula. Bob gave us that argument. It should probably be called the Black-Merton-Scholes paper." In 1997, Merton shared the Nobel Prize in Economics with Scholes. Sadly, Black had died at a young age and could not share the prize, since the Nobel Prize cannot be awarded posthumously. Merton has been called "the Isaac Newton of modern finance." 212 CHAPTER 7. PRICING OPTIONS: 4/12/01 7.17 Summary • An option gives the holder the right but not the obligation to do something, for example, to purchase a certain amount of a certain stock at a fixed price within a certain time frame. • A call option gives one the right to purchase (call in) a stock. A put gives one the right to sell (put away) a stock. • European options can be exercised only at their expiration date. American options can be exercised on or before their expiration date. • Arbitrage is making a guaranteed profit without investing capital. • Arbitrage pricing means determining the unique price of a financial instrument that guarantees that the market is free of arbitrage opportunities. • Options can be priced by arbitrage using binomial trees. • The "measure" of a binomial tree model or other stochastic process model gives the set of path probabilities of that model. • There exists a risk-neutral measure such that expected prices calculating with respect to this measure are equal to arbitrage determined prices. • In a binomial tree model with price changes proportional to the current price, as the number of steps increases the limit process is a geometric Brownian motion and the price of the option in the limit is given by the Black-Scholes formula. • To price an option by the Black-Scholes formula, one needs an estimate of the stock price's volatility. This can be obtained from historical data. Conversely, the implied volatility of a stock is the volatility which makes the actual market price equal to the price given by the Black-Scholes formula. 7.18. REFERENCES 213 • Within the Black-Scholes model, the early exercise of calls is never optimal but the early exercise of puts is sometimes optimal. Therefore, European and American calls have equal prices, but American puts are generally worth more than European puts. • Put-call parity is the relationship P = C + e~rTE - S0 between P, the price of a European put, and C, the price of a European call. It is assumed that both have exercise price E and expiration date T. So is the price of the stock. 7.18 References Baxter, M., and Rennie, A. (1998), Financial Calculus: An Introduction to Derivative Pricing, Cambridge University Press. French, K. R., and Roll, R., (1986), "Stock return variances; the arrival of information and the reaction of traders," Journal of Financial Economics, 17, 5-26. Hull, John C. (1995), Introduction to Futures and Options Markets, Prentice Hall, Englewood Cliffs, NJ. Merton, R.C. (1992), Continuous-Time Finance, revised ed., Blackwell, Cambridge, Ma. and Oxford, UK. 214 CHAPTER 7. PRICING OPTIONS: 4/12/01 Chapter 8 GARCH models: 4/24/01 8.1 Introduction Despite the popularity of ARMA models, they have a significant limitation, namely, that they assume a constant volatility. In finance, where correct specification of volatility is of the utmost importance, this can be a severe limitation. In this chapter we look at time series models that have randomly varying volatility. ARMA models are used to model the conditional expectation of the current observation, Yt, of a process given the past observations. ARMA models do this by writing Yt as a linear function of the past plus a white noise term. ARMA models also allow us to predict future observations given the past and present. The prediction of Yt+i given Yt, Yt_\... is simply the conditional expectation of Yt+i given Yt, Yt_\___ However, ARMA models have rather boring conditional variances—the conditional variance of Yt given the past is always a constant. What does this mean for, say, modeling stock returns? Suppose we have noticed that recent daily returns have been unusually volatile. We might suppose that tomorrow's return will also be more variable than usual. However, if we are modeling returns as an ARMA process, we cannot capture this type of behavior because the conditional variance is constant. So we need better 215 216 CHAPTER 8. GARCH MODELS: 4/24/01 time series models if we want to model the nonconstant volatility often seen in financial time series. In this chapter we will study models of nonconstant volatility. ARCH is an acronym meaning AutoRegressive Conditional Heteroscedasticity.1 In ARCH models the conditional variance has a structure very similar to the structure of the conditional expectation in an AR model. We will first study the ARCH(l) model, which is similar to an AR(1) model. Then we will look at ARCH(p) models which are analogous to AR(p) models. Finally, we will look at GARCH (Generalized ARCH) models which model conditional variances much like the conditional expectation of an ARMA model. 8.2 Modeling conditional means and variances Before looking at GARCH models, we will study some general principles on how one models non-constant variance. The general form for the regression oiYt on Xit,..., XPit is yí = /(X1;í,...,Xp,í) + ei (8.1) where et has expectation equal to 0 and a constant variance a2. The function / is the conditional expectation of Yt given Xitt,..., Xp>t. To appreciate this fact, notice that if we take the conditional (given the Xitt values) expectation of (8.1), /(X1)r,..., XPtt) is treated as a constant and the conditional expectation of et is 0. Moreover, the conditional variance is simply the variance of et, that is, a2. Frequently, / is linear so that /(^i,t) • • • i XPtt) = ßo + ßiXitt + • ■ • + ßpXpj. Principle: To model the conditional mean of Yt given X±mt,..., XPtt/ write Yt as the conditional mean plus white noise. 1 Heteroscedasticity is a fancy way of saying non-constant variance. Homoscedasticity means constant variance. Alternate spellings are heteroskedasticity and homoskedastic-ity. 8.3. ARCH(l) PROCESSES 217 Equation (8.1) can be modified to allow a nonconstant conditional variance. Let a2(Xiit,..., XPtt) be the conditional variance of Yt given Xi>t, ■ ■ ■, Xp>t. Then the model Yt = f{X1>t,..., XPtt) + a(Xltt,..., XPit)et (8.2) gives the correct conditional mean and variance. Principle: To allow a nonconstant conditonal variance in the model, multiply the white noise term by the conditional standard deviation. This product is added to the conditional mean as in the previous principle. The function a(Xiit, ■ ■ ■, XPit) must be non-negative since it is a standard deviation. If the function a(-) is linear, then its coefficients must be constrained to ensure non-negativity. Modeling non-constant conditonal variances in regression is treated in depth in the book by Carroll and Ruppert (1988). Models for conditional variances are often called "variance function models." The G ARCH models of this chapter are a special class of variance function models. 8.3 ARCH(l) processes Let €i, e2,... be Gaussian white noise with unit variance, that is, let this process be independent N(0,1). Then Efalet-!,...) = 0, and Var(eť|eť_1,...) = l. (8.3) Property (8.3) is called conditional homoscedasticity. The process at is an ARCH(l) process if at = etJa0 + a^^. (8.4) 218 CHAPTER 8. GARCH MODELS: 4/24/01 We require that a0 > 0 and a>i > 0 because a standard deviation cannot be negative. It is also required that a.\ < 1 in order for at to be stationary with a finite variance. If a± = 1 then at is stationary but its variance is oo; see below. Equation (8.4) is somewhat like an AR(1) but in a\, not at, and the ARCH(l) model induces an ACF in a\ that is like an AR(l)'s ACE Define a\ = Var(at\at-i,...) to be the conditional variance of at given past values. Since et is independent of at-\ and Var(er) = 1 ^(ai|at_1,...)=0, (8.5) and o\ = «o + aic^_v (8.6) Understanding equation (8.6) is crucial to understanding how GARCH processes work. This equation shows that if at-\ has an unusually large deviation from its expectation of 0, so that a\_x is large, then the conditional variance of at is larger than usual. Therefore, at is also expected to have an unusually large deviation from its mean. This volatility will propagate since at having a large deviation makes a^+1 large so that at+± will tend to be large. Similarly, if at-\ is unusually small, then o\ will be small, and at is expected to also be small, etc. Because of this behavior, unusual volatitity in at tends to persist, though not forever. The conditional variance tends to revert to the unconditional variance provided that a\ < 1 so that the process is stationary with a finite variance. The unconditional, i.e., marginal, variance of at denoted by 7a(0) is gotten by taking expectations in (8.5) which give us 7a (0) = ao + ai7a(0). This equation has a positive solution if o^ < 1: 7a(0) = a0/(l - ai). 8.3. ARCH(l) PROCESSES 219 If a\ > 1 then 7a(0) is infinite. It turns out that at is stationary nonetheless. The integrated GARCH model (I-GARCH) has a\ = 1 and is discussed in Section 8.10. Straightforward calculations using (8.6) show that the ACF of at is pa(h) = 0 if h ^ 0. In fact, any process such that the conditional expectation of the present observation given the past is constant is an uncorrelated process. In introductory statistics courses, it is often mentioned that independence implies zero correlation but not vice versa. A process, such as the GARCH processes, where the conditional mean is constant but the conditional variance is non-constant is a good example of a process that is uncorrelated but not independent. The dependence of the conditional variance on the past is the reason the process is not independent. The independence of the conditional mean on the past is the reason that the process is uncorrelated. Although at is uncorrelated just like the white noise process ttf the process aj has a more interesting ACF: if a± < 1 then pa*(h) = af, V h. If «i > 1, then of is nonstationary, so of course it does not have an ACF. 8.3.1 Example A simulated ARCH(l) process is shown in Figure 8.1. The top-left panel shows the independent white noise process, tt. The top right panel shows at = Jl + .950%^, the conditional standard deviation process. The bottom left panel shows at = attt, the ARCH(l) process. As discussed in the next section, an ARCH(l) process can be used as the noise term of an AR(1) process. This is shown in the bottom right panel. The AR(1) parameters are \i = .1 and = .8. The variance of at is 7a(0) = 1/(1 — .95) = 20 so the standard deviation is V2Ö = 4.47. 220 CHAPTER 8. GARCH MODELS: 4/24/01 White noise Conditional std dev ARCH(1) AR(1)/ARCH(1)) Figure 8.1: Simulation of 60 observations from an ARCH(l) process and an AR(1)/ARCH(1) process. The parameters are a0 = \, oi.\ = .95, yu = .1, and ó = .8. 8.4. THE AR(1)/ARCH(1) MODEL 221 The processes were started out all equal to 0 and simulated for 70 observation. The first 10 observations were treated as a burn-in period where the process was converging to its stationary distribution. In the figure, only the last 60 observations are plotted. The white noise process in the top left panel is normally distributed and has a standard deviation of 1, so it will be less that 2 in absolute value about 95% of the time. Notice that just before t = 10, the process is a little less than —2 which is a somewhat large deviation from the mean of 0. This deviation causes the conditional standard deviation (at) shown in the top right panel to increase and this increase persists for about 10 observations though it slowly decays. The result is that the ARCH(l) process exhibits more volatility than usual when t is between 10 and 15. Figure 8.2 shows a simulation of 600 observations from the same processes as in Figure 8.1. A normal probability plot of at is also included. Notice that this ARCH(l) exhibits extreme non-normality This is typical of ARCH processes. Conditionally they are normal with a nonconstant variance, but there marginal distribution is non-normal with a constant variance. 8.4 The AR(1)/ARCH(1) model As we have seen, an AR(1) has a nonconstant conditional mean but a constant conditional variance, while an ARCH(l) process is just the opposite. If we think that both the conditional mean and variance of a process will depend on the past then we need the features of both the AR and ARCH models. Thus, we will combine the two models. In this section we start simple and combine an AR(1) model with an ARCH(l) model. 222 CHAPTER 8. GARCH MODELS: 4/24/01 White noise 0 200 400 600 normal plot of ARCH(1) Conditional std dev 25 20 15 10 5 0 C ttiiikáll LI ijiijjyijiijiy ) 200 400 60 AR(1)/ARCH(1)) 40 r 20 -20L 0 200 400 600 Figure 8.2: Simulation of 600 observations from an ARCH(l) process and an AR(1)/ARCH(1) process. The parameters are a0 = 1, ax = .95, n = .1, and ó = .8. 46 8.5. ARCH(Q) MODELS 223 Let at be an ARCH(l) process and suppose that m- n = 4>{ut-i - n) + at. ut looks like an AR(1) process, except that the noise term is not independent white noise but rather an ARCH(l) process. Although at is not independent white noise, we saw in the last section that it is an uncorrelated process; at has the same ACF as independent white noise. Therefore, ut has the same ACF as an AR(1) process: pu(h) = ^ V h. Moreover, d\ has the ARCH(l) ACF: pa2 (h) = oíi V h. We need to assume that both \\ < 1 and a\ < 1 in order for u to be stationary with a finite variance. Of course, «o > 0 and «i > 0 and also assumed. The process ut is such that its conditional mean and variance, given the past, are both nonconstant so a wide variety of real time series can be modeled. Example A simulation of an AR(1)/ARCH(1) process is shown in the bottom right panel of Figure 8.1. Notice that when the ARCH(l) noise term in the bottom left panel is more volatile, then the AR(1)/ARCH(1) process moves more rapidly. 8.5 ARCH( = -8. 226 CHAPTER 8. GARCH MODELS: 4/24/01 8.7 Heavy-tailed distributions Researchers have long noticed that stock returns have "heavy-tailed" or "outlier-prone" probability distributions. This means that they have more extreme outliers than expected from a normal distribution. The reason for the outliers may be that the conditional variance is not constant. In fact, GARCH processes exhibit heavy-tails. Therefore, when we use GARCH models in finance we can model both the conditional heteroscedasticity and the heavy-tailed distributions of financial market data. To understand how a non-constant variance induces outliers, we look at a simple case. Consider a distribution which is 90% N(0,1) and 10% iV(0, 25). This is an example of a "normal mixture" distribution. The variance of this distribution is(.9)(l) + (.l)(25) = 3.4 so its standard deviation is 1.844. This distribution is MUCH different that a JV(0, 3.4) distribution, even though both distributions have the same mean (0) and variance (3.4). To appreciate this, look at Figure 8.4. You can see in the top left panel that the two densities look quite different. The normal density looks much more dispersed than the normal mixture, but we know that they actually have the same variances. What's happening? Look at the detail of the right tails in the top right panel. The normal mixture density is much higher than the normal density when x (the variable on the horizontal axis) is greater than 6. This is the "outlier" region (along with x < —6). The normal mixture has more outliers and they come from the 10% of the population with a variance of 25. Outliers have a powerful effect on the variance and this small fraction of outliers inflates the variance from 1.0 (the variance of 90% of the population) to 3.4. Let's see how much more probability the normal mixture distribution has in the outlier range \x\ > 6 compared to the normal distribution.3 For a 3There is nothing special about "6" to define the boundary of the outlier range. I just needed a specific number to make numerical comparisons. Clearly, \x\ > 7 or |x| > 8, say, would have been just as appropriate as outlier ranges. 8.7. HEAVY-TAILED DISTRIBUTIONS 227 Densities Densities - detail 0.4r 0.3 0.2 0.1 0^ 0.025 0.02 0.015 ----- normal ----- normal mix ----- normal ----- normal mix 0.01 0.005 0 -5 0 5 10 Normal plot - normal 4 6 8 10 12 Normal plot - normal mix 0.99/ ::::::>•':*: - ■ ir+ x-xP *£ + flUM ; ^ : 0.75 0.50 0.25 . y S jr jj> f U.1U ■■■■ tft w-wx ß- ,-H 1003 J* 0 Data Figure 8.4: Comparison on normal and heavy-tailed distributions. 228 CHAPTER 8. GARCH MODELS: 4/24/01 N(0, a2) random variable X, P{\X\ >x} = 2(l-$(x/a)). Therefore, for the normal distribution with variance 3.4, P{\X\ > 6} = 2(1 - $(6/VŠÍ4)) = -0011. For the normal mixture population which has variance 1 with probability .9 and variance 25 with probability .1 we have that P{\X\ > 6} = 2{.9(1 - $(6)) + .1(1 - $(6/5)) = (.9)(0) + (.1)(.23) = .023. Since .023/.001 ~ 21, the normal mixture distribution is 21 times more likely to be in this outlier range than the normal distribution. Normal probability plots of samples of size 200 from the normal and the normal mixture distributions are shown in the bottom panels. Notice how the outliers in the normal mixture sample give the data a nonlinear, almost S-shaped, pattern. The deviation of the normal sample from linearity is small and is due entirely to randomness. In this example, the variance is conditional upon which component of the mixture an observation comes from. The conditional variance is 1 with probability .9 and 25 with probability .1. Because the conditional variance is discrete, in fact, with only two possible values, the example was easy to analyze. The marginal distribution of a GARCH process is also a normal mixture, but with a continuous distribution of components correspondence to the continuous distribution of the conditional variance. Although GARCH processes are more complex than the simple model in this section, the same theme applies — conditional heteroscedasticity induces heavy-tailed marginal distributions even though the conditional distributions are light-tailed normal distributions. 8.8 Comparison of ARMA and GARCH processes Table 8.8 compares Gaussian white noise, ARMA, GARCH, and ARMA/ GARCH processes according to various properties: conditional means, 8.9. FITTING GARCHMODELS 229 conditional variances, conditional distributions, marginal means, marginal variances, and marginal distributions. Property Gaussian WN ARMA GARCH ARMA/ GARCH Cond. mean constant non-const 0 non-const Cond. var constant constant non-const non-const Cond. dist'n normal normal normal normal Marg. mean & var. constant constant constant constant Marg. dist'n normal normal heavy-tailed heavy-tailed All of the processes are stationary so that their marginal means and variances are constant. Gaussian white noise is the "baseline" process. Because it is an independent process the conditional distributions are the same as the marginal distribution. Thus, its conditional means and variances are constant and both its conditional and marginal distributions are normal. Gaussian white noise is the "driver" or "source of randomess" behind all the other processes. Therefore, they all have normal conditional distributions just like Gaussian white noise. 8.9 Fitting GARCH models A time series was simulated using the same program that generated the data in Figure 8.1, the only difference being that 300 observations were generated rather than only 60 as in the figure. The data were saved as "garch02.dat" and analyzed with SAS using the following program. Listing of the SAS program for the simulated data options linesize = 65 ; data arch ; infile 'C:\courses\or473\sas\garch02.dat' ; input y ; 230 CHAPTER 8. G ARCH MODELS: 4/24/01 run ; title 'Simulated ARCH(1)/AR(1) data' ; proc autoreg ; model y =/nlag = 1 archtest garch=(q=l); run ; This program uses the "autoreg" command that fits AR models. Since nlag = 1, an AR(1) model is being fit. However, the noise is not modeled as independent white noise. Rather an ARCH(l) model is used because of the specification "garch=(q=l)" in the "model" statement below the "autoreg" command. More complex G ARCH models can be fit using, for example, "garch=(p=2,q=l)." The specification "archtest" requests tests of ARCH effects, that is, tests the null hypothesis of conditional homoscedasticity versus the alternative of conditional heteroscedasticity. The output from this SAS program are listed below. The tests of conditional homoscedasticity all reject with p-values of .0001 or smaller. The estimates are = —.8226, which is +.8226 in our notation. This is close to the true value of 0.8. The estimates of the ARCH parameters are So = 1.12 and Si = .70. The true values are a0 = 1 and a,\ = .95. The standard errors of the ARCH parameters are rather large. This is a general phenomenon; time series usually have less information about variance parameters than about the parameters specifying the conditional expectation. An approximate 95% confidence interval for a\ is .70±(2)(0.117) = (.446, .934), which does not quite include the true parameter, 0.95. This could have just been bad luck, though it may indicate that a\ is downward biased. The confidence interval is based on the assumption of unbiasedness and is not valid if there is a sizeable bias. 8.9. FITTING GARCH MODELS Listing of the SAS output for the simulated data Simulated ARCH(1)/AR(1) data 1 13:01 Wednesday, April 4, 2001 The AUTOREG Procedure Dependent Variable y Ordinary Least Squares Estimates SSE 2693.22931 D FE MSE 9.00746 Root MSE 3 SBC 1515.48103 AIC 1511 Regress R-Square 0.0000 Total R-Square Durbin-Watson 0.4373 Q and LM Tests for ARCH Disturbances Order Q Pr > Q LM Pr > LM 1 119.7578 <.0001 118.6797 <.0001 2 137.9967 <.0001 129.8491 <.0001 3 140.5454 <.0001 131.4911 <.0001 4 140.6837 <.0001 132.1098 <.0001 5 140.6925 <.0001 132.3810 <.0001 6 140.7476 <.0001 132.7534 <.0001 7 141.0173 <.0001 132.7543 <.0001 8 141.5401 <.0001 132.8874 <.0001 9 142.1243 <.0001 132.8879 <.0001 10 142.6266 <.0001 132.9226 <.0001 11 142.7506 <.0001 133.0153 <.0001 12 142.7508 <.0001 133.0155 <.0001 Standard Approx Vari .able DF Estimate Error t Value Intercet 3t 1 0.8910 0.1733 5.14 Pr Estimates of Autocorrelations Lag Covariance Correlation 0 8.9774 1.000000 1 7.0075 0.780567 Estimates of Autocorrelations Lag -198765432101234567891 1 I I**************** I 232 CHAPTER 8. GARCH MODELS: 4/24/01 Simulated ARCH(1)/AR(1) data 13:01 Wednesday, April 4, 2001 The AUTOREG Procedure Preliminary MSE 3.5076 Estimates of Autoregressive Parameters Standard Lag Coefficient 1 -0.780567 Error t Value 0.036209 -21.56 Algorithm converged. GARCH Estimates SSE MSE Log Likelihood SBC Normality Test 1056.42037 3.52140 -549.43844 1121.69201 1.5134 Observations Uncond Var Total R-Square AIC Pr > ChiSq 300 3.72785257 0.6077 1106.87688 0.4692 Standard Variable Intercept AR1 ARCHO ARCH1 DF 1 1 1 1 Approx Estimate 0.4810 -0.8226 1.1241 0.6985 Error t Value Pr > |t| 0.3910 0.0266 0.1729 0.1167 1.23 -30.92 6.50 5.98 0.2187 <.0001 <.0001 <.0001 8.9.1 Example: S&P 500 returns This example is Example 10.5 in Pindyck and Rubinfeld (1998). The data are monthly from 1960 to 1996. The variables are the S&P 500 index (FSP-COM), the return on the S&P 500 (RETURNSP), the dividend yield on the S&P 500 index (FSDXP), the 3-month T-bill rate (R3), the change in the 3-month T-bill rate (DR3), the wholesale price index (PW), and the rate of wholesale price inflation (GPW). In this analysis, only RETURNSP, DR3, and GPW are used. 8.9. FITTING GARCHMODELS 233 It is expected that variation in stock returns are in part caused by changes in interest rates and changes in the rate of inflation. Therefore, a regression model where RETURNSP is regressed on DR3 and GPW is used. Regression models that regress returns on macroeconomic variables in this way are sometimes called "factor models" — see Bodie, Kane, and Marcus (1999). Figure 8.5 shows the residuals from this regression. The residuals represent the part of the S&P 500 returns that cannot be explained by changes in interest rates and the inflation rate. In the figure, there is some sign of nonconstant volatility. Also, there is no reason to assume that the residuals are uncorrelated as is assumed in a standard regression model. At the very least, this assumption should be checked. If the data contradict the assumption, then a model with correlated errors should be used. 0.15 0.1 0.05 15 § 0 ID I-0.05 -0.1 -0.15 lo 65 70 75 80 85 90 95 year Figure 8.5: Residuals ivhen the S&P 500 returns are regressed against the change in the 3-month T-bill rates and the rate of inflation. 234 CHAPTER 8. G ARCH MODELS: 4/24/01 An analysis more appropriate for this data set is to use a regression model to specify the conditional expectation of RETURNSP given DR3 and GPW, but not to assume a "standard" regression model with errors (which the residuals estimate) that are independent white noise. Rather we will assume the model RETURNSP = 7o + 7iDR3 + 72GPW + ut (8.7) where ut is an AR(1)/GARCH(1,1) process.4 Therefore, ut = 4>iut_i + au where at is a GARCH(1,1) process: a* = tt°t where Below is a listing of the SAS program used to fit this model. The regression model with AR(1)/GARCH(1,1) errors is specified by the command: proc autoreg ; model returnsp = DR3 gpw/nlag = 1 archtest garch=(p=l,q=l); In this command, • the statement "returnsp = DR3 gpw " specifies the regression model, that is, that "returnsp" is the dependent variable and "DR3" and "gpw" are the independent variables. • "nlag = 1" specifies the AR(1) structure. • "garch=(p=l,q=l)" specifies the GARCH(1,1) structure. • "archtest" specifies that tests of conditional heteroscedasticity be performed 4We denote the regression coefficients by gamma rather than beta, as is standard, because beta is used for parameters in the GARCH model for at- 8.9. FITTING GARCHMODELS 235 Listing of the SAS program options linesize = 65 ; data arch ; infile 'C:\courses\or473\data\pindyckl05.dat' ; input month year RETURNSP FSPCOM FSDXP R3 PW GPW; DR3 = dif(R3) ; run ; title 'S&P 500 monthly data from Pindyck & Rubinfeld, Ex 10.5' ; title2 'AR(l)/GARCH(1,1) model' ; proc autoreg ; model returnsp =/nlag = 1 archtest garch=(p=l,q=l); run ; title2 'Regression model with AR(1)/GARCH(1,1)' ; proc autoreg ; model returnsp = DR3 gpw/nlag = 1 archtest garch=(p=l,q=l); run ; The SAS output is listed below. From examination of the output, the following conclusions can be reached: • The p-values of the Q and LM tests are all very small, less than .0001. Therefore, the errors in the regression model exhibit conditional het-eroscedasticity. • Ordinary least squares estimates of the regression parameters are: Standard Approx Variable DF Estimate Error t Value Pr > |t| Intercept 1 0. .0120 0.001755 6 .86 < . .0001 DR3 1 -0. .8293 0.3061 -2 .71 0. .0070 GPW 1 -0. .8550 0.2349 -3 .64 0. .0003 • Using residuals from the OLS estimates, the estimated residual autocorrelations are: Estimates of Autocorrelations Lag Covariance Correlation 0 0.00108 1.000000 1 0.000253 0.234934 236 CHAPTER 8. G ARCH MODELS: 4/24/01 • Also, using OLS residuals, the estimate AR parameter is: Estimates of Autoregressive Parameters Standard Lag Coefficient Error t Value 1 -0.234934 0.046929 -5.01 • Assuming AR(1)/GARCH(1,1) errors, the estimated parameters of the regression are: Standard Approx Variable DF Estimate Error t Value Pr > |t| Intercept 1 0. .0125 0.001875 6 .66 < . .0001 DR3 1 -1 .0665 0.3282 -3 .25 0. .0012 GPW 1 -0. .7239 0.1992 -3 .63 0. .0003 - Notice that these differ slightly from OLS estimates. - Since all p-values are small, both independent variables are significant. - However, the Total R-square value is only 0.0551, so the regression has little predictive value. The estimated GARCH parameters are: ARl 1 -0.2016 0.0603 -3 .34 0. .0008 ARCHO 1 0.000147 0.0000688 2. .14 0. .0320 ARCHl 1 0.1337 0.0404 3 .31 0. .0009 GARCHl 1 0.7254 0.0918 7 .91 < . .0001 - the estimate of is —.2016 in SAS's notation but +.2016 in our notation. Thus, there is a positive association between returns and lagged returns Since all p-values are small, all GARCH parameters are significant. The GARCHl estimate (0.7254) is larger than the ARCHl (0.1337) estimate; this implies that the conditional variance will exhibit reasonably long persistence of volatility. 8.9. FITTING GARCH MODELS 237 Listing of SAS output S&P 500 monthly data from Pindyck & Rubinfeld, Ex 10.5 1 Regression model with AR(1)/GARCH(1,1) 17:04 Tuesday, April 10, 2001 The AUTOREG Procedure Dependent Variable RETURNSP Ordinary Least Squares Estimates SSE 0.46677572 DFE 430 MSE 0.00109 Root MSE 0.03295 SBC -1711.5219 AIC -1723.7341 Regress R-Square 0.0551 Total R-Square 0.0551 Durbin-Watson 1.52 03 Q and LM Tests for ARCH Disturbances Order Q Pr > Q LM Pr > LM 1 26 .8804 < . 0001 26 .5159 < .0001 2 27 .1508 < . 0001 27 .1519 < .0001 3 28 .2188 < . 0001 28 .4391 < .0001 4 28 .6957 < . 0001 28 .4660 < .0001 5 33 .4112 < . 0001 32 .6168 < .0001 6 34 .0892 < . 0001 32 .6962 < .0001 7 34 .4187 < . 0001 32 .9617 < .0001 8 34 .6542 < . 0001 32 .9636 < .0001 9 35 .2228 < . 0001 33 .3330 0. .0001 10 35 .3047 0. 0001 33 .4174 0. .0002 11 35 .8274 0. 0002 33 .9440 0. .0004 12 36 .0142 0. 0003 33 .9507 0. .0007 Standard Approx Variable DF Estimate Error t Value Pr > |t| Intercept 1 0. .012C ) 0 .001755 6. .86 <.0001 DR3 1 -0. .8293 0.3061 -2. .71 0.0070 GPW 1 -0. .855C ) 0.2349 -3. .64 0.0003 Estimates of Autocorrelations Lag Covariance Correlation 0 0.00108 1.000000 1 0.000253 0.234934 238 CHAPTER 8. GARCH MODELS: 4/24/01 S&P 500 monthly data from Pindyck & Rubinfeld, Ex 10.5 2 Regression model with AR(1)/GARCH(1,1) 17:04 Tuesday, April 10, 2001 The AUTOREG Procedure Estimates of Autocorrelations Lag -198765432101234567891 1 I I * * * * * I Preliminary MSE 0.00102 Estimates of Autoregressive Parameters Standard Lag Coefficient Error t Value 1 -0.234934 0.046929 -5.01 Algorithm converged. GARCH Estimates SSE 0.44176656 Observations 433 MSE 0.00102 Uncond Var 0 .00104656 Log Likelihood 889.071523 Total R-Square 0.1058 SBC -1735.6479 AIC -1764.143 Normality Test 43.0751 Pr > ChiSq Standard <.0001 Approx Variable DF Estimate Error t Value Pr > |t| Intercept 1 0.0125 0.001875 6. .66 <.0001 DR3 1 -1.0665 0.3282 -3. .25 0.0012 GPW 1 -0.7239 0.1992 -3. .63 0.0003 AR1 1 -0.2016 0.0603 -3. .34 0.0008 ARCHO 1 0.000147 0.0000688 2. .14 0.0320 ARCH1 1 0.1337 0.0404 3. .31 0.0009 GARCH1 1 0.7254 0.0918 7. .91 <.0001 8.10 I-GARCH models I-GARCH or integrated GARCH processes were designed to model data that has persistent changes in volatility. A GARCH(p, q) process is station- 8.10. I-GARCHMODELS 239 ary with a finite variance if g p A GARCH(p, q) process is called an I-GARCH process if g p ^ai + Y^ßi = 1-i=\ i=\ I-GARCH processes are either non-stationary or have an infinite variance. Infinite variance implies heavy-tailed, though a distribution can be heavy-tailed with a finite variance. To appreciate what an infinite variance processes can look like, we will do some simulation. Figure 8.6 shows 40,000 observations of ARCH(l) processes with a,\ = .9, 1, and 1.8. The same white noise process is used in each of the ARCH(l) processes. All three ARCH(l) processes are stationary but only the one with a\ = .9 has a finite variance. The second process is an I-GARCH process (actually, I-ARCH since q = 0). The third process has a± > 1 and so is more extreme than an I-GARCH process. Notice how all three processes do revert to their conditional mean of 0. The larger the value of a\ the more the volatility comes in sharp bursts. The processes with a\ = .9 and a\ = 1 looks similar; there is no sudden change in behavior when the variance becomes infinite. The process with a,\ = .9 already has a heavy-tail despite having a finite variance. Increasing ct\ from 0.9 to 1 does not increase the tail weight dramatically. Normal plots of the simulated data in Figure 8.6 are shown in Figure 8.7. Clearly, the larger the value of a,\, the heavier the tails of the marginal distribution. CHAPTER 8. GARCH MODELS: 4/24/01 -150 0 0.5 1 1.5 2 2.5 3 3.5 4 a, = 1 1 200 100- -100 0 0.5 1 x 10 x10 Figure 8.6: Simulated ARCH(l) processes with ot\ = .9,1, and 1.8. po crc" 1-4 rt> 00 í—-i Ě 3 to on TO S" "n po O-, Probability WmoDoooooaiřP Probability I^RmcD o o o aoacPP S9i)> k) ai ^i iskrädSS ^ffóDo ai o oi aJoäaíJiQ i f o o oo o O) o o o S f-* CD -N o M ^ 242 CHAPTER 8. G ARCH MODELS: 4/24/01 None of the processes in Figure 8.6 show much persistence of higher volatility. To model persistence of higher volatility, one needs an I-GARCH(p, q) process with q > 1. Figure 8.8 shows simulations from I-GARCH(1,1) processes. Since oil + ßi = 1 for these processes, ßi = 1 — «i, and the process is completely specified by a0 and o^. In this figure, a0 is fixed at 1 and «i is varied. Notice that the conditional variance is very bursty when ai = .95. When ai = .05, the conditional standard deviation looks somewhat like a random walk. I-GARCH processes can be fit by SAS by adding the specification "type = integrated" into the program, e.g., for the previous example with S&P 500 returns: proc autoreg ; model returnsp =/nlag = 1 garch=(p=l,q=l,type=integrated) ; run ; For this example, the I-GARCH(1,1) model seems to fit worse than a G ARCH (1,1) model according to AIC; see Section 8.13. 8.10.1 What does it mean to have an infinite variance? A random variable need not have a finite variance. Also, its expectation need not exist at all. To appreciate these facts, let X be a random variable with density fx- The expectation of X is xfx{x)dx provided that this integral is defined. If ŕ / xfx{x)dx = —oo (8.8) J—oo and l xfx(x)dx = oo (8.9) Jo 8.10. I-G ARCH MODELS 243 a = 0.95, Conditional std dev a., = 0.95, GARCH (1,1) 30 25- 20 15- 10 5 0 0 200 r 150- 100- 50- I'Jjj IL 500 1000 1500 2000 a, = 0.4, Conditional std dev 25 20 15 10 III : ill w hMim m 500 1000 1500 a, = 0.2, Conditional std dev 2000 0 500 1000 1500 a = 0.05, Conditional std dev 2000 0 500 1000 1500 2000 a =0.2, GARCH(1,1) 300 200- 100 0 -100 -200 -300 4^^iwá h^,,^.»^,.,. 0 500 1000 1500 2000 a =0.05, GARCH (1,1) 500 1000 1500 2000 500 1000 1500 2000 Figure 8.8: Simulations ofI-GARCH(l,l) processes. 244 CHAPTER 8. G ARCH MODELS: 4/24/01 then the expectation is, formally, — oo + oo which is not defined. If integrals on the left hand sides of (8.8) and (8.8) are both finite, then E(X) exists and equals the sum of these two integrals. Exercise Suppose that fx{x) = 1/6 if \x\ < 1 and fx{x) = l/(6o;2) if \x\ > 1. Show that /oo fx{x)dx = 1 -oo so that fx really is a density, but that ľ° / xfx{x)dx = —oo J—oo and roo / xfx(x)dx = oo Jo One consequence of the expectation not existing is this. Suppose we have a sample of iid random variables with density fx- The law of large numbers says that the sample mean will converge to E(X) as the sample size goes to infinity. However, the law of large numbers holds only if E{X) is defined. Otherwise, there is no point to which the sample mean can converge and it will just wander without converging. Figure 8.9 shows the sample mean of the first t observations plotted against t for the data in Figures 8.6 and 8.10. The sample mean appears to converge to 0 when a,\ = .9 or 1, but when «i = 1.8 it is unclear what the sample mean is doing. The sample mean decays towards 0 when the process is not in a high volatility period, but can shoot up or down during a burst of volatility. 8.10. I-G ARCH MODELS 245 -0.5 x 10 Figure 8.9: Sample means of simulated ARCH(l) processes with a,\ = .9,1, and 1.8. 246 CHAPTER 8. G ARCH MODELS: 4/24/01 Now suppose that the expectation of X exists and equals {ix- Then the variance of X equals /oo (x - (ix)2fx{x)dx. -oo If this integral is +oo, then the variance is infinite. The law of large numbers also implies that the sample variance will converge to the variance of X as the sample size increases. If the variance of X is infinity, then the sample variance will converge to infinity. Figure 8.10 shows the sample variance of the first t observations plotted against t for the data in Figure 8.6. In the top panel, the sample variance should be converging to 10 = (1 — ai)-1. Maybe it is converging to 10, but it is hard to tell even with 40,000 observations. In the middle and bottom panels the variance is infinity so the sample variance will converge to infinity. This convergence does appear to be happening in the bottom panel, but it is hard to see in the middle panel. Of course, in the middle panel the value of a± is on the borderline between finite and infinite variance, and the infinite variance may take a very long time to have its effect. 8.11 GARCH-M processes We have seen that one can fit regression models with AR/ G ARCH errors. In fact, we have done that with the S&P 500 data. In some examples, it makes sense to use the conditional standard deviation as one of the regression variables. For example, when the dependent variable is a return we might expect that higher conditional variability will cause higher returns. This is because the market demands a higher risk premium for higher risk. Models where the conditonal standard deviation is a regression variable 8.11. GARCH-M PROCESSES 247 150 100 x 10 Figure 8.10: Sample variances of simulated ARCH(l) processes with a\ = .9,1, and 1.8. 248 CHAPTER 8. G ARCH MODELS: 4/24/01 are called GARCH-in-mean, or GARCH-M, models. They have the form Yt = Xjj + 6at + at, where at is a G ARCH process with conditional standard deviation ot. GARCH-M models can be fit in SAS by adding the keyword "mean" to the GARCH specification, e.g., proc autoreg ; model returnsp =/nlag = 1 garch=(p=l,q=l,mean); run ; or for I-GARCH-M proc autoreg ; model returnsp =/nlag = 1 garch=(p=l,q=l,mean,type=integrated); run ; For the S&P 500 returns data, a GARCH(1,1)-M was fit in SAS. The estimate of 6 was .5150 with a standard error of .3695. This gives a t-value of 1.39 and a p-value of .1633. Since the p-value is reasonably large we could accept the null hypothesis that 5 = 0. Therefore, we see no strong evidence that there are higher returns during times of higher volatility. The volatility of the S&P 500 is market risk so this finding is a bit surprising. It may be that the effect is small (6 is positive, after all) and cannot be detected with certainty. The AIC criterion does select the GARCH-M model; see Section 8.13. 8.12 E-GARCH The exponential GARCH, or E-GARCH, model is q p iog(o-ŕ) = a0 + Y,aig(et-i) + EAlogM, where <7(et)=0et + 7{|et|-E(|et|)} 8.12. E-GARCH 249 and et = at/at. Since log(at) can be negative, there are no constraints on the parameters. Notice that g{et) = -1E{\et\) + {1 + 9)\et\ if et > 0, and g(et) =--yE(\et\) + fr - 0)\et\ if et < 0, It is a good calculus exercise to show that £"(^1) = J2/tt = .7979. Typically, — 1 < 6 < 0 so that O<7+0<7— 9. For example, 9 = —.7 in the S&P 500 example; see below. The function g with 9 = —.7 is plotted in the top left panel of Figure 8.11. Notice that g(et) is negative if |et| is close to zero; small values of noise decrease at. If |et| is large, then at increases. With a negative value of 9, at increases more rapidly as a function of | et | when et is negative than when et is positive, In finance, the "leverage effect" predicts that a asset's price will become more volatile when its price decreases. This is the type of behavior obtained when 9 < 0. The ability to acommodate leverage effects was the reason that the E-GARCH model was introduced by 'Daniel Nelson. The function g for several other values of 9 are also shown in Figure 8.11. When 9 = 0 (top right) the function is symmetric about 0. The bottom right panel where 9 = — 1 shows an extreme case where g(et) is negative for all positive et SAS fits the E-GARCH model with 7 fixed as 1 and 9 estimated. The E-GARCH model is specified by using "type=exp" as in proc autoreg ; model returnsp =/nlag = 1 garch=(p=l,q=l,mean,type=exp); run ; This command specifies both a GARCH-in-mean effect and the E-GARCH model. Omitting "mean" removes the GARCH-in-mean effect. 250 CHAPTER 8. GARCH MODELS: 4/24/01 e = -0.7 e = = 0 6 6 5 5 4 4 CO "„3 V / ra 2 1 0 -1 ra 2 -1 \ / \ / e = 0.7 e = -1 6 6 5 5 4 4 •~^3 to ^3 to ra 2 ra 2 1 0 -1 1 0 -1 Figure 8.11: T/ze g function f or the S&P 500 data (top left panel) and several other values of 9. 8.13 Back to the S&P 500 example SAS can fit six different AR(1)/GARCH(1,1) models since SAS allows "type' to be "integrated," "exp," or "nonneg." The last is the default and specifies a GARCH model with non-negativity constraints. Moreover, for each of these three types we can specify that a GARCH-in-mean effect be included or not. Table 8.1 contains the AIC statistics for the six models. The models are ordered from best fitting to worse fitting according to AIC—remember that a smaller AIC is better. It seems that the E-GARCH-M model is best, though the E-GARCH model fits nearly as well. The E-GARCH-M model will be used in the remaining discussion. To see if more AR or GARCH parameters would improve the fit, AR(2) and E-GARCH(1,2)-M, E-GARCH(2,1)-M, and E-GARCH(2,2)-M models were tried, but none of these lowered AIC or had all parameters significant at p = .1. Thus, AR(1)/E-GARCH(1,1) appears to be a good fit to the noise and adding a GARCH-in-mean term to the regression model 8.13. BACK TO THE S&P 500 EXAMPLE 251 Model AIC A AIC E-GARCH-M -1783.9 0 E-GARCH -1783.1 0.8 GARCH-M -1764.6 19.3 GARCH -1764.1 19.8 I-GARCH-M -1758.0 25.9 I-GARCH -1756.4 27.5 Table 8.1: AIC statistics for six AR(1)/GARCH(1,1) models fit to the S&P 500 returns data. A AIC is the change in AIC between a given model and E-GARCH-M. seems reasonable although it does not improve the fit very much. The fit to this model is in the SAS output listed below. 252 CHAPTER 8. G ARCH MODELS: 4/24/01 Listing of SAS output for the E-GARCH-M model: S&P 500 monthly data from Pindyck & Rubinfeld, Ex 10.5 2 Regression model with AR(1)/E-GARCH(1,1)-M 11:52 Sunday, April 15, 2001 The AUTOREG Procedure Estimates of Autoregressive Parameters Standard Lag Coefficient Error t Value 1 -0.234934 0.046929 -5.01 Algorithm converged. Exponential GARCH Estimates SSE 0.44211939 Observations 433 MSE 0.00102 Uncond Var Log Likel: Lhood 900.962569 Total R-Square 0.1050 SBC -1747.2885 AIC ■1783.9251 Normality Test 24 .9607 Pr > ChiSq Standard <.0001 Approx Variable ] DF Estimate Error t Value Pr > 11 | Intercept 1 -0.003791 0.0102 -0.37 0.7095 DR3 1 -1.2062 0.3044 -3.96 <.0001 GPW 1 -0.6456 0.2153 -3.00 0.0027 AR1 1 -0.2376 0.0592 -4.01 <.0001 EARCHO 1 -1.2400 0.4251 -2.92 0.0035 EARCH1 1 0.2520 0.0691 3.65 0.0003 EGARCHl 1 0.8220 0.0606 13.55 <.0001 THE TA 1 -0.6940 0.2646 -2.62 0.0087 DELTA 1 0.5067 0.3511 1.44 0.1490 8.14. THE G ARCH ZOO 253 8.14 The GARCH zoo There are many more types of GARCH models than the few mentioned so far. I've discussed only the most widely used models that can be fit in SAS. The number of models seems limited only by the number of letters in the alphabet, not the imagination of econometricians! Here's a sample of other GARCH models mentioned in Bollerslev, Engle, and Nelson (1994): • Q ARCH = quadratic ARCH • TARCH = threshold ARCH • STARCH = structural ARCH • SWARCH = switching ARCH • QTARCH = quantitative threshold ARCH • vector ARCH • diagonal ARCH • factor ARCH 8.15 Applications of GARCH in finance GARCH models were developed by econometricians working with business and finance data, and their applications to finance have been ex-tenisve. The review paper by Bollerslev, Engle, and Nelson lists hundreds of references. Finance models such as the CAPM and the Black-Scholes model for option pricing assume a constant conditional variance. When this assumption is false, use of these models can lead to serious errors. Therefore, generalization of finance models to include GARCH errors has been a hot topic. See Bollerslev, Engle, and Woolridge (1988) and Duan (1996a, 1996b) for some examples of finance models with conditional heteroscedasticity. 254 CHAPTER 8. G ARCH MODELS: 4/24/01 Rossi (1996) is a collection of papers, many reprinted from finance journals, on modeling stock market volatility with GARCH models. 8.16 Summary • The marginal, or unconditional, distribution of a stationary process is the distribution of an observation from the process given no information about the previous or future observations - by stationarity the marginal distribution must be constant - in particular, the marginal mean and variance are constant • Besides the marginal distribution, we are interested in the conditional distribution of the next observation given the current information set of present and past values of the process, and perhaps of other processes • For ARMA processes the conditional mean is non-constant but the conditional variance is constant • The constant conditional variance of ARMA processes makes them unsuitable for modeling the volatility of financial markets • GARCH process have non-constant conditional variance and were developed to model volatility • GARCH processes can be used as the "noise" term of an ARMA process - ARMA/GARCH processes have both non-constant conditional mean and a non-constant conditional variance - GARCH and ARMA/GARCH processes can be estimated by maximum likelihood. - Proc Autoreg in SAS fits AR/GARCH models 8.16. SUMMARY 255 • The simple ARCH(g) models have burst of volatility but cannot model persistent volatility • The generalized ARCH (G ARCH) models can model persistent volatil-ity • The marginal distribution of a GARCH process has heavier tails than the normal distribution. - heavy tails = outlier prone - in fact, for certain parameter values a GARCH process will have an infinite variance, which is an extreme case of heavy tails * I-GARCH (integrated GARCH) models are examples of GARCH models with infinite variance • If the marginal variance is infinite, then the sample variance will converge to infinity as the sample size increase • For extremely heavy tails, the marginal expectation may not exist - then there exists no point to which the sample mean can converge * the sample mean will wander aimlessly • ARMA/GARCH processes can be used as the noise term in regression models - SAS's Proc Autoreg can use an AR/GARCH noise term in a regression model • The G ARCH-M models use the conditional standard deviation as an independent variable in the regression • The "leverage effect" occurs when a negative return (drop in price) increases the volatility of future returns because the denominator of those returns is smaller. • E-GARCH models were designed to capture the leverage effect 256 CHAPTER 8. GARCH MODELS: 4/24/01 - in an E-GARCH model, the log of the conditional standard deviation is modeled as an ARMA process but with the white noise process et replaced by another white noise process g(et) - there is no need for non-negativity constraints on the parameters, such as those in an ordinary GARCH model, since the log standard deviation can be negative - the parameter 9 in an E-GARCH model determines the leverage effects * 9 < 0 => leverage effects * 9 = 0 => no leverage * 9 > 0 =>■ positive returns increase volatiltiy (this would be the opposite of the leverage effect and is not expected to happen in practice) • In the S&P 500 example we found that - returns are negatively associated with changes in interest rates (an increase in interest rates decreases returns) - returns are negatively associated with changes in wholesale prices - returns are positively associated with returns lagged one month ((f) = —.2376 is negative in the SAS output, but SAS's definition of is +.2376) - there are leverage effects since a E-GARCH model fits better than a GARCH model and ó = -.7 - there is slight evidence of a GARCH-in-mean effect, that is, there is some reason to believe that there is a risk premium • There is a wide variety of other GARCH models in the literature, but the ones discussed here, ARCH(?), GARCH (p, q), E-GARCH, GARCH-M, and I-GARCH, are probably enough to know about since they can model a wide variety of data types - the models discussed in these notes are the ones that can be fit by SAS 8.17. REFERENCES 257 • There is a large and growing literature on financial models with returns following GARCH processes 8.17 References Bodie, Z., Kane, A., and Marcus, A. (1999), Investments, Irwin /McGraw-Hill, Boston. BoUerslev, T. (1986), Generalized autoregressive conditional heteroskedas-ticity, /. of Econometrics, 31, 307-327. BoUerslev, T., Engle, R.R, and Nelson, D.B. (1994), ARCH models, in Handbook of Econometrics, Volume IV, Engle, R.R, and McFadden, D.L., editors, Elsevier. BoUerslev, T., Chou, R.Y, and Kroner, K.R (1992), ARCH modeling in finance, /. of Econometrics, 52,5-59. BoUerslev, T., and Engle, R.R (1993), Common persistence in conditional variances, Econometrica, 61,167-186. BoUerslev, T., Engle, R.R, and Nelson, D.B. (1994), ARCH models, In Handbook of Econometrics, Vol IV, R.R Engle and D.L. McFadden, ed., Elsevier. BoUerslev, T., Engle, R.R, and Wooldridge, J.M. (1988). A capital asset pricing model with time-varying covariances, /. of Political Economy, 96,116-131. Carroll, R.J., and Ruppert, D. (1988), Transformation and Weighting in Regression, Chapman & Hall, New York. Duan, J-C. (1996a). A unified theory of option pricing under stochastic volatility — from GARCH to diffusion, manuscript, (available at http://www.bm.ust.hk/ fina/staff/jcduan.html) 258 CHAPTER 8. GARCH MODELS: 4/24/01 Duan, J-C. (1996b). Term structure and bond option pricing under GARCH, manuscript, (available at http://www.bm.ust.hk/ fina/staff/jcduan. html) Enders, W. (1995), Applied Econometric Time Series, Wiley, New York. Engle, R.E (1982), Autoregressive conditional heteroskedasticity with estimates of variance of U.K. inflation, Econometrica, 50, 987-1008. Nelson, D.B. (1989). Modelling stock market volatility changes, ASA 1989 Proceedings of the Business and Economics Statistics Section, pp. 93-98. [Reprinted in Rossi (1996)] Pindyck, R.S., and Rubinfeld, D.L., (1998), Econometric Models and Economic Forecasts, Irwin/McGraw Hill, Boston. Rossi, P.E. (1996). Modelling Stock Market Volatility, Academic Press, San Diego. SAS Institute (1993), SAS/ETS User's Guide, Version 6, 2nd Edition, SAS Institute, Cary, NC. Chapter 9 Fixed Income Securities: 4/30/01 9.1 Introduction Corporations finance their operations by selling stock and bonds. Owning a share of stock means partial ownership of the company. You share in both the profits and losses of the company, so nothing is guaranteed. Owning a bond is different. When you buy a bond you are loaning money to the corporation. The corporation is obligated to pay back the principle and to pay interest as stipulated by the bond. You receive a fixed stream of income, unless the corporation defaults on the bond. For this reason, bonds are called "fixed-income" securities. It might appear that bonds are risk-free, almost stodgy. This is not the case. Many bonds are long-term, e.g., 20 or 30 years. Even if the corporation stays solvent or if you buy a US Treasury bond where default is virtually impossible, your income from the bond is guaranteed only if you keep the bond to maturity. If you sell the bond before maturity, your return will depend on changes in the price of the bond due to changes in interest rates. The interest rate of your bond is fixed, but in the market interest rates fluctuate. Therefore, the market value of your bond fluctuations too. For 259 260 CHAPTER 9. FIXED INCOME SECURITIES: 4/30/01 example, if you buy a bond paying 5% and the rate of interest increases to 6% then your bond is inferior to new bonds offering 6%. Consequently, the price of your bond will decrease. If you sell the bond you would lose money. So much for a "fixed income" stream! If you ever bought a CD, which really is a bond that you buy from a bank or credit union, you will have noticed that the interest rate depends on the maturity of the CD. This is a general phenomenon. For example, on March 28, 2001, the interest rate of Treasury bills1 was 4.23% for 3-month bills. The yields on Treasurys were 4.41%, 5.01%, and 5.46% for 2, 10, and 30 year maturities, respectively. The term structure of interest rates describes how rates of interest change with the maturity of bonds. In this chapter we will study how bond prices fluctuate due to interest rate changes. We will also study how the term structure of interest rates can be determined. 9.2 Zero coupon bonds Zero-coupon bonds, also called pure discount bonds, pay no principle or interest until maturity. A "zero" has a par value which is the payment made to the bond holder at maturity. The zero sells for less than par, which is the reason it is a "discount bond." For example, consider a 20-year zero with a par value of $1000 and 6% interest compounded annually. The price is the present-value of $1000 with discounting annually at 6%. That is, the price is 7^ = »311.80. (1.06)20 1 Treasury bills have maturities of one year or less, Treasury notes have maturities from one to ten years, and Treasury bonds have maturities from 10 to 30 years. 9.2. ZERO COUPON BONDS 261 If the interest is 6% but compounded every six months, then the price is $1000 just = $306'56' and if the interest is 6% compounded continuously then the price is ?™ n = $301.19. exp{(.06)(20)} 9.2.1 Price and returns fluctuate with the interest rate For concreteness, assume semi-annual compounding. Suppose you just bought the zero for $306.56 and then six months later the interest rate increased to 7%. The price would now be T^š = «61.41 (1.035)39 so your investment would drop by ($306.56 — $261.41) = $45.15. You will still get your $1000 if you keep the bond for 20 years, but if you sell it now you will lose $45.15. This is a return of -45.15 306.56 -14.73% for a half-year or —29.46% per year. And the interest rate only changed from 6% to 7%! If the interest rate dropped to 5% after six months, then your bond would be worth ^ooo _ .o81 74 (LÖ25F " This would be an annual rate of return of '381.74-306.56 306.56 49.05%. If the interest rate remained unchanged at 6%, then the price of the bond would be $100° $315.75. 11.03) 39 262 CHAPTER 9. FIXED INCOME SECURITIES: 4/30/01 The annual rate of return would be '315.75-306.56 70. 306.56 Thus, if the interest rate does not change, you can earn a 6% annual rate of return by selling the bond before maturity. If the interest rate does change, however, the 6% annual rate of return is guaranteed only if you keep the bond until maturity. 9.3 Coupon bonds Coupon bonds make regular interest payments.2 Coupon bonds generally sell at par when issued. At maturity, one receives the principle and the final interest payment. As an example, consider a 20-year coupon bond with a par value of $1000 and 6% annual interest with semi-annual coupon payments. Each coupon payment will be $30. Thus, the bond holder receives 40 payments of $30, one every six months plus a principle payment of $1000 after 20 years. One can check that the present value of all payments, with discounting at the 6% annual rate (3% semi-annual), equals $1000: 40 30 1000 t=i v1- w + 7-------OTT = 1000. 03)* (1.03)40 After six months if the interest rate is unchanged, then the bond (including the first coupon payment which is now due) is worth ^ 30 1000 _ (™ 30 1000 \ _ to ÖW + ÍL03F " (L03) [t VWf + PF] " 1030' which is a 6% annual return as expected. If the interest rate increases to 7%, then after six months the bond (plus the interest due) is only worth ^ 30 1000 _ /i°, 30 _L000_\ t (1-035)* + (1.035)39 " (LUá5J \t (1-035)* + (1.035)40J " 9 ^ 2At one time actual coupons were attached to the bond, one coupon for each interest payment. When a payment was due, its coupon could be clipped off and sent to the issuing company for payment. 9.3. COUPON BONDS 263 This is an annual return of Z924.49 - 1000^ = V looo J If the interest rate drops to 5% after six months then the investment is worth ^ 30 1000 _ n no^ [^ 30 1000 \ _ f-0 (1.025)* + (1.025)39 - (LU25j [^ (1.025)* + (1.025)4°J " ^ 15d'7U' (9.1) and the annual return is /1153.6-1000\ V 1000 2 I-------'t^-------) = 30.72%. Some general formulas Let's derive some useful formulas. If a bond with a par value of PAR matures in T years and makes semi-annual payments of C and the discount rate (rate of interest) is r per half-year, then the value of the bond when it is issued is IT C PAR C < , , o^i PAR t=1 (1 + r)* + (l + r)2T = 7l1"^1+r') )+(r+r)2r = ^ + |pAR-^}(l + r)-2T (9.2) If C7 = PARxr, then the value of the bond when issued is PAR. The value six months later is (1 + r) times the value in equation (9.2). The MAT-LAB function "bondvalue.m" computes (9.2). The call to this function is bondvalue(c,T,r,par). For example, if the coupon is C = 30, if T = 30, and if after six months r = 6.2%/half-year (or 3.1%/year), then the bond is worth (1.031) 30-{1-(1.031)-«°} 100° .031 L ' ' ' J (1.031)60 This value was computed by MATLAB with the call 1003.1. 264 CHAPTER 9. FIXED INCOME SECURITIES: 4/30/01 1.031*bondvalue(30,30,.031,1000). Similarly, (9.1) was computed with the MATLAB call 1.025*bondvalue(30,20,.025,1000). Derivation of (9.2) The summation formula for a finite geometric series is T -i _ T+l provided that r ^ 1. Therefore, ^ C = C 2^ 1 C{l-(l + r)-2ľ} Ží(l + r)* 1 + r t^ (1 + r)* (1 + r)(l - (1 + r)-i) (9.4) r {l-(l + r)-^}. (9.5) 9.4 Yield to maturity Suppose a bond with T = 30 and C = 40 is selling for $1200, $200 above par. If the bond were selling at par, then the interest rate would be .04/half-year (= .08/year). The 4%/half-year rate is called the coupon rate. But the bond is not selling at par. If you purchase the bond at $1200 you will make less than 8% per year interest. There are two reasons when the rate of interest is less than 8%. First, the coupon payments are $40 or 40/1200 = 3.333%/half-year of the $1200 investment; 3.333% is called the current yield. Second, at maturity you only get back $1000 of the $1200 investment. The current yield overestimates the return since it does not account for this loss of capital. The yield to maturity is a measure of the average rate of return, including the loss (or gain) of capital because the bond was purchased above (or 9.4. YIELD TO MATURITY 265 below) par. For this bond, the yield to maturity is the value of r that solves 40 f___ 40' 1200 ^ + {l000-^}(l + r)-60. (9.6) The right hand side of (9.6) is (9.5) with C = 40, T = 30, and PAR = 1000. It is easy to solve equation (9.6) numerically. The MATLAB program yield.m does the following: • computes the bond price for each r value on a grid • graphs bond price versus r (this is not necessary but it's fun to see the graph) • interpolates to find the value of r when bond value equals 1200 One finds that the yield to maturity is 0.0324. Figure 9.1 shows the graph of bond price versus r and shows that r = .0324 maps to a bond price of $1200. The yield to maturity of .0324 is less than the current yield of 0.0333 which is less than the coupon rate of 40/1000 = .04. (All three rates are rates per half-year.) Thus, we see that • coupon rate > current yield - since the bond sells above par • current yield > yield to maturity - since yield to maturity accounts for the loss of capital when at the maturity date you only get back $1000 of the $1200 investment Whenever, as in this example, the bond is selling above par, we have coupon rate > current yield > yield to maturity (9.7) Everything is reversed if the bond is selling below par. For example, if the price of the bond were only $900, then 266 CHAPTER 9. FIXED INCOME SECURITIES: 4/30/01 par=1000, coupon payment=40, T=30 1800 1600- 1400 1200 1000 800 0.02 0.025 0.03 0.035 0.04 0.045 0.05 yield to maturity Figure 9.1: Bond price versus the interest rate r and determining by interpolation the yield to maturity when the price equals $1200. • the yield to maturity would be 0.0448 (as before, this value can be determined by "yield.m" using interpolation) • the current yield would be 40/900 = 0.0444 • The coupon rate would still be 40/1000 = .04 Therefore we would have coupon rate < current yield < yield to maturity, which is just the opposite of (9.7). (9.8) 9.4.1 Spot rates The yield to maturity of a zero coupon bond of maturity n years is called the n-year spot rate. 9.5. TERM STRUCTURE 267 A coupon bond is a bundle of zero coupon bonds, one for each coupon payment and a final one for the priniciple payment. The component zeros have different maturity dates and therefore different spot rates. The yield to maturity of the coupon bond is a complex "average" of the spot rates of the zeros in this bundle. 9.5 Term structure On January 26, 2001, the Ithaca Journal stated that 1-year T-bill rate was 4.83% and the 30-year Treasury bond rate was 6.11%. This is typical— short and long term rates usually do differ. Such differences can be seen in Figure 10.5 of Campbell et al. or Figure 15.7 of Bodie, Kane, and Marcus (1999). Often short term rates are lower than long-term rates. This makes sense since long term bonds are riskier. Long term bond prices fluctuate more with interest rate changes and these bonds are often sold before maturity. In contrast, a 90-day or even 1-year T-bill is often keep to maturity and so is really a risk-free "fixed income security." However, during periods of very high short-term rates, the short-term rates may be higher than the long term rates. The reason is that the market believes that rates will return to historic levels and no one will commit to the high interest rate for, say, 20 or 30 years. The term structure of interest rates is a description of how, at a given time, yield to maturity depends on maturity. Term structure for all maturities up to n years can be described by any one of the following: • prices of zero coupon bonds of maturities 1-year, 2-years,..., n-years denoted here by P(l), P(2),..., P{n) • spot rates (yields of maturity of zero coupon bonds) of maturities 1-year, 2-years,... , n-years denoted by yi,..., yn • forwards rates v\,..., r. 268 CHAPTER 9. FIXED INCOME SECURITIES: 4/30/01 As will be seen below, each of the sets . {P(l),...,P(n)} • {yi,---,yn}, and • {ru...,rn} can be computed from either one of the other sets. For example, (9.10) gives {.P(l),.. •, P(n)} in terms of {ri:..., rn}. Also, equations (9.11) and (9.12) give {yi,..., yn} in terms of {P(l),..., P(n)} or {n,..., rn}, respectively. Term structure can be described by breaking down the time interval between the present time and the maturity time of a bond into short time segments with a constant interest rate within each segment, but with interest rates varying between segments. For example, a 3-year loan can be considered as three consecutive 1-year loans. Example: As an illustration, suppose that the three 1-year loans have the forward interest rates listed in Table 9.1. Year (i) Interest rate (r,) 1 6% 2 7% 3 8% Table 9.1: Forward interest rate example Using the forward rates in Table 9.1, we see that a par $1000 1-year zero would sell for J292. = 129° = «Ma.« = p(i). 1 + ri 1.06 V ' 9.5. TERM STRUCTURE 269 A par $1000 2-year zero would sell for 1000 1000 (l + ri)(l + r2) (1.06)(1.07) A par $1000 3-year zero would sell for 1000 1000 = $881.68 = P(2). (1 + n)(l + r2)(l + r3) (1.06)(1.07)(1.08) 816.37 = P(3). The general formula for the present value of $1 paid n periods from now is 7---------77-------—^------7----------7- (9-9) {l + ri)(l + r2)---{l + rn) Here r, is the forward interest rate during the ith period. By "forward rate" we mean the price for that period that is agreed upon now. Letting P(n) be the price of an n-year zero par $1000 coupon bond, P(n) is $1000 times the discount factor in (9.9), that is, P{n) = 7_____™»_____T. (9.10) ^ ; (l + ri)...(l + rn) Back to the example Let's look at the yields to maturity. For a 1-year zero, the yield to maturity yi solves 1000 993.40, (l + »i) which implies that yi = .06. Nothing surprising here, since r\ = .06! For a 2-year zero, the yield to maturity is y2 that solves 1000 (1 + 02)1 Thus, = 881.68. V2 = nm. _! = ,06Ba y V 881.68 270 CHAPTER 9. FIXED INCOME SECURITIES: 4/30/01 It is easy to show that y2 is also given by V2 = y/(l + r1)(l + r2) - 1 = v/(1.06)(1.07) - 1 = .0650 For a 3-year zero, the yield to maturity y3 solves 1000 1000 (I + 2/3)3 881.68' Also, y3 = {(1 + n)(l + r2)(l + r3)}1/3 - 1 = {(1.06)(1.07)(1.08)}1/3 - 1 = .0700, or, more precisely .069969. Thus, (1 + y3) is the geometric average of 1.06, 1.07, and 1.08 and approximately equal to their arithmetic average. Recall that P(n) is the price of a par $1000 n-year zero coupon bond. The general formulas for the yield to maturity yn of an n-year zero are (ioool1/n (9.11) {(l + r1)...(l + rn)}1/n- -1. (9.12) and Equations (9.11) and (9.12) give the yields to maturity in terms of the bond prices and forward rates, respectively. Also, n, ^ 100° ,„-,™ P(n) =---------—, (9.13) which give P(n) in terms of the yield to maturity. As mentioned before, interest rates for future years are called forward rates. A forward contract is an agreement to buy or sell an asset at some fixed future date at a fixed price. Since r2, r3,... are rates at future dates that are fixed now when a long-term bond is purchased, they are forward rates. 9.5. TERM STRUCTURE 271 maturity price lyear $920 2 year $830 3 year $760 Table 9.2: Bond price example The general formula for determining forward rates from yields to maturity is n = yu (9.14) and - - <1 + *>" (9.15) " (1 + !/„-i)-'' Now suppose that we only observed bond prices. Can we calculate yields to maturity and forward rates? The answer is "yes/ using (9.11) and then (9.15)." Example: Suppose that 1, 2, and 3-year par 1000 zeros are priced as Table 9.2. Then using (9.11), the yields to maturity are: 1000 ž/i =---------1 = .087, y 920 r looo Ý12 y2 = \-^r\ - 1 = -0976, I 830 J r 10001 176ÖJ r íooo i1/3 ž/3 = ^77 - 1 = -096, Then, using (9.14) and (9.15) n = yi = .087, il^ = (L0976P _ (1 + i/i) 1.0876 272 CHAPTER 9. FIXED INCOME SECURITIES: 4/30/01 and (I + 2/3)3 . (1-096)3 nQ„ r3=(ľT^-1 = (rô976F"1 = -093- 1. (9.16) The formula for finding rn from the prices of zero coupon bonds is _ P(n-l) Tn - P(n) which can derived from 1000 P{n) :i + ri)(l + r2).-.(l + rn)' and P(n-l) =________1222________ 1 ] (l + r1)(l + r2).-.(l + rn_1)- To calculate r\ using (9.16), we need P(0), the price of a 0-year bond, but P(0) is simply the par value. (Trivially, a bond which must be be paid back immediately is worth exactly its par value.) Example Thus, using (9.16) 1000 7-1 = r 2 = and n = 920 — 1 — .UÖ/U 920 830 - 1 = .1084, 830 - 1 = .0921. 760 9.6 Continuous compounding Now we will assume continuous compounding with forward rates ri:..., r„ We will see that the use of continous compounding rates simplifies the 9.6. CONTINUOUS COMPOUNDING 273 relationships between the forward rates, the yields to maturity, and the prices of zero coupon bonds. If P(n) is the price of a $1000 par n-year zero coupon bond, then P(B) = 1M°-------- „.IT) exp(ri +r2^-------\-rn) Therefore, P(n - 1) exp(rx + • • • + r„) , , p/ x = —^——7-------\ = exP(rn), (9-18) P(n) exp(ri H-------hrn_i) and gl P(n) J rn- The yield to maturity of an n-year zero coupon bond solves the equation P(n) = 10Q° exp(nyn)' and is easily seen to be ž/n = (rH-------^rn)/n. Therefore, {j"i,..., rn} is easily found from {?/i,..., yn} by the relationship n = ž/n, and fn = «ž/n - (n - l)ž/n-l for U > 1. Example Using the prices in Table 9.2 (converted from par 1000 to par 1) we have P (I) = .930, P(2) = .850, and P(3) = .760. Therefore, r'=MiöH726< 274 CHAPTER 9. FIXED INCOME SECURITIES: 4/30/01 and Also, and f 930 "i r2 = log \ '------\ = .0899, 5 1.850 J — O — i/i = ri = .0725, y2 = (ri + r2)/2 = .0813, ?/3 = (ri+r2 + r3)/3 = .0915. 9.6.1 Continuous forward rates So far, we have assumed that forward interest rates vary from year to year, but that these rates are constant within each year. This assumption is, of course, unrealistic. The forward rates should be modeled as a function varying continuously in time, rather than as functions that are constant for one year at a time. It is unrealistic to assume a starting time and that interest rates change each year after this starting time. In fact, bonds are issued repeatedly and bonds of many maturities are on the market. To specify the term structure in a realistic way, we will assume that there is a function r(t) called the forward rate function such that the price of a zero coupon bond of maturity T and with par equal to 1 is given by P(T) = exp | - ľ r(t)dt 1. (9.19) Formula (9.19) is a generalization of formula (9.17). To appreciate this, suppose that r{t) = rk for k — 1 < t < k. With this piecewise constant r, / r(t)dr = r\ + r2 + ... + rT, Jo 9.6. CONTINUOUS COMPOUNDING 275 so that exp I - / r(t)dt > = exp {-{ri~\-------h rT)} and therefore (9.17) agrees with (9.19). The yield to maturity of a bond with maturity date T is 1 rT Vt = r L r^ dt Think of (9.20) as the average of r(t) over the interval 0 < t < T. (9.20) Jarrow, Ruppert, and Yu (2001) estimate r(t) in (9.19) under the assumption that r(t) is a member of a flexible class of functions called splines. Figure 9.2 shows estimated forward rate curves for US Treasury bonds. Figure 9.3 shows estimated forward rate curves AT&T bonds. These two figures come from Jarrow, Ruppert, and Yu (2001). 0.075 0.07- a> 0.06 0.05 0.045 ----- GCV, R&C - RSA, R&C - - GCV, QI2 .-. largeX,R&C ■-■ large J..QI2 ----- Schwartz 10 12 14 16 18 time to maturity Figure 9.2: Spline estimates of forward rates of US Treaury bonds. 276 CHAPTER 9. FIXED INCOME SECURITIES: 4/30/01 0.075 0.07 0.065 B ro I 006 o Li_ 0.055 0.05 0 2 4 6 8 10 12 14 16 18 20 Time to maturity Figure 9.3: Spline estimates of forward rates of AT&T bonds 9.7 Summary 9.7.1 Introduction • buy a bond = making a loan to the company - corporation is obligated to pay back the principle and interest (unless it defaults) - you receive a fixed stream of income - bonds are called "fixed-income" securities - for long term bond your income is guaranteed only if you keep the bond to maturity 9.7. SUMMARY 277 9.7.2 Zero coupon bonds • Zero-coupon bonds pay no principle or interest until maturity • zero-coupon bond = pure discount bond • par value is the payment made to the bond holder at maturity • a zero sells for less than par • Example: 20-year zero - par value of $1000 - interest 6% compounded every six months =>• price is $10W $306.56, (1.03)40 9.7.3 Risk due to interest rate changes • bond prices fluctuate with the interest rate - Example: assume semi-annual compounding - you just bought the zero for $306.56 * six months later the interest rate increased to 7% - price would now be 7^ = 1261.41 (1.035)39 - investment would drop by ($306.56 - $261.41) = $45.15 - return of -4515 -14.73% 306.56 for a half-year or —29.46% per year - however, if the interest rate remains unchanged then the bond is worth $1000 (1.03)39 * 3%/half-year return (1.03)($306.56) 278 CHAPTER 9. FIXED INCOME SECURITIES: 4/30/01 9.7A Coupon bonds • coupon bonds make regular interest payments • consider a 20-year coupon bond with a par value of $1000 and 6% annual interest with semi-annual coupon payments - coupon payment will be $30 - bond holder receives 40 payments of $30 * plus a principle payment of $1000 after 20 years - present value of all payments, with discounting at the 6% annual rate (3% semi-annual), equals $1000: 40 ?.k 30 1000 ?x (1.03)* (1.03)40 • General formula: - Notation * PAR = par value * C = coupon payment * T = maturity * r = interest rate per half-year - bond price = = 1000. 2T ť=l \L C PAR + = 7 + {PAR-7}(1 + r)"2T Yield to maturity Example: a bond with T = 30 and C = 40 is selling for $1200, bond selling at par =>- interest rate = .04/half-year (= .08/year). - 4%/half-year rate = coupon rate. 9.7. SUMMARY 279 • but not selling at par => if you purchase the bond at $1200 you will make less than 8% per year • two problems - coupon payments are $40 or 40/1200 = 3.333%/half-year of the $1200 investment * 3.333% is called the current yield - at maturity you only get back $1000 of the $1200 investment - yield to maturity = the average rate of return • Spot rates - The yield to maturity of a zero coupon bond of maturity n years is called the n year spot rate. - A coupon bond is a bundle of zeros, each with a different maturity and therefore a different spot rate * the yield to maturity of a coupon bond is a complex "average" of these different spot rates 9.7.5 Term structure of interest rates • term structure is description of how, at a given time, yield to maturity depends on maturity • term structure for all maturities up to n years can be described by any one of the following sets: - prices of zero coupon bonds of maturities 1-year, 2-years, ..., n-years denoted here by P(l), -P(2),..., P(n) - spot rates (yields of maturity of zero coupon bonds) of maturities 1-year, 2-years,..., n-years denoted by z/i,..., yn - forwards rates ri:..., rn • each of the above sets can be computed from either of the other sets. 280 CHAPTER 9. FIXED INCOME SECURITIES: 4/30/01 9.7.6 Continuous compounding • continous compounding simplifies the relationships between - forward rates - yields to maturity of zeros (spot rates) - prices of zeros • prices from forward rates: 1000 P(l) P(2) exp(rij 1000 exp(ri)exp(r2): etc., so that 1000 P[n) = exp(ri + r2H-------\-rn)' forward rates from prices: P(n-\) exp(ri H-------h rn) P (n) exp(ri H-------h rn_i) r -logľ^H n~ H P(n) j yield to maturity yn solves 1000 = exp(rn) P {n) exp(nyn)' Vn = (n H-------\-rn)/n. {r u..., r n] is easily found from {yu ..., yn} by: n = yi, and rn = nyn - (n - l)yn-i for n > 1. 9.8. REFERENCES 281 9.8 References Jarrow, R., Ruppert, D., and Yu, Yan. (2001), Estimating the term structure of corporate debt with a semiparametric penalized spline model, manuscript. Available at http://www.orie.cornell.edu/~davidr/papers/ 282 CHAPTER 9. FIXED INCOME SECURITIES: 4/30/01 Chapter 10 Behavioral finance: 5/1/01 10.1 Introduction • behavioral finance is an alternative to the EMH • this material taken mostly from the 2000 book by Andrei Shleifer of Harvard: - Inefficient Markets: An Introduction To Behavioral Finance • EMH has been the central tenet of finance for almost 30 years • power of the EMH assumption is remarkable • EMH started in the 1960's - immediate success in theory and empirically - early empircal work gave overwhelming support to EMH - EMH invented at Chicago and Chicago became a world center of research in finance - Jensen (1978) "no other proposition in economics ... has more solid empirical support" • verdict is changing 283 284 CHAPTER 10. BEHAVIORAL FINANCE: 5/1/01 - efficiency of arbitrage is much weaker than expected - true arbitrage possibilities are rare - near arbitrage is riskier than expected - "Markets can remain irrational longer than you can remain solvent" — John Maynard Keyes * quoted by Roger Lowenstein in When Genius Failed: The Rise and Fall of Long-Term Capital Management 10.2 Defense of EMH • three lines of defense of the EMH: - investors are rational - trading of irrational investors is random and their trades cancel each other - even if a "herd" of irrational investors trade in similar ways, rational arbitrageurs will eliminate their influence on market price • each of these defenses is weaker that had been thought • rational investing = "value a security by its fundamental value" - "fundamental value" = net present worth of all future cash flows • rational investing =>- prices are (geometric) random walks • but prices being random walks (or nearly so) does not imply rational investing • there is good evidence that irrational trading is correlated - look at the internet stock bubble • initial tests of the semi-strong form of efficiency supported that theory 10.3. CHALLENGES TO THE EMH 285 - event studies showed that the market did react immediately to news and then stopping reactin * so reaction to news, as EMH predictos * also no reaction to stale news, again as EMH predicts - Scholes (1972) found little reaction to "non news" * block sales had little effect on prices 10.3 Challenges to the EMH • it is difficult to maintain that all investors are rational. - many investors react to irrelevant information - Black calls them noise traders • investors act irrationally when they - fail to diversify - purchase actively and expensively managed mutual funds - churn their portfolios • investors do not look at final levels of wealth when assessing risky situations ("prospect theory") • there is a serious "loss aversion" • people do not follow Bayes rule for evaluating new information - too much attention is paid to recent history • overreaction is commonplace • these deviations from fully rational behavior are not random • moreover, noise traders will follow each others mistakes • thus, noise trading will be correlated across investors CHAPTER 10. BEHAVIORAL FINANCE: 5/1/01 • managers of funds are themselves human and will make these errors too • managers also have their own types of errors - buying portfolios excessively close to a benchmark - buying the same stocks as other fund managers (so as not to look bad) - window dressing — adding stocks to the portfolio that have been performing well recently - on average, pension and mutual fund managers underperform passive investment strategies * these managers might be noise traders too .4 Can arbitrageurs save the day? • the last defense of the EMH depends on arbitrage • even if investor sentiment is correlated and noise traders create incorrectly priced assets - arbitrageurs are expected to take the other side of these traders and drive prices back to fundamental values • a fundamental assumption of behavioral finance is that real-world arbitrage is risky and limited • arbitrage depends on the existence of "close substitutes" for assets whose prices have been driven to incorrect levels by noise traders • many securities do not have true substitutes • often there are no risk-less hedges for arbitrageurs • mispricing can get even worse, as the managers of LTCM learned - this is called noise trader risk 10.5. WHAT DO THE DATA SAY? 287 10.5 What do the data say? • Schiller (1981), "Do stock prices move too much to be justified by subsequent changes in dividends": - market prices are too volatile - more volatile than explained by a model where prices are expected net present values - this work has been criticized by Merton who said that Schiller did not correctly specify fundamental value • De Bondt and Thaler (1985), "Does the stock market overreact?": - frequently cited and reprinted paper - work done at Cornell - compare extreme winners and losers - find strong evidence of overreaction - for every year starting at 1933 they formed portfolios of the best performing stocks over the previous three years * "winner portfolios" - they also formed portfolios of the worse performing stocks * "loser portfolios" - then examined returns on these portfolios over the next five years * losers consistently outperformed winners - difference is difficult to explain as due to differences in risk, at least according to standard models such as CAPM - De Bondt and Thaler claim that investors overreact * extreme losers are too cheap * so they bounce back - the opposite is true of extreme winners 288 CHAPTER 10. BEHAVIORAL FINANCE: 5/1/01 • historically, small stocks have earned higher returns than large stocks - no evidence that the difference is due to higher risk - superior returns of small stocks have been concentrated in January - small firm effect and January effect seem to have disappeared over the last 15 years • market to book value is a measure of "cheapness" - high market to book value firms are "growth" stock * they tend to underperform * also they tend to be riskier, especially in severe down markets • October 19,1987 — Dow Jones index dropped 22.6% - there was no apparent news that day • Cutler et al (1991): looked at 50 largest one-day market changes - many came on days with no major news announcements • Roll (1988) tried to predict the share of return variation that could be explained by - economic influences - returns on other stocks in the same industry - public firm-specific news • Roll's findings: - R2 = .35 for monthly data - R2 = .2 for daily data • Roll's study also shows that there are no "close substitutes" for stocks 10.6. REFERENCES 289 - this lack of close substitutes limits arbitrage • stocks rise if the company is put on the S&P 500 index - this is reaction to "non news" - America Online rose 18% when included on the S&P • In summary, there is now considerable evidence against the EMH - This evidence was not found during early testing of the EMH - Researchers needed to know what to look for 10.6 References Cutler, D., Poterba, J., and Summers, L. (1991). Speculative dynamics, Review of Economic Studies, 53,1839-1885. De Bond t, W. and Thaler, R. (1985), 'Does the stock market overreact?, /. of Finance, 40, 793-805. Jensen, M. (1978), Some anomalous evidence regarding market efficiency, /. of Financial Economics, 6,95-101. Roll, R. (1988). R2, J. of Finance, 43,541-566. Shiller, R. (1981), Do stock prices move too much to be justified by subsequent changes in dividends, American Economic Review, 71,421-436. Shleifer, Andrei (2000), Inefficient Markets: An Introduction to Behavioral Finance, Oxford University Press, Oxford.