1
Lecture Notes for ORIE 473:
Empirical Methods in Financial Engineering
by D. Ruppert
© Copyright D. Ruppert, 2001
ii
Contents
1    Introduction: 3/21/01                                                                            1
2    Review of Prob and Stats: 3/12/01                                                       5
2.1     Densities, CDF's, means, variances, and correlation.....      5
2.2    Best Linear Prediction   ......................      6
2.2.1    Prediction Error......................      7
2.3    Conditional Distributions....................      7
2.4    The Normal Distribution.....................      8
2.4.1    Conditional expectations and variance.........      9
2.5    Linear Functions of Random Variables.............      9
2.6    Maximum Likelihood Estimation................    11
2.7    Likelihood Ratio Tests   ......................    12
3    Returns: 3/12/01                                                                                  15
3.1     Prices and returns.........................    15
3.2    Log returns.............................    16
3.3    Behavior of returns........................    17
3.4    Common Model — IID Normal Returns............    18
3.5    The Lognormal Model......................    18
3.6    Random Walk...........................    21
3.6.1    Geometric Random Walk................    22
3.7    Are log returns really normally distributed?..........    23
3.7.1    Do the GE daily returns look like a geometric random walk? .........................    25
3.8    Portrait of an econometrician, Eugene Fama..........    30
m
iv                                                                                              CONTENTS
3.9    Other empirical work related to Fama's............    32
3.10  Technical Analysis    ........................    34
3.11   Fundamental Analysis......................    35
3.12  Efficient Markets Hypothesis (EMH)..............    36
3.12.1   Three types of efficiency.................    36
3.12.2   Testing market efficiency    ................    37
3.13  Summary..............................    38
4    Univariate Time Series Models: 3/12/01                                            41
4.1     Time Series.............................    41
4.2    Stationary Processes    .......................    41
4.2.1     Weak White Noise....................    42
4.2.2     Estimating parameters of a stationary process   ....    43
4.3    AR(1) processes..........................    43
4.3.1     Properties of a stationary AR(1) process........    44
4.3.2     Nonstationary AR(1) processes.............    45
4.3.3     Estimation.........................    51
4.4    AR(p) models...........................    58
4.4.1    Example: GE daily returns    ...............    59
4.5    Moving Average (MA) Processes................    59
4.5.1     MA(1) processes   .....................    59
4.5.2     General MA processes..................    60
4.6    ARIMA Processes.........................    60
4.6.1     The backwards operator.................    60
4.6.2     ARMA Processes.....................    60
4.6.3     The differencing operator................    61
4.6.4     From ARMA processes to ARIMA process......    61
4.7    Model Selection..........................    64
4.7.1     AICandSBC   .......................    64
4.7.2     Stepwise regression applied to AR processes.....    66
4.7.3     Using ARIMA in SAS: Cree data............    70
4.8    Example: Three-month Treasury bill rates...........    77
4.9    Forecasting.............................    92
4.9.1    GE daily returns   .....................    92
CONTENTS
v
5    Portfolio Selection: 3/12/01                                                                95
5.1     Trading off expected return and risk..............    95
5.2    One risky asset and one risk-free asset.............    96
5.2.1     Example..........................    98
5.2.2     Estimating E(R) and aR.................  100
5.3    Two risky assets..........................  100
5.3.1    Estimating means, standard deviations, and covari-
ances............................  101
5.4    Combining two risky assets with a risk-free asset.......  103
5.4.1     Tangency portfolio with two risky assets.......  104
5.4.2     Effect of pi2   ........................  107
5.5    Harry Markowitz.........................  109
5.6    Risk-efficient portfolios with N risky assets..........  110
5.6.1     Efficient-set mathematics   ................  110
5.6.2     Selling short........................  126
5.6.3     The Interior decorator fallacy..............  128
5.6.4     Back to the math.....................  128
5.6.5     Example: JV = 2......................  134
5.7    Is the theory useful?    .......................  134
5.8    Example—Global Asset Allocation...............  135
5.9    Quadratic programming.....................  137
6    The Capital Asset Pricing Model: 3/26/01                                        141
6.1     Introduction to CAPM......................  141
6.2    The capital market line (CML)..................  144
6.3    Betas and the Security Market Line     ..............  147
6.3.1     Examples of betas.....................  151
6.3.2     Comparison of the CML with the SML    ........  151
6.4    The security characteristic line..................  152
6.4.1    Reducing unique risk by diversification........  153
6.5    Some theory............................  155
6.5.1     Contributions to the market portfolio's risk......  155
6.5.2     Derivation of the SML..................  156
6.6    Estimation of beta and testing the CAPM...........  158
VI
CONTENTS
6.6.1    Interpretation of alpha..................164
6.7    Summary..............................164
7    Pricing Options: 4/12/01                                                                   167
7.1     Introduction............................  167
7.2    Call options............................  168
7.3    The law of one price    .......................  169
7.3.1    Arbitrage    .........................  171
7.4    Time value of money and present value............  171
7.5    A simple binomial example...................  172
7.6    Two-step binomial option pricing................  176
7.7    Arbitrage pricing by expectation................  179
7.8    A general binomial tree model    .................  181
7.9    Martingales   ............................  183
7.9.1    The risk-neutral world..................  185
7.10  Trees to random walks to Brownian motion..........  186
7.10.1   Getting more realistic    ..................  186
7.10.2   A three-step binomial tree................  187
7.10.3   More time steps......................  189
7.10.4   Properties of Brownian motion.............  190
7.11   Geometric Brownian motion...................  191
7.12  Using the Black-Scholes formula................  194
7.12.1   How does the option price depend on the inputs? . .  194
7.12.2   An example —GE....................  194
7.12.3   Early exercise of calls is never optimal.........  196
7.12.4   Are there returns on non-trading days?........  198
7.12.5   Implied volatility.....................  199
7.13  Puts.................................  201
7.13.1   Pricing puts by binomial trees..............  201
7.13.2   Why are puts different than calls?   ...........  204
7.13.3   Put-call parity.......................  204
7.14  The evolution of option prices..................  206
7.15  Intrinsic value and time value..................  206
7.16  Black, Scholes, and Merton....................  210
CONTENTS
Vil
7.17  Summary..............................  212
7.18  References.............................  213
8    GARCH models: 4/24/01                                                                  215
8.1     Introduction............................  215
8.2    Modeling conditional means and variances..........  216
8.3    ARCH(l) processes........................  217
8.3.1    Example..........................  219
8.4    The AR(1)/ARCH(1) model...................  221
8.5    ARCT%) models.........................  223
8.6    GARCH(p, q) models.......................  224
8.7    Heavy-tailed distributions....................  226
8.8    Comparison of ARM A and GARCH processes........  228
8.9    Fitting GARCH models......................  229
8.9.1    Example: S&P 500 returns................  232
8.10  I-GARCH models.........................  238
8.10.1  What does it mean to have an infinite variance?   . . .  242
8.11   GARCH-M processes.......................  246
8.12  E-GARCH.............................  248
8.13  Back to the S&P 500 example   ..................  250
8.14  The GARCH zoo   .........................  253
8.15  Applications of GARCH in finance...............  253
8.16  Summary..............................  254
8.17  References.............................  257
9    Fixed Income Securities: 4/30/01                                                      259
9.1     Introduction............................  259
9.2    Zero coupon bonds........................  260
9.2.1    Price and returns fluctuate with the interest rate  . . .  261
9.3    Coupon bonds...........................  262
9.4    Yield to maturity   .........................  264
9.4.1    Spot rates   .........................  266
9.5    Term structure...........................  267
9.6    Continuous compounding....................  272
9.6.1    Continuous forward rates................  274
viii                                                                                            CONTENTS
9.7    Summary..............................  276
9.7.1     Introduction........................  276
9.7.2     Zero coupon bonds....................  277
9.7.3     Risk due to interest rate changes............  277
9.7.4     Coupon bonds    ......................  278
9.7.5     Term structure of interest rates.............  279
9.7.6     Continuous compounding................  280
9.8    References.............................  281
10 Behavioral finance: 5/1/01                                                                283
10.1   Introduction............................  283
10.2  Defense of EMH..........................  284
10.3  Challenges to the EMH......................  285
10.4  Can arbitrageurs save the day?.................  286
10.5  What do the data say?   ......................  287
10.6  References.............................  289
Chapter 1 Introduction: 3/21/01
The title of this course is "Empirical Research Methods in Financial Engineering."
"Empirical" means derived from experience, observation, or experiment — so we are going to work with data. We'll be doing statistics.
Financial engineering is the construction of financial products such as stock options.
Financial engineering uses probability models, e.g., those used to derive the famous Black-Scholes formula.
•  are these models supported by financial markets data?
•  how are the parameters in these models estimated?
Let's look ahead to the Black-Scholes formula for the price of a European call option. "Now" is called time 0. The maturity date of the option is T. The option gives us the right to purchase one share of stock for E dollars at time T. Let ST be the price of the stock at time T. At time 0, T and E are known but S is unknown.
At time T, ST will become known. If at time T we learn that ST > E then
1
2
CHAPTER 1. INTRODUCTION: 3/21/01
we will exercise the option and purchase one share. We can immediately sell the share for St dollars and earn a profit of St — E dollars.
If at time T, ST < E then we do not exercise the option. The option expires and we lose the original cost of the option, but no more.
The value of the option at time T is, therefore, max{0, S — E}. But right now at time 0, what is the value of the option, i.e., the price for which it should sell on the market?
Prior to the 1970's, options were priced by "seat of pants". Then Black, Sc-holes, and Merton deduced the correct price of a call option from a mathematical model (and much hard thinking).
They assumed that one can lend and borrow at a risk-free rate r. Thus, if Bt is the price at time í of a risk-free bond purchased for $1 at time 0, then
Bt = exp(rŕ).
Let St be the price of the underlying stock. They assumed that
St = S0exp(fit + aWt),
where ji is a "drift" or "growth rate," Wt is a Brownian motion stochastic process, and a is a standard deviation that measures the volatility of the stock. In this course, you will learn exactly what this model means. Right now, the "take home" message is that there are precise mathematical models of stock price movements that we can check against the data. Also, there are important parameters such as /i and a that must be estimated from data.
The Black-Scholes formula is
C = $(di)So - §(d2)Eexp(-rT) where C is the price of the option at time 0, <E> is the standard normal CDF,
4 = Wtiti,t^ and d2 = di_aVf.
3
The formula is, quite obviously, complicated and it not easy to derive, but it is easy to compute and was hard-wired into calculators almost immediately; the Black-Scholes formula and hand-held calculators both emerged in the early 1970's.
We will be interested in the underlying assumptions behind the formula. Remember: GI - GO (garbage in, garbage out). If the assumptions don't hold, then there is no reason to trust the Black-Scholes formula, despite the impressive mathematics behind it.
The equation Bt = exp(ri) of continuous compounding is the solution to the differential equation
The general solution is Bt = B0 exp(ri) and B0 = 1 since we have assumed that the bond can be purchased for $1 at time 0.
Where does
St = S0exp(aWt + fit)
come from? If a were 0, then this would be exponential growth, St = S0 exp(jui), just like the bond price Bt. The term aWt comes from the random behavior of stock prices, a is a standard deviation, essentially of the changes in the stock prices. The random process Wt is something we will need to learn much more about; and we will.
In this course we will
•  study models f o financial markets (they are complex but fascinating)
•  learn to test the models — do they fit financial markets data adequately?
•  estimate parameters in the models such as /i and a that are essential for correct pricing of financial products such as a call option.
4
CHAPTER 1. INTRODUCTION: 3/21/01
Key question: How do the prices of stocks and other financial assets behave?
Looking ahead to where this course is going
•  We will start by defining "returns" on the prices of a stock
•  We will then look at "ARIMA models"
-  these are models for "time series," which are sequences of data sampling over time
-  ARIMA models are stochastic processes
•  After looking at returns and time series models of return behavior we will look at optimal portfolios of risky assets (e.g., stocks) and of risky assets and risk-free bonds (e.g., US Treasury bills).
-  This will take us to the famous Capital Asset Pricing Model (CAPM)
Looking even farther ahead, we will later return to the pricing of stock options by the Black-Scholes formula and cover other areas of financial engineering such as the term structure of interest rates.
But before we get into applications of probability and statistics in financial engineering, we need to review probability and statistics so that we are all up to speed.
Chapter 2
Review of Prob and Stats: 3/12/01
2.1   Densities, CDF's, means, variances, and correlation
random variable — large set of possible values but only one will actually occur
continuous random variable — X is a continuous random variable if it has a p.d.f. fx such that
P(X e A) = í fx{x)dx    for all sets    A
■I A
The CDF of X if
rx
Fx(x) := /     fx{u)du
J — oo
The expectation of X is
/ + 00 xfx{x)dx -oo
The variance of X is
o\ := f{x - E(X)}2fx(x)dx = E{X - E(X)}2
5
6                          CHAPTER! REVIEWOFPROB AND STATS: 3/12/01
Useful formula: a\ = E(X2) - {E(X)}2.
The standard deviation is the square root of the variance.
A pair of random variables, (X, Y), as a bivariate density /xľ(i, y)
Covariance:
aXY = E[{X-E(X)}{Y-E(Y)}] = J{x-E(X)}{y-E(Y)}fXY(x,y)dxdy.
Useful formulas:
•  aXY = E(XY) - E(x)E(y) . aXY = E[{X - E(X)}Y]
. aXY = E[{Y - E(Y)}X]
•  oXY = E(XY) if E(X) = 0 or E(Y) = 0
The correlation coefficient between X and Y is pXY := oXY jaxaY.
2.2    Best Linear Prediction
Suppose we observe X and want to predict Y; this can be done if pXY is
not zero.
Best linear prediction means finding ß0 and ßi so that
H(ß0,ß1):=E{Y-(ß0 + ß1X)}2
is minimized.
if (A,, A) = /i(F2) - 2ßoE{Y) - 2ß1E(XY) + (A, + AX)2.
Setting the partial derivatives to zero we get
0   =   -E{Y) + ß0 + ß1E(X)emd
0   =   -E(XY)+ß0E{X) + ß1E(X2).
2.3. CONDITIONAL DISTRIBUTIONS
7
After some algebra we find that
ßi = oXYja\                                          (2.1)
and
A, = E{Y) - ß1E(X) = E{Y) - aXY/ax E(X).
Thus, the best linear predictor of Y is
Y:=ßo + ßiX = E(Y) + aXY/a2x{X - E(X)}
2.2.1    Prediction Error
The prediction error is Y — Y. It is easy to prove that E{Y — Y} = 0 so that the prediction is "unbiased." With a little algebra we can show that the expected squared prediction error is
E{Y-YY = al-^f = aUl-P\y). °x
If we do not observe X, then the best predictor of Y is E(Y) and the expected squared prediction error is aY. Therefore, pXY is the fraction by which the prediction error is reduced when X is known. Example: If pXY = .5, then the prediction error is reduced by 25% by observing X. If aY = 3, then the expected squared prediction error is 3 if X is unobserved by only 2 1 /4 if X is observed.
2.3    Conditional Distributions
Let fXY(x, y) be the joint density of a pair of random variables, (X, Y).
The marginal density of X is fx(x) := / fXY(x, y)dx and similarly for fY.
The conditional density of Y given X is
,      /  i  n      fxy(x,y)
fY\x{y\x) =                 .
8                          CHAPTER! REVIEWOFPROB AND STATS: 3/12/01
The conditional expectation of Y given X is just the expectation calculated using fY\x(y\x):
E(Y\X = x) = j yfY\x{y\x)dy
which is, of course, a function of x. The conditional variance of Y given X is
Var(r|X = x) = J{y - E(Y\X = x)}2fY\x{y\x)dy.
Example: Suppose /iľ(i, y) = 2on0 < x < 1 and x < y < 1.
Then the marginal density of X is fx(x) = 2(1 — x).
The conditional density of Y given X is fY\x{y\x) = (1—x)'1 for x < y < 1.
The conditional expectation of Y is
E{Y\X = x) = l-±^.
The conditional variance of Y is
(1 - XÝ Var(r|X = x) = ^—^-.
2.4   The Normal Distribution
The standard normal distribution has density
^):=-i=exp(^).
The N{ji, a2) density is
a    \    a    J The standard normal CDF is
$(rc) := /     4>{u)du.
J—oo
2.5. LINEAR FUNCTIONS OF RANDOM VARIABLES
9
$ can be evaluated using tables (ugh!) or more easily using software such as MATLAB or MINITAB.
If X ~ N(fjL, a2) then P(X < x) = ${{x - ß)/a}.
Example: If X ~ iV(5,4) then what is P(X < 7). Answer: $(1) = .8413. In MATLAB, "cdf n (1)" gives "ans   =   0.8413".
2.4.1    Conditional expectations and variance
The calculation of conditional expectations and variances can be difficult for some probability distributions, but it is quite easy for a pair (X, Y) that has a bivariate normal distribution.
For a bivariate normal pair, the conditional expectation of Y given X equals the best linear predictor of Y given X:
E(Y\X) = E{Y) + °-^f{X - E(X)}. °x
The conditional variance of Y given X is the expectation squared prediction error:
Var(r|X) = 4(l-p^)
2.5   Linear Functions of Random Variables
E(aY + b) = aE(Y) + b where F is a random variable and a and b are constants. Also,
Var(aF + 6) = a2Var(F).
If X and Y are random variables and w\ and w2 are constants, then
E{WlX + w2Y) = WlE{X) + w2E(Y),
10
CHAPTER 2. REVIEW OF PROB AND STATS: 3/12/01
and
Var(wiX + w2Y) = w\\ax(X) + 2Wlw2Cov(X, Y) + ^Var(F).
Check that
Let X = (Xi,..., XN)T be a random vector.  We define the expectation vector of X to be
í E(Xi) \
\E(Xn)J
The covariance matrix of X is
COV(X)
/    Var(X0        Cov(XuX2)    ■■■    Cov(X1,XN)\ Cov{X2,X1)        Var(X2)        •••    Cov^X^
VCov(Xw,^)    Cov(XN,X2)    •••       Var(Xw)    / Let w = (ii>i,..., wN)T be a vector of weights. Then
N
wTX = J2 WiXi
i=l
is a weighted average of the components of bX; it is a random variable.
One can show that
E(wTX) = wT{E(X)}.
Also
N     N
Var(wJX) = ^2 ^2 Wi Wj Cov(Xj, Xj)
This result can be expressed more simply using vector/matrix notation:
Vaľ(wJX) = wTCOV(X)w.                             (2.2)
2.6. MAXIMUM LIKELIHOOD ESTIMATION
11
Important fact: If X has a multivariate normal distribution, then w1X is a normally distributed random variable.
Example: Suppose that E{X\) = 1, E(X2) = 1.5, a2Xl = 1, ax   = 2, and Cov(Xi,X2) = .5. Find E{.3Xl + .7X2) and Var(.3Xi + .7X2). If (Xi X2)T is bivariate normal, find P(.3X1 + .7X2 < 2).
Ansioer: E(.3X1 + .7X2) = 1.35, Var(.3Xi + .7X2) = 1.28, and P(.3Xl + .7X2 < 2) = ${(2 - 1.35)/v/L28} = $(.5745) = .7172.
2.6    Maximum Likelihood Estimation
Maximum likelihood is the most important and widespread method of estimation. Many well-known estimators such as the sample mean and the least-squares estimator in regression are maximum likelihood estimators. Maximum likelihood is a very useful in practice and tends to give more precise estimates than other methods of estimation.
Let Y" = (Fi,..., Fn)T be a vector of data and let 6 = (#i,..., 9P)T be a vector of parameters. Suppose that /(y; 6) is the density of Y which depends on the parameters.
Example: Suppose that Fi,..., Fn are HD N(ß, a2). Then 0 = {jjL.a2). Also,
/(,;e) = ni0(^) = _L_exp{^|:W-^}.
L{0) := f(Y; 6) is called the "likelihood function" and is the density evaluated at the observed data. It tells us the likelihood of what was actually observed. The maximum likelihood estimator (MLE) is the value of 6 that maximizes the likelihood function. In other words, the MLE is the value of 0 that maximizes the likelihood of the data that was observed. We will denote the MLE by 0ml- Often it is mathematically easier to maximize log{L(0)}; since the log function is increasing, maximizing log{L(0)} is equivalent to maximizing L(0).
12
CHAPTER 2. REVIEW OF PROB AND STATS: 3/12/01
Example: In the example above, it is an easy calculus exercise to show that P-ML = Y. Also, with ji fixed at its MLE, the MLE of a2 solves
d                                      n         \    n
The solution to this equation is
n i=l
The MLE of a2 has a small bias. The "bias-corrected" MLE is the so-called "sample variance" defined as
In a "textbook example" such as the one above, it is possible to find an explicit formula for the MLE. With more complex models, there is no explicit formula for the MLE. Rather, one writes a program to compute log{L(ö)} for any value of 6 and then using optimization software to maximize this function numerically. For some models such as the ARIMA time series models discussed in Chapter 4, there are software packages, e.g, MINITAB and SAS, that compute the MLE; the computation of the log-likelihood function has been pre-programmed.
2.7   Likelihood Ratio Tests
Likelihood ratio tests, like maximum likelihood estimation, are a convenient, all-purpose tool. We will consider likelihood ratio tests when we wish to test a restriction on a subset of the parameters. Let
-O
be a partitioning of the parameter vector into two components. Suppose we want to test a hypothesis about 8\ without making any hypothesis
2.7. LIKELIHOOD RATIO TESTS
13
about the value of 02. For example, we might want to test that a population mean is zero; then Q\ = ß and 02 = o2.
Let 0ljO be the hypothesized value of 0X, e.g., 0ijO = 0 if we want to test that ß is zero. Then the hypotheses are
Ho: 0i = 0i,o    and    Hi : 0O ^ 0i,o.
For example, if we are testing that ß is zero then the hypotheses are
H0 : ß = 0    and    Hi : ß ^ 0.
Let Q ml be the maximum likelihood estimator and let 02;O be the value of 02 that maximizes L(0) when 0i = 0ljO.
Idea: If H0 is true, then £(0i,o, 02,o) should be similar to L{0). Otherwise, £(0i,o, 02,o) should be smaller that 1/(0).
The likelihood ratio test rejects H0 if
2[log{L(0ML)}-log{L(0i,o,02,o)}]>X^dim{öl).
Here dim(0i) is the dimension (number of components) of 0i and x2a k is the a upper-probability value of the chi-squared distribution with k degrees of freedom; the probability above \2a k ^s a-
Example: Suppose again that Yu...,Yn are HD N{ß,a2) and 0 = (ß.a2). We want to test that ß is zero. Note that
n                     n                       1      n
log(i) = - g log(27r) - - log(a2) - — YSY, - ßf.
If we evaluate log(L) at the MLE, we get
log{L(F, a2ML)} = --{1 + log(27r) + log^)}.
The value of a2 that maximizes L when ß = 0 is
1   n
% = &*■
14                      CHAPTER 2. REVIEW OF PROB AND STATS: 3/12/01
Therefore,
2[log{L(F,alfL)}-log{L(0,a02)}] = nlog (jjpj = nlog (^^^2)
The likelihood ratio test rejects H0 if
Tn    Y2      \
3XAYi-T)\
"log u» ,r; '^ >4
Chapter 3 Returns: 3/12/01
3.1   Prices and returns
Let Pt be the price of an asset at time t. Assuming no dividends the net return is
r,              Pt            n          Pt — Pt-1 tU =  TÍ---------I  =  -----^--------
Pt-1                   Pt-1
The simple gross return is
Pt
Pt-i
l + Rt
Example: If Pt = 2 and Pť+1 =2.1 then
l + Rt = 1.05    and    i?t = .05. The gross return over the most recent k periods (t — kto t) is
1 + Rt(k)   :=
Pt      =    (JjA (Pt-A       (Pt-k+i\ Pt_k         \Pt_J {Pt_2)'"{ Pt_k )
=    (1 + Rt) ■ ■ ■ (1 + Rt_k+1) 15
16
CHAPTER 3. RETURNS: 3/12/01
Returns are scale-free, meaning that they do not depend on units (dollars, cents, etc.). Returns are not unitless. Their unit is time; they depend on the units of t (hour, day, etc.). Example:
Time    t - 2    t-l      t      t + 1
P       200      210     206     212
R                  1.05    .981     1.03
Rt(2)                          1.03    1.01
Rt{3)                                     1.06
3.2    Log returns
Continuously compounded returns, also known as "log returns" are:
rt := log(l + Rt) = log í -^- J = pt - pt-i where
Pt ■= log(Pt)
[Notation: log(rc) will mean the natural logarithm of x throughout these notes. log10(x) will be used to denote the logarithm to base ten, if it is needed.]
Advantage — simplicity of multiperiod returns
rt{k)   :=   log{l + i?t(A;)}
=    log{(l + Rt)---(l + Rt.k+1)}
=    log(l + Rt) + --- + \og{l + Rt.k+1)
=    rt + Tt-x -\-------h rt_k+l
3.3. BEHAVIOR OF RETURNS
17
3.3    Behavior of returns
What can we say about returns?
•  They cannot be perfectly predicted — i.e., they are random.
•  If we were ancient Greeks, we would think of the returns as determined by the Gods or Fates (three Goddesses of destiny). The Greeks did not seem to realize that random phenomena do exhibit some regularities such as the law of large numbers and the central limit theorem.
Peter Bernstein has written an interesting popular book "Against the Gods: The Remarkable Story of Risk." He chronicles the developments of probability theory and our understanding of risk.
It took a surprisely long time for proability theory to develop. The ancient Greeks did not have probability theory.
Probability arose out of gambling during the Renaissance.
University of Chicago economist Frank Knight (1916 Cornell PhD) distinguishes between
•  measurable uncertainty or "risk proper" (e.g., games of chance) where the probabilities are known
•  unmeasurable uncertainty (e.g., finance) where the probabilities are unknown
At time t — 1 Pt and Rt are not only unknown, but we do not know their probability distributions.
However, we can estimate this probability distribution if we are willing to make an assumption.
Leap of Faith
Future returns will be similar to past returns.
18
CHAPTER 3. RETURNS: 3/12/01
More precisely, the probability distribution of Pt can be determined from
past data
With this (big) assumption, we can get somewhere — and we will!
Asset pricing models (e.g.  CAPM) use the joint distribution of a cross-section {Rh, ..., Rm} of returns on iV assets at a single time t
Other models use the time series {Rľ, R2l..., Rt} of returns on a single asset. We will start with a single asset.
3.4    Common Model — IID Normal Returns
Here Rľ, R2l... are the returns from a single asset. A common model is that they are
1.  mutually independent
2.  identically distributed, i.e., they have the same mean and variance
3.  normally distributed
IID = independent and identically distributed
There are (at least) two problems with this model:
•  The model implies the possbility of unlimited losses, but liability is usually limited; Rt > — 1 since you can lose no more than your investment
•  1 + Rt(k) = (1 + -Rt)(l + Rt-i) •••(! + Rt-k+i) is not normal — sums of normals are normal but not so with products.
3.5    The Lognormal Model
A second model assumes that the continuously compounded single-period returns, a.k.a. the log returns and denoted by rt/ are IID. Recall that the log
3.5.  THE LOGNORMAL MODEL
19
return is
n = log(l + Rt)
where 1 + Rt is the simple gross return
Thus, we assume that
\og(l + Rt)~N{ß,<j2)
so that 1 + Rt = exp (normal r.v.) > 0 so that Rt > — 1. This solves the first
problem.
Also,
1 + Rt{k)   =   (l + Rt)---{l + Rt-k+1) =   exp(rr) • • • exp(rt-k+1) =   exp(rrH-------hrt_i+i).
Therefore,
log{ 1 + Rt(k)} = r t + ...n_fc+1
Sums of normals are normal =>• the second problem is solved — normality of single period returns implies normality of multiple period returns.
The lognormal distribution goes back to Louis Bachelier (1900).
•  dissertation at Sorbonne called The Theory of Speculation
•  Poincarě: "M. Bachelier has evidenced an original and precise mind [but] the subject is somewhat remote from those our other candidates are in the habit of treating."
•  Bachelier was awarded "mention honorable" rather than "mention trés honorable" — Bachelier never found a decent academic job.
•  Bachelier anticipated Einstein's (1905) theory of Brownian motion.
In 1827, Brown, a Scottish Botanist, observed the erratic, unpredictable motion of pollen grains under a microscope
20
CHAPTER 3. RETURNS: 3/12/01
Einstein (1905) — movement due to bombardment by water molecules — Einstein developed a mathemetical theory giving precise quantitative predictions.
Later, Norbert Wiener, an MIT mathematician, developed a more precise mathematical model of Brownian motion. This model is now called the Wiener process.
[Aside: 1905 was a good year for Einstein. He published:
•  the paper on introducing special relativity
•  a paper on quantization of light which led quantum theory (which he never embraced — "God does not play dice with the world")
•  the paper on Brownian motion] Bachelier stated that
•  "The mathematical expectation of the speculator is zero" (this is essentially true of short-term speculation but not of long term investing)
•  "It is evident that the present theory solves the majority of problems in the study of speculation by the calculus of probability"
Bachelier's thesis came to light accidently more than 5 years after he wrote it. Jimmie Savage found a book by Bachelier in the U. Chicago library and asked other economists about it. Paul Samuelson found Bachelier's thesis in the MIT library. The English translation was published in 1964 in The Random Character of Stock Market Prices, an edited volume.
Example: A simple gross return, (1 + R), is lognormal(0,(.l)2). What is
P{l + R< .9)?
Answer: Since log(.9)= —.105,
P{l+R < .9) = P(log(l+i?) < log(.9)) = ${(-.105-0/.!} = $(-1.05) = .1469
3.6. RANDOM WALK
21
InMATLAB,cdfn(-1.05)   =   .1469.
Example:
Assume again that 1 + R is log normal(0,(.l)2). Find the probability that a simple gross two-period return is less than .9.
Answer: The two-period gross return is log normal(0,2(.l)2) so this probability is
$ S^É^L] = $(-.745) = .2281. 1(>/2)(.1)J       l       ;
Let's find a general formula for the kth period returns: Assume that
. 1 + Rt(k) = (1 + Rt) ■ ■ ■ (1 + Rt-k+1).
•  log(l + Ri) ~ N{(jl, a2) for all i
•  The {Rj} are mutually independent.
Then log{l + Rt{k)} is the summ of k independent N{jjl, a2) random variables, so that log(l + Rt(k)) ~ N(kfi, ka2).
P(l + W<,) = ${i^^}.
3.6    Random Walk
Let Z]_, Z2-, ■ ■ ■ be IID with mean jjl and standard deviation a. Z0is an arbitrary starting point. Let S0 = Z0 and
St-Zo + Zr + '-' + Zt, t>l.
S0, S\,... is called a random walk.   We have E(St\ZQ)  = ZQ + [it and Vai(St\Z0) = o2t.
22
CHAPTER 3. RETURNS: 3/12/01
Random Walk
Figure 3.1: Mean and probability bounds on a random walk with S0 = 0, p = .5 and a = 1. At any given time, the probability of being between the probability bounds (dashed curves) is 68%.
3.6.1    Geometric Random Walk
Recall that log{l + Rt(k)} = rt-\-------h rt-k+i- Therefore
p
—^— = 1 + Rt(k) = exp(rt H-------h rt_k+l)
ťt-k
so taking k = t we have
Pt = -Po exp(rt + rt-i H-------h n).
Conclusion: If the log returns are IID normals, then the process {Pt : t = 1,2,...} is the exponential of a random walk.  We call such a process a "geometric random walk". lír = log(l + R) is N(p, a2), then the median of R is exp(ju) — 1 since
P{R < exp(/i)-l) = P(l+R < exp(ji)) = P(r < p) = P{N{p, a2) < p) = \.
3.7. ARE LOG RETURNS REALLY NORMALLY DISTRIBUTED?         23
Lognormal densities
Figure 3.2: Log normal densities.
3.7   Are log returns really normally distributed?
There are several ways to check whether log returns are really normally distributed. One way is to look at a normal probability plot of the log returns to see if the plot is approximately a straight line. Another method is to look at the sample skewness and kurtosis of the log returns and to check if their values are near those of the normal distribution; any normal distribution has a skewness coefficient of 0 and a kurtosis of 3.
Suppose with have a time series of log retu asset. The sample skewness, denoted by S, i
series of log returns, "is
ri,..., rt,..., tt on some

The sample kurtosis is The sample skewness, denoted by S, is
24
CHAPTER 3. RETURNS: 3/12/01
The "excess kurtosis" is K — 3. Both the sample skewness and the excess kurtosis should be near 0 if the log returns are normally distributed.
Table 1.1 of Campbell et al. gives S and K — 3 for several market indices and common stocks. In that table, S is generally close to zero, which indicates that log returns are not very skewed. However, the excess kurtosis is typically rather large for daily returns and positive though not as large for monthly returns. By the CLT, the distribution of log returns over longer periods should approach the normal distribution. Therefore, the smaller excess kurtosis for monthly log returns, in contrast to daily log returns, is expected. The large kurtosis of daily returns indicates that the are "heavy-tailed."
Normal probability plots can be supplemented by tests of normality based on the sample CDF, F. F(x) is defined to be the proportion of the sample that is less than or equal to x; if 10 out of 40 data points are 3 or less then F(3) = .25. Normality is tested by comparing the sample CDf with the normal CDF with mean and variance equal to the sample mean and variance, i.e., with compare F(x) with ${(a; — p)/s}. Three common tests of normality that compare the sample CDF with the normal CDF are the Anderson-Darling test, the Shapiro-Wilks test, and the Kolmogorov-Smirnov test. All three are available on MINITAB. Actually, MINITAB uses the Ryan-Joiner test which is close to the Shapiro-Wilks test. In MINI-TAB, go to "Stat," then "Basic Statistics," and then "Normality test." You will need to choose one the three tests. The output is a normal plot plus the results of the test. You can re-run the procedure to run the other tests.
The Kolmogorov-Smirnov test is based on the maximum distance between the sample CDF and the normal CDF.
The Shapiro-Wilks test is closely tied to the normal probability plot, since it is based on the correlation between the normal quantiles and the sample quantiles. The correlation measures how close the normal plot is to being a straight line.
3.7. ARE LOG RETURNS REALLY NORMALLY DISTRIBUTED?         25
CDF's and quantiles are closely related. In fact, quantiles are given by the inverse of the CDF function; if a random variable X has CDF F then the půl quantile of X is F~l{p) since P{X < F~l(p)} = F{F_1(p)} = p.
Let's look at daily returns for GE common stock from December 1999 to December 2000. The daily price Pt is taken to be the average of the high and the low for the day. It might have been better to use the closing price for each day. Why?
As can be seen ifn Figure 3.3, the net returns R and the log returns R are very similar. A normal plot is roughly linear.
The log return have a sample mean, standard deviation, skewness, and excess kurtosis of .00014, .0176, -.094, and .094, respectively. The values of the sample skewness and excess kurtosis suggest than the log returns are approximately normally distributed.
From MINITAB, the Kolmogorov-Smirnov, Anderson-Darling, and Ryan-Joiner tests of normality have a p-values of .15, .40, and .10, respectively. Since each p-value exceeds .05, each test would accept the null hypothesis of normality at a = .05.
3.7.1    Do the GE daily returns look like a geometric random walk?
Figure 3.4 shows five independent simulated geometric random walks with the same parameters as the GE daily log returns. Note that the geometric random walks seem to have "patterns" and "momentum" even though they do not. The GE log returns look similar to the geometric random walks.
It is somewhat difficult to distinguish between a random walk and a geometric random walk. Figure 3.5 shows three independent simulated time series.  For each pair, the log price series (a random walk) is plotted on
26
CHAPTER 3. RETURNS: 3/12/01
the left while the price series (a geometric random walk) is plotted on the right. Note the subtle differences between the prices and the log prices.
We prefer the geometric random walk model to the random walk model, because the geometric random walk model is more realistic: the geometric random walk implies non-negative prices and net returns that are at least -1.
This graphical comparison of GE prices to geometric random walks is not strong evidence in favor of the geometric random walk hypothesis. This hypothesis implies that the log returns are mutually independent and, therefore, uncorrelated. Therefore we should check for evidence that the log returns are correlated. If we find no such evidence, then we have more reason to believe the geometric random walk hypothesis.
3.7. ARE LOG RETURNS REALLY NORMALLY DISTRIBUTED?
27
GE, daily - 12/17/99 to 12/15/00
0Ĺ
100   150   200   250
50   100   150   200   250
Normal plot of log returns
150       200       250
0.999 0.997
0.99 0.98
0.95 0.90
i- 0.75
ra  0.50
a.  0.25
0.10 0.05
0.02 0.01
0.003 0.001
'+""   /
-0.04    -0.02        0        0.02 log return
0.04
100       150       200       250
Figure 3.3: GE daily returns. The first plot is the prices. The second and third are the net returns and the log returns. The fourth plot is a normal probability plot of the log returns. The final plot is of the absolute log returns; there is a scatterplot smooth to help show lohether the volatility is constant.
28
CHAPTER 3. RETURNS: 3/12/01
Geometric Random Walk
Geometric Random Walk
0         50        100       150       200       250
50        100       150       200       250
Geometric Random Walk
Geometric Random Walk
0         50        100       150       200       250
50        100       150       200       250
Geometric Random Walk
GE, daily - 12/17/99 to 12/15/00
0    50   100   150   200   250
0    50   100   150   200   250
Figure 3.4: Five independent geometric random walks and GE daily log returns. The geometric random walks have the same expected log return, volatility, and starting point as the GE log returns.
3.7 ARE LOG RETURNS REALLY NORMALLY DISTRIBUTED?
29
Random Walk
Geometric Random Walk
200       400       600       800       1000
200       400       600       800       1000
Random Walk
Geometric Random Walk
200       400       600       800       1000
200       400       600       800       1000
Random Walk
Geometric Random Walk
0         200       400       600       800       1000
200       400       600       800       1000
Figure 3.5: Three independent simulated price series. On left: log prices. On right: prices.
30                                                          CHAPTER 3. RETURNS: 3/12/01
3.8   Portrait of an econometrician, Eugene Fama
This material is taken from Chapter 7 of Capital Ideas by Peter Bernstein.
Fama was born in 1939 in Boston, majored in French at Tufts, and was an outstanding student-athlete.
In college, Fama earned extra money working for Harry Ernst who published a stock market newsletter:
•  Fama's job was to find workable buy and sell signals.
•  Ernst believed that trends, once in place, would continue because of "price momentum."
•  Bernstein writes that "Fama's efforts to develop profitable trading rules were by no means unsuccessful" but "the ones he found worked only on the old data, not on the new."
-  like many other investors Fama found that rules that worked well on "backtests" couldn't beat the market when applied in real time.
-  the market environment would shift or too many people would be using the same strategy
Fama decided to go to business school to learn what was really going on.
•  1964 doctorate at University of Chicago.
•  he thought of going to Harvard but was told that he was "more intellectual than the typical Harvard type"
Fama stayed at Chicago where he taught finance.
•  scholars at Chicago were keenly interested in collecting facts (empirical research!)
3.8. PORTRAIT OF AN ECONOMETRICIAN, EUGENE FAMA            31
at Chicago, James Lorie and Lawrence Fisher were demonstrating what the computer could offer to economic research
-  1964: Lorie and Fisher published a "bombshell" — $1000 invested in 1926 would grow to almost $30,000 in 1960, a growth of over 9% a year (log(30)/35 = .097)
*  Remember: 1929 was the great crash and the ensuing great depression lasted until the US entered WW II in the 40's. This was not exactly a favorable time for investing.
*  These findings increased the interest in stocks as long-term investments
1965: Fama published "The Behavior of Stock Market Prices" (his thesis) in Journal of Business.
-  a less technical version was published in 1966 as "Random Walks in Stock Market Prices" in Financial Analysts Journal.
-  the less technical version was reprinted in Institutional Investor.
Fama's first target was "technical analysis" as practiced by so-called "chartists."
-  technical analysts believe that future prices can be predicted from past patterns
-  Charting stock prices was once fashionable
*  I remember as a young child my grandmother explaining to me how to chart stock prices.
-  Fama: "The chartist must admit that the evidence in favor of the random walk model is both consistent and voluminous, whereas there is precious little published in discussion of rigorous empirical test of various technical theories."
Fama's next target was "fundamental analysis" as practiced by securities analysts.
32
CHAPTER 3. RETURNS: 3/12/01
-  Fundamental analysts examine accounting data, interview management, and look at economic forecasts, interest rates, and political trends.
-  Selecting stocks by fundamental analysis seems to do no better than using a dartboard
-  Of course, good management, favorable economic trends, etc. influence the prices of assets, but Fama claimed that this information is already fully reflected in stock prices by the time we learn it — markets react instantaneously to information.
-  Security analysis is essential in order for stocks to be priced correctly, but ironically it means that there are few discrepancies between actual prices and the values of stocks
-  William Sharpe discussed the antagonism of professional investors to the random walk theories of Fama and other academics. He stated that "Interestingly, professional economists seem to think more highly of professional investors than do other professional investors." (Later we will learn more about Sharpe, the economist who developed the CAPM and winner of the Nobel Prize.)
3.9    Other empirical work related to Fama's
Fama's work was preceded by that of other researchers.
• In 1933 Alfred Cowles published "Can stock market forecasters forecast?" The three-word abstract stated "It is doubtful." The article appeared in the brand-new journal Econometrica. Econometrica is now the leading journal in econometrics.
-  Cowles analyzed the track records of:
* 16 leading financial services that furnished their subscribers with selected lists of common stocks
3.9. OTHER EMPIRICAL WORK RELATED TO FAMA'S                      33
*  purchases and sales of stock by 20 leading fire insurance companies
*  24 publications by financial services, financial weeklies, bank letters, etc.
*  editorials in The Wall Street Journal by William Peter Hamilton, an expounder of the "Dow Theory" due to Charles Dow (the Dow of Dow-Jones). Dow compared stock prices to tides and ocean waves; the tides were a way to explain "price momentum."
-  Cowles found that only 6 of 16 financial services had achieved any measure of success
*  even the best record could not be definitely attributed to skill rather than luck (one needs statistical analysis to reach such a conclusion)
-  In 1944, Cowles published a new study with basically the same conclusions.
• In 1936, Holbrook Working published a paper in The Journal of the American Statistical Association on commodity prices.
-  These were once believed to have rhythms and trends.
-  Working found that he could not distinguish the price changes from an independent sequence of random changes.
-  Perturbed, Working took his data to professional commodity traders.
*  He also showed them graphs of random series.
*  The professionals could not distinguish the random series from real commodity prices.
*  of course, Working's study does not prove anything about stock returns, but it is an interesting example of a financial time series where momentum was thought to exist, but where no evidence of momemtum was found in a statistical analysis.
34
CHAPTER 3. RETURNS: 3/12/01
• Maurice Kendall published the paper "The analysis of economic time series" in the Journal of the Royal Statistical Society in 1953.
-  Kendall wrote "the patterns of events in the price series was much less systematic than is generally believed," and
-  "Investors can, perhaps, make money on the Stock Exchange, but not, apparently by watching price movements and coming in on what looks like a good thing ... But it is unlikely that anything I say or demonstrate will destroy the illusion that the outside investor can make money by playing the markets, so let us leave him to his own devices."
There is no question as to whether one can make money in the stock market. Over the long haul, stocks outperform bonds which outperform savings accounts. The question is rather whether anyone can "beat the market."
3.10   Technical Analysis
"A Random Walk Down Wall Street" was written by Burton G. Malkiel, a professor of economics at Princeton. It is a perennial best seller and has been revised several times. It contains much sensible advice for the small investor. This book is also quite humorous, and the discussion of technical analysts is particularly amusing (unless you are a technical analyst).
Malkiel writes
I, personally, have never known a successful technician, but I have seen the wrecks of several unsuccessful ones. (This is, or course, in terms of following their own technical advice. Commissions from urging customers to act on their recommendations are very lucrative.)
Malkiel describes many of the technical theories, including the Dow Theory, the Filter System, and the Relative-Strength system, which advises
3.11. FUNDAMENTAL ANALYSIS
35
buying stocks that have done well recently. There is also the hemline theory which predicts price changes by the lengths of women's dresses and the super bowl indicator which says that "a victory by an NFL team predicts a bull market, whereas a victory by a former AFL team is bad news for stock-market investors." There is also the odd-lot theory. It is based on the impeccable logic that a person who is always wrong is a reliable source of information—just negate whatever that person says. The believe is that the odd-lot trader is precisely that sort of person. It turns out that the odd-lotter isn't such a dolt after all.
Human nature seems to find randomness very hard to accept. For example, sports fans have many theories of streaks in athletics, e.g., the "hot hand" theory of basketball. Extensive testing of basketball players' performances have show no evidence of streaks beyond what would be expected by pure chance. The point is that streaks will occur by chance, but you cannot make money on the basis of random streaks since you cannot predict if they will continue.
Why are technicians hired? Malkiel has the skeptical view that it is because their theories recommend a lot of trading. "The technicians do not help produce yachts for the customers, but they do help generate the trading that provides yachts for the brokers."
3.11    Fundamental Analysis
The practitioners of fundamental analysis are called security analysts. Their job is basically to predict future earnings of companies, since it is future earnings that ultimately drive prices.
Although few on Wall Street still have much faith in technical analysis, there is much faith in fundamental analysis. However, some academics studying the financial markets data have come to the conclusion that security analysts can do no better than blindfolded monkeys that throw darts at the Wall Street Journal.
36
CHAPTER 3. RETURNS: 3/12/01
3.12   Efficient Markets Hypothesis (EMH)
As evidence accumulated that stock price fluctuated like random walks, economists sought a theory as to why that would be so. In 1965 Paul Samuelson published a paper "Proof that properly anticipated prices fluctuate randomly." The idea is that random walk behavior is due to the very efficiency of the market.
•  A market is information efficient if prices "fully reflect" available information
•  A market is "efficient with respect to an information set" if prices would be unchanged by revealing that information to all participants
- this implies that it is impossible to make economic profits by trading on the basis of this information set
•  This last idea is the key to testing (empirically) the EMH.
3.12.1    Three types of efficiency
weak-form efficiency the information set includes only the history of prices or returns
semi-strong efficiency the information set includes all information that is publically available
strong-form efficiency the information set includes all information known to any market participant
Weak-form efficiency =>■ technical analysis will not make money
Semistrong-form efficiency =>• fundamental analysis will not help the average investor
3.12. EFFICIENT MARKETS HYPOTHESIS (EMH)
37
3.12.2   Testing market efficiency
The research of Fama, Cowles, Working, and Kendall just described tests the various forms of the EMH. Cowles's work supports the semi-strong and perhaps the strong form of the EMH.
In their book Investments, Bodie, Kane, and Marcus discuss some of the issues involved when testing the EMH. One is the magnitude issue. No one believes that markets are perfectly efficient. The small inefficiencies might be important to the manager of a large portfolio. If one is managing a $5 billion portfolio, beating the market by .1% results in a $5 million increase in profit. This is clearly worth achieving. Yet, no statistical test is likely to undercover a .1% inefficiency amidst typical market fluctuations. The S&P 500 index has a 20% standard deviation in annual returns.
Another issue is selection bias. If there is someone who can consistently beat the market, they probably are keeping that a secret. We can only test market efficiency by testing methods of technical or fundamental analysis that are publicized. These may be the ones that don't reveal market inefficiencies.
Another problem is that for any time periods, by chance there will be some investment managers that consistently beat the the market.
•  if 2,000 people each toss a coin 10 times, it is likely that at least one will get 10 heads since
2,000*2"10 = 1.95.
Using the Poisson approximation to the binomial, the probability that no one tosses 10 heads is exp(—1.95) = .14.
If some does toss 10 heads, it would be a mistake to say that that person has skill in tossing heads.
•  Peter Lynch's Magellan Fund outperformed the S&P 500 in 11 of 13 years ending in 1989. Was Lynch a skilled investment manager or
38
CHAPTER 3. RETURNS: 3/12/01
just lucky? (If he really was skilled, then this is evidence against the semi-strong form of the EMH.)
Campbell, Lo, and MacKinlay and Bodie, Kane, and Marcus discuss much of the empirical literature on testing the EMH and give references to the original studies.
Fama was written a review article:
Fama, E. (1970), "Efficent Captial Markets: A Review of Theory and Empirical Work," Journal of Finance, 25,383-417.
There is a sequel as well:
Fama, E., (1991), "Efficient Capital Markets:  II," Journal of Finance, 46,
1575-1618.
The Journal of Finance, as well as many other journals in economics and finance, are available online at JStor:
http: / / www.jstor.org/cgi-bin/jstor /listjournal
However, the most recent five years of these journals are not available online.
Good course project: Read one or more of the studies of the EMH and prepare a report summarizing the work. The two review articles by Fama could help you find studies that would interest you. Using some new financial markets data, try to replicate some of original work.
3.13    Summary
Let Pt be the price of an asset at time t. Then PtjPt-\ is the simple gross return and Rt = Pt/Pt-i — 1 is the simple net return. ("Simple" means one period.) The gross return over the last k periods is 1 + Rt{k) = Pt/Pt_k. Let Pt — log(Pi). The (one-period) log return is rt = pt — Pt-i- Rt ~ rt-
3.13. SUMMARY
39
Log returns are often models as geometric random walks. This model implies that log returns are mutually independent; one cannot predict future returns from past returns. The model also implies that Rt is lognormally distributed.
Empirical research by Eugene Fama, Alfred Cowles, Holbrook Working, Maurice Kendall, and other ecomometricians supports the geometric random walk model.
The geometric random walk suggest the efficient market hypothesis (EMH) that states that all valuable information is reflected in the market prices; price changes occur because of unanticipated new information. There are three forms of the EMH, the weak form, the semi-strong form, and the strong form.
40                                                      CHAPTER 3. RETURNS: 3/12/01
Chapter 4
Univariate Time Series Models:
3/12/01
4.1    Time Series
A univariate time series is a sequence of observations taking over time, for example, a sequence of daily returns on a stock. A multivariate time series is a sequence of vectors of observations taking in time, for example, the sequence of vectors of returns on a fixed set of stocks.
In this chapter, we will study statistical models for univariate times series. These models are widely used in econometrics as well as in other business and OR applications. For example, time series models are routinely used to model the output of simulations.
4.2    Stationary Processes
A process is stationary if its behavior is unchanged by shifts in time. More precisely Xi, X2,... is a weakly stationary process if
•  E(Xi) = /i (a constant) for all i
•  Var(Xj) = a2 (a constant) for all i
41
42              CHAPTER 4.  UNIVARIATE TIME SERIES MODELS: 3/12/01
•  Corr(Xj, Xj) = p(\i — j|) for all i and j
Thus, the mean and variance do not change with time and the correlation between two observations depends only on the time distance between them. For example, if the process is stationary then the correlation between X2 and X5 is the same as the correlation between X7 and Xw, since each pair are separated from each other by three units of time.
p is called the correlation function of the process. Note that p(h) = p(—h). Why?
The covariance between Xt and Xt+h is denoted by j(h). 7(-) is called the autocovariance function. Note that ^(h) = o2p{h) and that 7(0) = a1 since p(0) = 1.
4.2.1   Weak White Noise
White noise is the simplest example of a stationary process. Xi, X2,... is a WN(0,cr2) process (weak white noise process) if
•  E(Xi) = 0 for all i
•  Var(Xj) = o1 (a constant) for all i
•  Corr(Xi, Xj) = 0 for all i ^ j
If in addition, Xi,X2... are independent normal random variables, then the process is called a Gaussian white noise process. (The normal distribution is sometimes called the Gaussian distribution.)
A weak white noise process is weakly stationary with
p(0)   =   1
p(t)   =   0 if t Ý 0-
4.3. AR(1) PROCESSES
43
Properties of Gaussian white noise
E(Xi+t\Xu ...,Xi) = 0 for all t > 1.
(You cannot predict the future, because the future is independent of the past and present.)
To us, "white noise" will mean weak white noise, which includes Gaussian white noise as a special case.
White noise (either weak or Gaussian) is uninteresting in itself but is the building block of important time series models used for economic data.
4.2.2    Estimating parameters of a stationary process
Suppose we observe yi,...,yn from a stationary process. To estimate the mean /i and variance a2 of the process we use the sample mean y and sample variance s2.
To estimate the autocovariance function we use
n—h
i(h) = n~l Yl(Vj+h - y){vj - y)-
To estimate p(-) we use the sample autocorrelation function (SACF) defined as
p{h) = wj-
4.3    AR(1) processes
Let ei, e2,... be WH(0,of). We say that y1; y2,... is an AR(1) process if for some constant parameters ji and <j>
Vt-ß = Hvt-i -v) + et                                 (4.1)
44              CHAPTER 4.  UNIVARIATE TIME SERIES MODELS: 3/12/01
for all t.
It \4>\ < 1, then yi,... is a weakly stationary process. Its mean is /i. Simple algebra shows that (4.1) can be rewritten as
yt = (1 - </>)fj, + 4>yt-i + e.                                (4.2)
Remember the linear regression model, yt = ß0 + ß\xt + et from your statistics courses. (4.2) is just a linear regression model with ß0 = (1 — <^)/i and ßi = ej). It it is assumed that /i = 0, then /?0 = 0 as well. Linear regression with ß0 = 0 is the "linear regression through the origin model." The term autoregression refers to the regression of the process on its own past values.
When \(f)\<l then
oo
yt = m + et + 0et_i + 02et_2 + ••• = //+ ^ 0ftet_A              (4.3)
ft=0
(infinite moving average (MA(oo)) represention).
4.3.1    Properties of a stationary AR(1) process
When \<j>\ < 1 (stationarity), then 1.
E{yt)=n   Mt
2.
2
7(0) = Var(yí) = j^    W
3.
j(h)=Cov(yuyt+h) = T^-    Vt.
4.3. AR(1) PROCESSES
45
4.
p(h) = Con(yt, yt+h) = (ßw    Ví.
It is important to remember that these formulas hold only if \<p\ < 1 and
only for AR(1) processes. If \(f>\ > 1, then the AR(1) process is nonstation-ary, and the mean, variance, and correlation are not constant.
These formulas can be proved using (4.3). For example
/ oo                 \                 oo                          2
Varfo) = Var   £ 4>\-h    = ^ £ ^ = T^V
\h=0             J             /i=0              L       9
Also, for h > 0
(OO                        00
i=0              j=0
Be sure to distinguish between erf which is the variance of the stationary white noise process e1; e2,... and 7(0) which is the variance of the AR(1)
process y1; y2,___We can see from the result above that 7(0) is bigger than
of unless <^ = 0 in which case yt = et.
4.3.2    Nonstationary AR(1) processes
Random Walk
If 4> = 1 then
ž/t = Vt-i + e
and the process is not stationary. This is the random walk process we saw in Chapter 3.
It is easy to see that
Vt = I/o + ei H-----et.
of^l
1-02'
46              CHAPTER 4.  UNIVARIATE TIME SERIES MODELS: 3/12/01
Suppose we start at the process at an arbitrary point y0. Then E(yt \y0) = y0 for all t, which is constant but depends entirely on the arbitrary starting point. Moreover, Var(yt\y0) = to\ which is not stationary but rather increases linearly with time. The increasing variance makes the random walk "wander."
AR(1) processes when 0> 1
When |0 > 1, an AR(1) process has explosive behavior. This can be seen in Figure 4.1. This figure shows simulations of 200 observations from AR(1) processes with various values of 0. The explosive case where 0 = 1.02 clearly is different than the other cases where |0| < 1. However, the case where 0 = 1 is not that much different than 0 = .9 even though the former is non-stationary while the latter is stationary.
The ability to distinguish the three types of AR(1) processes (stationary, random walk, and explosive) depends on the length of the observed series. For short AR(1), it is very difficult to tell if the process is stationary, random walk, or explosive. For example, in Figure 4.2, we see 30 observations from processes with the same parameter values as in Figure 4.1. If we observe the AR processes for longer than 200 observations, then the the behavior of 0 = .9 and 0 = 1 processes would not look as similar as in Figure 4.1. For example, in Figure 4.3 there are 1,000 observations from each of the processes. Now the processes with 0 = .9 and 0 = 1 look dissimilar. The stationary process 0 = .9 continues to return to its mean of zero. The random walk (0 = 1) wanders without tending to return to zero.
Suppose an explosive AR(1) process starts at y0 and has ß = 0. Then
yt = (f>yt-i + et = 4>(yt-2 + et-i) + et = ■ ■ ■ = et + 0et_i + 02et_2 H-------h 0^.
Therefore,
Varfo) = a2(l + 02 + 04 + • • • + 02i) = a2<P A     i     .
4.3. AR(1) PROCESSES
47
This variance increase geometrically fast at t —> oo.
Explosive AR processes are not widely used in econometrics since economic growth is usually not explosive, though these processes may serve as good models of rabbit populations.
/I   _ /I  ,-~ /
Y :|   //  / I jj /(  ." >-"~"-v"
/       Y
jo  o    I
( ~T~     j
>•_"' _•/
/   "~"  I
Y     _,   I
/I ;-"~ _ i / 1/ ,-"~   \ \//\/    •- \
Y        /Y* 1       I     ! ]\      _\    /"\ (" ~----( ~   Y.  )
48             CHAPTER 4.  UNIVARIATE TIME SERIES MODELS: 3/12/01
AR(1):4> = 0.9
AR(1):4> = 0.6
50    100    150    200
50    100    150    200
AR(1):4> = 0.2
AR(1):4> = -0.9
50    100    150    200
0     50    100    150    200
AR(1): 4> = 1
AR(1):4>= 1.02
200
200
Figure 4.1: Simulations of 200 observations from AR(1) processes with various values of (j) and // = 0. The white noise "residual" or "error" process e1; e2,... is the same for all six AR(1) processes.
4.3. AR(1) PROCESSES
49
AR(1):<|> = 0.9
AR(1):<|> = 0.6
AR(1):<|> = 0.2
AR(1):<|> = -0.9
AR(1):<|» = 1
AR(1):<|> = 1.02
Figure 4.2: Simulations of 30 observation from AR(1) processes with various values of (j) and \i = 0. The white noise "residual" or "error" process ei,e2,...is the same for all six AR(1) processes.
50             CHAPTER 4.  UNIVARIATE TIME SERIES MODELS: 3/12/01
AR(1):4> = 0.9
AR(1):4> = 0.6
200   400  600   800  1000
0   200   400   600   800  1000
AR(1):4> = 0.2
AR(1):4> = -0.9
200   400  600   800  1000
800  1000
AR(1): (f> = 1
AR(1):4> = 1.02
0   200   400  600   800  1000
0   200   400   600   800  1000
Figure 4.3: Simulations of 1000 observation from AR(1) processes with various values of (j) and p, = 0. The white noise "residual" or "error" process e1; e2,... is the same for all six AR(1) processes.
4.3. AR(1) PROCESSES
51
4.3.3    Estimation
Depending upon the application, one will want to fit an AR(1) to either one of the variables in the raw data or a variable that has been constructed from the raw data. In finance applications, one often has the prices as the raw data but wants to fit an AR(1) to the log returns. To create the log returns, one first log-transforms the prices and then differences the log prices. MINITAB and SAS both have functions to do differencing. For example, in MINITAB, go to the "Stat" menu, then the "Time Series" menu, and then select "differences." Once a variable containing the log returns has been created, one then can fit an AR(1) model to it.
Let's assume we have a time series yu..., yn and we want to fit an AR(1) model to this series. Since an AR(1) model is a linear regression model, it can be analyzed using linear regression software. One creates a lagged variable in yt and uses this as the "^-variable" in the regression. MINITAB and SAS both support lagging. For example, in MINITAB, go to the "Stat" menu, then the "Time Series" menu, and then select "lag."
The least squares estimation of p and /i minimize
Y, [{yt -ß}- {4>{vt-i - ß)}] ■
í=2
If errors (ei,..., en) are Gaussian white noise then the least-squares estimate is also the MLE.
Moreover, both MINITAB or SAS have special procedure for fitting AR models.
In MINITAB, go the the "Stat" menu, then the "Time Series" menu, and then choose ARIMA. Use 1 autoregressive parameter, 0 differencing, and 0 moving average parameters.
In SAS, use the "AUTOREG" or the "ARIMA" procedure.
Once (f) has been estimated, one can calculate the residuals, ?i, e2,..., en,
52              CHAPTER 4.  UNIVARIATE TIME SERIES MODELS: 3/12/01
defined by
et = yt - fi - 4>(yt-i -fi).
The residuals estimate ei, e2,..., en and can be used to check the assumption that yi,y2,...,ynisan AR(1) process; any autocorrelation in the residuals is evidence that against the assumption of an AR(1) process.
To test for residual autocorrelation one can use the "test bounds" provided by MINITAB's or SAS's autocorrelation plots. One can also use the Ljung-Box test that simultaneously tests that all autocorrelations up to a specified lag are zero.
Example: GE daily returns
Autoregressive models can be analyzed in both MINITAB and SAS.
The MINITAB output was obtained by running MINITAB interactively. Here is the MINITAB output.
4.3. AR(1) PROCESSES
2/2/01   10:45:25  AM
Welcome to Minitab, press Fl for help.
Retrieving worksheet from file: C:\COURSES\OR473\MINITAB\GE_DAILY.MTW
# Worksheet was saved on Wed Jan 10 2001
Results for: GE_DAILY.MTW
ARIMA Model: logR
ARIMA model for logR
Estimates at each iteration
Iteration		SSE	Parameters		
0	2.	.11832	0.100	0.	.090
1	0.	.12912	0.228	0.	.015
2	0.	.07377	0.233	0.	.001
3	0.	.07360	0.230	0.	.000
4	0.	.07360	0.230	-0.	.000
5	0.	.07360	0.230	-0.	.000
Relative change in each estimate less than     0.0010
Final Estimates of Parameters
Type         Coef     SE Coef        T                  P
AR   1      0.2299      0.0621      3.70         0.000
Constant -0.000031    0.001081     -0.03         0.977 Mean     -0.000040    0.001403
Number of observations:  252
Residuals:    SS =  0.0735911  (backforecasts excluded) MS = 0.0002944  DF = 250
Modified Box-Pierce (Ljung-Box) Chi-Square   statistic
Lag               12        24        36                  48
Chi-Square      23.0      33.6      47.1             78.6
DF                10        22        34                  46
P-Value        0.011     0.054     0.066            0.002
54              CHAPTER 4.  UNIVARIATE TIME SERIES MODELS: 3/12/01
The SAS output comes from running the following program.
options linesize = 72 ;
comment Restrict the linesize to 72 characters ;
data ge ;  comment Start the data step ;
infile 'c:\courses\or473\data\ge.dat' ;
comment Specify the input data set ;
input close ;
comment Create a new variable ;
D_p = dif(close);
comment Take first differences ;
logP = log(close) ;
logR = dif(logP) ;
comment logR = log returns ;
run ;
title 'GE - Daily prices, Dec 17, 1999 to Dec 15, 2000' ;
title2 'AR(l)' ;
proc autoreg ;
model logR =/nlag = 1 ;
run ;
4.3. AR(1) PROCESSES Here is the SAS output.
GE - Daily prices, Dec 17, 1999 to Dec 15, 2000
AR(1)    10:32 Friday, February 2, 20
The AUTOREG Procedure
Dependent Variable    logR
Ordinary Least Squares Estimates
SSE                 0.07762133    DFE                                                251
MSE                  0.0003092    Root MSE                             0.01759
SBC                 -1316.8318    AIC                                 -1320.3612
Regress R-Square        0.0000    Total R-Square                  0.0000 Durbin-Watson          1.5299
Standard                                    Approx
Variable       DF    Estimate       Error    t Value    Pr > |t|
Intercept       1    -0.000011     0.001108      -0.01      0.9917
Estimates of Autocorrelations Lag    Covariance     Correlation
0      0.000308        1.000000
1      0.000069        0.225457
Estimates  of  Autocorrelations Lag         -198765432101234567891
0        I                                                                           1********************1
1     I                                             I * * * * *                                  I
Preliminary MSE    0.000292
Estimates of Autoregressive Parameters
Standard Lag    Coefficient                        Error    t Value
1       -0.225457        0.061617      -3.66 GE - Daily prices, Dec 17, 1999 to Dec 15, 2000
AR(1)    10:32 Friday, February 2, 20
56              CHAPTER 4.  UNIVARIATE TIME SERIES MODELS: 3/12/01
The AUTOREG Procedure
Yule-Walker Estimates
SSE                                     0.07359998         DFE                                                250
MSE                                       0.0002944         Root MSE                             0.01716
SBC                                     -1324.6559         AIC                                 -1331.7148
Regress R-Square                  0.0000         Total R-Square                  0.0518
Durbin-Watson                        1.9326
Standard                                    Approx
Variable       DF    Estimate       Error    t Value    Pr > |t|
Intercept        1    -0.000040     0.001394      -0.03      0.9773
From MINITAB we see that (f> = .2299 and the estimated standard deviation of 4> is 0.0621. The t-value for testing H0 : 0 = 0 versus Hi : 4> =/ 0 is .2299/.0621 = 3.70 and the p-value is .000 (zero to three decimals). Since the p-value is so small, we reject the null hypothesis.
[Note: Recall from your statistics course that small p-values are significant; we reject the null hypothesis if the p-value is less than a, e.g., less than .05.]
The null hypothesis is that the log returns are white noise and the alternative is that they are correlated. Thus, we have evidence against the geometric random walk hypothesis. However, 4> = .2299 is not large. Since p{h) = 4>h, the correlation between successive log returns is .2299 and the squared correlation is only .0528 — only about five percent of the variation in a log return can be predicted by the previous days return.
We have seen that an AR(1) process fits the GE log returns better than a white noise model. Of course, this is not proof that the AR(1) fits these data, only that it fits better than a white noise model. To check that the AR(1) fits well, one looks at the sample autocorrelation function (SACF) of the residuals. A plot of the residual SACF can be requested when using either MINITAB or SAS.
4.3. AR(1) PROCESSES
57
The SACF of the residuals from the GE daily log returns shows high negative autocorrelation at lag 6; p(6) is outside the test limits so is "significant" at a = .05; see Figure 4.4. This is disturbing.
Qa Daily   log returns
ACF of Residuals for logR
(with 95% confidence limits for the autocorrelations)
o
4-» «
1_
O ü o
1.0
0.8
0.6
0.4
0.2
0.0 -0.2 -0.4--0.6--0.8 -1.0
JjL
T^
~r~
10
15
_±J___L
i     I
~r~
20
25
—I------
30
Lag
~r~
35
..II
1 I I       |
40
45
50
55
60
Figure 4.4: SACF of residuals from an AR(1) fit to the GE daily log returns. Notice the large negative residual autocorrelation at lag 6. This is a sign that the AR(1) model does not fit well.
Moreover, the more conservate Ljung-Box "simultaneous" test that p(l) = ■ ■ ■ p(12) = 0 has p = .011. Since the AR(1) model does not fit well, one might consider more complex models. These will be discussed in the following sections.
The SAS estimate of ó is —.2254. SAS uses the model
Vt
-<\>yt-\ + et
58              CHAPTER 4.  UNIVARIATE TIME SERIES MODELS: 3/12/01
so SAS's <j) is the negative of <j> as we, and MINITAB, define it. The difference, .2299 versus .2254, between MINITAB and SAS is due to slight variation in the estimation algorithm.
We can also estimate /i and test that ß is zero. From the MINITAB output, we see fi is nearly zero, the t-value for testing that /i is zero is very small while the p-value is near one. Remember that small values of the p-value are significant; since the p-value is large we accept the null hypothesis that /i is zero.
4.4   AR(p) models
yt is an AR(p) process if
(ž/ť - ß) = 4>i{yt-i - aO + hivt-2 - m) h-------1- (f>p(yt-p -n) + et
where ei,..., en is WN(0, of).
This is a multiple linear regression model with lagged values of the time series as the "x-variables." The model can be reexpressed as
yt = ßo + <\>Vt-\ + • • • + <PpVt-p + et, where ß0 = {1 - (0i + ... + <f>p)}ii.
The least-squares estimator minimizes
n
Y. {vt - (ßo + <ßiyt-i + ■■■ + (f>Pyt-p)}2-t=p+l
The least-squares estimator can be calculated using a multiple linear regression program but one must create "x-variables" by lagging the time series with lags 1 throught p. It is easier to use the ARIMA command in MINITAB or SAS or SAS's AUTOREG procedure; these procedures do the lagging automatically.
4.5. MOVING AVERAGE (MA) PROCESSES
59
4.4.1    Example: GE daily returns
The SAS program shown above was rerun with
model  logR =/nlag =   1
replaced by
model logR =/nlag = 6 .
The output is on the course's web site as "GE DAILY, AR(6) (SAS)."
The autoregression coefficients (the 4>i) are "significant" at lags 1 and 6 but not at lags 2 through 5. Here "significant" means at a = .05 which corresponds to absolute t-value bigger than 2. MINITAB will not allow p > 5 but SAS does not have such aconstraint.
4.5   Moving Average (MA) Processes
4.5.1    MA(1) processes
The moving average process of order [MA(1)] is
yt- fi = et-0et-i, where as before the e/s are WH(0, of).
One can show that
E{yt) = m>
Var(yt) = aUl + 62),
7(1) = Oal
7(/i) = 0ii\h\ > 1,
and
p(h) = 0 if \h\ > 1.
60              CHAPTER 4.  UNIVARIATE TIME SERIES MODELS: 3/12/01
4.5.2    General MA processes
The MA(q) process is
Vt — M = et — Olfy-l — ■ ■ ■ — Qqtt-q-
One can show that ^{h) = 0 and p{h) = 0 if \h\ > q.
4.6   ARIMA Processes
Stationary time series with complex autocorrelation behavior are better modeled by mixed autoregressive and moving average (ARMA) processes than by either a pure AR or pure MA process. ARIMA (autoregressive, integrated, moving average) processes are based on ARMA processes and are models for nonstationary time series.
ARIMA processes are more easily described if we introduce the "backwards" operator, B.
4.6.1    The backwards operator
The backwards operator B is defined by
B yt = yt_!
and, more generally,
Bkyt = yt-k. Note that B c = c for any constant c since a constant does not change with time.
4.6.2    ARMA Processes
The ARMA(p, q) process satisfies the equation
(1-tjiB---------<ßpBp)(yt -ti) = (l-91B-...- 6qBq)et.
A white noise process is ARMA(0,0).
4.6. ARIMA PROCESSES
61
4.6.3    The differencing operator
The differencing operator is A = 1 — B so that
Ayt = yt- Byt = yt- yt_x.
Thus, differencing a time series produces a new time series consisting of the changes in the original series. For example, if pt = log(Pt) is the log price, then the log return is
r t = Aft.
Differencing can be iterated. For example,
A2yt = A(Ayt)   =   A(yt - yt_x) = (yt - yt_x) - (yt_x - yt_2) =   yt - 2yt_1 + yt_2 ■
4.6.4    From ARMA processes to ARIMA process
Often the first or second differences of nonstationary time series are stationary. For example, the first differences of random walk (nonstationary) are white noise (stationary).
A time series yt is said to by ARIMA(p, d, q) if Adyt is ARMA(p, q). Also, if log returns (rt) on an asset are ARMA(p, q), then the log prices (pt) are ARIMA(p,l,i).
The ARIMA procedures in MINITAB and SAS allow one to specify p, d, and q.
Notice that an ARIMA(p, 0, q) model is the same as an ARMA(p, q) model. ARIMA((p, 0, 0), ARMA(p, 0), and AR(p) models are the same. Also, ARI-MA(0,0, q), ARMA(0, q), and MA(q) models are the same. A random walk is an ARIMA(0,1,0) model. Why?
The inverse of differencing is "integrating." The integral of a process yt is the process wt where
wt = wto + yto + yto+1 H-----yt.
62              CHAPTER 4.  UNIVARIATE TIME SERIES MODELS: 3/12/01
where t0 is an arbitrary starting time point and wto is the starting value of the wt process.
Figure 4.5 shows an AR(1), its "integral" and its "second integral," meaning the integral of its integral.
4.6. ARIMA PROCESSES
63
ARIMA(1,0,0) with \l = 0 and ty = 0.4
0         50        100       150       200       250       300       350       400
ARIMA(1,1,0)
0    50   100  150  200  250  300  350  400
ARIMA(1,2,0)
0    50   100  150  200  250  300  350  400
Figure 4.5: The top plot is of an AR(1) process with \i = 0 and <j> = 0.4. The middle and bottom plots are, respectively, the integral and second integral of this AR(1) process.
64              CHAPTER 4.  UNIVARIATE TIME SERIES MODELS: 3/12/01
4.7   Model Selection
Once the parameters p, d7 and q of an ARIMA process have been selected, the AR and MA coefficients can be estimated by maximum likelihood. But how do we choose p, d7 and q?
Generally, d is either 0,1, or 2 and is chosen by looking at the SACF of yt,
Ayt, and A2yt.
A sign that a process is nonstationary is that its SACF decays to zero very slowly. If this is true of yt then the original series is nonstationary and should be differenced at least once.
If the SACF of Ayt looks stationary then we use d = 1. Otherwise, we look at the SACF of A2yt; if this looks stationary we use d = 2.
I have never seen a real time series where A2yt did not look stationary, but if one were encountered then d > 2 would be used.
Once d has been chosen, we know that we will fit an ARMA(p, q) process to Adytr but we still need to select p and q. This can be done by comparing various choices of p and q by some criterion that measures how well a model fits the data.
4.7.1    AIC and SBC
AIC and SBC are model selection criteria based on the log-likelihood.
Akaike's information criterion (AIC) is defined as
-2\og{L) + 2(p + q), where L is the likelihood evaluated at the MLE. Schwarz's Bayesian Criterion (SBC) is also called the Bayesian Information
4.7. MODEL SELECTION
65
Criterion (BIC) and is defined as
-21og(L) + 21og(n)(p + <7),
where n is the length of the time series.
The "best" model according to either criterion is the model that minimizes that criterion.
Either criteria will tend to select models with large values of the likelihood; this makes perfect sense since a large value of L means that the observed data are likely under that model.
The term 2(p + q) in AIC or log(n)(p + q) is a penalty on having too many parameters. Therefore, AIC and SBC both try to tradeoff a good fit to the data measured by L with the desire to use as few parameters as possible.
Note that log(n) > 2 if n > 8. Since most time series are much longer than 8, SBC penalizes p + q more than AIC. Therefore, AIC will tend to choose models with more parameters than SBC. Compared to SBC, with AIC the tradeoff is more in favor of a large value of L than a small value of p + q.
This difference between AIC and SBC is due to the way they were designed. AIC is designed to select the model that will predict best and is less concerned with having a few too many parameters. SBC is designed to select the true values of p and q exactly.
In practice the best AIC model is usually close to the best SBC model; often they are the same model.
Two model can be compared by likelihood ratio testing when one model is "bigger" than the other. Therefore, AIC and SBC are closely connected with likelihood ratio tests.
66              CHAPTER 4.  UNIVARIATE TIME SERIES MODELS: 3/12/01
4.7.2    Stepwise regression applied to AR processes
Stepwise regression is a way of looking at a variety of regression models to see which ones fit the data well. You may encounter stepwise regression if you take an advanced regression course. In backwards regression, sometimes called backstepping, one starts with all possible x-variables and eliminates them one at time until all remaining variables are "significant" by some criterion.
Stepwise regression can, of course, be applied to AR models since these are a type of multiple regression model. SAS's AUTOREG procedure allows backstepping as an option.
4.7. MODEL SELECTION
67
The following SAS program starts with an AR(6) model and backsteps.
options linesize = 72 ;
comment Restrict the linesize to 72 characters ;
data ge ;  comment Start the data step ;
infile 'c:\courses\or473\data\ge_quart.dat' ;
comment Specify the input data set ;
input close ;
D_p = dif(close);
comment Take first differences ;
logP = log(close) ;
logR = dif(logP) ;
comment logR = log returns ;
run ;
title 'GE - Quarterly closing prices, Dec 1900 to Dec 2000' ;
title2 'AR(6) with backstepping' ;
proc autoreg ;
model logR =/nlag = 6 backstep ;
run ;
68
CHAPTER 4.  UNIVARIATE TIME SERIES MODELS: 3/12/01
Here is the SAS output:
GE - Quarterly closing prices, Dec 1900 to Dec 2000                      1
AR(6) with backstepping
23:32 Tuesday, January 30, 2001
The AUTOREG Procedure
Dependent Variable
logR
Ordinary Least Squares Estimates
SSE
MSE
SBC
Regress R-Square
Durbin-Watson
0.15125546 0.00398
-102.20076 0.0000 2.0710
D FE
Root MSE
AIC
Total R-Square
0.06309
-103.86432
0.0000
Variable Intercept
DF 1
Estimate 0.0627
Standard                                    Approx
Error    t Value    Pr > |t|
0.0101
6.21
<.0001
Estimates of Autocorrelations
g	Covariance	Correlation
0	0.00388	1.000000
i	-0.00014	-0.036627
2	-0.00023	-0.059114
3	0.00152	0.392878
4	-0.00014	-0.035792
5	-0.00075	-0.193269
6	0.000337	0.086919
Lag
Estimates   of  Autocorrelations -198765432101234567
********************
********
GE - Quarterly closing prices, Dec 1900 to Dec 2000
2
4.7. MODEL SELECTION
AR(6) with backstepping
23:32 Tuesday, January 30, 2
The AUTOREG Procedure
Backward Elimination of Autoregressive Terms
Lag
4 2 1 6 5
Estimate	t Value	Pr > |t|
0.020648	0.12	0.9058
0.023292	-0.14	0.8921
0.035577	0.23	0.8226
0.082465	0.50	0.6215
0.170641	1.13	0.2655
Preliminary MSE     0.0032E
Estimates of Autoregressive Parameters
Standard Lag     Coefficient                        Error    t Value
3       -0.392878        0.151180      -2.60
Expected Autocorrelations
Lag    Autocorr
0       1.0000
1       0.0000
2       0.0000
3       0.3929
Yule-Walker Estimates
SSE                 0.12476731    DFE                                                  37
MSE                     0.00337    Root MSE                             0.05807
SBC                  -105.5425    AIC                                 -108.86962
Regress R-Square        0.0000    Total R-Square                  0.1751 Durbin-Watson          1.982 0
GE - Quarterly closing prices, Dec 1900 to Dec 2000 AR(6) with backstepping
23:32 Tuesday, January 30, 2
70              CHAPTER 4.  UNIVARIATE TIME SERIES MODELS: 3/12/01
The AUTOREG Procedure
			Standard		Approx
Variable	DF	Estimate	Error	t Value	Pr > |t|
Intercept	1	0.0632 Exp	0.0146 ected	4.33	0.0001
		Autocorrelations			
		Lag	Autocorr		
		0	1.0000		
		1	0.0000		
		2	0.0000		
		3	0.3929		
4.7.3    Using ARIMA in SAS: Cree data
Daily returns of Cree from December 1999 to December 2000 are shown in Figure 4.6.
4.7. MODEL SELECTION
71
CREE, daily - 12/17/99 to 12/15/00
0Ĺ
0    50   100   150   200   250
-0.2
250
250
0.999 0.997
0.99 0.98
0.95 0.90
ž- 0.75
ra 0.50
a.  0.25
0.10 0.05
0.02 0.01
0.003 0.001
Normal plot of log returns
'		'         .'■<+
		'■■     ■++'■
		.........-■--•,£■.--
		#'+
		
		
		
-10
0 log return
10
100        150       200       250
Figure 4.6: Cree daily returns. The first plot is the prices. The second and third are the net returns and the log returns. The fourth plot is a normal probability plot of the log returns. The final plot is of the absolute log returns; there is a scatterplot smooth to help show lohether the volatility is constant.
72              CHAPTER 4.  UNIVARIATE TIME SERIES MODELS: 3/12/01
In this example, we will illustrate fitting an ARMA model in SAS. We use daily log returns on Cree from December 1999 to December 2000. The SAS program is:
options linesize = 72 ;
data cree ;
infile 'U:\courses\473\data\cree_daily.dat ' ;
input month day year volume high low close ;
logP = log(close) ;
logR = dif(logP) ;
run ;
title Cree daily log returns ;
title2 ARMA(1,1) ; proc arima ; identify var=logR ; estimate p=l q=l ; run ;
4.7. MODEL SELECTION
73
The "identify" statement specifies the input series and tells SAS to compute the SACK It can also be used to specify the amount of differencing; "identify var=logP(l) ;" would tell SAS to use the first differences of the log prices as input.
Here is the SAS output. The result is that the Cree log returns appear to be white noise since <j>i (denoted by AR1,1 in SAS), Q\ (denoted by MA1,1) and /i not significantly different from zero.
Cree daily log returns                                                    1
AEMA(1,1)  15:18 Friday, February 2, 2001
The ARIMA Procedure
Name of Variable = logR
Mean of Working Series -0.00071 Standard Deviation 0.067473 Number of Observations        2 52
Autocorrelations
Lag	Covariance	Corre	.ation
0	0.0045526	1	00000
1	0.00031398	0	06897
2	-0.0000160	-	00351
3	-5.5958E-6	-	00123
4	-0.0002213	-	04862
5	0.00002748	0	00604
6	-0.0000779	-	01712
7	-0.0000207	-	00454
8	-0.0003281	-	07207
9	0.00015664	0	03441
10	0.00057077	0	12537
11	0.00023632	0	05191
12	-0.0003475	-	07633
13	-0.0001348	-	02961
14	-0.0005590	-	12278
15	0.00023425	0	05145
16	-0.0001021	-	02242
17	-0.0000582	-	01278
18	-0.0007147	-	15699
19	0.00006314	0	01387
20	-0.0000466	-	01024
21	-0.0001681	-	03692
22	-0.0001439	-	03161
23	-0.0002135	_	04690
74             CHAPTER 4.  UNIVARIATE TIME SERIES MODELS: 3/12/01
24 0.00007502     0.01648 |                                      .  |
"." marks two standard errors
Lag
Cree daily log returns
ARMA(1,1)  15:18 Friday, February 2,
The ARIMA Procedure
Inverse Autocorrelations
Correlation         -198765432101234567891
2 2001
1	-0	11452    |
2	0	06356    |
3	-0	08905    |
4	0	12788    |
5	-0	04576    |
6	0	07209    |
7	-0	06322    |
8	0	09828    |
9	-0	04639    |
10	-0	05006    |
11	-0	09283    |
12	0	10049    |
13	-0	02141    |
14	0	15284    |
15	-0	09318    |
16	0	05864    |
17	-0	02983    |
18	0	16300    |
19	-0	05602    |
20	0	05126    |
21	0	01713    |
22	0	04942    |
23	0	00197    |
24	-0	01745    |
Partial  Autocorrelations
Lag
Correlation
765432101234567
1	0.06897    |
2	-0.00830    |
3	-0.00041    |
4	-0.04877    |
5	0.01287    |
6	-0.01916    |
7	-0.00183    |
8	-0.07486    1
4.7. MODEL SELECTION
9	0.04628           1
0	0.11841           1
1	0.03697           1
2	-0.09207           1
3	-0.01457           1
4	-0.11485           1
Cree daily log returns
AEMA(1,1)  15:18 Friday, February 2, 2 0
The ARIMA Procedure
Partial Autocorrelations
Lag
Correlation
765432101234567
15	0.07540           |
16	-0.04385           |
17	0.00180           |
18	-0.16594            |
19	0.05041           |
20	-0.06240           |
21	-0.02732            |
22	-0.05643            |
23	0.00111           |
24	0.01957           |
Autocorrelation Check for White Noise
To	Chi-		Pr   >							
Lag	Square	DF	ChiSq				T\ ~\ "\ i~ r^r^ r^~\^~\^r^~\ ~i 1- ~\ r^i~\ ^-|			
							riU.ULJ^LJJ.J.tSXclUX LJllo			
6	1.91	6	0.9276	0	069	-0	004   -0.001   -0.049      0	006	-0	017
12	10.02	12	0.6143	-0	005	-0	072      0.034      0.125      0	052	-0	076
18	21.95	18	0.2344	-0	030	-0	123      0.051   -0.022   -0	013	-0	157
24	23.37	24	0.4978	0	014	-0	010   -0.037   -0.032   -0	047	0	016
Conditional Least Squares Estimation
Parameter	Estimate	Error	t	va:	.ue	Pr   >   |t|	Lag
MU	-0.0006814	0.0045317		-0	15	0.8806	0
MA1,1	-0.18767	0.88710		-0	21	0.8326	1
AR1,1	-0.11768	0.89670		-0	13	0.8957	1
Constant Estimate             -0.00076
Variance Estimate             0.004585
Std Error Estimate           0.067712
AIC                                            -638.889
76
CHAPTER 4.  UNIVARIATE TIME SERIES MODELS: 3/12/01
SBC                                           -628.301
Number of Residuals        252
* AIC and SBC do not include log determinant.
Cree daily log returns                                                    4
ARMA(1,1)  15:18 Friday, February 2, 2001
The ARIMA Procedure
Correlations of Parameter Estimates
Parameter       MU    MA1,1    AR1,1
	1.000	0.005	0.006
1	0.005	1.000	0.998
1	0.006	0.998	1.000
Autocorrelation Check of Residuals
6	0.	.75	4	0.	.9444	0.	.000	0.	.004	0.	.001	-0.	.049	0.	.010	-0.	.019
12	8.	.54	10	0.	.5761	0.	.003	-0.	.075	0.	.032	0.	.118	0.	.050	-0.	.079
18	21	.12	16	0.	.1741	-0.	.014	-0.	.127	0.	.062	-0.	.029	0.	.001	-0.	.159
24	22	.48	22	0.	.4314	0.	.025	-0.	.011	-0.	.035	-0.	.026	-0.	.045	0.	.016
30	32.	.65	28	0.	.2490	0.	.054	0.	.127	0.	.102	-0.	.023	-0.	.029	0.	.070
36	38.	.16	34	0.	.2858	-0.	.055	-0.	.038	-0.	.026	-0.	.079	0.	.021	0.	.083
42	47.	.23	40	0.	.2009	-0.	.061	-0.	.092	-0.	.004	-0.	.028	-0.	.118	-0.	.055
48	49.	.15	46	0.	.3480	-0.	.032	-0.	.011	-0.	.004	0.	.027	0.	.054	-0.	.036
Model for variable logR Estimated Mean    -0.00068
Autoregressive Factors Factor 1:  1 + 0.11768 B**(l)
Moving Average Factors
Factor 1:  1 + 0.18767 B**(l)
4.8. EXAMPLE: THREE-MONTH TREASURY BILL RATES                  77
4.8   Example: Three-month Treasury bill rates
The efficient market hypothesis predicts that log returns will be white noise, and our empirical results are that log returns have little autocorrelation even if they are not exactly white noise. Other financial time series do have substantial autocorrelation, as is shown in this example.
The time series in this example is monthly interest rates on three-month US Treasury bills from December 1950 until February 1996. The data come from Example 16.1 of Pindyck and Rubin (1998), Econometric Models and Economic Forecasts. The rates are plotted in Figure 4.7. The first differences look somewhat stationary, and we will fit ARMA models to the first differences.
78
CHAPTER 4.  UNIVARIATE TIME SERIES MODELS: 3/12/01
3 month T—bills
0          200        400        600
month since Jan 1950
0          200        400        600
month since Jan 1950
Figure 4.7: Time series plot of 3 month Treasury bill rates, plot of first differences, and sample autocorrelation function of first differences. Monthly values from January 1950 until March 1996.
4.8. EXAMPLE: THREE-MONTH TREASURY BILL RATES
79
First we try fitting an AR(10) model with ARIMA. Here is the SAS program. Note statement "identify var=z(l);" specifies that the model should be fit to the first differences of the variance z; z is the interest rate.
options linesize = 72 ;
data ratel ;
infile 'c:\courses\or473\data\fygn.dat' ;
input date $ z;
title 'Three month treasury bills' ;
title2 'ARIMA model - to first differences' ;
proc arima ;
identify var=z(l) ;
estimate p=10 plot;
run ;
Here is the SAS output.
Three month treasury bills                                                1
ARIMA model - to first differences
14:41 Saturday, February 3, 2001
The ARIMA Procedure
Name of Variable = z
Period(s) of Differencing                                                          1
Mean of Working Series                                                  0.006986
Standard Deviation                                                          0.494103
Number of Observations                                                             554
Observation(s) eliminated by  differencing          1
Autocorrelations Lag  Covariance  Correlation  -198765432101234567891
0     0.244138     1.00000 |                                                                     | ********************\
1   0.067690     0.27726 |                                        . |******                                |
2  -0.026212     -.10736 |                                        **| .                                        |
80
CHAPTER 4.  UNIVARIATE TIME SERIES MODELS: 3/12/01
3	-0.022360	-	09159 |
4	-0.0091143	-	03733 |
5	0.011399	0	04669 |
6	-0.045339	-	18571 |
7	-0.047987	-	19656 |
8	0.022734	0	09312 |
9	0.047441	0	19432 |
10	0.014282	0	05850 |
11	-0.0017082	-	00700 |
12	-0.022600	-	09257 |
13	0.0087638	0	03590 |
14	0.038426	0	15739 |
15	-0.024885	-	10193 |
16	0.0012018	0	00492 |
17	0.020048	0	08212 |
18	0.019043	0	07800 |
19	-0.0081609	-	03343 |
20	-0.056547	-	23162 |
21	-0.038945	-	15952 |
22	-0.0035774	-	01465 |
23	-0.0018465	-	00756 |
24	-0.0080554	-	03300 |
* * * *
* * * *
* * * * * * * *
* * * *

marks two standard errors
Lag
Correlation
1	-0	38226    |
2	0	17388    |
3	-0	03944    |
4	0	09813    |
5	-0	15403    |
6	0	16052    |
7	0	03458    |
8	-0	07833    |
9	-0	01029    |
0	-0	01264    |
1	-0	07557    |
2	-0	00166    |
3	0	12786    |
4	-0	22060    |
5	0	19060    |
6	-0	10958    1
Three month treasury bills                                                2
ARIMA model - to first differences
14:41 Saturday, February 3, 2001
The ARIMA Procedure
Inverse Autocorrelations
8765432101234567891
********
4.8. EXAMPLE: THREE-MONTH TREASURY BILL RATES
81
17	0.03736    |
18	-0.05356    |
19	0.07262    |
20	0.03663    |
21	0.03580    |
22	0.02890    |
23	0.00507    |
24	0.00765    |
Lag
Partial  Autocorrelations Correlation         -198765432101234567
1	0	27726    |
2	-0	19958    |
3	-0	00061    |
4	-0	03172    |
5	0	05661    |
6	-0	25850    |
7	-0	05221    |
8	0	14071    |
9	0	08439    |
0	-0	04699    |
1	0	06148    |
2	-0	11389    |
3	0	05561    |
4	0	13716    |
* * *
* *
Three month treasury bills ARIMA model - to first differences
14:41 Saturday,
February  3,   2 0 01
The  ARIMA Procedure
Partial  Autocorrelations
ag	Correlation
15	-0.13273    |
16	0.15741    |
17	0.02301    |
18	0.01777    |
19	-0.13330    |
20	-0.08447    |
21	-0.07718    |
22	-0.04553    |
23	-0.01479    |
24	-0.01071    1
765432101234567
* * *
Autocorrelation Check for White Noise
82             CHAPTER 4.  UNIVARIATE TIME SERIES MODELS: 3/12/01
To	Chi-		Pr >							
Lag	Square	DF	ChiSq				T\ ~\ "\ i~ r^r^ r^~\^~\^r^~\  ~i 1- ~\ r^i~\ ^-|			
							ŕiU. ULJL, LJ J. J. ti X d L, J. LJllo			
6	75.33	6	<.0001	0	.277	-0.	.107 -0.092 -0.037  0.	.047	-0.	.186
12	130.15	12	<.0001	-0.	.197	0.	.093  0.194  0.059 -0.	.007	-0.	.093
18	158.33	18	<.0001	0.	.036	0.	.157 -0.102  0.005  0.	.082	0.	.078
24	205.42	24	<.0001	-0.	.033	-0.	.232 -0.160 -0.015 -0.	.008	-0.	.033
Conditional Least Squares Estimation
Parameter	
MU	
AR1,	,1
AR1,	,2
AR1,	,3
AR1,	,4
AR1,	,5
AR1,	,6
AR1,	,7
AR1,	,8
AR1,	,9
AR1,	,10
Estimate
0.0071463 0.33494
-0.16456 0.01712
-0.10901 0.14252
-0.21560
-0.08347 0.10382 0.10007
-0.04723
Standard			Approx
Error	t Value		Pr > |t|
0.02056	0.	.35	0.7283
0.04287	7	.81	<.0001
0.04501	-3.	.66	0.0003
0.04535	0.	.38	0.7060
0.04522	-2.	.41	0.0163
0.04451	3.	.20	0.0014
0.04451	-4	.84	<.0001
0.04522	-1	.85	0.0655
0.04536	2.	.29	0.0225
0.04502	2.	.22	0.0267
0.04290	-1	.10	0.2714
Lag
0 1 2 3 4 5 6 7
10
Constant Estimate      0.006585
Variance Estimate      0.198648
Std Error Estimate     0.445699
Three month treasury bills
ARIMA model - to first differences
14:41 Saturday, February 3,
2001
The ARIMA Procedure
AIC                                           687.6855
SBC                                           735.1743
Number of Residuals        554 AIC and SBC do not include log determinant.
Correlations of Parameter Estimates
Parameter		MU	AR1,1	AR1,2	AR1,3	AR1,4	AR1,5
MU	1.	.000	0.001	-0.000	-0.001	-0.001	-0.000
AR1,1	0.	.001	1.000	-0.315	0.160	-0.020	0.095
AR1,2	-0.	.000	-0.315	1.000	-0.357	0.166	-0.033
AR1,3	-0.	.001	0.160	-0.357	1.000	-0.350	0.204
AR1,4	-0.	.001	-0.020	0.166	-0.350	1.000	-0.375
AR1,5	-0.	.000	0.095	-0.033	0.204	-0.375	1.000
4.8. EXAMPLE: THREE-MONTH TREASURY BILL RATES                 83
AR1,6	-0.	.001	-0.	.131	0.	.122	-0.	.068	0.218	-0.367
AR1,7	-0.	.001	0.	.200	-0.	.178	0.	.161	-0.078	0.218
AR1,8	-0.	.001	0.	.080	0.	.163	-0.	.166	0.161	-0.068
AR1,9	-0.	.001	-0.	.106	0.	.123	0.	.163	-0.178	0.122
AR1,10	-0.	.003	-0.	.085	-0.	.106	0.	.080	0.200	-0.131
Correlations of Parameter Estimates
Parameter	AR1,6		AR1,7	AR1,8	AR1,9		AR1,10
MU	-0.	.001	-0.001	-0.001	-0.	.001	-0.003
AR1,1	-0.	.131	0.200	0.080	-0.	.106	-0.085
AR1,2	0.	.122	-0.178	0.163	0.	.123	-0.106
AR1,3	-0.	.068	0.161	-0.166	0.	.163	0.080
AR1,4	0.	.218	-0.078	0.161	-0.	.178	0.200
AR1,5	-0.	.367	0.218	-0.068	0.	.122	-0.131
AR1,6	1.	.000	-0.375	0.2 04	-0.	.033	0.096
AR1,7	-0.	.375	1.000	-0.350	0.	.166	-0.020
AR1,8	0.	.204	-0.350	1.000	-0.	.357	0.161
AR1,9	-0.	.033	0.166	-0.357	1.	.000	-0.315
AR1,10	0.	.096	-0.020	0.161	-0.	.315	1.000
Three month treasury bills                                                5
ARIMA model - to first differences
14:41 Saturday, February 3, 2001
The ARIMA Procedure
Autocorrelation Check of Residuals
To	Chi-		Pr >											
Lag	Square	DF	ChiSq				---Autocorrelatior							
														
6	0.00	0	<.0001	0.	.003	-0.	.011	0.	.003	0.021	-0.	.015	-0.	.031
12	9.56	2	0.0084	0.	.036	-0.	.001	-0.	.031	0.018	0.	.105	-0.	.040
18	42.72	8	<.0001	-0.	.076	0.	.177	-0.	.115	0.081	0.	.019	0.	.025
24	62.06	14	<.0001	-0.	.062	-0.	.149	-0.	.078	-0.025	-0.	.024	-0.	.013
30	65.76	20	<.0001	0.	.002	0.	.008	0.	.045	0.048	-0.	.043	-0.	.007
36	73.52	26	<.0001	-0.	.070	-0.	.004	-0.	.051	-0.003	-0.	.053	-0.	.052
42	74.14	32	<.0001	-0.	.007	0.	.028	-0.	.007	-0.005	0.	.010	0.	.006
48	82.20	38	<.0001	-0.	.011	-0.	.000	-0.	.006	0.001	-0.	.103	0.	.050
Autocorrelation Plot  of  Residuals Lag  Covariance  Correlation  -198765432101234567891
0    0.198648     1.00000 |                                                   | ********************|
1 0.00057812     0.00291 |                                         • I •                                        I
2 -0.0020959     -.01055 |                                         • I •                                        I
3 0.00068451     0.00345 |                                         • I •                                        I
84             CHAPTER 4.  UNIVARIATE TIME SERIES MODELS: 3/12/01
4	0.0041792	0	02104
5	-0.0030362	-	01528
6	-0.0061377	-	03090
7	0.0071315	0	03590
8	-0.0001693	-	00085
9	-0.0061781	-	03110
10	0.0036055	0	01815
11	0.020788	0	10465
12	-0.0078818	-	03968
13	-0.015171	-	07637
14	0.035240	0	17740
15	-0.022934	-	11545
16	0.016000	0	08054
17	0.0037288	0	01877
18	0.0049781	0	02506
19	-0.012221	-	06152
20	-0.029590	-	14896
21	-0.015566	-	07836
22	-0.0050098	-	02522
23	-0.0048445	-	02439
24	-0.0026174	-	01318
"." marks two standard errors
Three month treasury bills                                                6
ARIMA model - to first differences
14:41 Saturday, February 3, 2001
The ARIMA Procedure
Inverse Autocorrelations
Lag         Correlation         -198765432101234567891
1	-0	04462
2	0	02988
3	0	02921
4	-0	04817
5	0	00308
6	0	02072
7	-0	02134
8	-0	01272
9	0	01308
10	-0	02753
11	-0	10241
12	0	03617
13	0	06350
14	-0	16306
15	0	12298
16	-0	08990
17	-0	02141
18	-0	00130
19	0	04419
4.8. EXAMPLE: THREE-MONTH TREASURY BILL RATES
20	0.11901    |
21	0.08929    |
22	0.02613    |
23	0.00628    |
24	0.00879    |
* *
* *
Lag
Lag
Partial  Autocorrelations Correlation         -198765432101234567
1	0	00291    |
2	-0	01056    |
3	0	00351    |
4	0	02091    |
5	-0	01534    |
6	-0	03040    |
7	0	03569    |
8	-0	00204    |
9	-0	02966    |
0	0	01926    |
1	0	10200    |
2	-0	04035    |
3	-0	07248    |
4	0	17834    |
Three month treasury bills ARIMA model - to first differences
14:41 Saturday, February 3,
The ARIMA Procedure
Partial Autocorrelations
Correlation         -19876543210123456789]
15	-0.13109    |
16	0.09936    |
17	0.02268    |
18	0.00293    |
19	-0.05597    |
20	-0.13881    |
21	-0.10044    |
22	-0.02905    |
23	-0.00750    |
24	-0.00979    |
* * * * *
Model for variable z
Estimated Mean                                 0.007146
Period(s) of Differencing                        1
86             CHAPTER 4.  UNIVARIATE TIME SERIES MODELS: 3/12/01
Autoregressive Factors
Factor 1:  1 - 0.33494 B**(l) + 0.16456 B**(2) - 0.01712 B**(3) + 0.10901 B**(4) - 0.14252 B**(5) + 0.2156 B**(6) + 0.08347 B**(7) - 0.10382 B**(8) - 0.10007 B**(9) + 0.04723 B**(10)
4.8. EXAMPLE: THREE-MONTH TREASURY BILL RATES
87
The AR(10) model does not fit well. Next we try an AR(24) model with backfitting. Here is the SAS program:
options linesize = 72 ;
data ratel ;
infile 'c:\courses\or473\data\fygn.dat' ;
input date $ z;
zdif=dif (z) ;
title 'Three month treasury bills' ;
title2 'AR(24) model to first differences with backfitting' ;
proc autoreg ;
model zdif= / nlag=24 backstep;
run ;
Here is the output.
Three month treasury bills                                                1
AR(24) model to first differences with backfitting
10:32 Wednesday, February 14, 2001
The AUTOREG Procedure
Dependent Variable    zdif
Ordinary Least Squares Estimates
SSE	135.25253	D FE	553
MSE	0.24458	Root MSE	0.49455
SBC	797.34939	AIC	793.032225
Regress R-Square	0.0000	Total R-Square	0.0000
Durbin-Watson	1.4454		
Standard                                    Approx
Variable       DF    Estimate       Error    t Value    Pr > |t|
Intercept        1     0.006986       0.0210       0.33      0.7397
Estimates of Autocorrelations
Lag    Covariance     Correlation
CHAPTER 4.  UNIVARIATE TIME SERIES MODELS: 3/12/01
0	0.2441	1	000000
1	0.0677	0	277260
2	-0.0262	-0	107364
3	-0.0224	-0	091587
4	-0.00911	-0	037332
5	0.0114	0	046690
6	-0.0453	-0	185710
7	-0.0480	-0	196558
8	0.0227	0	093118
9	0.0474	0	194318
Estimates of Autocorrelations
Lag
765432101234567

* * * *
* * * *
******************** ******
* * I * * * *
Three month treasury bills                                                2
AR(24) model to first differences with backfitting
10:32 Wednesday, February 14, 2001
The AUTOREG Procedure
Estimates of Autocorrelations
ag	Covariance	Correlation	
10	0.0143	0	058501
11	-0.00171	-0	006997
12	-0.0226	-0	092572
13	0.00876	0	035897
14	0.0384	0	157393
15	-0.0249	-0	101930
16	0.00120	0	004923
17	0.0200	0	082117
18	0.0190	0	078001
19	-0.00816	-0	033427
20	-0.0565	-0	231618
21	-0.0389	-0	159520
22	-0.00358	-0	014653
23	-0.00185	-0	007563
24	-0.00806	-0	032995
4.8. EXAMPLE: THREE-MONTH TREASURY BILL RATES
89
Lag
Estimates  of  Autocorrelations 98765432101234567
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
* * * * * * * *
* *
* *
Three month treasury bills AR(24) model to first differences with backfitting
10:32 Wednesday, February 14,
The AUTOREG Procedure
2001
Backward Elimination of Autoregressive Terms
Lag
10 23 17
3 24 13
7 18 22 20
4
Estimate		t Value	
0	007567	0	16
0	010212	0	22
0	008951	0	19
0	014390	-0	32
0	015798	0	40
0	041434	0	92
0	038880	0	85
0	037456	-0	90
0	042555	1	02
0	058230	1	31
0	059903	1	48
0	058141	-1	42
Pr
111
8721 8241 8492 7496 6907 3605 3964 3702 3090 1912 1389 1562
Preliminary MSE
0.1765
Estimates of Autoregressive Parameters
Standard
90             CHAPTER 4.  UNIVARIATE TIME SERIES MODELS: 3/12/01
.ag	Coefficient			Error	t Value
1	-0.	.388246	0.	.040419	-9.61
2	0.	.200242	0.	.040438	4.95
5	-0.	.108069	0.	.040513	-2.67
6	0.	.249095	0.	.039719	6.27
8	-0.	.103462	0.	.039668	-2.61
11	-0.	.102896	0.	.040278	-2.55
12	0.	.119950	0.	.040704	2.95
14	-0.	.204702	0.	.040427	-5.06
15	0.	.223381	0.	.042441	5.26
16	-0.	.151917	0.	.040811	-3.72
19	0.	.103356	0.	.038847	2.66
21	0.	.108074	0.	.039511	2.74
Three month treasury bills                                                4
AR(24) model to first differences with backfitting
10:32 Wednesday, February 14, 2001
The AUTOREG Procedure
	Expected
Autocorrelations	
Lag	Autocorr
0	1.0000
1	0.2840
2	-0.1196
3	-0.0801
4	0.0273
5	0.0656
6	-0.1914
7	-0.1923
8	0.0880
9	0.1549
10	0.0223
11	-0.0229
12	-0.0737
13	0.0767
14	0.1628
15	-0.1000
16	-0.0017
17	0.0685
18	0.0437
19	-0.0638
20	-0.1968
21	-0.1296
Yule-Walker Estimates
4.8. EXAMPLE: THREE-MONTH TREASURY BILL RATES
SSE                                     97.7597462         DFE                                                541
MSE                                            0.18070         Root MSE                             0.42509
SBC                                     695.767655         AIC                                 639.644514
Regress R-Square                  0.0000         Total R-Square                  0.2772
Durbin-Watson                        2.0627
Standard                                    Approx
Variable       DF    Estimate       Error    t Value    Pr > |t|
Intercept        1     0.006664       0.0192       0.35      0.7289 Three month treasury bills AR(24) model to first differences with backfitting
10:32 Wednesday, February 14, 20
The AUTOREG Procedure
	Expected
Autocorrelations	
Lag	Autocorr
0	1.0000
1	0.2840
2	-0.1196
3	-0.0801
4	0.0273
5	0.0656
6	-0.1914
7	-0.1923
8	0.0880
9	0.1549
10	0.0223
11	-0.0229
12	-0.0737
13	0.0767
14	0.1628
15	-0.1000
16	-0.0017
17	0.0685
18	0.0437
19	-0.0638
20	-0.1968
21	-0.1296
92              CHAPTER 4.  UNIVARIATE TIME SERIES MODELS: 3/12/01
4.9    Forecasting
ARIMA models are often used to forecast future values of a time series. Consider first an AR(1) process. Suppose that we have data yu...,yn and estimates ß and 4>. Then we estimate yn+i by
yn+1 := ß + 4>{yn - ß)
andyt+2by
yn+2 := ß + (f>(ýn+i - ß) = (f>{<f>(yn - ß)},
etc. In general, yn+k = ß+cßk(yn — ß). If 4> < 1 as is expected for a stationary series, then as k increases the forecasts will decay exponentially fast to ß.
Forecasting general AR(p) processes is similar. For example, for an AR(2) process
yn+i := ß + ii(yn - ß) + (f>2(yn-i ~ ß)
and
yn+2 := ß + <Í>i{yn+i - ß) + fo{yn - A)-
Forecasting ARMA and ARIMA processes is only slightly more complicated than forecasting AR processes and is discussed in time series courses such as ORIE 563. Moreover, the forecasts can be generated automatically by MINITAB and SAS, so you don't need to know the details in order to forecast.
4.9.1    GE daily returns
We have learned that fitting an ARIMA(1,0,0) model to log returns is equivalent to fitting an ARIMA(1,1,0) model to the log prices. Here we will fit both models to the GE daily price data. Figure 4.8 shows the forecasts of the log returns up to 24 days ahead. The forecasts are given in red and 95% confidence limits on the forecasts are show in blue. The observed time series is plotted in black.
4.9. FORECASTING
93
Time Series Plot for logR
(with forecasts and their 95% confidence limits)
T    U         T               T                           »                                             n				
I    1,1    ,	Hl			i f"
if	Ii|Ml		1	J L_
p}!	|™Im	1 ] ľ		1 1        111
ni i	1              M		Ji '[_____	
				
20     40      60     80     100    120    140    160    180   200   220   240
Time
Figure 4.8: Time series plot of the daily GE log returns with forecasts from an AR(1) model.
Next we fit an ARIMA(l,l/0) model to the log prices. Although this model is equivalent to the last model, it generates forecasts of the log prices, not the log returns. (MINITAB always forecasts the input series.) The forecasts are given in Figure 4.9. Notice that the forecasts predict that the price of GE will stay constant, but the confidences limits on the forecasts get wider as we forecast further ahead. This is exactly the type of behavior we would expect from a random walk [ARIMA(0,1,0)] model. The ARIMA(1,1,0) model for the log prices isn't quite a random walk model, but it is similar to a random walk model with zero drift {ji = 0) since § is close to 0
94              CHAPTER 4.  UNIVARIATE TIME SERIES MODELS: 3/12/01
Time Series Plot for logP
(with forecasts and their 95% confidence limits)
4.1
4.0 0.
3.8 3.7
20  40  60  80  100 120 140 160 180 200 220 240
Time
Figure 4.9: Time series plot of the daily GE log prices with forecasts from an AR(1) model.
and ß is extremely close to 0.
The forecast limits suggest that accurately forecasting future GE stock prices is pretty hopeless. For practical purposes the log prices behave like a random walk so that the prices behave like a geometric random walk.
Chapter 5
Portfolio Selection: 3/12/01
5.1   Trading off expected return and risk
How should we invest our wealth? There are two principles:
•  we want to maximize the expected return
•  we want to minimize the risk = variance of return
These goals are somewhat at odds. Nonetheless, there are optimal compromises between expected return and risk. In this chapter we will see how to maximize expected return subject to an upper bound on the risk, or to minimize the risk subject to a lower bound on the expected return.
The key concept that we will discuss is reduction of risk by diversifying the portfolio of assets held. Diversification was not always considered as favorably as it is now.
The investment philosophy of Keynes
The famous economist, John Maynard Keynes, did not believe in diversifying a portfolio. He wrote:
... the management of stock exchange investment of any kind is a low pursuit... from which it is a good thing for most members of society to be free
95
96
CHAPTER 5. PORTFOLIO SELECTION: 3/12/01
I am in favor of having as large a unit as market conditions will allow ... to suppose that safety-first consists in having a small gamble in a large number of different [companies] where I have no information to reach a good judgement, as compared with a substantial stake in a company where ones's information is adequate, strikes me as a travesty of investment policy
This quote is taken from Bernstein, Capital Ideas: The Improbable Origins of Modern Wall Street.
Keynes is advocating stock picking or "fundamental analysis." But the semi-strong version of the EMH says that fundamental analysis does not lead to economic profit. Of course, Keynes lived well before the EMH and one wonders what Keynes with think about diversification if he were alive now. Modern portfolio theory takes a very different viewpoint than Keynes. This is not to say that Keynes was wrong. Keynes was investing on a long time horizon, and fundamental analysis, if done well, might be very successful in the long run. However, portfolio managers are judged on short-term successes. Also, using fundamental analysis to find bargains is probably more difficult now than in Keynes's time.
5.2   One risky asset and one risk-free asset
We will start with a simple example where we have
•  one risky asset, which could be a portfolio, e.g., a mutual fund
-  expected return is .15
-  standard deviation of the return is .25
•  one risk-free asset, e.g., a 30-day T-bill
-  expected value of the return is .06
-  standard deviation of the return is 0 by definition of "risk-free."
5.2. ONE RISKY ASSET AND ONE RISK-FREE ASSET                         97
We are faced with the problem of constructing an investment portfolio that we will hold for one time period which could be an hour, a day, a month, a quarter, a year, ten years, etc. At the end of the time period we might want to readjust the portfolio, so for now we are only looking at returns over one time period. Suppose that
•  a fraction w of our wealth is invested in the risky asset
•  the remaining fraction 1 — w is invested in the risk-free asset
•  then the expected return is E(R) = w(.15) + (l — iu)(.06) = .06 + .09«;.
•  the variance of the return is
4 = w2 (.25)2 + (1 - wf (0)2 = w2(.25)2. or aR = .25 w. Would w > 1 make any sense?
0.2 0.18 0.16 0.14 0.12 £ 0.1
LU
0.08
0.06
0.04
0.02
0 0               0.2              0.4              0.6              0.8                1
w
Figure 5.1: Expected return for a portfolio with allocation w to the risky asset with expected return 0.15 and allocation 1 — wto the risk-free return ivith return 0.06.
98
CHAPTER 5. PORTFOLIO SELECTION: 3/12/01
Question: Suppose you want an expected return of .10? What should w be? [answer: 4/9]
Question: Suppose you want aR = .05. What should w be? [answer: 0.2]
More generally if the expected returns on the risky and risk-free assets are /ii and (if and if the standard deviation of the risky asset is o\, then the expected return on the portfolio is w/j,i + (1 — w)jif while the standard deviation of the portfolio's return is w o\.
This model is simple but not as useless as it might seem at first. Finding an optimal portfolio can be achieved in two steps.
1.  finding the "optimal" portfolio of risky assets, called the "tangency portfolio"
2.  finding the appropriate mix of the risk-free asset and the tangency portfolio from step one
So we now know how to do the second step. What we need to learn is how to mix optimally a number of risky assets; we will do that in the next sections. First, we look at a related example.
5.2.1    Example
In the February 2001 issue of Paine Webber's Investment Intelligence: A Report for Our Clients, the advantages of holding municipal bonds are touted. Paine Webber says "The chart at the right shows that a 20% municipal/80% S%P 500 mix sacrificed only 0.42% annual after-tax return relative to a 100% S&P 500 portfolio, while reducing risk by 13.6% from 14.91% to 12.88%. The chart is show here as Figure 5.2. Although Paine Webber's point is correct, the chart is cleverly designed to over-emphasize the reduction in volatility; how?
5.2. ONE RISKY ASSET AND ONE RISK-FREE ASSET
99
Return Volatility
Murticipal/S&P 500 Balanced Portfolios Annualized After-Tax Returns 19B1-20M and Portfolio Volatility

% at Purllolio Invested In Municipal Bonds {balance invested in 5SP 5DD]
Sourer: Nuvem InvtiWienti, "Tu-a h Gredttr Than One, "faxnary 200L Pnii pcrfoTTTiATtft is riß guarantee effuture refifltf-
Figure 5.2: Chart from PaineWebber newsletter showing reduction in volatility by mixing municipal bonds with the S&P 500 index.
100
CHAPTER 5. PORTFOLIO SELECTION: 3/12/01
5.2.2    Estimating E(R) and aR
The risk-free rate, /i/, will be known; Treasury bill rates are published in most newspapers.
What should we use as the values of E(R) and aR? If returns on the asset are assumed to be stationary, then we can take a time series of past returns and use the sample mean and standard deviation. Whether the station-arity assumption is realistics or not is always debatable. If we think that E(R) and aR will be different than in the past, we could subjectively adjust these estimates upward or downward according to our opinions, but we must live with the consequences if our opinions prove to be incorrect.
Another question is how long a time series to use, that is how far back in time when should gather data. A long series, say 10 or 20 years, will give much less variable estimates. However, if the series is not stationary but rather has slowly drifting parameters, then a shorter series (mabye 1 or 2 years) will be more representative of the future.
5.3    Two risky assets
The mathematics of mixing risky assets is most easily understood when there are only two risky assets. This is where we will start.
Suppose the two risky assets have returns i?i and R2 and that we mix them in proportions w and 1—w, respectively. The return is R = wR1 + (l — w)R2. The expected return on the portfolio is E(R) = wjjli + (1 — w)/i2. Let p12 be the correlation between the returns on the two risky assets. The variance of the return on the portfolio is
aR = w2a\ + (1 - wfo\ + 2w(l - w)p12 o\o2.
[Note: erÄ1)Ä2 = p\2oxo2\
Example:
5.3.  TWO RISKY ASSETS
101
If iii = .14, \i2 = -08, <7i = .2, a2 = .15, and pi2 = 0, then
E{R) = M + .06w.
Also, because rhoi2 = 0
o£ = (.2)2«,2 + (.15)2(l-,/,)2.
Using differential calculus, one can easily show that the portfolio with the minimum risk is w = .045/.125 = .36. For this portfolio E(R) = .08 + (.06)(.36) = .1016 and aR = ^/(.2)2(.36)2 + (.15)2(.64)2 = .12. Here are values of E(R) and aR for some other values of w:
w       E(R)      aR
~o      xm   .150
1/4     .095     .123
1/2     .110     .125
3/4     .125     .155
1         .140     .200
The somewhat parabolic curve in Figure 5.3 is the locus of values of (aRl E(R)) when 0 < w < 1. The points labeled Ri and R2 corresponds to w = 1 and w = 0, respectively. The other features of this figure will be explained in the next section.
5.3.1    Estimating means, standard deviations, and covari-ances
Estimates of jii and o\ can be obtained from a univariate times series of past returns on the first risky asset; denote this time series by i?i;i,..., Rľjn where the first subscript indicates the asset and the second subscript is for time. Let i?i and sRl be the sample mean and standard deviation of this series. Similarly, fi2 and o2 can be estimated from a time series of past returns on the second risky asset. The covariance er12 can be estimated by sample covariance
n
102                                CHAPTER 5. PORTFOLIO SELECTION: 3/12/01
0.125
LU
0.075
0.05
Figure 5.3: Expected return versus risk. The parabola is the locus of portfolios combining the two risky assets. The lines are the locus of portfolios of two risky assets and the risk-free asset. F = risk-free asset. T = tangency portfolio. Ri is the first risky asset. R2 is the second risky asset.
The correlation p12 can be estimated by the sample correlation
Pl2
SlS2'
pi2, sometimes denoted by r\2, is called the cross-correlation coefficient between Ri and R2 at lag 0, since we are correlating the return on the first risky asset with the return on the second during the same time periods. Cross-correlations at other lags can be defined but are not needed here. In fact, we can define a cross-correlation function, which is a function of lag. The cross-correlation function plays an important role in the analysis of
5.4. COMBINING TWO RISKY ASSETS WITH A RISK-FREE ASSET 103
multivariate time series.
Sample correlations and covariances can be computed on MINITAB. Go to "Stat," then "Basic statistics," and then "Correlation" or "Covariance."
5.4   Combining two risky assets with a risk-free asset
As mentioned at the end of the last section, each point on the parabola in Figure 5.3 is (aR, E(R)) for some value of w between 0 and 1. If we fix w, then we have a fixed portfolio of the two risky assets. Now let us mix that portfolio of risky assets with the risk-free asset. The point F in Figure 5.3 gives (aR, E(R)) for the risk-free asset; of course oR = 0 at F. The possible values of (aR, E(R)) for a portfolio consisting of the fixed portfolio of two risky assets and the risk-free asset is a line connecting the point F with a point on the parabola, e.g., the dashed line. The dotted line connecting F with Ri mixes the risk-free asset with the first risky asset.
Notice that the dotted line lies above the dashed line. This means that for any value of aR/ the dotted line gives a higher expected return than the dashed line. The slope of any line is called the "Sharpe ratio" of the line; it is named after William Sharpe whom we have met before in Section 3.8 and will meet again in Chapter 6. Sharpe's ratio can be thought of as a "reward-to-risk" ratio. It is the ratio of the "excess exprected return" to the risk as measure by the standard deviation.
Clearly, the bigger the Sharpe ratio the better. Why? The point T on the parabola represents the portfolio with the highest Sharpe ratio. It is the optimal portfolio for the purpose of mixing with the risk-free asset. This portfolio is called the "tangency portfolio" since its line is tangent to the parabola.
Key result: The optimal or "efficient" portfolios mix the tangency portfolio of two risky assets with the risk-free asset. Each efficient portfolio has
104
CHAPTER 5. PORTFOLIO SELECTION: 3/12/01
two properties:
•  it has a higher expected return than any other portfolio with the same (or smaller) risk
•  it has a smaller risk than any other portfolio with the same (or smaller) expected return.
Thus we can only improve (reduce) the risk of an efficient portfolio by accepted a worse (smaller) expected return, and we can only improve (increase) the expected return of an efficient portfolio by accepting worse (higher) risk.
Note that all efficient portfolios use the same mix of the two risky assets, namely the tangency portfolio. Only the proportion allocated to the tangency portfolio and the proportion allocated to the risk-free asset vary.
5.4.1    Tangency portfolio with two risky assets
Given the importance of the tangency portfolio, you may be wondering "how do we find it?"
Again let Hi, p2, and (if be the expected returns on the two risky assets and the return on the risk-free asset. Let cti and o2 be the standard deviations of the returns on the two risky assets and let p12 be the correlation between the returns on the risky assets.
Define Vi = /ii — /x/ and V2 = ß2 — Hf', V\ and V2 are called the "excess returns." Then the tangency portfolio uses weight
__________V\o\ - V2P12C1O2_______                            (5 ,,
V\o\ + V2o\ - {Vi + V2)Pl2 0"lO"2 '
This formula will be derived in Section 5.6.5.
The tangency portfolio allocates a fraction wT of the investment to the first risky asset and (1 — wT) to the second risky asset.
5.4. COMBINING TWO RISKY ASSETS WITH A RISK-FREE ASSET 105
Let RT, E{RT), and aT be the return, expected return, and standard deviation of the return on the tangency portfolio.
Example: Suppose as before that Hi = .14, /i2 = .08, cti = .2, a2 = .15, and Pi2 = 0. Suppose as well that jif = .06. Then Vi = .14 — .06 = .08 and V2 = .08 - .06 = .02. Using (5.1) we get wT = .693. Therefore,
E(RT) = (.693)(.14) + (.307)(.08) = .122,
and
aT = v/(.693)2(.2)2 + (.307)2(.15)2 = .146.
Let R be the return on the portfolio that allocates a fraction uj of the investment to the tangency portfolio and 1 — u to the risk-free asset.
Then R = ujRt + (1 — a;)/// = /if + co(Rt — Rf) so that
E{R) = jif + u){E(RT) — jif}    and    aR = ujot.
Continuation of previous example: What is the optimal investment with a R = .05?
answer: The maximum expected return with aR = .05 mixes the tangency portfolio and the risk-free asset such that oR = .05. Since oT = .146, we have that .05 = aR = ujaT = .146 u, so that u = .05/.146 = .343 and \-u) = .657.
So 65.7% of the portfolio should be in the risk-free asset. 34.3% should be in the tangency portfolio. Thus (.343)(69.3%) = 23.7% should be in the first risky asset and (.343)(30.7%) = 10.5 should be in the second risky asset.
In summary
106                                CHAPTER 5. PORTFOLIO SELECTION: 3/12/01
Asset	Allocation
risk-free risky 1 risky 2	65.7% 23.7% 10.5%
Total	99.9%
The total is not quite 100% because of rounding errors.
Now suppose that you want a 10% expected return. Compare
•  The best portfolio of only risky assets
•  the best portfolio of the risky assets and the risk-free asset Answer:
•  (best portfolio of risky assets)
-  .1 = w(.U) + (1 - w)(.08) implies that w = 1/3.
-  This is the only portfolio of risky assets with E(R) = .1, so by default it is best.
-  Then
aR = ^2(.2)2 + (l-«;)2(.15)2 = v/(l/9)(.2)2 + 4/9(.15)2 = .120.
•  (best portfolio of the two risky assets and the risk-free asset)
-  .1 = E(R) = .06 + .062cj = .06 + A25aR, since oR = uaT or to = aR/aT = ctä/.146.
-  This implies that aR = .04/.425 = .094 and u = .04/.062 = .645.
So combining the risk-free asset with the two risky assets reduces aR from .120 to .094 while maintaining E(R) at .1. The reduction in risk is (.120 — .094)/.094 = 28%.
More on the example: What is the best we can do combining the risk-free asset with only one risky asset? Assume that we still want to have E(R) = .1
5.4. COMBINING TWO RISKY ASSETS WITH A RISK-FREE ASSET 107
•  Second risky asset with the risk-free
-  Since [if = .06 < .1 and ji2 = -08 < .1, no portfolio with only the second risky asset and the risk-free asset will have an expected return of .1.
•  First risky asset with the risk-free
-  .1 =w(.14) + (l-o;)(.06) = .06+w(.08)implesthatw = .04/.08 = 1/2.
-  Then aR = a; (.20) = .10 which is greater than .094, the smallest risk with two risky assets and the risk-free asset such that E(R) = .1.
The minimum value of aR under various combinations of available assets are given in Table 5.4.1.
Available Assets	Minimum aR
1st risky, risk-free	0.1
2nd risky, risk-free	-
Both riskies	0.12
All three	0.094
Table 5.1: Minimum value of aR as a function of the available assets.
5.4.2    Effect of p12
Positive correlation between the two risky asets is bad. With positive correlation, then two assets tend to move together which increases the volatility of the portfolio. Conversely, negative correlation is good. If the assets are negatively correlated, a negative return of one tends to occur with a positive return of the other so the volatility of the portfolio decreases. Figure 5.4 shows the efficient frontier and tangency portfolio when /ii = .14, /U2 = -09, ai = .2, cr2 = -15, and /i/ = .03.  The value of p12 is varied
108
CHAPTER 5. PORTFOLIO SELECTION: 3/12/01
from .7 to —.7. Notice that the standard deviation of the portfolio returns decreases as p12 decreases.
p = 0.7
LU
0.05
0     0.05    0.1    0.15    0.2 p = 0
LU
0.05
0     0.05    0.1    0.15    0.2
LU
0.1	p = 0.3
0.05	F''
0     0.05    0.1    0.15    0.2 p = -0.7
LU
0.05
0     0.05    0.1    0.15    0.2
Figure 5.4: Efficient frontier and tangency portfolio when ßi = .14, ß2 = -09, ax = .2, o2 = .15, and \i$ = .03. The value of p12 is varied from .7 to —.7.
5.5. HARRY MARKOWITZ
109
5.5    Harry Markowitz
Chapter Two of Capital Ideas: The Improbable Origins of Modern Wall Street by Peter Bernstein is titled "Fourteen Pages to Fame." The title refers to the paper "Portfolio Selection" by Harry Markowitz that was published in the Journal of Finance in 1952. This article is indeed only fourteen pages though it was later expanded to the book Portfolio Selection: Efficient Diversification of Investments that was published by Markowitz in 1959.
Markowitz was not primarily interested in the stock market or investing. Rather, he was drawn to the more general issue of how people make tradeoffs. Investors are faced with a trade-off between risk and expected return. The maxim "nothing ventured, nothing gained" isn't quite true, but risk-free rates of return can be smaller than many investors find acceptable. Markowitz's solution to the problem of risk also can be expressed as a maxim, "don't put all your eggs in one basket." (Keynes, would have agreed with Mark Twain who said, "put all your eggs in one basket — and then, watch that basket!"
Markowitz was born in 1927 and grow up in Chicago. His high school grades were not impressive, but he was intellectually curious and read a great deal on his own. At fourteen, he read Darwin's Origin of Species and later his hero was the philosopher David Hume. The knowledge he acquired on his own got him into the University of Chicago and even exempted him from the required science courses there. This self-study may have been ideal preparation for the highly original work that came later.
After graduation, Markowitz became a research associate at the Cowles Commission and a graduate student at his Alma Mater. While waiting outside his advisor's office one day, he began a conversation with a stock broker who suggested that he write his thesis on the stock market. Markowitz was somewhat surprised when later his advisor was enthusiastic over this idea.
Markowitz started to read what he could about investing. In the 1937 book
110                                CHAPTER 5. PORTFOLIO SELECTION: 3/12/01
The Theory of Investment Value by John Burr Williams, he found Williams's prescription for selecting stocks: one estimated the "instrinsic value" of a stock by forecasting all future dividends and calculating the "present value" all future dividends, that is, the discounted sum of all future dividends. William then recommends that one put all one's capital in the stock with the highest intrinsic value.
Markowitz had enough knowledge of the world to realize that this is not how investors actually operated. He had the key insight that humans are risk-averse, and he began to explore the relationship between diversification and risk.
Interestingly, Markowitz did not recommend that expected returns be estimated from past data but rather from Williams's Dividend Discounted Model.
5.6   Risk-efficient portfolios with N risky assets
5.6.1    Efficient-set mathematics
Efficient-set mathematics generalizes our previous analysis with two risky assets to the more realistic case of many risky assets. This material is taken from Section 5.2 of Campbell, Lo, and MacKinlay.
Assume that we have N risky assets and that the return on the ith risky asset is Hi. Define
to be the random vector of returns. Then
( V\ \ E(R) = n=     ;
5.6. RISK-EFFICIENT PORTFOLIOS WITH N RISKY ASSETS             111
Let fž„ be the covariance between Rj and Rj. Also, let o i = y/H^ be the the standard deviation of i?,. Define p„ = ^/(o-jOj) as the correlation between Ri and i?j. Finally, let fž be the covariance matrix of R, i.e.,
n = cov(H),
so that the z, jth element of ÍŽ is íí^. Let
\uJn / be a matrix of portfolio weights and let
be a column of N ones. We assume that uji + ■ ■ ■ + uN = lTo; = 1. The expected return on a portfolio with weights u; is Y^íĹi ^ifo = u;TA*-
When N = 2, uj2 = 1 — u)\. Suppose there is a target value, ßPl of the expected return on the portfolio. We assume that
min   Uj < up <   max   /*,-,
since no portfolio can have an expected return higher than the individual asset with the highest expected return or smaller than the individual asset with the lowest expected return. When N = 2 the target, ßPr is achieved by only one portfolio and its uji value solves
ßp = Uißi + U2ß2 = ß2+ ^i(Mi - ß2)-
For N > 3, there will be an infinite number of portfolios achieving the target, ßP. The one with the smallest variance is called the "efficient" portfolio. Our goal is to find the efficient portfolio.
112
CHAPTER 5. PORTFOLIO SELECTION: 3/12/01
By equation (2.2), the variance of the return on the portfolio with weights
IjJ is
N    N
J2 Yl ^i wi % = ^T^-                                 (5-2)
i=l j=l
Thus, given a target jiP, the efficent portfolio minimizes (5.2) subject to
CĽT/Lt = jip                                                          (5.3)
and
wTl = 1.                                             (5.4)
We will denote the weights of the efficient portfolio by U3ßp. To find 0JßP, form the Lagrangian
L = u}Tflu} + 5i(/ip - u>lpn) + S2(l- wTpj).
Then solve
0 = -^-L = 2ííwílJ,-<f1ji-52l.                           (5.5)
au;
Definition: Here
du means the gradient of L with respect to u; with the other variables in L held fixed.
Fact: For an n x n matrix A and an n-dimensional vector x,
d T—xTAx = (A + AT)x
The solution to (5.5) is
u,ßp = ^ÍT^iji + 521) = n-\XifJL + A21)
5.6. RISK-EFFICIENT PORTFOLIOS WITH N RISKY ASSETS             113
where Ai and A2 are new Lagrange multipliers:
Ai = -Ô! and A2 = -S2. Thus,
where Ai and A2 are yet to be determined scaler quantities. We need to use the constraints to determine Ai and A2. Therefore,
and Define
lip = nTußP = Ai/zTfž 1fj, + A2/LiTfi 11-, 1 = lTußp = AilJírV + A21TÍÍ_11.
A   =   fiTfi-1l = lTíí_1/x
b = nTn V, c = iJn -4,
Then
/ip   =   SAi + AA2 1   =   AX1 + CX2.
These are equations in Ai and A2; A, B, and C are known quantities. The
solution is
.        -A + Cfj,p                  B-Aup
Ai =-------------- and A2 =------------.
D                               D
It follows after some algebra that
ußP=g + hnp,                                       (5.6)
where
bo.r1!- AnrlLi                                   „_m
g =------------y>---------- '                                     (5-7)
114
CHAPTER 5. PORTFOLIO SELECTION: 3/12/01
and
h=cn-y-.4Q-i                    (58)
Notice that g and h are fixed vectors, since they depend on the fixed vector /x and the fixed matrix Í7. Also, the scalars A, C, and D are functions of /x and Í7 so they are also fixed. The target expected return, fiP, can be varied over the range
min   u, < up <   max   n,-..
As jj,p varies over this range, we get a locus ojßp of efficient portfolios called the "efficient frontier." We can illustrate the efficient frontier by the following algorithm:
1.  Vary jjlp along a grid. For each value of /j,p on this grid, compute aßp by:
(a)  computing uißp = g + h fiP
(b)  then computing aßp = ^vßp nußp
2.  Plot the values (,up, aßp)
5.6. RISK-EFFICIENT PORTFOLIOS WITH N RISKY ASSETS             115
This algorithm is implemented in the MATLAB program "portfolio02.m" on the course's web site and listed below:
% Input mean vector and covariance matrix of returns here
bmu = [ .08; .03; .05] ; bOmega = [ .3 .02 .01 ;
.02 .15 .03 ;
.01 .03 .18 ] ;
bone = ones(length(bmu),1) ;
short = 1 ;  % short = 1 implies extensive short selling
% short = 0 reduces the short selling, but % does not eliminate short sell
ngrid = 2 00 ;
if short == 1 ;
muP = linspace(-.02,.2,ngrid) ;
w = linspace(-5,7,ngrid) ;
else ;
muP = linspace(min(bmu),max(bmu),ngrid) ;
w = linspace(0,1,ngrid) ;
end ;
sigmaP = zeros(1,ngrid) ; omegaP = zeros(3,ngrid) ;
mul2 = zeros(1,ngrid) ;
sigmal2 = mul2 ;
mul3 = zeros(1,ngrid) ;
sigmal3 = mul2 ;
mu23 = zeros(1,ngrid) ;
sigma2 3 = mul2 ;
116
CHAPTER 5. PORTFOLIO SELECTION: 3/12/01
ibOmega = inv(bOmega) ;
A = bone'*ibOmega*bmu ;
B = bmu'*ibOmega*bmu ;
C = bone'*ibOmega*bone ;
D = B*C - A~2 ;
bg = (B*ibOmega*bone - A*ibOmega*bmu)/D ;
bh = (C*ibOmega*bmu - A*ibOmega*bone)/D ;
for i=l:ngrid ;
omegaP(:,i) = bg + muP(i)*bh ;
sigmaP(i) = sqrt(omegaP(:,i)'*bOmega*omegaP(:,i)) ;
mul2(i) = w(i)*bmu(l) + (l-w(i))*bmu(2) ;
sigmal2(i) = sqrt(w(i)~2*b0mega(1,1) + 2*w(i)*(l-w(i))*bOmega(1,2)
+ (l-w(i))~2*b0mega(2,2)) ;
mul3(i) = w(i)*bmu(l) + (l-w(i))*bmu(3) ;
sigmal3(i) = sqrt(w(i)~2*b0mega(1,1) + 2*w(i)*(1-w(i))*bOmega(1,3!
+ (l-w(i))~2*bOmega(3,3)) ;
mu23(i) = w(i)*bmu(2) + (1-w(i))*bmu(3) ;
sigma23(i) = sqrt(w(i)~2*b0mega(2,2) + 2*w(i)*(1-w(i))*bOmega(2,3)
+ (l-w(i))~2*bOmega(3,3)) ;
end ;
fsize = 16 ;
figure (1)
p = plot(sigmaP,muP,sigmal2,mul2,
set(p,'linewidth',6) ;
xlabel('standard deviation of return (\sigma_P)',
ylabel('expected return (\mu_P)','fontsize',fsize
text(sqrt(bOmega(1,1)),bmu(1),'1'
text(sqrt(bOmega(2,2)),bmu(2),'2'
text(sqrt(bOmega(3,3)),bmu(3),'3'
set(gca,'fontsize',fsize) ;
if short == 0 ;
set(gca,'ylim', [ .025, .085]) ;
end ;
if short == 1 ;
set(gca,'ylim',[-.02,.2]) ;
set(gca,'xlim', [ .2 , 2]) ;
end ;
',sigma13,mul3,'-.',sigma23,mu23, 'fontsize',fsize) ;
,'fontsize' ,'fontsize' ,'fontsize'
,24) ,24) ,24!
5.6. RISK-EFFICIENT PORTFOLIOS WITH N RISKY ASSETS             117
grid ;
if short == 0 ;
print portfolio02.ps -depsc ;
!mv portfolio02.ps ~/public_html/or473/LectNotes/portfolio02.ps ;
else ;
print portfolio02SH.ps -depsc ;
!mv portfolio02SH.ps ~/public_html/or473/LectNotes/portfolio02SH.ps ;
end ;
figure (2)
p2 = plot(muP,omegaP(1,:),muP,omegaP(2,:),'--',muP,omegaP(3,:),'-.') ;
set(p2,'linewidth',6) ;
set(gca,'fontsize',fsize) ;
grid ;
xlabel('\mu_P','fontsize',fsize) ;
ylabel('weight','fontsize' , fsize) ;
legend('w_l','w_2','w_3',0) ;
if short == 0
print portfolio02_wt.ps -depsc ;
!mv portfolio02_wt.ps ~/public_html/or473/LectNotes/portfolio02_wt.ps ;
else ;
print portfolio02_wtSH.ps -depsc ;
!mv portfolio02_wtSH.ps ~/public_html/or473/LectNotes/portfolio02_wtSH.ps ;
end ;
To use this program replace bmu and bOmega in the program by the vector of expected returns and covariance matrix of returns for the assets you wish to analyze. The parameter "short" should be set equal to 0 or 1. If "short" is 1, then there is extensive short selling, i.e., weights get quite negative. If "short" is 0, then the amount of short selling is small.
118
CHAPTER 5. PORTFOLIO SELECTION: 3/12/01
Figure 5.5 was produced by this program with "short" equal to 0.
0.35         0.4         0.45         0.5
standard deviation of return (o )
Figure 5.5: Efficient frontier (solid) plotted for N = 3 assets by the program "portfolio02.m" with the parameter "short" equal to 0. "1," "2," and "3" are the three single assets. The efficient frontiers for just two assets are dashed (1 and 2), dashed-and-dotted (1 and 3), and dotted (2 and 3).
5.6. RISK-EFFICIENT PORTFOLIOS WITH N RISKY ASSETS             119
The portfolio weights as functions of ßP are plotted in Figure 5.6. The weights can be negative. Negative weights can be obtained by the technique of selling short which is described in Section 5.6.2.
0.03             0.04             0.05             0.06             0.07             0.08
Figure 5.6: Weights for assets 1,2, and 3 as functions of \iv. Note that the weights for assets 1 and 2 can be negative, so that short selling would be required.
120
CHAPTER 5. PORTFOLIO SELECTION: 3/12/01
If one wants to avoid short selling, then one must impose the additional constraints that wi > 0 for i = 1,..., N. Minimization of portfolio risk subject to u}T ß = Up, tt>Tl = 1, and these additional nonnegativity constraints is a quadratic programming problem. (This minimization problem cannot be solved by the method of Lagrange multipliers because of the inequality constraints.) Quadratic programming algorithms are not hard to find. For example, the program "quadprog" in MATLAB's Optimization Toolbox does quadratic programming.
Figure 5.7 and 5.8 were produced by the program "portfolio02QP.m" that uses "quadprog" in MATLAB. Quadratic programming in MATLAB and "portfolio02QP.m are discussed in Section 5.9.
5.6. RISK-EFFICIENT PORTFOLIOS WITH N RISKY ASSETS            121
	1			1
		J^	^	^0*^\
				
		o		
		O		
	^^*o			
			—  no negative wts - - ■ unconstrained wts	
	1			
0.25
0.3
0.35
0.4
0.45
0.5
standard deviation of return (o )
0.55
0.6
Figure 5.7: Efficient frontier plotted by the program "portfolio02QP.m" for N = 3 assets. "1," "2," and "3" are the three single assets. The efficient frontiers are found with and without the constraint of no negative loeights. The constrained efficient frontier is computed using MATLAB's quadratic programming algorithm.
122
CHAPTER 5. PORTFOLIO SELECTION: 3/12/01
0.03              0.04              0.05              0.06              0.07              0.08
Figure 5.8: Weights for assets 1, 2, and 3 as functions of /j,p. The iveightsfor all three assets are constrained to be nonnegative.
5.6. RISK-EFFICIENT PORTFOLIOS WITH N RISKY ASSETS             123
The portfolio weights with the nonnegativity constraint are plotted as functions of ßp in Figure 5.8. That figure was made using the program "port-folio02QP.m" listed here:
% Input mean vector and covariance matrix of returns here
bmu = [ .08; .03; .05] ; bOmega = [ .3 .02 .01 ;
.02 .15 .03 ;
.01 .03 .18 ] ; A = [ones (1,3) ;bmu'] ; ngrid = 50 ; muP = linspace(.03,.08,ngrid)'  ;
icompute = 0 ;
if icompute == 1 ;
sigmaP = muP ;
sigmaP2 = sigmaP ;
omegaP = zeros(3,ngrid) ;
omegaP2 = omegaP ;
for i = 1:ngrid ;
omegaP(:,i) = quadprog(bOmega,zeros(3,1) ,-eye(3) ,zeros (3,1) ,A, [l;muP(i)]) ;
omegaP2(:,i) =  quadprog(bOmega,zeros(3,1) ,zeros(1, 3) ,0,A, [l;muP(i)]) ;
sigmaP(i) = sqrt(omegaP(:,i)'*bOmega*omegaP(:,i)) ; sigmaP2(i) = sqrt(omegaP2(:,i)'*b0mega*omegaP2(:,i)) ;
end ; end ;
fsize = 16 ;
figure(1)
elf
p = plot(sigmaP,muP,sigmaP2,muP,'--') ;
l=legend('no negative wts','unconstrained wts',4) ;
set(gca,'fontsize',fsize) ;
set(1,'fontsize',fsize) ;
xlabel('standard deviation of return (\sigma_P)','fontsize',fsize) ;
124
CHAPTER 5. PORTFOLIO SELECTION: 3/12/01
ylabel('expected return (\mu_P)','fontsize',fsize) ; text(sqrt(bOmega(1,1)) ,bmu(1) ,'ľ,'fontsize' ,24) ; text(sqrt(bOmega(2,2)),bmu(2),'2','fontsize',24) ; text(sqrt(bOmega(3,3)),bmu(3),'3','fontsize',24) ; set(gca,'ylim', [ .025, .085]) ;
set(p,'linewidth',4) ; grid ;
print portfolio02QP.ps -depsc ;
!mv portfolio02QP.ps ~/public_html/or473/LectNotes/portfolio02QP.ps ;
figure(2)
p2 = plot(muP,omegaP(1,:),muP,omegaP(2,:),'--',muP,omegaP(3,:),'-.') ;
set(p2,'linewidth',6) ;
set(gca,'fontsize',fsize) ;
grid ;
xlabel('\mu_P','fontsize',fsize) ;
ylabel('weight','fontsize',fsize) ;
legendi' w_ľ , 'w_2' , 'w_3' , 0) ;
print portfolio02_wtQP.ps -depsc ;
!mv portfolio02_wtQP.ps ~/public_html/or473/LectNotes/portfolio02_wtQP.ps ;
5.6. RISK-EFFICIENT PORTFOLIOS WITH N RISKY ASSETS             125
Now suppose that we have a risk-free asset and we want to mix the risk-free asset with some efficient portfolio. One can see geometrically that there is a tangency portfolio; see Figure 5.9. The optimal portfolio always is a mixture of the risk-free asset with the tangency portfolio. This is a remarkable simplification.
mu =
expected
return
T= tangency portfolio     ^"**^	^,__    ^ ^^efficient
	^jr               frontier
	/y                           4 arbitrary portfolio of
best portfolios of        /	/                                    risky assets
risky and         ™*A^	
risk-free assets   /         ■	P = arbitrary efficient portfolio of
	risky assets
//%. mixtures of P	
Ar          and R	
	X
R =	
risk-free	
	sigma
Figure 5.9: Finding the best portfolios that combine risky and risk-free assets. R is the risk-free asset. T is the tangency portfolio. The optimal portfolios are on the line connecting R and T. The efficient frontier gives the set of optimal portfolios of risky assets.
126
CHAPTER 5. PORTFOLIO SELECTION: 3/12/01
5.6.2    Selling short
Selling short is a way to profit if a stock price goes down. To sell a stock short, one sells the stock without owning it. The stock must be borrowed from a broker or another customer of the broker. At a later point in time, one buys the sale and gives it back to the lender. This closes the short position.
Suppose a stock is selling at $25/share and you sell 100 shares short. This gives you $2,500. If the goes down to $17 share, you can buy the 100 shares for $1,700 and close out your short position.
Suppose that you have $100 and there are two risky assets. With your money you could buy $150 worth of risky asset 1 and sell $50 short of risky asset 2. The net cost would be exactly $100. If i?i and R2 are the returns on risky assets 1 and 2, then our the return on your portfolio would be
§* + (-i)«,
Your portfolio weights are wi = 3/2 and w2 = —1/2. Thus, you hope that risky asset 1 rises in price and risky asset 2 falls in price.
Here, as elsewhere, we have ignored transaction costs.
Figure 5.10 is the same as Figure 5.5 except that the range of values of fiP has been expanded. Values of /iP below min(/ij) and above max(^) are possible by using short selling. In principle, there is no upper limit to jjlP/ but in practice security exchanges place limits on the amount of stock one can sell short becausing selling short increases risk.
5.6. RISK-EFFICIENT PORTFOLIOS WITH N RISKY ASSETS            127
Figure 5.10: Efficient frontier (solid) plotted for N = 3 assets by the program "portfolio02.m" with the parameter "short" equal to 1. "1," "2," and "3" are the three single assets. The efficient frontiers for just two assets are dashed (1 and 2), dashed-and-dotted (1 and 3), and dotted (2 and 3). This figure is the same as Figure 5.5 except that the range offip has been expanded.
128
CHAPTER 5. PORTFOLIO SELECTION: 3/12/01
5.6.3    The Interior decorator fallacy
It is often thought that a stock portfolio should be tailored to the financial circustances of a client, as an interior decorator furnishes your home to suit your tastes. For example, widows and orphans should hold conservative "income stocks," or so it is said.
Bernstein, in his book Capital Ideas, calls this the "interior decorator fallacy." Bernstein tells the story of a woman in her forties who came to him in 1961 for investment advice. She was married to a clergyman with a modest income. She had just inherited money which she wanted to invest. Bernstein recommended a portfolio that included stocks with good growth potential but low dividends, e.g., Georgia Pacific, IBM, and Gillette. The client was worried that these were too risky, but she eventually took Bernstein's advice, which turned out to be sound. Bernstein reasoned that even someone with modest means should benefit from the long-term growth potential of the "hot" stocks.
In another case, Bernstein recommended electric utilities, a conservative choice, to a young business excecutive who wanted a more aggressive portfolio. Again, this recommendation was at odds with conventional wisdom.
A new view, based both on mathematical theory and experience, is that there is a best portfolio (the tangency portfolio) that is the same for everyone. An individual's circumstances only determines the appropriate mix between risk-free assets and the tangency portfolio. The clergyman's wife should invest a higher percentage of her money in risk-free assets than the young business executive. In 1961, Bernstein had the right intuition but he had not yet heard of the Efficient Frontier or the tangency portfolio.
5.6.4    Back to the math
Here's the mathematics behind Figure 5.9. We now remove the assumption that ujj1 = 1. The quantity 1 — usJ 1 is invested in the risk-free asset.
5.6. RISK-EFFICIENT PORTFOLIOS WITH N RISKY ASSETS             129
(Does it make sense to have 1 — a;Tl < 0?). The expected return is
u/V+(l-^Tl)/"/,                                    (5-9)
where jif is the return on the risk-free asset. The constraint to be satisfied is that (5.9) is equal to jiP. Thus, the Lagrangian function is
L = ujtQ.uj + 5{jip — uj1n — (1 — u;Tl)/i/}.
Here 5 is a Lagrange multiplier. Since
d 0 = — L = 2ftu; + ô(-fí + l/i/), du;
the optimal weight vector, i.e., the vector of weights that minimizes risk subject to the constraint on the expected return, is
ivßp = xn-1(ß-f,fi),                          (5.10)
where A = 5/2. To find A, we use our constraint:
^Íp^ + (1-wÍp1)^/ = mp-                            (5-H)
Rearranging (5.11), we get
^Ipip- J"/1) = Vp- Vf-                              (5-12)
Therefore, substituting (5.10) into (5.12) we have
X(fi - ///l)Tíí_1(/x - fifl) =[ip- fif,
or
A = 7---------^T^----------v                           (5-13)
(^-M/)1^    (/x-M/1)
Then substituting (5.13) into (5.10)
u>ßp = cp w,
where
_               lip- nf
Cp —
(ti-fifiyn-'ifi-tifi)
130
CHAPTER 5. PORTFOLIO SELECTION: 3/12/01
and
üJ = Q-1(/j,-ljfl).                                   (5.14)
Note that (fj, — /i/l) is the vector of "excess returns," that is, the amount by which the expected returns on the risky assets exceed the risk-free return. The excess returns measure how much the market buys for assuming risk.
ČU is not quite a portfolio because these weights do not necessarily sum to one. The tangency portfolio is a scalar multiple of u;:
u>T = -^.                                         (5.15)
1   UJ
cp tells us how much weight to put on W and therefore on the tangency portfolio. The amount of weight to put of the risk-free asset is 1 — uJTl = 1 — cp (wTl). The weight on the tangency portfolio is cp (oJJl).
Note that ÜJ and u:T do not depend on /j,p.
The MATLAB program "portfolio03.m" on the course web site is an extension of "portfolio02.m." portfolio03.m, which is listed below, also plots of the tangency portfolio (T) and the line connecting the risk-free asset (F) with the tangency portfolio.
% portfolio03 - extension of portfolio02
% Input mean vector and covariance matrix of returns here
bmu = [ .08; .03; .05] ; bOmega = [ .3 .02 .01 ;
.02 .15 .03 ;
.01 .03 .18 ] ;
muf = .02 ;
bone = ones(length(bmu),1) ;
5.6. RISK-EFFICIENT PORTFOLIOS WITH N RISKY ASSETS
muP = linspace(min(bmu),max(bmu),50) ; siqmaP = zeros(1,50) ;
ibOmeqa = inv(bOmeqa) ;
A = bone'*ibOmeqa*bmu ;
B = bmu'*ibOmeqa*bmu ;
C = bone'*ibOmeqa*bone ;
D = B*C - A~2 ;
bq = (B*ibOmeqa*bone - A*ibOmeqa*bmu)/D ;
bh = (C*ibOmeqa*bmu - A*ibOmeqa*bone)/D ;
for i=l:50 ;
omeqaP = bq + muP(i)*bh ;
siqmaP(i) = sqrt(omeqaP'*bOmeqa*omeqaP) ;
end ;
bomeqabar = ibOmeqa*(bmu - muf*bone) ; bomeqaT = bomeqabar/(bone'*bomeqabar) ; siqmaT = sqrt(bomeqaT'*bOmeqa*bomeqaT) ; muT = bmu'*bomeqaT ;
fsize = 16 ;
fsize2 = 28 ;
bomeqaP2 = [0; .3; .7] ;
siqmaP2 = sqrt(bomeqaP2'*bOmeqa*bomeqaP2) ;
muP2 = bmu'*bomeqaP2 ;
elf ;
pi = plot(siqmaP,muP) ;
11 = line([0,siqmaT],[muf,muT]) ;
tl= text(siqmaP2,muP2,'* P','fontsize',fsize2)
t2= text(siqmaT,muT,'* T','fontsize',fsize2) ;
t3=text(.01,muf+.006,'F','fontsize',fsize2) ;
t3B= text(0,muf,'*','fontsize',fsize2) ;
132
CHAPTER 5. PORTFOLIO SELECTION: 3/12/01
set(pi,'linewidth',2) ; set(11,'linewidth',2) ; set(11,'linestyle','--') ;
xlabel('standard deviation of return','fontsize',fsize) ; ylabel('expected return','fontsize',fsize) ;
print portfolio03.ps -deps ;
5.6. RISK-EFFICIENT PORTFOLIOS WITH N RISKY ASSETS            133
0.08
0.07
C 1—
CD
i_
T3 CD
~*—»
O CD Q. X CD
0.06
0.05
0.04
0.03
0.02
0           0.05          0.1          0.15          0.2          0.25          0.3          0.35          0.4
standard deviation of return
0.45          0.5
Figure 5.11: Efficient frontier and line of optimal combinations of risky and risk-free assets plotted by the program "portfolio03.m" for N = 3 assets. "P" is the portfolio loith loeights (0 .3 .7) that is not on the effcient frontier. "T" is the tangency portfolio and "F" is the risk-free asset.
134
CHAPTER 5. PORTFOLIO SELECTION: 3/12/01
5.6.5    Example: N = 2 If N = 2, then
n =
You should check that
of          J0120102 \
pi2ö-i<72      <j|   y
1      o-l _   /"       Cl 2           -Pl2Ö"l ^ X
n~1 =
Also,
Therefore,
I-PÍ2"           V -Pl2^1 1<72 *             ^
( fa- fJLf tl-fJtfl=[
u = n-^/i - ^1) = ľ-—    ;f(m_,f) "^
^              0-1(72                    of
Next, let V\ = fa — jU/ and V2 = ß2 — pf. Then,
T_ _ Vi of + V2 (j\ - {Vi + V2) p\2<?\o2 1 w —------------------------2  2/1--------------\----------------------■
It follows that
a;
lTa;      V2 o"2 + V2 of - {V± + V2) p\2o1o2
( VXo\ - V2 pl2<7l(J2 \
\ V2 of - Vi pxioxo2 J '
Compare the first element of this vector with (5.1), the formula that gives the weight of the first of two risky assets in the tangency portfolio.
5.7   Is the theory useful?
This theory of portfolio selection could be used if N were small. We would need estimates of /x and Í2. These would be obtained from recent returns data. Of course, there is no guarantee that future returns will behave like returns in the past, but this is the working assumption.
5.8. EXAMPLE—GLOBAL ASSET ALLOCATION
135
The next section gives an example of using portfolio theory to allocate capital among the various international markets.
However, suppose that we were considering selecting a portfolio from all 500 stocks on the S&P index. Or, even worse, consider all 3000 stocks on the Russell index. Ugh! There would be (3000)(2999)/2 « 4.5 million co-variances to calculate. Moreover, Q would be 3000 by 3000 and its inverse is required. However, the most serious difficulty would not be the computations. It would be data collection.
Porfolio theory was an important theoretical development; Markowitz was awarded the Nobel Prize in economics for this work. However, a practical version of this theory awaited the work of Sharpe and Lintner. Sharpe, who was Markowitz's PhD student, shared the Nobel Prize with Marko witz.
Sharpe's CAPM assets that the tangency portfolio is also the market portfolio. This is a tremendous simplification.
5.8   Example—Global Asset Allocation
This example is taken from Efficient Asset Management by Richard O. Michaud. The problem is to allocate capital to eight major classes of assets: U.S. stocks, U.S. government and corporate bonds, Euros, and the Canadian, French, German, Japanese, and U.K. equity market. The historic data used to estimate expected returns, variances, and covariances consisted of 216 months (Jan 1978 to Dec 1995) of index total returns in U.S. dollars for all eight asset classes and for U.S. 30-day T-bills.
The efficient frontier, with all weights constrained to be non-negative, was found by quadratic programming and is shown in Figure 5.12. There are three reference portfolios. Michaud states that
The index portfolio is roughly consistent with a capitalization weighted portfolio relative to a world equity benchmark for the
136
CHAPTER 5. PORTFOLIO SELECTION: 3/12/01
six equity markets. The current portfolio represent a typical U.S.-based investor's global portfolio asset allocation. ... An equal weighted portfolio is useful as a reference point.
10	Expected Return			France
ia			j«***Index	a                    Japan *UK
■ 4			r^    ■   Current Equal Weight	™   Germany
P		US Bonds		0 Canada
10	Euros	•		
s				
f.				
A				
ŕ				
5                      10                    "5
Annualized Return Standard Deviation
20
2D
Figure 5.12: Efficient frontier for the global asset allocation problem.
5.9. QUADRATIC PROGRAMMING
137
5.9    Quadratic programming
Quadratic programming can be used to solve problems such as minimizing
-xJHx + f x 2                 J
subject to
Ax < b,
and
■™-eqW        "eg-
Here for some N, x and / are iV x 1 vectors and H is an N x iV matrix. Also, iismxJV and 6 is m x 1 for some m, while Aeq isn x N and beq is n x 1 for some n.
We can impose nonnegativity constraints on the weights of a portfolio by solving the minimization problem above with x = u>, H = fi, / equal to aiVxl vector of zeros, A = —I (the N x N identity matrix), b equal to a N x 1 vector of zeros,
Aeg= (^TJ, and
Here is the documentation for MATLAB's "quadprog" illustrating several ways that this program can be used. In our applications, e.g., in the program "portfolio02QP.m," we call the program "quadprog" with a command of the type "X=QUADPROG(H,f,A,b,Aeq,beq)". This can be seen in the listing of "portfolio02QP.m" which is given later.
QUADPROG Quadratic programming.  X=QUADPROG(H,f,A,b) solves the quadratic programming problem:
min 0.5*x'*H*x + f'*x   subject to:  A*x <= b x
138
CHAPTER 5. PORTFOLIO SELECTION: 3/12/01
X=QUADPROG(H,f,A,b,Aeq,beq) solves the problem above while additionally satisfying the equality constraints Aeq*x = beq.
X=QUADPROG(H,f,A,b,Aeq,beq,LB,UB) defines a set of lower and upper bounds on the design variables, X, so that the solution is in the range LB <= X <= UB. Use empty matrices for LB and UB if no bounds exist.  Set LB(i) = -Inf if X(i) is unbounded below; set UB(i) = Inf if X(i) is unbounded above.
X=QUADPROG(H,f,A,b,Aeq,beq,LB,UB,XO) sets the starting point to XO.
X=QUADPROG(H,f,A,b,Aeq,beq,LB,UB,XO,OPTIONS) minimizes with the default optimization parameters replaced by values in the structure OPTIONS, an argument created with the OPTIMSET function.  See OPTIMSET for details.  Used options are Display, Diagnostics, TolX, TolFun, HessMult, LargeScale, Maxlter, PrecondBandWidth, TypicalX, TolPCG, and MaxPCGIter.  Currently, only 'final' and 'off are valid values for the parameter Display ('iter' is not available).
X=QUADPROG(Hinfo,f,A,b,Aeq,beq,LB,UB,XO,OPTIONS,PI,P2,...)  passes the problem-dependent parameters P1,P2,...  directly to the HMFUN function when OPTIMSET('HessMult',HMFUN) is set.  HMFUN is provided by the user.  Pass empty matrices for A, b, Aeq, beq, LB, UB, XO, OPTIONS, to use the default values.
[X,FVAL]=QUADPROG(H,f,A,b) returns the value of the objective function at X: FVAL = 0.5*X'*H*X + f'*X.
[X,FVAL,EXITFLAG] = QUADPROG(H,f,A,b) returns a string EXITFLAG that describes the exit condition of QUADPROG. If EXITFLAG is:  > 0 then QUADPROG converged with a solution X. 0 then the maximum number of iterations was exceeded (only occurs with large-scale method).  < 0 then the problem is unbounded, infeasible, or QUADPROG failed to converge with a solution X.
[X,FVAL,EXITFLAG,OUTPUT] = QUADPROG(H,f,A,b) returns a structure OUTPUT with the number of iterations taken in OUTPUT.iterations, the type of algorithm used in OUTPUT.algorithm, the number of conjugate gradient iterations (if used) in OUTPUT.cgiterations, and a measure of first order optimality (if used) in OUPUT.firstorderopt.
[X,FVAL,EXITFLAG,OUTPUT,LAMBDA]=QUADPROG(H,f,A,b) returns the set of Lagrangian multipliers LAMBDA, at the solution:  LAMBDA.ineqlin for the linear inequalities A, LAMBDA.eqlin for the linear equalities Aeq, LAMBDA.lower for LB, and LAMBDA.upper for UB.
Here is the program "portfolio02QP.m":
%   Input  mean vector and covariance matrix of   returns  here
bmu = [ .08; .03; .05] ; bOmega  =    [   .3   .02   .01   ;
5.9. QUADRATIC PROGRAMMING
139
.02 .15 .03 ;
.01 .03 .18 ] ; A = [ones(1,3);bmu'] ; nqrid = 50 ; muP = linspace(.03,.08,nqrid)'  ;
icompute = 0 ;
if icompute == 1 ;
siqmaP = muP ;
siqmaP2 = siqmaP ;
omeqaP = zeros(3,nqrid) ;
omeqaP2 = omeqaP ;
for i = 1:nqrid ;
omeqaP(:,i) = quadproq(bOmeqa,zeros(3,1),-eye(3),...
zeros(3,1),A,[l;muP(i)]) ;
omeqaP2(:,i) =  quadproq(bOmeqa,zeros(3,1),zeros(1,3),...
0,A,[l;muP(i)]) ;
siqmaP(i) = sqrt(omeqaP(:,i)'*bOmeqa*omeqaP(:,i)) ; siqmaP2(i) = sqrt(omeqaP2(:,i)'*b0meqa*omeqaP2(:,i)) ;
end ; end ;
fsize = 16 ;
fiqure(1)
elf
p = plot(siqmaP,muP,siqmaP2,muP,'-- ' ) ;
l=leqend('no neqative wts','unconstrained wts',4) ;
set(qca,'fontsize',fsize) ;
set(1,'fontsize',fsize) ;
xlabel('standard deviation of return (\siqma_P)',...
140
CHAPTER 5. PORTFOLIO SELECTION: 3/12/01
'fontsize',fsize) ;
ylabel('expected return (\mu_P)','fontsize',fsize) ; text(sqrt(bOmega(l,1)),bmu(l),'ľ,'fontsize',24) ; text(sqrt(b0mega(2,2)),bmu(2),'2','fontsize',24) ; text(sqrt(bOmega(3,3)),bmu(3),'3','fontsize',24) ; set(gca,'ylim', [ .025, .085]) ;
set(p,'linewidth',4) ; grid ;
print portfolio02QP.ps -depsc ; !mv portfolio02QP.ps ... ~/public_html/or473/LectNotes/portfolio02QP.ps ;
figure(2)
p2 = plot(muP,omegaP(1,:),muP,omegaP(2,:),'--',muP,...
omegaP(3,:),'-.') ;
set(p2,'linewidth',6) ;
set(gca,'fontsize',fsize) ;
grid ;
xlabel('\mu_P','fontsize',fsize) ;
ylabel('weight','fontsize',fsize) ;
legend('w_l','w_2','w_3',0) ;
print portfolio02_wtQP.ps -depsc ; !mv portfolio02_wtQP.ps  ... ~/public_html/or473/LectNotes/portfolio02_wtQP.ps ;
Chapter 6
The Capital Asset Pricing Model:
3/26/01
6.1    Introduction to CAPM
The CAPM (capital asset pricing model has a variety of uses:
•  It provides a theoretical justification for the widespread practice of "passive" investing known as indexing.
- Indexing means holding a diversified portfolio in which securities are held in the same relative proportions as in a broad market index such as the S&P 500. Individual investors can do this easily by holding shares in an index fund.
•  CAPM can provide estimates of expected rates of return on individual investments
•  CAPM can establish "fair" rates of return on invested capital in regulated firms or in firms working on a cost-plus basis — what should the "plus" be?
•  CAPM starts with the question, what would be the risk premiums on securites if the following assumptions were true?
141
142       CHAPTER 6.  THE CAPITAL ASSET PRICING MODEL: 3/26/01
-  The market prices are "in equilibrium."
*  In partcular, for each asset, supply equals demand.
-  Everyone has the same forecasts of expected returns and risks.
-  All investors chose portfolios optimally according to the prinici-ples of efficient diversification discussed in Chapter 5.
*  This implies that everyone holds the tangency portfolio of risky assets.
-  The market rewards people for assuming unavoidable risk, but there is no reward for needless risks due to inefficient portfolio selection.
*  Therefore, the risk-premium on a single security is not due to its "stand alone" risk, but rather to its contribution to the risk of the tangency portfolio.
• The various components of risk will be discussed in Section 6.4.
As in Chapter 5, "return" can either refer to one-period net returns or one-period log returns.
Suppose that there are exactly three assets with a total market value of $100 billion.
•  Stock A: $60 billion
•  Stock B: $30 billion
•  risk-free: $10 billion
The market portfolio of Stock A to Stock B is 2:1. CAPM says that under equilibrium, all investors will hold Stock A to Stock B in a 2:1 ratio. Therefore, the tangency portfolio puts weight 2/3 on Stock A and 1 /3 on Stock B and all investors will have two-thirds of their allocation to risky assets in Stock A and one-third in Stock B.
6.1. INTRODUCTION TO CAPM
143
Suppose there was too little of Stock A and too much of Stock B for everyone to have a 2:1 allocation. For example, suppose that there were one million shares of each stock and the price per share was $60 for Stock A and $40 for Stock B. Then the market portfolio must hold Stock A to Stock B in a 3:2 ratio, not 2:1. Not everyone could hold the tangency portfolio, though everyone would want to. Thus, prices would be in disequilibrium and would change. The price of Stock A would go up since the supply of Stock A is less than the demand. Similarly the price of Stock B would go down. As these prices changed, so would expected returns and the tangency portfolio would change. These changes in prices and expected returns would stop when the market portfolio was equal to the tangency portfolio, so that prices were in equilibrium. At least, this adjustment to equilibrium would happen under the ideal conditions of economic theory. The real world would be a little messier. The underlying message from theory, is however, correct. Prices adjusts as all investors look for an efficient portfolio and supply and demand converge to each other.
The market portfolio is 9:1 risky to risk-free. In total, investors must hold risky to risk-free in a 9:1 ratio — they are the market.
For an individual investor, the risky:risk-free ratio will depend on that investor's risk aversion.
•  At one extreme, a portfolio of all risk-free has a standard deviation of returns equal to 0
•  At the other extreme, all risky assets, the standard deviation is maximized. (This assumes no margin. If we allow negative positions in the risk-free, then there is no limit to the risk)
At equilibrium, returns on risky and risk-free assets are such that aggregate demand for risk-free assets equal supply.
144       CHAPTER 6.  THE CAPITAL ASSET PRICING MODEL: 3/26/01
6.2   The capital market line (CML)
The capital market line (CML) relates the excess expected return on an efficient portfolio to its risk; "excess expected return" means the amount by which the expected return exceeds the risk-free rate of return. The CML is
ßR = fif-\----------------PR,                                         (6.1)
cm
where R is the return on a given efficient portfolio (mixture of the market
portfolio and the risk-free asset), jj,r = E(RM), /j,f is the rate of return
on the risk-free asset, RM is the return on the market portfolio, aM is the
standard deviation of the return on the market portfolio, and oR is the
standard deviation of return on the portfolio. The slope of the CML is, of
course,
ßM ~ ßf
which can be interpreted as the ratio of the "risk premium" to the standard deviation of the market porffolio. This is Sharpe's "reward-to-risk ratio." Equation (6.1) can be rewritten as
ßR — ßF  _ ßM — ßF
vr                   om
which says that the reward-to-risk ratio for any efficient portfolio equals that ratio for the market portfolio.
Example: Suppose that risk-free rate of interest is /j,f = 0.06, that the expected return on the market portfolio is /iM = .15, and the risk of the market portfolio is uM = 0.22. Then the slope of the CML is (.15 - .06)/.22 = 9/22. The CML of this example is illustrated in Figure 6.1.
The CML is easy to derive. Consider an efficient portfolio that allocates a proportion w of its assets to the market portfolio and (1 — w) to the risk-free asset. Then
R = wRM + (1 - w)(j,f = (if + w(RM - (if).
6.2.  THE CAPITAL MARKET LINE (CML)                                           145
Therefore,
Ate = A*/ + w(aíb - A*/)-                                  (6-2)
Also,
or
«, = ^.                                             (6.3)
Substituting (6.3) into (6.2) gives the CML.
146       CHAPTER 6.  THE CAPITAL ASSET PRICING MODEL: 3/26/01
O                                               std de v of return                                                                          .22
Figure 6.1: CML when \ij = 0.06, \iM = 0.15, and om = 0.22. All efficient portfolios are on the line connecting the risk-free asset (F) and the market portfolio (M). Therefore, the reward-to-risk ratio is the same for all efficient portfolios, including the market portfolio.
6.3. BETAS AND THE SECURITY MARKET LINE
147
CAPM says that the optimal way to invest is to:
1.  Decide on the risk a r that you can tolerate, 0 < a r < o m- (or > er m is possible by borrowing money to buy risky assets.)
2.  Calculate w = orJom-
3.  Invest w proportion of your investment in an index fund, i.e., a fund that tracks the index.
4.  Invest 1 — w proportion of your investment in risk-free treasury bills, or a money-market fund that invests in T-bills.
Alternatively,
1.  Choose the reward jj,r — /j,f that you want.
2.  Calculate
(J-R — M/
w =-----------.
Mm — ßf
3.  Do steps 3 and 4 as above.
One can view w = orJom as is an index of the risk aversion of the investor. The smaller the value of w the more risk averse the investor. If an investor has w equal to 0, then that investor is 100% in risk-free assets. Similarly, an invest with w = 1 is totally invested in the tangency portfolio of risky assets.
6.3   Betas and the Security Market Line
The Security Market Line (SML) relates the excess return on an asset to the slope of its regression on the market portfolio.
Suppose that there are many securities indexed by j. Define
<jjM = covariance between the jth security and the market portfolio.
148       CHAPTER 6.  THE CAPITAL ASSET PRICING MODEL: 3/26/01 Also, define
It follows from regression theory that ßj is the slope when the jth security's returns are regressed on the returns of the market portfolio. This fact follows from equation (2.1) for the slope of a best linear prediction equation. Another way to appreciate this fact is to suppose that we have a bivariate time series (Rjt, RMt)t=i °f returns on the jth asset and the market portfolio. Then, the estimated slope of the regression of Rjt on RMt is                                                        _               _
a _ Y%=i(Rjt - Rj)(Rmj - Rm)
J2t=A^Mt — Rm)2 which is an estimate of ojm divided by an estimate of aM.
Let jj,j be the expected return on the jth security. Then /j,j — /j,f is the "risk premium" (or "reward for risk" or "excess expected return") for that security. Using C APM, it can be shown that
Hj- Hf = ßj(pM- ft/)-                                    (6.4)
This equation, which is called the security market line (SML), will be derived in Section 6.5.2. In (6.4) ßj is a variable in the linear equation, not the slope; more precisely, /i,- is a linear function of ßj with slope /j,m — l^f-This point is worth remembering. Otherwise, there could be some confusion since ßj was defined earlier as a slope of a regression model. In other words, ßj is a slope in one context but is the independent variable in the SML.
The SML says that the risk premium of the jth asset is the product of its beta (ßj) and the risk premium of the market portfolio (fj,M — ///). ßj measures both the riskiness of the jth asset and the reward for assuming that riskiness, ßj is, therefore, a measure of how "aggressive" the jth asset is. By definition, the beta for the market portfolio is 1, i.e., ßM = 1- Therefore,
ß3; > 1      =4> "aggressive" ßj = 1      => "average risk" ßj < I      => "not aggressive".
6.3. BETAS AND THE SECURITY MARKET LINE
149
Figure 6.2 illustrated the SML and an asset, J, that is not on the SML. This asset contradicts the CAPM; according to CAPM no such asset exists.
Risk premium
Figure 6.2: Security market line (SML) showing that the risk premium of an asset is a linear function of the asset's beta. J is a security not on the line and a contradiction to CAPM. Theory predicts that the price of] will decrease until J is on the SML.
Consider what would happen if an asset like J did exist. Investors would not want to buy it because its risk premium is too low. They would invest less in J and more in other securities. Therefore the price of J would
150       CHAPTER 6.  THE CAPITAL ASSET PRICING MODEL: 3/26/01
decline and its expected return would increase. After that increase, the asset J would be on the SML, or so the theory predicts. In other words, J is mispriced according to CAPM.
6.3. BETAS AND THE SECURITY MARKET LINE
151
Stock (symbol)	Industry	Stock's ß	Ind's ß
Celanese (CZ)	Synthetics	0.13	0.86
General Mills (GIS)	Food - major diversif	0.29	0.39
Kellogg (K)	Food - major, diversif	0.30	0.39
Proctor & Gamble (PG)	Cleaning Prod	0.35	0.40
Exxon-Mobil (XOM)	Oil/gas	0.39	0.56
7-Eleven (SE)	Grocery stores	0.55	0.38
Merck (Mrk)	Major drug manuf	0.56	0.62
McDonalds (MCD)	Restaurants	0.71	0.63
McGraw-Hill (MHP)	Pub - books	0.87	0.77
Ford (F)	Auto	0.89	1.00
Aetna (AET)	Health care plans	1.11	0.98
General Motors (GM)	Major auto manuf	1.11	1.09
AT&T (T)	Long dist carrier	1.19	1.34
General Electric (GE)	Conglomerates	1.22	0.99
Genentech (DNA)	Biotech	1.43	0.69
Microsoft (MSFT)	Software applic.	1.77	1.72
Cree (Cree)	Semicond equip	2.16	2.30
Amazon (AMZN)	Net soft & serv	2.99	2.46
Doubleclick (Dclk)	Net soft & serv	4.06	2.46
6.3.1    Examples of betas
Netscape's home page has a link to stock quotes from Salomon Smith Barney. If you request a quote on a stock, you will be given menu for choosing further information about the company. Under "profile" you will find the five-year beta of the company, its industry, and the S&P 500. Table 6.3.1 has some "five-year betas" that I took from net on February 27 and March 5, 2001. The beta for the S&P 500 is given as 1.00; why?
6.3.2    Comparison of the CML with the SML
The CML applies only to the return R of an efficient portfolio. It can be arranged so as to relate the excess expect return of that portfolio to the
152 CHAPTER 6. THE CAPITAL ASSET PRICING MODEL: 3/26/01 excess expected return of the market portfolio:
Mä - A*/ = (— ) (Mm - /•*/)•                              (6-5)
The SML applies to any asset and like the CML relates its excess expected return to the excess expected return of the market portfolio:
fij-fif = ßj(fiM-fif).                            (6.6)
If we take an efficient portfolio and consider it as an asset, then /j,r and fj,j both denote the expected return on that portfolio/asset. Both (6.5) and (6.6) hold so that
6.4   The security characteristic line
Let Rjt be the return at time t on the jth asset. Similarly, let RMt and ßß be the return on the market portfolio and the risk-free return at time t. The security characteristic line (sometimes shortened to the characteristic line) is a regression model:
Rjt = Vft + ßj(RMt - M/t) + ejť,                           (6-7)
where tjt is iV(0, of^-). It is often assumed that the e^'s are uncorrelated across assets, that is, that ejt is uncorrelated with e,-/ť for j ^ j'. This assumption has important ramifications for risk reduction by diversification; see Section 6.4.1.
Let fij = E (Rjt) and /j,m = E(RMt). Taking expectations in (6.7) we get
\H = i*f + ßjim - Vf),
which is our friend the SML again. The SML gives us information about expected returns, but not about the variance of the returns. For the latter we need the characteristic line. The characteristic line is said to be a
6.4.  THE SECURITY CHARACTERISTIC LINE                                    153
"returning generating process" since it gives us a probability model of the returns, not just a model of their expected values.
An analogy to the distinction between the SML and characteristic line is this. The regression line E[Y\X) = ß0 + ßiX gives of the expected value of Y given X but not the conditional probability distribution of Y given X. The regression model
Yt = ßo + ßiXt + eť,   and et ~ JV(0, a2)
does give us this conditional probability distribution.
The characteristic line implies that
2        o2   2     ,      2 aj   = Pj^M + Veji
a j j, = ßjßj>a2M for j ^ j', and that
&Mj = ßj&M-
The total risk of the jth asset is
The risk has two components: ß2o2M is called the market or systematic component of risk and a2e is called the unique, nonmarket, or unsystematic component of risk.
6.4.1    Reducing unique risk by diversification
The market component cannot be reduced by diversification, but the unique component can be reduced in this way.
Suppose that there are JV assets with returns Ru,..., RNt for holding period t. If we form a portfolio with weights wi,..., wN then the return of the portfolio is
Rpt = wxRlt -\-------h wNRNt.
154       CHAPTER 6.  THE CAPITAL ASSET PRICING MODEL: 3/26/01
Let RMt be the return on the market portfolio. According to the characteristic line model Rjt = (iß + ßj{RMt — (iß) + ejt so that
(N            \                                   N
Y^ ßjWj      {RMt - fJ'Ft) + X] W3e3t-
Therefore, the portfolio beta is
N
ßP = YdWjßj,
and the "epsilon" for the portfolio is
N €pt = J2W3eJt-
We will now assume that en,..., e^vt ^re uncorrelated. Therefore,
2          V"^      2    2
Example
Suppose that Wj = 1/N for all j. Then
Ä
z2j=i ßj
N     '
and
2  _ iV-1 Ef=14 _ ^
where of is the average of the a^. If of- is a constant, say of for all j, then
N
For example, suppose that <r£ is 5%. If iV = 20, then aeP is 1.12%. If iV = 100, then atP is 0.5%. There are approximately 1600 stocks on the NYSE; if N = 1600, then aeP = 0.0125%.
6.5. SOME THEORY
155
Are the assumptions sensible?
A key assumption that allows risk to be removed by diversification is that elť,..., eNt are uncorrelated. This assumption implies that all correlation among the cross-section1 of asset returns is due to a single cause and that cause is measured by the market index. For this reason, the characteristic line is a "single factor" or "single index" model with RMt the "factor."
This assumption of uncorrelated €jt would not be valid if, say, two energy stocks are correlated over and beyond their correlation due to the market index. In this case, unique risk could not be eliminated by holding a large portfolio of all energy stocks.
6.5    Some theory
In this section we will show that <jjM quantifies the contribution of the jth asset to the risk of the market portfolio. Also, we will derive the SML.
6.5.1    Contributions to the market portfolio's risk
Suppose that we have iV risky assets and that wiM,..., wNM are the weights of the market portfolio. Since
N
RMt = 2^ wiNlRit, i=l
the correlation between the return on the jth asset and the return on the market portfolio is
/          N                 \         N
OjM = Cov   Rju Y; wmRit J = 5ľ wiM&ij-
\         i=l                /        i=l
Therefore,
JV     JV                                                N
aM = 5ľ 5ľ WjMWiM<Jij = Yl WjMCTjM-                        (6.8)
j=li=l                                  j=l
1 "Cross-section" returns means returns across assets within a single holding period.
156       CHAPTER 6.  THE CAPITAL ASSET PRICING MODEL: 3/26/01
Equation (6.8) shows that WjM<7jM is the contribution of the jth asset to the risk of the market portfolio.
6.5.2    Derivation of the SML
Consider a portfolio P with weight u>j given to the zth risky asset and weight (1 — Wi) given to the market portfolio. The return on this portfolio is
Rpt = WiRit + (1 - Wi)RMf
The expected return is
Up = Wim + (1 - Wi)/j,M, and the risk is
up = yjwfaf + (1 - Wi)2a2M + 2iuť(l - Wi)aiM-
As we vary Wi we get the locus of points on (a, n) space that is shown as a blue curve in Figure 6.3.
Key idea: The derivative of this locus of points evaluated at the market portfolio is equal to the slope of the CML. We can calculate this derivative and equate it to the slope of the CML to see what we get. The result will be the SML.
6.5. SOME THEORY
157
Efficient frontier
portfolios of M and i
Figure 6.3: Derivation of the SML. M is the market portfolio and T is the tan-gency portfolio; they are equal according to the CAPM. The blue curve is the locus of portfolios combining asset i and the market portfolio. The derivative of this curve at M is equal to the slope of the CML, since this curve is tangent to the CML at M.
158        CHAPTER 6.  THE CAPITAL ASSET PRICING MODEL: 3/26/01
We have
~;       = IM ~ (Ami dWi
and
d(Jp       1   _i
-Up1 {2wi<7i - 2(! - wi)aM + 2(! - 2wí)<Jím}
dwi      2 Therefore,
d Up      djjip/dwi                          (yUj — /J,p)<Tp
daP      d up j d Wi      w^} - aM + WiU2M + aiM - 2wiaiM'
Next,
d lip              (fa - Hm)vm
dap But
wi=°         OiM - <J2m
d fip
dap
Wi=0
must equal the slope of the CML which is (/iM — Hf)/aM- Therefore,
(fii — Hm)om _ Vm — ßf
2 &iM — CT M                 °M
which, after some algebra, gives us
fa — Vf = —ö-(Vm — V f) = ßjiVM — V f) Vm
which is the SML given in equation (6.4).
6.6   Estimation of beta and testing the CAPM
Recall the security characteristic line
Rjt = lift + ßj{RMt - Vft) + €jt,                              (6-9)
Let R*t = Rjt — /j,ft be the excess return on the jth security and let R*Mt = Rmí — Vft be the excess return on the market portfolio. Then (6.9) can be written as
Rß = ßjRMt + «it.                                       (6.10)
6.6. ESTIMATION OF BETA AND TESTING THE CAPM
159
Equation (6.10) is a regression model without an intercept and with ßj as the slope. A more elaborate model is
R]t = a3+ß3WMt + e]t.                                (6.11)
which includes an intercept.
Given series Rjt/ Rmj, and /j,ft for t = 1,..., n, we can calculate R*t and R*Mt and regress R*t on R*Mt to estimate a,j, ßj, and of
By testing the null hypothesis that a3- = 0 we are testing whether the jth asset is mispriced accoridng to the CAPM.
Here is an example done in MINITAB. The least squares line and the output from the regression command are shown below. The variable X:MS_1 is the excess return on Microsoft. Five years of monthly data, March 1996 to February 2001, were used. The raw data are in the Excel file "Datas-tream01.xls" on the course home page. For Microsoft, we find that
ß = 1.44
and
á = .012.
Since the standard error of X:MS_1 (i.e., of ßj) is 0.317, a 95% confidence interval for ßj is 1.44 ± (2)(.317) or (.81, 2.07).2 The p-value for X:MS_1 is 0.000. This p-value is for testing the null hypothesis that ßj = 0, so it is not surprising that the null hypothesis is strongly rejected. We do not expect the beta of a stock to be zero.
The test that a = 0 has a p-value of 0.441 so we can accept the null hypothesis. This implies that the data are consistent with the CAPM. Moreover,
a\ = 0.01381,
2Here "2" is used as an approximate t-value. The exact t-value is 2.0017. This value can be found in MINITAB. Go to the calc menu, then probability distrbutions, then "t." Use "inverse cumulative probability" with "noncentrality parameter" equal to 0 and "input constant" equal to .975 (for a 95% confidence interval).
160       CHAPTER 6.  THE CAPITAL ASSET PRICING MODEL: 3/26/01
the mean square residual error.
Here is the MINITAB plot.
Regression Plot
X:MS_1 =0.0118981 + 1.44111 X:S&P_1 S = 0.117516       R-Sq = 26.2%       R-Sq(adj) = 24.9 %
0.0
X:S&P 1
Figure 6.4: Least squares line fit to the Microsoft data.
6.6. ESTIMATION OF BETA AND TESTING THE CAPM
161
Here is the MINITAB regression output.
Model with intercept:
Regression Analysis: X:MS_1 versus X:S&P_1
The regression equation is X:MS 1 = 0.0119 + 1.44 X:S&P 1
Predictor	Coef	SE Coef	T        P
Constant	0.01190	0.01532	0.78    0.441
X:S&P_1	1.4411	0.3174	4.54    0.000
S = 0.1175	R-Sq =	26.2%     R-S	q(adj) = 24.9%
Analysis of Variance
Source                          DF                      SS          MS         F        P
Regression                    1           0.28464            0.28464     20.61    0.000
Residual Error         58            0.80098            0.01381
Total                             59           1.08562
Unusual Observations
Obs    X:S&P_1     X:MS_1                    Fit
49      0.129     0.1258      0.1973
50     -0.044    -0.4040     -0.0518
58     -0.020    -0.4249     -0.0166
59      0.015     0.3164      0.0337
R denotes  an observation with a  large  standardized residual
X denotes  an observation whose X value gives   it  large   influence.
If we assume that a = 0, then we can refit the model using a no intercept model. This is done with MINITAB's regression program by NOT choosing the "fit intercept" option; the default is to choose this option so you need to go to "options" and unchoose that option. Here is the MINITAB output when fitting a no intercept model.
SE Fit	Residual	St	Resid
0.0416	-0.0715		-0.65 X
0.0222	-0.3521		-3.05R
0.0173	-0.4083		-3.51R
0.0154	0.2828		2 .43R
Notice that the R2 (R-sq) value for the regression is 26.2%. The interpreta-
162       CHAPTER 6.  THE CAPITAL ASSET PRICING MODEL: 3/26/01
tion of R2 is the percent of the variance in the excess returns on Microsoft that is due to excess returns on the market. In other words, 26.2% of the risk is due to systematic or market risk {ß2a2M). The remaining 73.8% is due to unique or nonmarket risk (a2). Model without intercept:
Regression Analysis: X:MS_1 versus X:S&P_1
The regression equation is X:MS 1 = 1.48 X:S&P 1
Predictor Noconstant X:S&P 1
Coef
1.4755
SE Coef
0.3133
4 .71
0.000
0.1171
Analysis of Variance
Source		DF		SS		MS	F			P	
Regression		1	0.	.30432	0.	.30432     22.	.19		0.000		
Residual	Error	59	0.	.80931	0.	.01372					
Total		60	1.	.11362							
Unusual i	Dbservat	ions									
Obs   X	:S&P_1	X:	:MS_1		Fit	SE Fit	Res:		Ldual		St Resid
15	0.100	0.	.1040	0.	.1479	0.0314		-0	.0439		-0.39 X
30	-0.100	-0.	.0705	-0.	.1473	0.0313		0.	.0767		0.68 X
33	0.109	0.	.1906	0.	.1606	0.0341		0.	.0300		0.27 X
45	0.084	-0.	.0017	0.	.1240	0.0263		-0	.1257		-1.10 X
49	0.129	0.	.1258	0.	.1899	0.0403		-0	.0640		-0.58 X
50	-0.044	-0.	.4040	-0.	.0653	0.0139		-0	.3387		-2.91R
58	-0.020	-0.	.4249	-0.	.0292	0.0062		-0	.3958		-3.38R
59	0.015	0.	.3164	0.	.0223	0.0047		0.	.2941		2 .51R
R denotes an observation with a large standardized residual
X denotes an observation whose X value gives it large influence.
Now ß = 1.48 and á2  = 0.0137 which are small changes from the value from the intercept model.
6.6. ESTIMATION OF BETA AND TESTING THE CAPM
163
Regression using returns instead of excess returns
Often, as an alternative to regression using excess returns, the returns on the asset are regressed on the returns on the asset.
Figure 6.5 show the least squares line using returns instead of excess returns. The estimate of beta has changed from 1.44 to 1.38. The new value, 1.38, is well within the old confidence interval of (.81, 2.07) showing that there is little difference between using returns and using excess returns.
Regression Plot
logR:MS = 0.0185260 + 1.37935 logR:S&P S = 0.0952255       R-Sq = 23.9 %       R-Sq(adj) = 23.3 %
	0.3
	0.2
	0.1
CO	0.0
O) O	-0.1
	-0.2
	-0.3
	-0.4
-0.1
. ■ -í.
'   .   .■ ■      * •■ .■■■f/ sv
I                                                       T"
0.0                                                   0.1
logR:S&P
Figure 6.5:
164       CHAPTER 6.  THE CAPITAL ASSET PRICING MODEL: 3/26/01
6.6.1    Interpretation of alpha
If a is nonzero then the security is mispriced, at least according to CAPM. If a > 0 then the security if underpriced; the returns are too large on average. This is an indication of an asset worth purchasing. Of course, one must be careful. If we reject the null hypothesis that a = 0, all we have done is shown that the security was mispriced in the past.
Since for the Microsoft data we accepted the null hypothesis that a is zero, there is no evidence that Microsoft was mispriced.
6.7    Summary
The CAPM assumes that prices are in equilibrium, that everyone has the same forecasts of returns, and that everyone uses the principles of portfolio selection introduced in Chapter 5.
The CAPM assumptions imply that everyone will hold risk efficient portfolios which mix the tangency portfolio and risk-free assets. This fact implies that the market portfolio will equal the tangency portfolio. A further consequence is that the Sharpe ratio for any efficient portfolio will equal the Sharpe ratio of the market portfolio:
or                 <Jm
where R is any efficient portfolio. Equation (6.12) can be rearranged to give the CML which is
.  Mm — M/
VR = 11 f -\----------------PR-
Cm The CML applies only to efficient portfolios.
Another consequence of CAPM assumptions is the SML which applies to any security, say the jth, and is
\X3 = A*/ + (Mm - fif)ßj-
6.7. SUMMARY
165
Here ßj is the "independent variable" of this linear relationship and measures the riskiness of the security. /j,j is the "dependent variable."
The security characteristic line is a model for how actual returns are generated. (The SML only described expected returns.) The security characteristic line is
Rjt = A4/ + ßji^Mt — Hf) + €jt-
The variance of ejt is o\y The security characteristic line implies that the risk of the jth asset can be decomposed into market and non market risks:
a j = ^ß)o2M + o%.
If one assumes that eJt is uncorrelated with ert for j ^ j' (that is, for two different securities), then non market risk can be eliminated by portfolio diversification.
Since the security characteristic line is a regression model it can be used to estimate ßj and o\y The R2 value of the regression estimates the proportion of a2 due to market risk, i.e., it estimates ß2a2M/a2.
166       CHAPTER 6.  THE CAPITAL ASSET PRICING MODEL: 3/26/01
Chapter 7
Pricing Options: 4/12/01
7.1    Introduction
The European call options mentioned in Chapter 1 are one example of the many derivative now on the market. A derivative is a financial instrument whose value is derived from the value of some underlying instrument such an interest rate, foreign exchange rate, or stock price.
A call option gives one the right to buy a certain stock at the exercise or strike price, while a put option gives one the right to sell the stock at the exercise price. An option has an exercise date, which is also called the strike date, maturity, or expiration date. American options can be exercised at any time up to their exercise date, but European options can be exercised only at their exercise date. European options are easier to analyze than American options since one does not need to consider the possibility of early exercise.
In this chapter we will discuss the main ideas behind the pricing of options. We will not actually prove the Black-Scholes formula, since that derivation requires advanced mathematics. However, I will present a heuristic derviation of that formula to give an intuitive understanding of option pricing. For lack of time, we will only study European call options in detail. However, Black-Scholes type formulas exist for other derivatives as
167
168
CHAPTER 7. PRICING OPTIONS: 4/12/01
well. We will give one example of a put option. The book Introduction to Futures and Options Markets by Hull is a nice overview of the options markets and the uses of options. Financial Calculus by Baxter and Rennie discuss the mathematics of many types of derivatives.
Why do companies purchase options and other derivatives? The answer is simple, to manage risk. In its 2000 Annual Report, the Coca Cola Company writes
Our company uses derivative financial instruments primarily to reduce our exposure to adverse fluctuations in interest rates and foreign exchange rates and, to a lesser extent, adverse fluctuations in commodity prices and other market risks. We do not enter into derivative finanicial instruments for trading pru-poses. As a matter of policy, all our derivative positions are used to reduce risk by hedging an underlying economic exposure. Because of the high correlation between the hedging instrument and the underlying exposure, fluctuations in the value of the instruments are generally offset by reciprocal changes in the value of the underlying exposure. The derivatives we use are straightforward instruments with liquid markets.
Derivatives can and have been used to speculate, but that is not their primary purpose. The intent of this quote is clear. The Company is assuring its stockholders that it is using derivatives to manage risk, not to gamble.
7.2    Call options
Suppose that you have purchased a European call option on 100 shares of Stock A with a exercise price of $70. At the expiration date, suppose that Stock A is selling at $73. The option allows you to purchase the 100 shares for $70 and to immediately sell them for $73, with a gain of $300 on the 100 shares. Of course, the net profit for purchasing the option isn't $300 since you had to pay for the option. If the option cost $2/share, then you paid
7.3.  THE LAW OF ONE PRICE
169
$200 for the option. Moreover, you paid the $200 up front but only got the $300 at the expiration date. Suppose that the expiration date was 3 months after the purchase data and the continuously compounded risk-free rate is 6% per annum or 1.5% for 3 months. Then the dollar value of your net profit is
exp(-.015)300 - 200 = 95.53
at the time of purchase and is
300 - exp(.015)200 = 96.98 at the exercise date.
We will use the notation (x)+ = x if x > 0 and = 0 if x < 0. With this notation, the value of a call at exercise date is
{St - E)+, where E is the exercise data and ST is stock's price on the exercise data, T.
A call is never exercised if the strike price is greater than the price of the stock, since exercising the option would amount to buying the stock for more than it would cost on the market. If a call is not exercised, then one loses the cost of the option.
One can lose money on an option even if it is exercised, because the amount gained by exercising the option might be less than the cost of the option. In the example above, if Stock A were selling for $71 at the exercise data, then one would exercise the option and gain $100. This would be less than the $200 paid for the option. Even though exercising the option results in a loss, then loss is less than it would be if the option were not exercised. An option should always be exercised if (St — E) is positive.
7.3   The law of one price
The "law of one price" states that if two financial instruments have exactly the same payoffs, then they will have the same price. This prinicple is used
170
CHAPTER 7. PRICING OPTIONS: 4/12/01
to price options. To valuate an option, one must find a portfolio or a self-financing1 trading strategy with a known price and which has exactly the same payoffs as the option. The price of the option is then known; it must be the same as the price of the portfolio or self-financing trading strategy.
Here's a simple example of pricing by the law of one price. Suppose stock in company A sells at $100/share. The risk-free rate of borrowing is 6% compounded annually. Consider a futures contract obliging one party to sell to the other party one share of Company A exactly one year from now at a price P. (No money changes hands now.) What is the fair market price, i.e., what should P be?
Note that this contract is not an option. The sale must take place. It would seem that P should depend on the expected price of company A stock one year from now. However, this is not the case. Consider the following strategy. The party that, one year from now, must sell the share of company A can borrow $100 and buy one share now; this involves no capital since the share is purchased with borrowed money. A year from now that party sells the share for P dollars and pays back $106 (principle plus interest) to the lender, who could be a third party. The profit is P — 106. The fair profit is 0 since no capital was used and there is no risk. Therefore, P should be $106.
Consider what would happen if P were not $106. You should be able to see that any other value of P besides $106 would lead to unlimited risk-free profits. As investors rushed in to take advantage of this situation, the market would immediately correct the value of P to be $106.
1A trading strategy is "self-financing" if it requires no investment other than the initial investment. After the initial investment, any further purchases of assets are financed by the sale of other assets or by borrowing.
7.4.  TIME VALUE OF MONEY AND PRESENT VALUE
171
7.3.1    Arbitrage
Arbitrage is the making of a guaranteed risk-free profit by trading in the market with no invested capital2. Speaking informally, arbitrage is a "free lunch." The arbitrage price of a security is the price that guarantees no arbitrage opportunities. The law of one price is equivalent to stating that the market is free of arbitrage opportunities, i.e., that there are no free lunches. Arbitrage pricing is the same as pricing by the law of one price. The price of $106 that we just derived in the example of the future contract is, therefore, the arbitrage price.
7.4   Time value of money and present value
"Time is money" is an old adage that is still true. A dollar a year from now is worth less to us than a dollar now. In finance it is essential that we be able to convert value in future payments to their present values, or vice versa. For example, we saw in Section 7.1 that the arbitrage enforced future price of a stock is simply the present price converted into a "future value" by multiplying by 1 + r.
Let r be the risk-free annual interest rate. Then the "present value" of $D dollars one year from now is $D/(1 + r) without compounding or $Dexp(—r) under continuous compounding. Another way of stating this is that $D dollars now is worth $(l+r)D dollars a year from now without compounding, or $(exp(r)) D dollars a year from now under continuous compounding. When $D is a future cash flow, then its present value is also called a discounted value and r is the discount rate.
The distinction between simple and compounding is not essential since an interest rate of r without compounding is equivalent to an interest rate of r' with continuously compounding where
1 + r = exp (r')
investing in risk-free T-bills guarantees a positive net return but is not arbitrage since capital is invested.
172                                         CHAPTER 7. PRICING OPTIONS: 4/12/01
so that
r = exp(r') — 1 or r' = log(l + r).
We will work with both simple and compound interest, whichever is most convenient.
Examples
If r = 5%, then r' = log(1.05) = .0488 or 4.88%. If r' = 4%, then r = exp(.04) - 1 = 1.0408 - 1 or 4.08%. In general, r > r'
Occasionally, we will simplify life by making the unrealistic assumption that r = 0 so that present and future values are equal. This simplifying assumption allows us to focus on other concepts besides discounting.
7.5   A simple binomial example
We will start our study of options with a very simple example. Suppose that a stock is currently selling for $80. At the end of one time period it can either have increased to $100 or decreased to $60. What is the current value of a call option that allows one to purchase one share of the stock for $80, the exercise price, after one time period?
At the end of the time period, the call option will be worth $20 ($100 — $80) if the stock has gone up and worth $0 dollars if the stock has gone down. See Figure 7.1. However, the question is "what is the option worth noiu?" This question if vital since the answer is, of course, the fair market price for the option at the current time.
One might think that the current value of the option depends on the probability that the stock will go up. However, this is not true. The current value of the option depends only on the rate of risk-free borrowing. For
7.5. A SIMPLE BINOMIAL EXAMPLE
173
Stock
30
Option
60
GIVEN: Exercise price = $80
Hedge ratio = 1/2 Buj 1/2 share
Borrow: $30
Initial value of portfolio = 1/2(80) -30=   $10
This must be the value of the call option.
Figure 7.1: Example of one-step binomial option pricing. For simplicity, it is assumed that the interest rate is 0. The portfolio of 1/2 share of stock and —$30 of risk-free assets replicates the call option.
174
CHAPTER 7. PRICING OPTIONS: 4/12/01
simplicity, we will assume that this rate is 0; later we will see how to valu-ate options when the rate of interest is positive. It turns out that the value of the option is $10. How did I get this value?
Consider the following investment strategy. Borrow $30 and buy one-half of a share of stock. The cost upfront is $40 — $30 = $10, so the value now of the portfolio is $10. If after one time period the stock goes up, then the portfolio is worth 100/2 — 30 = 20 dollars. If the stock goes down, then the portfolio is worth 60/2 — 30 = 0 dollars. Thus after one time period, the portfolio's value will be exactly the same as the value of the call option, no matter which way the stock moves. By the law of one price, the value of the call option now must be the same as the value now of the portfolio which is $10.
Let's summarize what we have done. We have found a portfolio of the stock and the risk-free asset that replicates the call option. The current value of the portfolio is easy to calculate. Since the portfolio replicates the option, the option must have the same value as the portfolio.
Suppose we have just sold a call option. By purchasing this portfolio we have hedged the option. By hedging is meant that we have eliminated all risk, because the net return of selling the option and purchasing the portfolio is exactly 0 no matter what happens to the stock price.
How did I know that the portfolio should be 1/2 share of stock and —$30 in cash? I didn't use trial-and-error; that would have been tedious. Rather, I used the following logic. First, the volatility of the stock is $100 — $60 = $40 while the volatility of the option is $20 — $0 = $20. The ratio of the volatility of the option to the volatility of the stock is 1/2; this is called the hedge ratio. If the portfolio is to exactly replicate the option, then the portfolio must have exactly the same volatility as the option; this means the portfolio must have one-half a share.
Key point: The number of shares in the portfolio must equal the hedge
7.5. A SIMPLE BINOMIAL EXAMPLE
175
ratio, where
,          .       volatility of option
hedge ratio = —-——-------^—r—.
volatility of stock
If the stock goes down, the portfolio is worth $30 minus the amount borrowed. But we want the portfolio's value to equal that of the option, which is $0. Thus, the amount borrowed is $30.
Key point: We can determine the amount borrowed by equating the value of the portfolio when the stock goes down to the value of the option when the stock goes down. (Alternatively, we could equate the value of the portfolio to the value of the option when the stock goes up. This would tell us that $50 minus the amount borrowed equals $20, or that $30 must be borrowed.)
Now suppose that the interest rate is 10%. Then, we borrow $30/(1.1) = $27.27 so that the amount owed after one year is $30. The cost of the portfolio is 40 - 30/1.1 = $12.7273. Thus, the value of the option is $12.7273 if the risk-free rate of interest if 10%. This value is higher than the value of the option when the risk-free rate is 0, because the initial borrowing used by the self-financing strategy is more expensive when the interest rate is higher.
Here's how to valuate one-step binomial options for other values of the parameters. Suppose the current price is s± and after one time period the stock either goes up to s3 or down to s2. The exercise price is E. The risk-free rate of interest is r. It is assumed that s2 < E < s3, so the option is exercised if and only if the stock goes up.3 Then the hedge ratio is
c      S3-E
«3 -82
(7.1)
3If S2 < S3 < E, then the option will not be exercised under any circumstances. We are certain the option will be worthless, so its price must be 0. If E < s% < S3, then the option will always be exercised; it really isn't an option, it is a futures contract. We have already seen how to valuate a futures contract—that was done in Section 7.3.
176
CHAPTER 7. PRICING OPTIONS: 4/12/01
This is the number of shares of stock that are purchased; the cost is Ssi. The amount borrowed is
TT--                                              <7'2)
1 + r and the amount that will be paid back to the lender will be 5S2. Therefore, the price of the option is
fU-^-UÜzI L--*-}.           (7.3)
I        1+rJ      s3 - s2   I        1 + rJ
If the stock goes up, then the option is worth (s3 — E) and the portfolio is also worth (s3 — E). If the stock goes down, both the option and the portfolio are worth 0. Thus, the portfolio does replicate the option.
Example
In the example analyzed before, si = 80, s3 = 100, s2 = 60, and E = 80.
Therefore,
100 - 80 _ 1
~ 100 - 60 ~ 2"
The price of the option is
1 r„„       60
2 I        1 + r)
which is $10 is r = 0 and $12.7273 is r = 0.1. The amount borrow is
5s2   _ (1/2)60 _    30
1 + r        1 + r        1 + r'
which is $30 if r = 0 and $27.27 if r = .1.
7.6   Two-step binomial option pricing
A one-step binomial model for a stock price may be realistic for very short maturities. For longer maturities, multiple-step binomial models are needed. A multiple-step model can be analyzed by analyzing the individual steps, going backwards in time.
7.6.  TWO-STEP BINOMIAL OPTION PRICING
177
Stock
Option 100
jS                                                                                               20
Figure 7.2: Two-step binomial model for option pricing.
To illustrate multi-step binomial pricing, consider the two-step model of a European call option in Figure 7.2. The option matures after the second step. The stock price can either go up $10 or down $10 on each step. Assume that r = 0.
Using the pricing principles just developed and working backwards, we can fill in the question marks in Figure 7.2. See Figure 7.3. For example, at node B, the hedge ratio is 5 = 1 so we need to own one share which at this node is worth $90. Also, we need to have borrowed 5s2/(l + r) = (1)(80)/(1 + 0) = $80 so that our portfolio has the same value at nodes E and F as the option, that is, the portfolio should be worth $0 at node E and $20 at node F. Since at node B we have stock worth $90 and risk-free worth —$80, the net value of our portfolio is $10. By the same reasoning, at node C the hedge ratio is 0 and we should have no stock and no borrowing, so our portfolio is worth $0.
Given: exercise price is $30
178
CHAPTER 7. PRICING OPTIONS: 4/12/01
Option
Given: exercise price is $30
Figure 7.3: Pricing the option by backwards induction.
We can see in Figure 7.3 that at the end of the first step the option is worth $10 is the stock is up (node B) and $0 if it is down (node C). Applying one step pricing at node A, at the beginning of trading the hedge ratio is 1/2 and we should own 1/2 share of stock (worth $40) and we should have borrowed $35. Therefore, the portfolio is worth $5 at node A, which proves that $5 is the correct price of the option.
Note the need to work backwards. We could not apply one-step pricing at node A until we had already found the value of the portfolio (and of the option) at nodes B and C.
Let's show that our trading strategy is self-financing. To do this we need to show that we invest no money other than the initial $5. Suppose that the stock is up on the first step, so we are at node B. Then our portfolio is worth $90/2 — $35 or $10. At this point we borrow $45 and buy another half-share for $45; this is self-financing. If the stock is down on the first
7.7. ARBITRAGE PRICING BY EXPECTATION
179
step, we self the half share of stock for $35 and buy off our debt; again the step is self-financing.
7.7   Arbitrage pricing by expectation
It was stated earlier that one prices an option by arbitrage, that is, the price is determined by the requirement that the market be arbitrage-free. The expected value of the option is not used to price the option. In fact, we do not even consider the probabilities that the stock moves up or down.
However, there is a remarkable result showing that arbitrage pricing can be done using expectations. More specifically, there exists probabilities of the stock moving up and down such that the arbitrage price of the option is equal to the expected value of the option according to these probabilities. Whether these are the "true" probabilities of the stock moving up or down is irrelevant. The fact is that these probabilities give the correct arbitrage price when they are used to calculate expectations.
Let "now" be time 0 and let "one step ahead" be time 1. Because of the time value of money, the present value of $D dollars at time 1 is $D/(1 + r) where r is the interest rate. Let /(2) = 0 and /(3) = S3 — E be the values of the option if the stock moves up or down, respectively. We will now show that there is a value of q between 0 and 1, such that the present value of the option is
7^/(3)+ (l-s)/(2)}.                              (7.4)
The quantity in (7.4) is the present value of the expectation of the option at time 1. To appreciate this, notice that the quantity in curly brackets is the value of the option if the stock goes up times q, which is the arbitrage determined "probability" that the stock goes up, plus the option's value if the stock goes down times (1 — q). Thus the quantity in curly brackets is the expectation of the value's option at the end of the holding period. Dividing by 1 + r converts this to a "present value."
180
CHAPTER 7. PRICING OPTIONS: 4/12/01
Okay, how do we find this magical value of q? That's easy. We know that q must satisfy
Tl7{9/(3) + (l-9)/(2)} = |^|{Sl-T|7}.        (7.5)
since the left hand side of this equation is (7.4) and the right-hand side is the value of the option according to (7.3). Substituting /(2) = 0 and /(3) = s3 — E into (7.5) we get an equation that can be solved for q to find that
q = (1 + r)5l~S2.                                     (7.6)
«3 - S2
We want q to be between 0 and 1 so that it can be interpreted as a probability. From (7.6) one can see that 0 < q < 1 if s2 < (1 + r)si < s3. Why should the latter hold? We will show that s2 < (1 + r)si < s3 is required in order for the market to be arbtrage-free. If we invest si in a risk-free asset at time 0, then the value of our holdings at time 1 will be (1 + r)si. If we invest si in the stock, then the value of our holdings at time 1 will be either s2 or s3. If s2 < (1 + r)s1 < s3 were not true, then there would be an arbitrage opportunity. For example, if (1 + r)s\ < s2 < S3, then could borrow at the risk-free rate and invest the borrowed money in the stock with a guaranteed profit; at time 1 we would pay back (1 + r)si and receive at least s2 which is greater that (1 + r)si.
Exercise: How would we make a guaranteed profit if s2 < s3 < (1 + r)si?
Answer: Sell the stock short and invest the si dollars in the risk-free asset. At the end of the holding period (maturity) receive (1 + r)si from the risk-free investment and buy the stock for at most s3 < (1 + r)si.
Thus, the requirement that the market be arbitrage-free ensures that 0 <
q<\.
7.8. A GENERAL BINOMIAL TREE MODEL                                       181
Figure 7.4: Two-step non-recombinant tree. The q(j) is the risk-neutral probability at node j of the stock moving upioard.
7.8   A general binomial tree model
The material in this section follows Chapter 2 of Financial Calculus by Baxter and Rennie. Consider a possibility non-recombinant4 tree as seen in Figure 7.4
Assume that:
• At the jth node the stock is worth s j and the option is worth f(j).
4The tree would be recombinant if the stock prices at nodes 5 and 6 were equal so that these two nodes could be combined.
182
CHAPTER 7. PRICING OPTIONS: 4/12/01
•  The jth node leads to either the 2 j + 1th node or the 2jth node after one time "tick."
•  The actual time between ticks is St.
•  Interest is compounded continuously at a fixed rate r so that B0 dollars now is worth exp(rn 5t)B0 dollars after n time ticks. (Or, B0 dollars after n ticks is worth exp(—rn5t)Bo dollars now.)
Then at node j:
•  The value of the option is
f(j) = exp(-r6t){qjf(2j + 1) + (1 - qj)f(2j)}. where
•  The arbitrage determined qj is
e    s j     S2j                                    ,„ _,
Qj =--------------■                                  (7-7)
S2J+1 - S2j
The number of shares of stock to be holding is ±       f(2j + 1) - f(2j)
$2j+l - S2j
= hedge ratio.
• Denote the amount of capital to hold in the risk-free asset by ipj) typically tpj is negative because money has been borrowed. Since the portfolio replicates the option, at node j the option's value, which is f(j), must equal the portfolio's value which is Sjifij + ipj. Therefore,
b = UU) - <!>&}■                              (7-8)
(ißj increases in value to erSt{f(j) — 4>jSj} after one more time tick).
Expectations for paths along the tree are computed using the q/s. The probability of any path is just the product of all the probabilities along the path.
7.9. MARTINGALES
183
An example
The tree for the example of Section 7.6 is shown in Figure 7.5. Because r = 0 is assumed and because the stock moves either up or down the same amount ($10), the qj are all equal to 1/2.5
The probability of each full path from node 1 to one of nodes 4, 5, 6, or 7 is 1/4.
Given the values of the option at nodes 4,5, 6, and 7, it is easy to compute the expectations of the option's value at other nodes. These expectations are shown in magenta in Figure 7.5.
The path probabilities are independent of the exercise price, since they depend only on the prices of stock at the nodes and on r. Therefore, it is easy to price options with other exercise prices.
Exercise
Assuming the same stock price process as in Figure 7.5, price the call option with an exercise price of $70.
Answer: Given this exercise price, it is clear that the option is worth $0, $10, $10, and $30 dollars at nodes 4, 5, 6, and 7, respectively. Then we can use expectation to find that the option is worth $5 and $20 at nodes 2 and 3, respectively. Therefore, the option's value at node 1 is $12.50; this is the price of the option.
7.9   Martingales
A martingale is a probability model for a fair game, that is, a game where the expected changes in one's fortune are always zero. More formally, a
5It follows from (7.7) that whenever r = and the up moves and down moves are of equal length, then qj = 1/2 for all j.
184
CHAPTER 7. PRICING OPTIONS: 4/12/01
Figure 7.5: Two-step example with pricing by probabilities. Red is node number. Blue is value of the stock. Magenta is value of the option. Path probabilites are in dark green. The exercise price is $80.
7.9. MARTINGALES
185
stochastic process Y0, Yi, Y2,... is a martingale if
E(Yt+1\Yt) = Yt for all t.
Let Pt, t = 0,1,... be the price of the stock at the end of the ith step in a binomial model. Then Pt* := exp(—rt 5t)Pt is the discounted price process.
Key fact: Under the {qj} probabilities, the discounted price process Pr* is a martingale.
To see that Pr* is a martingale, we calculate:
E(Pt+1 \Pt = Sj)    =   qjS2j+i + (1 - qj)s2j
=   s2j + Qj(s2j+i - s2j)
=   s2j + {exp(r 5t)sj — s2j} = exp(r5ť)Sj.
This holds for all values of Sj. Therefore,
E{Pt+1\Pt) = exp(r5t)Pt,
so that
E{exp{-r(t + l)5t)Pt+1\Pt)=exp{-rt6t)Pt,
or
e{p;+1\p;) = p;.
This shows that Pr* is a martingale.
Any set of path probabilities, {pj}, is called a measure of the process. The measure {qj} is called the martingale measure or the risk-neutral measure. We will also call {qj} the risk-neutral path probabilities.
7.9.1    The risk-neutral world
If all investors were risk-neutral, that is, indifferent to risk, then there would be no risk premiums and all expected asset prices would rise at
186                                         CHAPTER 7. PRICING OPTIONS: 4/12/01
the risk-free rate. Therefore, all discounted asset prices, with discounting at the risk-free rate, would be martingales.
We know that we do not live in such a risk-free world, but there is a general prinicple that expectations taken with respect to a risk-neutral model give correct, i.e., arbitrage-free, prices of options and other financial instruments.
Example
In Section 7.3 it was argued that if a stock is selling at $100/share and the risk-free interest rate is 6%. then the correct future delivery price of a share one year from now is $106. We can now calculate this value using the risk-neutral measure—in the risk-neutral world, the expected stock price will increase to exactly $106 one year from now.
7.10   Trees to random walks to Brownian motion
7.10.1    Getting more realistic
Binomial trees are useful because they illustrate several important concepts, in particular:
•  arbitrage pricing
•  self-financing trading strategies
•  hedging
•  computation of arbitrage prices by expectations with respect to an appropriate set of probabilities called the risk-neutral measure
However, binomial trees are not realistic, because stock prices are continuous, or at least approximately continuous. This lack of realism can be alleviated by increasing the number of step. In fact, one can increase the
7.10.  TREES TO RANDOM WALKS TO BROWNIAN MOTION         187
number of steps without limit to derive the Black-Scholes model and formula. That is the goal of Section 7.11. The present section will get us closer to that goal.
7.10.2   A three-step binomial tree
Figure 7.6 is a three-step tree where at each step the stock price either goes up $10 or down $10. Assume that the risk-free rate is r = 0.
Now consider the price of the stock, call it Pt at time t where t = 0,1, 2,3. Using the risk-neutral path probabilities, which are each 1/2 in this example, Pt is a stochastic process, that is a process that evolves randomly in time. In fact, since Pť+1 equals Pt± $10, this process is a random walk. We have
Pt = P0 + ($10){2{W1 + ... + Wt)-t}                      (7.9)
where Wi, ■ ■ ■, Ws are independent and Wt equal 0 or 1, each with probability 1 /2. If Wt is 1, then 2Wt — 1 = 1 and the price jumps up $10 on the ith step. If Wt is 0, then 2Wt — 1 = — lthe price jumps down $10.
The random sum W1 + • ■ • + Wt is Binomial(t, 1/2) distributed and so has a mean of t/2 and variance equal to t/A.
The value of the call option is
P{(P3 - E)+}                                       (7.10)
where x+ equals x if x > 0 and equals 0 otherwise. The expectation in (7.10) is with respect to the risk-neutral probabilities. Since W\ + W2 + W3 is Binomial(3,1/2), it equals 0,1, 2, or 3 with probabilities 1/8, 3/8, 3/8, and 1/8, respectively. Therefore,
P{(P3 -E)+}   =   ^[{P0-30-£ + (20)(0)}+ + 3{P0-30-£+(20)(l)}_ +   3{P0 - 30 - E + (20)(2)}+ + {P0 - 30 - E + (20)(3)}+].
188
CHAPTER 7. PRICING OPTIONS: 4/12/01
All qsare 1/2
50
21.25
100
time=0
time=l
time =2
Exercise price = $30
Figure 7.6: Three-step example of pricing a European call option by probabilities. Red is node number. Blue is value of the stock. Magenta is value of the option. Risk-neutral path probabilites are not shown, but they are all equal to 1/2. The exercise price is $80. The risk-free rate is r = 0.
7.10.  TREES TO RANDOM WALKS TO BROWNIAN MOTION         189
Examples
If P0 = 100 and E = 80, then PQ - 30 - E = -10 and
E{(P,-E)+}   =   I{(-10 + 0)+ + 3(-10 + 20) +
+   3(-10 + 40)+ + (-10 + 60) + }
=   -(0 + 30 + 90 + 50) = —= 21.25 8                                   8
as seen in Figure 7.6.
Similarly, if P0 = 100 and E = 100, then P0 - 30 - E = -30 and
E{(P3-E)+}   =   I{(-30 + 0)+ + 3(-30 + 20)+
+   3(-30 + 40)+ + (-30 + 60) + }
1/                        ,60
=   -(0 + 0 + 30 + 30) = — = 7.5 8                                 8
7.10.3    More time steps
Let's consider a call option with maturity data equal to 1. Take the time interval [0, 1] and divide it into n steps, each of length 1/n. Suppose that the stock price goes up or down a/y/n at each step. Then the price after m steps (0 < m < n) when t = m/n is
Prn/n = Po + ^{2(1^1 + ■ ■ ■ + Wm) - m}.                       (7.11)
Since, W\ +-----h Wm is Binomial(m, 1/2) distributed, it follows that
E(Pt\P0) = PQ.                                       (7.12)
and
TT   ,^,^x      4<t2 m      m  0        0                         ,„„„v
Var(Pt P0) =-------- = -a2 = ta2,                       (7.13)
n   4       n
and, in particular,
Var(P1|P0) = a2.                                     (7.14)
190
CHAPTER 7. PRICING OPTIONS: 4/12/01
Moreover, by the central limit theorem, as n —> oo, Pi converges to a N(P0, a2) random variable.
Let E be the exercise price. Remember that the value of an option is the expectation with respect to the risk-neutral measure of the present value of the option at expiration. Therefore, in the limit, as the number of steps goes to oo, the price of the option converges to
E{(P0 + aZ-E)+}                                  (7.15)
where Z is N(0,1) so that P1 = P0 + aZ is N{P0, a2).
For a fixed value of n, Pt is a discrete time stochastic process since t = 0,1/n, 2/n,..., (n — l)/n, 1. In fact, as we saw before, for any finite value of n, Pt is a random walk. However, in the limit as n —> oo, Pt becomes a continuous time stochastic process. This limit process is called Brownian motion. In other words, the continuous time limit of random walks is Brownian motion.
7.10.4    Properties of Brownian motion
We have seen that Brownian motion is a continuous-time stochastic process that is the limit of discrete-time random walk processes. A Brownian motion process, Bt, starting at 0, i.e., with B0 = 0, has the following mathematical properties:
1.  E{Bt) =0 for all t.
2.  Var(Bt) = ta2 for all t. Here a2 is the volatility of Bt.
3.  Changes over non-overlapping increments are independent. More precisely, if t1 < t2 < ts < i4 then Bt2 — Btl and Bti — Bt3 are independent.
4.  Bt is normally distributed for any t.
If B0 is not zero, then each of these properties holds for the process Bt — Bü/ which is the change in Bt from times 0 to t. All of these properties but the last are shared to random walks with mean-zero steps.
7.11. GEOMETRIC BROWNIAN MOTION
191
7.11    Geometric Brownian motion
Random walks are not realistic models for stock prices, since a random walk can go negative. Therefore, (7.15) is close to but not quite the correct price of the option. To get the correct price we need to make our model more realistic. We saw in Chapter 3 that geometric random walks are much better than random walks as models for stock prices since geometric random walks are always non-negative.
We will now introduce a binomial tree model that is geometric random walk. We do this by making the steps proportional to the current stock price. Thusy if s is the stock price at the current node, then price at the next node is
sexp(fi/n ± a/y/n) = (sup, sdown).
Notice that the log of the stock price is a random walk since
(log(sUp), log(sdown)) = log(s) + -±—=.
Tl        \/Tl
Therefore, the stock price process is a geometric random walk. There is a drift if n ^ 0, but we will see that the amount of drift is irrelevant. We could have set the drift equal to 0 but we didn't to show later that the drift does NOT affect the option's price.
The risk neutral probability of an up jump is
sexp(r/n) - sdown
Q   =
exp(r/n) — exp(/i/n — o j ^/n) exp(/i/n + oj^fn) — exp(fi/n — a/^/n)
1  /   _ M-r + a2/2\
2  I             u^       ) '
Then
a    m
Pt = Pm/n = PoexpUt + -= £(2Wť - 1) J . where as before Wi is either 0 or 1 (so 2Wi — 1 = ±1).
192
CHAPTER 7. PRICING OPTIONS: 4/12/01
Using risk-neutral probabilities, we have
E{ '  ±m-A   =   ™(2pl)       *      (r-/.-5V2'
(             cr2\ m       ,               , ,„.
and
"  ^ow     11 _ 4ff2™,(l - 9) _ , ,
since 5 —>• 1/2 as n —>■ oo.
Therefore, in the risk-neutral world
Pt « Po exp{(r - (72/2)i + aßr},                         (7.16)
where Bt is Brownian motion and 0 < t < 1. Time could be easily extended beyond 1 by adding more steps. We will assume that this has been done.
Notice that (7.16) does NOT depend on /i, only on a. The reason is that in the risk-neutral world, the expectation of all assets increase at rate r. The rate of increase in the real world is /i but this is irrelevant for risk-neutral calculations. Remember that risk-neutral expectations DO give the correct option price in the real world even if they do not correctly describe real world probability distributions.
If E is the exercise or strike price and T is the expiration date of a European call option, then the value of the option at maturity is
[PQ exp{(r - a2/2)T + aBT} - e]    .                      (7.17)
Since BT - N(0,T), we can write BT = VTZ where Z - N(0,1). The discounted value of (7.17) is
P0exp J -°— + aVTZ 1 - exp(-rT)E
(7.18)
7.11. GEOMETRIC BROWNIAN MOTION
193
We will again use the principle that the price of an option is the risk-neutral expectation of the option's discounted value at expiration. By this prinicple, the call's price at time t = 0 is the expectation of of (7.18). Therefore,
C
= !
P0 exp J --— + ox/Tz I - exp(-rT)£
<j){z)dz,           (7.19)
where <f> is the iV(0,1) pdf (probability density function).
Computing this integral is not easy, but it can be done. The result is the famous Black-Scholes formula: Let So be the current stock price (we have switched notation from P0), let E be the exercise price, let r to be continuously compounded interest rate, let a be the volatility, and let T be the expiration date of a call option. Then by evaluating the integral in (7.19) it can be shown that
C = $(eři)S0 - $(d2)Eexp(-rT) where $ is the standard normal CDF,
^M^I.H^t    and  d2 = di_aVŤ
a y/T
Example
Here's a numerical example. Suppose that So = 100, E = 90, a = .4, r = .1, and T = .25. Then
d  = log(100/90) + {-l + (.4)2/2}(.25) =    ^ 1                          .4v^25
and
d2 = d1 - A\/Ž25 = .5518.
Then $(dx) = .7739 and $(d2) = .7095. Also, exp(-rT) = exp{(.l)(.25)} = .9753. Therefore,
C = (100)(.7739) - (90)(.9753)(.7095) = 15.1.
194
CHAPTER 7. PRICING OPTIONS: 4/12/01
7.12   Using the Black-Scholes formula
7.12.1    How does the option price depend on the inputs?
Figure 7.7 shows the variation in the price of a call option as the parameters change. The baseline values of the parameters are So = 100, E = lOOexp(rT), T = .25, r = .06, and a = .1. The exercise price E and initial price So have been chosen so that if invested at the risk-free rate, So would increase to E at expiration time. In each of the subplots in Figure 7.7, one of the parameters is varied while the others are held at baseline.
One see that the price of the call increases with er. This makes sense since E = Soexp(rT) = E (St) in this example (E(St) is the risk-neutral expectation of St)- The expected value of E (St) is at the money. Thus, St is equally likely to be in the money or out of the money. As a increase, the likelihood that St is considerably larger than E also increases. As an extreme case, suppose that a = 0. Then in the risk-neutral world St = exp(rT)So = E and the option at expiration is at the money so its value is 0.
The value at maturity is (St — E)+ so we expect that the price of the call will increase as So increases and decrease as E increases. This is exactly the behavior seen in Figure 7.7. Also, note that the price of the call increases as either r or T increases.
7.12.2    An example — GE
Table 7.12.2 gives the exercise price E, month of expiration, and the price of call options on GE on February 13, 2001. This information was taken from The Wall Street Journal, February 14.  Traded options are generally American rather than European and that is true of the options in Table 7.12.2. However, under the Black-Scholes theory it can be proved that the price
7.12.  USING THE BLACK-SCHOLES FORMULA
195
100 So
0.04         0.08         0.12
r
0.12
Figure 7.7: Price of a call as a function of volatility (a), exercise price (E), initial price (S0), risk-free rate (r), and expiration date (T). Baseline values of the parameters are S0 = 100, E = 100 exp(rT), T = .25, r = .06, and a = .1. In each subplot, all parameters except the one on the horizontal axis are fixed at baseline.
196
CHAPTER 7. PRICING OPTIONS: 4/12/01
of an American call option is identical to the price of a European call option.6 See Section 7.12.3 for discussion of this point. Since an American call has the same price as a European call, we can use the Black-Scholes formula for European call options to price the options in Table 7.12.2. We will compare the Black-Scholes prices with the actual market prices.
Only the month of maturity is listed in a newspaper. However, maturities (days until expiration) can be determined as follows. An option expires on 10:59pm Central Time of the Saturday after the third Friday in the month of expiration (Hull, 1995, page 180). February 16, 2001 was the third Friday of its month, so that on February 13, an option with a February expiration date had three trading days (and four calendar days) until expiration. Since there are returns on stocks only on trading days, T = 3 for options expiring in February. Similarly, on February 13 an option expiring in March had T = 23 trading days until expiration. Since there are 253 trading days/year, there are 253/12 w 21 trading days per month. For June, I used T = 23 + (21)(3) and for September I used T = 23 + (6)(21).
GE closed at $47.16 on February 13, so we use So = 47.16.
On February 13, the 3-month T-bill rate was 4.91%. Thus, the daily rate of return on T-bills would be r = 0.0491/253 = .00019470, assuming that a T-bill only has a return on the 253 trading days per year; see Section 7.12.4.
I used two values of a. The first, 0.0176, was based on daily GE return from December 1999 to December 2000. The second, 0.025, was chosen to give prices somewhat similar to the actual market prices.
7.12.3    Early exercise of calls is never optimal
It can be proved that early exercise of an American call option is never optimal. The reason is that at any time before the expiration date, the price of the option will be higher than the value of the option if exercised.
6However, American and European put options will in general have different prices.
7.12.  USING THE BLACK-SCHOLES FORMULA
197
E	Month of Expiration	T (in days)	Actual Price	B&S calculated price		Implied Volatility
				a = .0176	a = .025	
35	Sep	149	14.90	13.40	14.03	.0320
40	Sep	149	10.80	9.22	10.37	.0275
42.50	Mar	23	5.30	5.03	5.38	.0235
45	Feb	3	2.40	2.22	2.32	.0290
45	Mar	23	3.40	3.00	3.57	.0228
50	Feb	3	0.10	0.016	0.09	.0258
50	Mar	23	0.90	0.64	1.23	.0209
50	Sep	149	4.70	3.42	5.12	.0232
55	Mar	23	0.20	0.06	0.28	.0223
55	Jun	86	1.30	0.92	2.00	.0204
Table 7.1: Actual prices and prices determined by the Black-Scholes formula for options on February 13, 2001. E is the exercise price. T is the maturity.
Therefore, it is always better to sell the option rather than to exercise it early.
To see empirical evidence of this principle, consider the first option in Table 7.12.2. The strike price is 35 and the closing price of GE was 47.16. Thus, if the option had been exercised at the closing of the market, the option holder would have gained $(47.16 — 35) = $12.16. However, the option was selling on the market for $14.90 that day. Thus, one would gain $(14.90 — 12.16) = $2.74 more by selling the option rather than exercising it.
Similarly, the other options in Table 7.12.2 are worth more if sold than if exercised. The second option is worth $(47.16 — 40) = $7.16 is exercised but $10.80 if sold. The third option is worth $(47.16 - 42.5) = $4.66 if exercised but $5.30 if exercised.
Since it is never optimal to exercise an American call option early, the abil-
198
CHAPTER 7. PRICING OPTIONS: 4/12/01
ity to exercise an American call early is not worth anything. This is why American calls are equal in value to European calls with the same exercise price and expiration date.
7.12.4    Are there returns on non-trading days?
We have assumed that there are no returns on non-trading days. For T-bills, this assumption is justified by the way we calculated the daily interest rate. We took the daily rate to be the annual rate divided by 253 on trading days and 0 on non-trading days. If instead we took the daily rate to be the annual rate divided by 365 on every calendar day, then the interest on T-Bills over a year, or a quarter, would be the same.
A stock price is unchanged over a non-trading day. However, the efficient market theory says that stock prices change due to new information. Thus, we might expect that there is a return on a stock over a weekend or holiday but it is realized until the market reopens. If this were true, then returns from Friday to Monday would be more volatile than returns over a single trading day. Empirical evidence fails to find such an effect.
A reason why returns over weekends are not overly volatile might be that there is little business news over a weekend. However, this does not seem to be the explanation why there is not excess volatility over a weekend. In 1968, the NYSE was closed for a series of Wednesday. Of course, other businesses were open on these Wednesdays so there was the usual amount of business news during the Wednesdays when the NYSE was closed. For this reason, one would expect increased volatility for Tuesday to Thursday price changes on weeks with a Wednesday closing compared to, say, Tuesday to Wednesday price changes on weeks without a Wednesday market closing. However, no such effect has been detected (French and Roll, 1986).
Trading appears to generate volatility by itself. Traders react to each other. Stock prices react to both trading "noise" and to new information. Short term volatility might be mostly due to noise trading.
7.12.  USING THE BLACK-SCHOLES FORMULA
199
7.12.5    Implied volatility
Given the exercise price, current price, and maturity of an option and given the risk-free rate, there is some value of a that makes the price determined by the Black-Scholes formula equal to the current market price. This value of a is called the implied volatility. One might think of implied volatility as the amount of volatility the market believes to exist currently.
How does one determine the implied volatility? The Black-Scholes formula gives price as a function of a with all other parameters held fixed. What we need is the inverse of this function, that is, a as a function of the option price. Unfortunately, there is no formula for the inverse function. The function exists, of course, but there is no explicit formula for it. However, using interpolation one can invert the Black-Scholes formula to get a as a function of price. Figure 7.8 shows how this could be done for the third option in Table 7.12.2. The implied volatility in Figure 7.8 is 0.0235 and was determined by MATLAB's interpolation function, interpl.m. The implied volatilities of the other options in Table 7.12.2 were determined in the same manner.
Notice that the implied volatilities are substantially higher than 0.0176, the average volatility over the previous year. However, there is evidence that volatility of GE was increasing at the end of last year; see the estimated volatility in Figure 3.3. In that figure, volatility is estimated from December 15,1999 to December 15, 2000. Volatility is highest at the end of this period and shows some sign of continuing to increase. The estimated volatility on December 15, 2000 was 0.023, which is similar to the implied volatilities in Table 7.12.2. It would be worthwhile to re-estimate volatility with data from December 15, 2000 to February 13, 2001. It may be that the implied volatilities in Table 7.12.2 are similar to the observed volatility in early 2001.
The implied volatilities also vary somewhat among themselves. One reason for this variation is that the option prices and closing price of GE stock
200
CHAPTER 7. PRICING OPTIONS: 4/12/01
0.015
0.02
0.025
0.03
sigma
Figure 7.8: Calculating the volatility implied by the option with an exercise price of $42.50 expiring in March 2001. The price was $5.30 on February 13, 2001. The blue curve is the price given by the Black-Scholes formida as a function of a. The horizontal line is drawn where price is $5.30. This line intersects the curve at a = .0242. This value of a is the volatility implied by the option's price.
7.13. PUTS
201
are not concurrent. Rather, each price is for the last trade of the day for that option or for the stock. This lack of concurrence introduces some error into pricing by the Black-Scholes formula and therefore into the implied volatilities. Another problem with these prices is that the Black-Scholes formula assumes that the stock pays no dividends, but GE does pay dividends.7
7.13    Puts
Recall that a put option gives one the right to sell a certain number of shares of a certain stock at the exercise price. The pricing of puts is similar to the pricing of calls, but as we will see in this section, there are some differences.
7.13.1    Pricing puts by binomial trees
Put options can be priced by binomial trees in the same way that call options are priced. Figure 7.9 shows a two-step binomial tree where the stock price starts at $100 and increases or decreases by 20% at each step. Assume that the interest rate is 5% compounded continuously and that the strike price of the put is $110.
In this example, European and American puts do NOT have the same price at all nodes. We will start with a European put and then see how an American put differs.
At each step,
_ exp(.05) - .8 _ q -      i2_8      - .6282.
The value of a put after two steps is (110 — S)+ where S is the price of the stock after two steps. Thus the put is worth $46, $14, and $0 at nodes 4, 5,
Modifications of the formula to accommodate dividend payments are possible, but we will not pursue that topic here.
202
CHAPTER 7. PRICING OPTIONS: 4/12/01
Exercise price =	$110	
r=5%		0
4.91		144
11.65 (I3.54j^-~VL> rTV^^^   120		^4
100
24.63
96
30
Put Option
Figure 7.9: Pricing a put option. The stock price is in blue and the price of a European put option is in magenta. The price of an Amercian put option is shown in black with parentheses when it differs from the price of a European put.
7.13. PUTS
203
and 6 respectively. Therefore, the price of the option at node 3 is
e-05{(^)(0) + (1 - q)(U)} = e-05{(.6282)(0) + (.3718)(14)} = 4.91. The price of the option at node 2 is
e-05{(tf)(14) + (1 - g)(46)} = 24.63. Finally the price of the put at node 1 is
e-05{(9)(4.91) + (1 - ?)(24.63)} = 11.65.
Now consider an American option. At nodes 4, 5, and 6 we have reached the expiration time so that the American option has the same value as the European option.
At node 3 the European option is worth $4.91. At this node, should we exercise the American option early? Clearly not, since the strike price ($110) is less than the stock price ($120). Since early exercise is suboptimal at node 3, the American option is equivalent to the European option at this node and both options are worth $4.91.
At node 2 the European option is worth $24.63. The American option can be exercised to earn ($110 — $80) = $30. Therefore, the American option should be exercised early since early exercise earns $30 while holding the option is worth only $25.89. Thus, at node 2 the European option is worth $24.63 but the American option is worth $30.
At node 1, the American option is worth
e-06{(?)(4.91) + (1 - <z)(30)} = 13.65,
which is more than $11.65, the value of the European option at node 1. The American option should NOT be exercised early at node 1 since that would earn only $10. However, the American option is worth more than the European option at node 1 because the American can be exercised early at node 2 should the stock move down at node 1.
204
CHAPTER 7. PRICING OPTIONS: 4/12/01
7.13.2    Why are puts different than calls?
We saw that in the Black-Scholes model where changes in price are proportional to the current price, it is never optimal to exercise an American call early. Puts are different. In the Black-Scholes model, early exercise of a call may be optimal. In the binomial model example just given, prices changes are proportional to current prices as in the Black-Scholes model, and in the binomial model early exercise of a put is again optimal at some nodes.
So why are puts different than calls? The basic idea is this. A put increases in value as the stock price decreases. As the stock price decreases, the size of further price changes also decreases. At some point we are in the range of diminishing returns. We expect further decreases in the stock price to be so small that the put will increase in value at less than the risk-free rate. Therefore, it is better to exercise the option and invest the profits in a risk-free asset.
With calls, everything is reversed. A call increases in value as the stock price increases. As the stock price increases, so does the size of future price changes. The expected returns on the call (expectations are with respect to the risk-neutral measure, of course) are greater than the risk-free rate of return.
7.13.3    Put-call parity
It is possible, of course, to derive the Black-Scholes formula for a European put option by the same reasoning used to price a call. However, this work can be avoided since there is a simple formula relating the price of a European put to that of a call:
P = C + e~rTE - S0,                                  (7.20)
where P and C are the prices of a put and of a call, both with expiration date T and exercise price E. Here, the stock price is So and r is the contin-
7.13. PUTS
205
uously compounded risk-free rate. Thus, the price of a put is simply the price of the call plus e~rTE — So.
Equation (7.20) is derived by a simple arbitrage argument. Consider two portfolios. The first portfolio holds one call and Ee~rT dollars in the risk-free asset. Its payoff at time T is E, the value of the risk-free asset, plus the value of the call, which is (St — E)+. Therefore, its payoff is E if St < E and St if St > E. In other words, the payoff is either E or St, whichever is larger.
The second portfolio holds a put and one share of stock. Its payoff at time T is St if St > E so that the put is not exercised. If St < E, then the put is exercised and the stock is sold for a payoff of E. Thus, the payoff is E or St, whichever is larger, which is the same payoff as the first portfolio.
Since the two portfolios have the same payoff for all values of St, their initial values at time 0 must be equal to avoid arbitrage. Thus,
C + e~rTE = P + S0,
which can be rearranged to yield equation (7.20).8
Relationship (7.20) holds only for European options. European calls have the same price as European calls so that the right hand side of (7.20) is the same for European and American options. American puts are worth more than European puts, so the left hand side of (7.20) is larger for American than for European puts. Thus, (7.20) becomes
P > C + e~rTE - So,                                  (7.21)
for American options, and clearly (7.21) does not tell us the price of an American put.
8As usual in these notes, we are assuming that the stock pays no dividend, at least not during the lifetime of the two options. If there are dividends, then a simple adjustment of formula (7.20) is needed. The reason the adjustment is needed is that the two portfolios will no longer have exactly the same payoff. One can see that the first portfolio which holds the stock will receive a dividend and so receive a higher payoff than the second portfolio which will not receive the dividend.
206
CHAPTER 7. PRICING OPTIONS: 4/12/01
7.14   The evolution of option prices
As time passes the price of an option changes with the changing stock price and the decreasing about of time until the expiration date. We will assume that r and a are constant, though in the real financial world these could change too. The Black-Scholes formula remains in effect and can be used to update the price of an option. Suppose that t = 0 is when the option was written and t = T is the expiration date. Consider a time point t such that 0 < t < T. Then the Black-Scholes formula can be used with So in the formula set equal to St and T in the formula set equal to T — t.
Figure 7.10 illustrates the evolution of option prices for two simulations of the geometric Brownian motion process of the stock price. Here T = 1, a = A, r = .06, S0 = 100, and E = 100 for both the put and the call. In one case the call was in the money at expiration, while in the second case it was the put that was in the money.
Notice that around t = .18 the stock price is around 110 in the red simulation but the put is still worth something, since there is still plenty of time for the price to go down. Around t = 1 the stock price of the blue simulation is around 110 but the value of the put is essentially 0; now there is too little time for the put to go in the money (the risk-neutral probability is not 0, but almost 0).
7.15   Intrinsic value and time value
The intrinsic value of a call is (So — E)+, the payoff one would obtain for immediate exercise of the option (which would be possible only for an American option). The intrinsic value is always less than the price, so immediate exercise is never optimal. The difference between the intrinsic value and the price is called the time value of the option. Time value has two components. The first is a volatility component. The stock price could drop between now and the expiration date; by waiting until the last
7.15. INTRINSIC VALUE AND TIME VALUE
207
120
110
CD Ü
100
0.4                  0.6
time
time
10
CD    8
o
a-  6
■*—>
tt  4 2

0.2
0.4
0.6
0.8
time
Figure 7.10: Evolution of option prices. The stock price is a geometric Brown-ian motion. Two independent simulations of the stock price are shown and color coded. Here T = 1, a = .1, r = .06, ^o = 100, and E = 100 for both the put and the call. In the blue and red simidations the call, respectively, put are in the money at the expiration date.
208
CHAPTER 7. PRICING OPTIONS: 4/12/01
12
10
CO Ü
M—
°   6
CD Ü
0^ 85
price of European call intrinsic value adj intrinsic value
Figure 7.11: Price (for European or American option), intrinsic value, and adjusted intrinsic value of a call option. The intrinsic value is the payoff if one exercises early. Here E = 100, T = .25, r = 0.06, and a = 0.1.
moment, one can avoid exercising the option when St < E. The second component is the time value of money. If you do exercise the option, it is best to wait until time T so that you delay payment of the exercise price.
The adjusted intrinsic value is (So — e E)+. The difference between the price and the adjusted intrinsic value is the volatility component of the time value of the option. As Sq —> oo, the price converges to the adjusted intrinsic value and the volatility component converges to 0. The reason this happens is that as So —>■ oo you become sure that the option will be in the money at the expiration date.
Figure 7.11 shows the price, intrinsic value, and adjusted intrinsic value of a call option when S0 = 100, E = 100, T = .25, r = 0.06, and a = 0.1
7.15. INTRINSIC VALUE AND TIME VALUE
209
Figure 7.12: Price (for European option), intrinsic value, and adjusted intrinsic value of a put option. The intrinsic value is the payoff if one exercises early. The price of an American put would be either the price of the European put of the intrinsic value, whichever is larger. Here S0 = 100, E = 100, T = .25, r = 0.06, and a = 0.1.
The intrinsic value of put is (E — So)+, which again is the the payoff one would obtain for immediate exercise of the option, if that is possible (American option). The intrinsic value is sometimes greater than the price, in which case immediate exercise is optimal.
The adjusted intrinsic value is (e~rTE — So)+. As So —> 0, the likelihood that the option will be in the money at the expiration date increase to 1 and the price converges to the adjusted intrinsic value.
Figure 7.12 shows the price, intrinsic value, and adjusted intrinsic value of a put option when S0 = 100, E = 100, T = .25, r = 0.06, and a = 0.1
210
CHAPTER 7. PRICING OPTIONS: 4/12/01
7.16   Black, Scholes, and Merton
This section is based on chapter 11 of Bernstein's (1992) book Capital Ideas. Fischer Black graduated in 1959 from Harvard with a degree in physics. In 1964 he received a PhD in applied mathematics from Harvard where he studied operations research, computer design, and artifical intelligence. He never took a course in either finance or economics.
Finding his doctorial studies at bit too abstract, he went to work at Arthur D. Little where he became acquainted with the CAPM. He found this subject so fascinating that he moved into finance. At ADL, Black tried to apply the CAPM to the pricing of warrants, which are much like options. Bernstein (1992) quotes Black as recalling
I applied the Capital Asset Pricing Model to every moment in a warrant's life, for every possible stock price and warrant value .... I stared at the differential equation for many, many months. I made hundreds of silly mistakes that led me down blind alleys. Nothing worked ...
[The calculations revealed that] the warrant value did not depend on the stock's expected return, or on any other asset's expected return. That fascinate me. ... Then Myron Scholes and I started working together.
Scholes received a bacheler's degree from McMaster's University in Ontario in 1962, earned a doctorate in finance from Chicago, and then took a teaching job at MIT. When Scholes meet Black he too was working intensely on warrant pricing by the CAPM. Realizing that they were working on the same problem, they began a collaboration that proved to be very fruitful.
Black and Scholes came to understand that the expected return on a stock or option had no effect of what the current price of the option should be.
7.16. BLACK, SCHOLES, AND MERTON
211
With this insight and building on the CAPM, they arrived at the option equation and derived the formula for the option price.
In 1970, Scholes described his work with Black on options pricing to Robert C. Merton. Merton had studied engineering mathematics at Columbia and then Cal Tech. He developed an interest in economics and planned to study that subject in graduate school. His lack of formal training in economics put off many graduate schools, but MIT offered him a fellowship where he worked under the direction of Paul Samuelson.
Merton developed the "intertemportal capital asset pricing model" that converted the CAPM from a static model describing the market for a single discrete holding period to a model for finance in continuous time. Merton realized that Ito's stochastic calculus was a goldmine for someone working on finance theory in continuous time. In the preface to his book, "Continuous-Time Finance," Merton has written
The mathematics of the continuous-time model contains some of the most beautiful applications of probability and optimization theory. But, of course, not all that is beautiful in science need also be practical. And surely, not all that is practical in science is beautiful. Here we have both.
Merton developed a much more elegant derivation of the Black-Scholes formula, a derivation based on an arbitrage argument. Black has said "A key part of the options paper I wrote with Myron Scholes was the arbitrage argument for deriving the formula. Bob gave us that argument. It should probably be called the Black-Merton-Scholes paper."
In 1997, Merton shared the Nobel Prize in Economics with Scholes. Sadly, Black had died at a young age and could not share the prize, since the Nobel Prize cannot be awarded posthumously. Merton has been called "the Isaac Newton of modern finance."
212
CHAPTER 7. PRICING OPTIONS: 4/12/01
7.17    Summary
•  An option gives the holder the right but not the obligation to do something, for example, to purchase a certain amount of a certain stock at a fixed price within a certain time frame.
•  A call option gives one the right to purchase (call in) a stock. A put gives one the right to sell (put away) a stock.
•  European options can be exercised only at their expiration date. American options can be exercised on or before their expiration date.
•  Arbitrage is making a guaranteed profit without investing capital.
•  Arbitrage pricing means determining the unique price of a financial instrument that guarantees that the market is free of arbitrage opportunities.
•  Options can be priced by arbitrage using binomial trees.
•  The "measure" of a binomial tree model or other stochastic process model gives the set of path probabilities of that model.
•  There exists a risk-neutral measure such that expected prices calculating with respect to this measure are equal to arbitrage determined prices.
•  In a binomial tree model with price changes proportional to the current price, as the number of steps increases the limit process is a geometric Brownian motion and the price of the option in the limit is given by the Black-Scholes formula.
•  To price an option by the Black-Scholes formula, one needs an estimate of the stock price's volatility. This can be obtained from historical data. Conversely, the implied volatility of a stock is the volatility which makes the actual market price equal to the price given by the Black-Scholes formula.
7.18. REFERENCES
213
•  Within the Black-Scholes model, the early exercise of calls is never optimal but the early exercise of puts is sometimes optimal. Therefore, European and American calls have equal prices, but American puts are generally worth more than European puts.
•  Put-call parity is the relationship
P = C + e~rTE - S0
between P, the price of a European put, and C, the price of a European call. It is assumed that both have exercise price E and expiration date T. So is the price of the stock.
7.18   References
Baxter, M., and Rennie, A. (1998), Financial Calculus: An Introduction to Derivative Pricing, Cambridge University Press.
French, K. R., and Roll, R., (1986), "Stock return variances; the arrival of information and the reaction of traders," Journal of Financial Economics, 17, 5-26.
Hull, John C. (1995), Introduction to Futures and Options Markets, Prentice Hall, Englewood Cliffs, NJ.
Merton, R.C. (1992), Continuous-Time Finance, revised ed., Blackwell, Cambridge, Ma. and Oxford, UK.
214                                      CHAPTER 7. PRICING OPTIONS: 4/12/01
Chapter 8
GARCH models: 4/24/01
8.1    Introduction
Despite the popularity of ARMA models, they have a significant limitation, namely, that they assume a constant volatility. In finance, where correct specification of volatility is of the utmost importance, this can be a severe limitation. In this chapter we look at time series models that have randomly varying volatility.
ARMA models are used to model the conditional expectation of the current observation, Yt, of a process given the past observations. ARMA models do this by writing Yt as a linear function of the past plus a white noise term. ARMA models also allow us to predict future observations given the past and present. The prediction of Yt+i given Yt, Yt_\... is simply the conditional expectation of Yt+i given Yt, Yt_\___
However, ARMA models have rather boring conditional variances—the conditional variance of Yt given the past is always a constant. What does this mean for, say, modeling stock returns? Suppose we have noticed that recent daily returns have been unusually volatile. We might suppose that tomorrow's return will also be more variable than usual. However, if we are modeling returns as an ARMA process, we cannot capture this type of behavior because the conditional variance is constant. So we need better
215
216
CHAPTER 8. GARCH MODELS: 4/24/01
time series models if we want to model the nonconstant volatility often seen in financial time series.
In this chapter we will study models of nonconstant volatility. ARCH is an acronym meaning AutoRegressive Conditional Heteroscedasticity.1 In ARCH models the conditional variance has a structure very similar to the structure of the conditional expectation in an AR model. We will first study the ARCH(l) model, which is similar to an AR(1) model. Then we will look at ARCH(p) models which are analogous to AR(p) models. Finally, we will look at GARCH (Generalized ARCH) models which model conditional variances much like the conditional expectation of an ARMA model.
8.2    Modeling conditional means and variances
Before looking at GARCH models, we will study some general principles on how one models non-constant variance.
The general form for the regression oiYt on Xit,..., XPit is
yí = /(X1;í,...,Xp,í) + ei                                (8.1)
where et has expectation equal to 0 and a constant variance a2. The function / is the conditional expectation of Yt given Xitt,..., Xp>t. To appreciate this fact, notice that if we take the conditional (given the Xitt values) expectation of (8.1), /(X1)r,..., XPtt) is treated as a constant and the conditional expectation of et is 0. Moreover, the conditional variance is simply the variance of et, that is, a2. Frequently, / is linear so that
/(^i,t) • • • i XPtt) = ßo + ßiXitt + • ■ • + ßpXpj.
Principle: To model the conditional mean of Yt given X±mt,..., XPtt/ write Yt as the conditional mean plus white noise.
1 Heteroscedasticity is a fancy way of saying non-constant variance. Homoscedasticity means constant variance. Alternate spellings are heteroskedasticity and homoskedastic-ity.
8.3. ARCH(l) PROCESSES
217
Equation (8.1) can be modified to allow a nonconstant conditional variance. Let a2(Xiit,..., XPtt) be the conditional variance of Yt given Xi>t, ■ ■ ■, Xp>t. Then the model
Yt = f{X1>t,..., XPtt) + a(Xltt,..., XPit)et                    (8.2)
gives the correct conditional mean and variance.
Principle: To allow a nonconstant conditonal variance in the model, multiply the white noise term by the conditional standard deviation. This product is added to the conditional mean as in the previous principle.
The function a(Xiit, ■ ■ ■, XPit) must be non-negative since it is a standard deviation. If the function a(-) is linear, then its coefficients must be constrained to ensure non-negativity. Modeling non-constant conditonal variances in regression is treated in depth in the book by Carroll and Ruppert (1988). Models for conditional variances are often called "variance function models." The G ARCH models of this chapter are a special class of variance function models.
8.3   ARCH(l) processes
Let €i, e2,... be Gaussian white noise with unit variance, that is, let this process be independent N(0,1). Then
Efalet-!,...) = 0,
and
Var(eť|eť_1,...) = l.                                     (8.3)
Property (8.3) is called conditional homoscedasticity. The process at is an ARCH(l) process if
at = etJa0 + a^^.                                    (8.4)
218
CHAPTER 8. GARCH MODELS: 4/24/01
We require that a0 > 0 and a>i > 0 because a standard deviation cannot be negative. It is also required that a.\ < 1 in order for at to be stationary with a finite variance. If a± = 1 then at is stationary but its variance is oo; see below. Equation (8.4) is somewhat like an AR(1) but in a\, not at, and the ARCH(l) model induces an ACF in a\ that is like an AR(l)'s ACE
Define
a\ = Var(at\at-i,...)
to be the conditional variance of at given past values. Since et is independent of at-\ and Var(er) = 1
^(ai|at_1,...)=0,                                     (8.5)
and
o\ = «o + aic^_v                                      (8.6)
Understanding equation (8.6) is crucial to understanding how GARCH processes work. This equation shows that if at-\ has an unusually large deviation from its expectation of 0, so that a\_x is large, then the conditional variance of at is larger than usual. Therefore, at is also expected to have an unusually large deviation from its mean. This volatility will propagate since at having a large deviation makes a^+1 large so that at+± will tend to be large. Similarly, if at-\ is unusually small, then o\ will be small, and at is expected to also be small, etc. Because of this behavior, unusual volatitity in at tends to persist, though not forever. The conditional variance tends to revert to the unconditional variance provided that a\ < 1 so that the process is stationary with a finite variance.
The unconditional, i.e., marginal, variance of at denoted by 7a(0) is gotten by taking expectations in (8.5) which give us
7a (0) = ao + ai7a(0).
This equation has a positive solution if o^ < 1:
7a(0) = a0/(l - ai).
8.3. ARCH(l) PROCESSES
219
If a\ > 1 then 7a(0) is infinite. It turns out that at is stationary nonetheless. The integrated GARCH model (I-GARCH) has a\ = 1 and is discussed in Section 8.10.
Straightforward calculations using (8.6) show that the ACF of at is
pa(h) = 0       if   h ^ 0.
In fact, any process such that the conditional expectation of the present observation given the past is constant is an uncorrelated process. In introductory statistics courses, it is often mentioned that independence implies zero correlation but not vice versa. A process, such as the GARCH processes, where the conditional mean is constant but the conditional variance is non-constant is a good example of a process that is uncorrelated but not independent. The dependence of the conditional variance on the past is the reason the process is not independent. The independence of the conditional mean on the past is the reason that the process is uncorrelated.
Although at is uncorrelated just like the white noise process ttf the process aj has a more interesting ACF: if a± < 1 then
pa*(h) = af,        V    h. If «i > 1, then of is nonstationary, so of course it does not have an ACF.
8.3.1    Example
A simulated ARCH(l) process is shown in Figure 8.1. The top-left panel shows the independent white noise process, tt. The top right panel shows at = Jl + .950%^, the conditional standard deviation process. The bottom left panel shows at = attt, the ARCH(l) process. As discussed in the next section, an ARCH(l) process can be used as the noise term of an AR(1) process. This is shown in the bottom right panel. The AR(1) parameters are \i = .1 and <f> = .8.
The variance of at is 7a(0) = 1/(1 — .95) = 20 so the standard deviation is V2Ö = 4.47.
220
CHAPTER 8. GARCH MODELS: 4/24/01
White noise
Conditional std dev
ARCH(1)
AR(1)/ARCH(1))
Figure 8.1:  Simulation of 60 observations from an ARCH(l) process and an AR(1)/ARCH(1) process. The parameters are a0 = \, oi.\ = .95, yu = .1, and
ó = .8.
8.4.  THE AR(1)/ARCH(1) MODEL
221
The processes were started out all equal to 0 and simulated for 70 observation. The first 10 observations were treated as a burn-in period where the process was converging to its stationary distribution. In the figure, only the last 60 observations are plotted.
The white noise process in the top left panel is normally distributed and has a standard deviation of 1, so it will be less that 2 in absolute value about 95% of the time. Notice that just before t = 10, the process is a little less than —2 which is a somewhat large deviation from the mean of 0. This deviation causes the conditional standard deviation (at) shown in the top right panel to increase and this increase persists for about 10 observations though it slowly decays. The result is that the ARCH(l) process exhibits more volatility than usual when t is between 10 and 15.
Figure 8.2 shows a simulation of 600 observations from the same processes as in Figure 8.1. A normal probability plot of at is also included. Notice that this ARCH(l) exhibits extreme non-normality This is typical of ARCH processes. Conditionally they are normal with a nonconstant variance, but there marginal distribution is non-normal with a constant variance.
8.4   The AR(1)/ARCH(1) model
As we have seen, an AR(1) has a nonconstant conditional mean but a constant conditional variance, while an ARCH(l) process is just the opposite. If we think that both the conditional mean and variance of a process will depend on the past then we need the features of both the AR and ARCH models. Thus, we will combine the two models. In this section we start simple and combine an AR(1) model with an ARCH(l) model.
222
CHAPTER 8. GARCH MODELS: 4/24/01
White noise
0             200           400           600
normal plot of ARCH(1)
	Conditional std dev		
25			
			
20			
15			
10			
5 0 C	ttiiikáll	LI	ijiijjyijiijiy
	)             200           400           60		
	AR(1)/ARCH(1))		
40 r
20
-20L
0             200           400           600
Figure 8.2: Simulation of 600 observations from an ARCH(l) process and an AR(1)/ARCH(1) process. The parameters are a0 = 1, ax = .95, n = .1, and
ó = .8.
46
8.5. ARCH(Q) MODELS
223
Let at be an ARCH(l) process and suppose that
m- n = 4>{ut-i - n) + at.
ut looks like an AR(1) process, except that the noise term is not independent white noise but rather an ARCH(l) process.
Although at is not independent white noise, we saw in the last section that it is an uncorrelated process; at has the same ACF as independent white noise. Therefore, ut has the same ACF as an AR(1) process:
pu(h) = ^       V   h.
Moreover, d\ has the ARCH(l) ACF:
pa2 (h) = oíi           V    h.
We need to assume that both \<f>\ < 1 and a\ < 1 in order for u to be stationary with a finite variance. Of course, «o > 0 and «i > 0 and also assumed.
The process ut is such that its conditional mean and variance, given the past, are both nonconstant so a wide variety of real time series can be modeled.
Example
A simulation of an AR(1)/ARCH(1) process is shown in the bottom right panel of Figure 8.1. Notice that when the ARCH(l) noise term in the bottom left panel is more volatile, then the AR(1)/ARCH(1) process moves more rapidly.
8.5   ARCH(<?) models
As before, let et be Gaussian white noise with unit variance. Then at is an ARCH(^) process if
at = ViM
224
CHAPTER 8. GARCH MODELS: 4/24/01
where
Ot =  -v   «o + Yl aia,
2
t-i i=l
is the conditional standard deviation of at given the past values of this process. Like an ARCH(l) process, an ARCH(^) process is uncorrelated and has a constant mean (both conditional and unconditional) and a constant unconditional variance, but its conditional variance is nonconstant. In fact, the ACF of a\ is the same as the ACF on an AR(p) process.
8.6   GARCH(j9, q) models
The GARCH(p, q) model is
at = et&t
where
ot
\
=i                     j=i
«o + J2 aiat-i + Yl ßiat-i-
The process at is uncorrelated with a stationary mean and variance and a\ has an ACF like an ARMA process.
A very general time series model lets at be GARCH(pG, qG) and uses at as the noise term in an ARIMA(^, d, qA) model.2
GARCH models include ARCH models as a special case, and we will use the term "GARCH" to refer to both ARCH and GARCH models.
Figure 8.3 is a simulation from a GARCH(1,1) process and from a AR(1)/ GARCH(1,1) process. The GARCH parameters are a0 = 1, «i = .08, ßi = .9. The large value of ß\ give the conditional standard deviation process a long-term memory. Notice that the conditional standard deviation is less "bursty" than for an ARCH(l) process.
2We use subscripts on p and q to distinguish between the GARCH (G) and ARIMA (A) parameters.
8.6. GARCH(P, Q) MODELS
225
White noise
Conditional std dev
600
200               400
600
GARCH(1,1)
AR(1)/GARCH(1,1))
600
200               400
600
Figure 8.3: Simulation GARCH(1,1) and AR(1)/GARCH(1,1) processes.  The parameters are a0 = 1, cti = .08, ßi = .9, and 4> = -8.
226
CHAPTER 8. GARCH MODELS: 4/24/01
8.7   Heavy-tailed distributions
Researchers have long noticed that stock returns have "heavy-tailed" or "outlier-prone" probability distributions. This means that they have more extreme outliers than expected from a normal distribution. The reason for the outliers may be that the conditional variance is not constant. In fact, GARCH processes exhibit heavy-tails. Therefore, when we use GARCH models in finance we can model both the conditional heteroscedasticity and the heavy-tailed distributions of financial market data.
To understand how a non-constant variance induces outliers, we look at a simple case. Consider a distribution which is 90% N(0,1) and 10% iV(0, 25). This is an example of a "normal mixture" distribution. The variance of this distribution is(.9)(l) + (.l)(25) = 3.4 so its standard deviation is 1.844. This distribution is MUCH different that a JV(0, 3.4) distribution, even though both distributions have the same mean (0) and variance (3.4). To appreciate this, look at Figure 8.4.
You can see in the top left panel that the two densities look quite different. The normal density looks much more dispersed than the normal mixture, but we know that they actually have the same variances. What's happening? Look at the detail of the right tails in the top right panel. The normal mixture density is much higher than the normal density when x (the variable on the horizontal axis) is greater than 6. This is the "outlier" region (along with x < —6). The normal mixture has more outliers and they come from the 10% of the population with a variance of 25. Outliers have a powerful effect on the variance and this small fraction of outliers inflates the variance from 1.0 (the variance of 90% of the population) to 3.4.
Let's see how much more probability the normal mixture distribution has in the outlier range \x\ > 6 compared to the normal distribution.3 For a
3There is nothing special about "6" to define the boundary of the outlier range. I just needed a specific number to make numerical comparisons. Clearly, \x\ > 7 or |x| > 8, say, would have been just as appropriate as outlier ranges.
8.7. HEAVY-TAILED DISTRIBUTIONS
227
Densities
Densities - detail
0.4r 0.3 0.2 0.1
0^
		0.025 0.02 0.015		
	----- normal ----- normal mix			----- normal ----- normal mix
				
		0.01		
		0.005 0		
-5            0            5           10
Normal plot - normal
4           6           8          10        12
Normal plot - normal mix
0.99/					
					::::::>•':*:
					-    ■     ir+
x-xP					*£  +
flUM			;		^        :
0.75 0.50 0.25			. y		
			S		
			jr		
		jj>	f		
U.1U	■■■■	tft			
w-wx		ß-			
	,-H				
1003	J*				
					
0
Data
Figure 8.4: Comparison on normal and heavy-tailed distributions.
228
CHAPTER 8. GARCH MODELS: 4/24/01
N(0, a2) random variable X,
P{\X\ >x} = 2(l-$(x/a)).
Therefore, for the normal distribution with variance 3.4,
P{\X\ > 6} = 2(1 - $(6/VŠÍ4)) = -0011.
For the normal mixture population which has variance 1 with probability .9 and variance 25 with probability .1 we have that
P{\X\ > 6} = 2{.9(1 - $(6)) + .1(1 - $(6/5)) = (.9)(0) + (.1)(.23) = .023.
Since .023/.001 ~ 21, the normal mixture distribution is 21 times more likely to be in this outlier range than the normal distribution.
Normal probability plots of samples of size 200 from the normal and the normal mixture distributions are shown in the bottom panels. Notice how the outliers in the normal mixture sample give the data a nonlinear, almost S-shaped, pattern. The deviation of the normal sample from linearity is small and is due entirely to randomness.
In this example, the variance is conditional upon which component of the mixture an observation comes from. The conditional variance is 1 with probability .9 and 25 with probability .1. Because the conditional variance is discrete, in fact, with only two possible values, the example was easy to analyze. The marginal distribution of a GARCH process is also a normal mixture, but with a continuous distribution of components correspondence to the continuous distribution of the conditional variance. Although GARCH processes are more complex than the simple model in this section, the same theme applies — conditional heteroscedasticity induces heavy-tailed marginal distributions even though the conditional distributions are light-tailed normal distributions.
8.8   Comparison of ARMA and GARCH processes
Table 8.8 compares Gaussian white noise, ARMA, GARCH, and ARMA/ GARCH processes according to various properties:  conditional means,
8.9. FITTING GARCHMODELS
229
conditional variances, conditional distributions, marginal means, marginal variances, and marginal distributions.
Property	Gaussian WN	ARMA	GARCH	ARMA/ GARCH
Cond. mean	constant	non-const	0	non-const
Cond. var	constant	constant	non-const	non-const
Cond. dist'n	normal	normal	normal	normal
Marg. mean & var.	constant	constant	constant	constant
Marg. dist'n	normal	normal	heavy-tailed	heavy-tailed
All of the processes are stationary so that their marginal means and variances are constant. Gaussian white noise is the "baseline" process. Because it is an independent process the conditional distributions are the same as the marginal distribution. Thus, its conditional means and variances are constant and both its conditional and marginal distributions are normal. Gaussian white noise is the "driver" or "source of randomess" behind all the other processes. Therefore, they all have normal conditional distributions just like Gaussian white noise.
8.9    Fitting GARCH models
A time series was simulated using the same program that generated the data in Figure 8.1, the only difference being that 300 observations were generated rather than only 60 as in the figure. The data were saved as "garch02.dat" and analyzed with SAS using the following program.
Listing of the SAS program for the simulated data
options linesize = 65 ;
data arch ;
infile 'C:\courses\or473\sas\garch02.dat' ;
input y ;
230
CHAPTER 8. G ARCH MODELS: 4/24/01
run ;
title 'Simulated ARCH(1)/AR(1) data' ;
proc autoreg ;
model y =/nlag = 1  archtest garch=(q=l);
run ;
This program uses the "autoreg" command that fits AR models. Since nlag = 1, an AR(1) model is being fit. However, the noise is not modeled as independent white noise. Rather an ARCH(l) model is used because of the specification "garch=(q=l)" in the "model" statement below the "autoreg" command. More complex G ARCH models can be fit using, for example, "garch=(p=2,q=l)." The specification "archtest" requests tests of ARCH effects, that is, tests the null hypothesis of conditional homoscedasticity versus the alternative of conditional heteroscedasticity.
The output from this SAS program are listed below. The tests of conditional homoscedasticity all reject with p-values of .0001 or smaller. The estimates are <f> = —.8226, which is +.8226 in our notation. This is close to the true value of 0.8.
The estimates of the ARCH parameters are So = 1.12 and Si = .70. The true values are a0 = 1 and a,\ = .95. The standard errors of the ARCH parameters are rather large. This is a general phenomenon; time series usually have less information about variance parameters than about the parameters specifying the conditional expectation. An approximate 95% confidence interval for a\ is
.70±(2)(0.117) = (.446, .934),
which does not quite include the true parameter, 0.95. This could have just been bad luck, though it may indicate that a\ is downward biased. The confidence interval is based on the assumption of unbiasedness and is not valid if there is a sizeable bias.
8.9. FITTING GARCH MODELS
Listing of the SAS output for the simulated data
Simulated ARCH(1)/AR(1) data                                       1
13:01 Wednesday, April 4, 2001
The AUTOREG Procedure
Dependent Variable   y
Ordinary Least Squares Estimates
SSE	2693.22931	D FE	
MSE	9.00746	Root MSE	3
SBC	1515.48103	AIC	1511
Regress R-Square	0.0000	Total R-Square	
Durbin-Watson	0.4373		
Q and LM Tests for ARCH Disturbances Order                            Q    Pr > Q                          LM   Pr > LM
1		119.7578	<.0001	118.6797	<.0001
2		137.9967	<.0001	129.8491	<.0001
3		140.5454	<.0001	131.4911	<.0001
4		140.6837	<.0001	132.1098	<.0001
5		140.6925	<.0001	132.3810	<.0001
6		140.7476	<.0001	132.7534	<.0001
7		141.0173	<.0001	132.7543	<.0001
8		141.5401	<.0001	132.8874	<.0001
9		142.1243	<.0001	132.8879	<.0001
10		142.6266	<.0001	132.9226	<.0001
11		142.7506	<.0001	133.0153	<.0001
12		142.7508	<.0001	133.0155	<.0001
Standard			Approx		
Vari	.able      DF		Estimate	Error	t Value
Intercet		3t         1	0.8910	0.1733	5.14
Pr
Estimates of Autocorrelations Lag   Covariance     Correlation
0        8.9774        1.000000
1        7.0075        0.780567
Estimates  of  Autocorrelations Lag         -198765432101234567891
1                        I                                                                                                             I****************                        I
232
CHAPTER 8. GARCH MODELS: 4/24/01
Simulated ARCH(1)/AR(1) data 13:01 Wednesday, April 4, 2001
The AUTOREG Procedure
Preliminary MSE      3.5076
Estimates of Autoregressive Parameters
Standard Lag     Coefficient
1       -0.780567
Error    t Value 0.036209     -21.56
Algorithm converged.
GARCH Estimates
SSE
MSE
Log Likelihood
SBC
Normality Test
1056.42037
3.52140
-549.43844
1121.69201
1.5134
Observations
Uncond Var
Total R-Square
AIC
Pr > ChiSq
300 3.72785257
0.6077 1106.87688
0.4692
Standard Variable
Intercept AR1 ARCHO ARCH1
DF
1 1 1 1
Approx Estimate
0.4810
-0.8226
1.1241
0.6985
Error   t Value   Pr > |t|
0.3910 0.0266 0.1729 0.1167
1.23
-30.92
6.50
5.98
0.2187 <.0001 <.0001 <.0001
8.9.1    Example: S&P 500 returns
This example is Example 10.5 in Pindyck and Rubinfeld (1998). The data are monthly from 1960 to 1996. The variables are the S&P 500 index (FSP-COM), the return on the S&P 500 (RETURNSP), the dividend yield on the S&P 500 index (FSDXP), the 3-month T-bill rate (R3), the change in the 3-month T-bill rate (DR3), the wholesale price index (PW), and the rate of wholesale price inflation (GPW). In this analysis, only RETURNSP, DR3, and GPW are used.
8.9. FITTING GARCHMODELS                                                           233
It is expected that variation in stock returns are in part caused by changes in interest rates and changes in the rate of inflation. Therefore, a regression model where RETURNSP is regressed on DR3 and GPW is used. Regression models that regress returns on macroeconomic variables in this way are sometimes called "factor models" — see Bodie, Kane, and Marcus (1999). Figure 8.5 shows the residuals from this regression. The residuals represent the part of the S&P 500 returns that cannot be explained by changes in interest rates and the inflation rate. In the figure, there is some sign of nonconstant volatility. Also, there is no reason to assume that the residuals are uncorrelated as is assumed in a standard regression model. At the very least, this assumption should be checked. If the data contradict the assumption, then a model with correlated errors should be used.
0.15 0.1
0.05
15
§       0
ID
I-0.05
<L>
-0.1 -0.15
lo         65         70         75         80         85         90         95
year
Figure 8.5: Residuals ivhen the S&P 500 returns are regressed against the change in the 3-month T-bill rates and the rate of inflation.
234
CHAPTER 8. G ARCH MODELS: 4/24/01
An analysis more appropriate for this data set is to use a regression model to specify the conditional expectation of RETURNSP given DR3 and GPW, but not to assume a "standard" regression model with errors (which the residuals estimate) that are independent white noise. Rather we will assume the model
RETURNSP = 7o + 7iDR3 + 72GPW + ut                   (8.7)
where ut is an AR(1)/GARCH(1,1) process.4 Therefore,
ut = 4>iut_i + au
where at is a GARCH(1,1) process:
a* = tt°t where
Below is a listing of the SAS program used to fit this model. The regression model with AR(1)/GARCH(1,1) errors is specified by the command:
proc autoreg ;
model returnsp = DR3 gpw/nlag = 1  archtest garch=(p=l,q=l);
In this command,
•  the statement "returnsp = DR3 gpw " specifies the regression model, that is, that "returnsp" is the dependent variable and "DR3" and "gpw" are the independent variables.
•  "nlag = 1" specifies the AR(1) structure.
•  "garch=(p=l,q=l)" specifies the GARCH(1,1) structure.
•  "archtest" specifies that tests of conditional heteroscedasticity be performed
4We denote the regression coefficients by gamma rather than beta, as is standard, because beta is used for parameters in the GARCH model for at-
8.9. FITTING GARCHMODELS
235
Listing of the SAS program
options linesize = 65 ;
data arch ;
infile 'C:\courses\or473\data\pindyckl05.dat' ;
input month year RETURNSP FSPCOM FSDXP R3 PW GPW;
DR3 = dif(R3) ;
run ;
title 'S&P 500 monthly data from Pindyck & Rubinfeld, Ex 10.5' ;
title2 'AR(l)/GARCH(1,1) model' ;
proc autoreg ;
model returnsp =/nlag = 1  archtest garch=(p=l,q=l);
run ;
title2 'Regression model with AR(1)/GARCH(1,1)' ;
proc autoreg ;
model returnsp = DR3 gpw/nlag = 1  archtest garch=(p=l,q=l);
run ;
The SAS output is listed below.
From examination of the output, the following conclusions can be reached:
•  The p-values of the Q and LM tests are all very small, less than .0001. Therefore, the errors in the regression model exhibit conditional het-eroscedasticity.
•  Ordinary least squares estimates of the regression parameters are:
Standard                                Approx
Variable      DF   Estimate      Error   t Value   Pr > |t|
Intercept	1	0.	.0120	0.001755	6	.86	< .	.0001
DR3	1	-0.	.8293	0.3061	-2	.71	0.	.0070
GPW	1	-0.	.8550	0.2349	-3	.64	0.	.0003
• Using residuals from the OLS estimates, the estimated residual autocorrelations are:
Estimates of Autocorrelations Lag    Covariance     Correlation
0       0.00108        1.000000
1      0.000253        0.234934
236                                            CHAPTER 8. G ARCH MODELS: 4/24/01
•  Also, using OLS residuals, the estimate AR parameter is:
Estimates of Autoregressive Parameters
Standard Lag    Coefficient                        Error    t Value
1       -0.234934        0.046929      -5.01
•  Assuming AR(1)/GARCH(1,1) errors, the estimated parameters of the regression are:
Standard                                Approx
Variable      DF   Estimate      Error   t Value   Pr > |t|
Intercept	1	0.	.0125	0.001875	6	.66	< .	.0001
DR3	1	-1	.0665	0.3282	-3	.25	0.	.0012
GPW	1	-0.	.7239	0.1992	-3	.63	0.	.0003
-  Notice that these differ slightly from OLS estimates.
-  Since all p-values are small, both independent variables are significant.
-  However, the Total R-square value is only 0.0551, so the regression has little predictive value.
The estimated GARCH parameters are:
ARl	1	-0.2016	0.0603	-3	.34	0.	.0008
ARCHO	1	0.000147	0.0000688	2.	.14	0.	.0320
ARCHl	1	0.1337	0.0404	3	.31	0.	.0009
GARCHl	1	0.7254	0.0918	7	.91	< .	.0001
- the estimate of <f> is —.2016 in SAS's notation but +.2016 in our notation. Thus, there is a positive association between returns and lagged returns
Since all p-values are small, all GARCH parameters are significant.
The GARCHl estimate (0.7254) is larger than the ARCHl (0.1337) estimate; this implies that the conditional variance will exhibit reasonably long persistence of volatility.
8.9. FITTING GARCH MODELS                                                           237
Listing of SAS output
S&P 500 monthly data from Pindyck & Rubinfeld, Ex 10.5     1 Regression model with AR(1)/GARCH(1,1)
17:04 Tuesday, April 10, 2001
The AUTOREG Procedure
Dependent Variable   RETURNSP
Ordinary Least Squares Estimates
SSE                 0.46677572    DFE                                                430
MSE                     0.00109    Root MSE                             0.03295
SBC                 -1711.5219    AIC                                 -1723.7341
Regress R-Square        0.0551    Total R-Square                  0.0551 Durbin-Watson          1.52 03
Q and LM Tests for ARCH Disturbances Order                            Q    Pr > Q                          LM    Pr > LM
1	26	.8804	< .	0001	26	.5159		<	.0001
2	27	.1508	< .	0001	27	.1519		<	.0001
3	28	.2188	< .	0001	28	.4391		<	.0001
4	28	.6957	< .	0001	28	.4660		<	.0001
5	33	.4112	< .	0001	32	.6168		<	.0001
6	34	.0892	< .	0001	32	.6962		<	.0001
7	34	.4187	< .	0001	32	.9617		<	.0001
8	34	.6542	< .	0001	32	.9636		<	.0001
9	35	.2228	< .	0001	33	.3330		0.	.0001
10	35	.3047	0.	0001	33	.4174		0.	.0002
11	35	.8274	0.	0002	33	.9440		0.	.0004
12	36	.0142	0.	0003	33	.9507		0.	.0007
				Standard					Approx
Variable	DF	Estimate			Error	t Value			Pr > |t|
Intercept	1	0.	.012C	)    0	.001755	6.	.86		<.0001
DR3	1	-0.	.8293		0.3061	-2.	.71		0.0070
GPW	1	-0.	.855C	)	0.2349	-3.	.64		0.0003
Estimates of Autocorrelations Lag   Covariance     Correlation
0       0.00108        1.000000
1      0.000253        0.234934
238
CHAPTER 8. GARCH MODELS: 4/24/01
S&P 500 monthly data from Pindyck & Rubinfeld, Ex 10.5     2 Regression model with AR(1)/GARCH(1,1)
17:04 Tuesday, April 10, 2001
The AUTOREG Procedure
Estimates of Autocorrelations
Lag    -198765432101234567891
1    I                                             I * * * * *                                  I
Preliminary MSE
0.00102
Estimates of Autoregressive Parameters
		Standard	
Lag	Coefficient	Error	t Value
1	-0.234934	0.046929	-5.01
Algorithm converged.
GARCH Estimates
SSE			0.44176656	Observations			433
MSE			0.00102	Uncond Var		0	.00104656
Log Likelihood			889.071523	Total R-Square			0.1058
SBC			-1735.6479	AIC			-1764.143
Normality	Test		43.0751	Pr > ChiSq Standard			<.0001 Approx
Variable	DF		Estimate	Error   t	Value		Pr > |t|
Intercept		1	0.0125	0.001875	6.	.66	<.0001
DR3		1	-1.0665	0.3282	-3.	.25	0.0012
GPW		1	-0.7239	0.1992	-3.	.63	0.0003
AR1		1	-0.2016	0.0603	-3.	.34	0.0008
ARCHO		1	0.000147	0.0000688	2.	.14	0.0320
ARCH1		1	0.1337	0.0404	3.	.31	0.0009
GARCH1		1	0.7254	0.0918	7.	.91	<.0001
8.10   I-GARCH models
I-GARCH or integrated GARCH processes were designed to model data that has persistent changes in volatility. A GARCH(p, q) process is station-
8.10. I-GARCHMODELS                                                                      239
ary with a finite variance if
g                   p
A GARCH(p, q) process is called an I-GARCH process if
g                   p
^ai + Y^ßi = 1-i=\            i=\
I-GARCH processes are either non-stationary or have an infinite variance.
Infinite variance implies heavy-tailed, though a distribution can be heavy-tailed with a finite variance. To appreciate what an infinite variance processes can look like, we will do some simulation. Figure 8.6 shows 40,000 observations of ARCH(l) processes with a,\ = .9, 1, and 1.8. The same white noise process is used in each of the ARCH(l) processes. All three ARCH(l) processes are stationary but only the one with a\ = .9 has a finite variance. The second process is an I-GARCH process (actually, I-ARCH since q = 0). The third process has a± > 1 and so is more extreme than an I-GARCH process. Notice how all three processes do revert to their conditional mean of 0. The larger the value of a\ the more the volatility comes in sharp bursts. The processes with a\ = .9 and a\ = 1 looks similar; there is no sudden change in behavior when the variance becomes infinite. The process with a,\ = .9 already has a heavy-tail despite having a finite variance. Increasing ct\ from 0.9 to 1 does not increase the tail weight dramatically.
Normal plots of the simulated data in Figure 8.6 are shown in Figure 8.7. Clearly, the larger the value of a,\, the heavier the tails of the marginal distribution.
CHAPTER 8. GARCH MODELS: 4/24/01
-150
0           0.5           1            1.5           2           2.5           3           3.5           4
a, = 1 1
200
100-
-100
0           0.5           1
x 10
x10
Figure 8.6: Simulated ARCH(l) processes with ot\ = .9,1, and 1.8.
po
crc"
1-4
rt>
00
í—-i
Ě
3
to
on
TO
S" "n
po O-,
Probability
WmoDoooooaiřP
Probability
I^RmcD o o o aoacPP S9i)> k) ai ^i iskrädSS ^ffóDo ai o oi aJoäaíJiQ
i   f
o o
oo o
O)
o
o
o
S
f-*
CD
-N
o

M
^
242
CHAPTER 8. G ARCH MODELS: 4/24/01
None of the processes in Figure 8.6 show much persistence of higher volatility. To model persistence of higher volatility, one needs an I-GARCH(p, q) process with q > 1. Figure 8.8 shows simulations from I-GARCH(1,1) processes. Since oil + ßi = 1 for these processes, ßi = 1 — «i, and the process is completely specified by a0 and o^. In this figure, a0 is fixed at 1 and «i is varied. Notice that the conditional variance is very bursty when ai = .95. When ai = .05, the conditional standard deviation looks somewhat like a random walk.
I-GARCH processes can be fit by SAS by adding the specification "type = integrated" into the program, e.g., for the previous example with S&P 500 returns:
proc autoreg ;
model returnsp =/nlag = 1 garch=(p=l,q=l,type=integrated) ;
run ;
For this example, the I-GARCH(1,1) model seems to fit worse than a G ARCH (1,1) model according to AIC; see Section 8.13.
8.10.1   What does it mean to have an infinite variance?
A random variable need not have a finite variance. Also, its expectation need not exist at all. To appreciate these facts, let X be a random variable with density fx- The expectation of X is
xfx{x)dx provided that this integral is defined. If
ŕ
/     xfx{x)dx = —oo                                   (8.8)
J—oo
and
l    xfx(x)dx = oo                                     (8.9)
Jo
8.10. I-G ARCH MODELS
243
a = 0.95, Conditional std dev
a., = 0.95, GARCH (1,1)
30
25-
20
15-
10
5
0
0
200 r
150-
100-
50-
I'Jjj
IL
500          1000         1500        2000
a, = 0.4, Conditional std dev
25				
20				
15				
10			III	
:	ill	w	hMim	m
500          1000         1500
a, = 0.2, Conditional std dev
2000
0            500          1000         1500
a = 0.05, Conditional std dev
2000
0     500    1000   1500   2000 a =0.2, GARCH(1,1)
300
200-
100
0
-100
-200
-300
4^^iwá
h^,,^.»^,.,.
0     500    1000   1500   2000 a =0.05, GARCH (1,1)
500    1000   1500   2000
500    1000   1500   2000
Figure 8.8: Simulations ofI-GARCH(l,l) processes.
244
CHAPTER 8. G ARCH MODELS: 4/24/01
then the expectation is, formally, — oo + oo which is not defined. If integrals on the left hand sides of (8.8) and (8.8) are both finite, then E(X) exists and equals the sum of these two integrals.
Exercise
Suppose that fx{x) = 1/6 if \x\ < 1 and fx{x) = l/(6o;2) if \x\ > 1. Show that
/oo fx{x)dx = 1 -oo
so that fx really is a density, but that
ľ°
/     xfx{x)dx = —oo
J—oo
and
roo
/    xfx(x)dx = oo Jo
One consequence of the expectation not existing is this. Suppose we have a sample of iid random variables with density fx- The law of large numbers says that the sample mean will converge to E(X) as the sample size goes to infinity. However, the law of large numbers holds only if E{X) is defined. Otherwise, there is no point to which the sample mean can converge and it will just wander without converging.
Figure 8.9 shows the sample mean of the first t observations plotted against t for the data in Figures 8.6 and 8.10. The sample mean appears to converge to 0 when a,\ = .9 or 1, but when «i = 1.8 it is unclear what the sample mean is doing. The sample mean decays towards 0 when the process is not in a high volatility period, but can shoot up or down during a burst of volatility.
8.10. I-G ARCH MODELS
245
-0.5
x 10
Figure 8.9: Sample means of simulated ARCH(l) processes with a,\ = .9,1, and 1.8.
246
CHAPTER 8. G ARCH MODELS: 4/24/01
Now suppose that the expectation of X exists and equals {ix- Then the variance of X equals
/oo (x - (ix)2fx{x)dx. -oo
If this integral is +oo, then the variance is infinite.
The law of large numbers also implies that the sample variance will converge to the variance of X as the sample size increases. If the variance of X is infinity, then the sample variance will converge to infinity.
Figure 8.10 shows the sample variance of the first t observations plotted against t for the data in Figure 8.6.
In the top panel, the sample variance should be converging to 10 = (1 — ai)-1. Maybe it is converging to 10, but it is hard to tell even with 40,000 observations. In the middle and bottom panels the variance is infinity so the sample variance will converge to infinity. This convergence does appear to be happening in the bottom panel, but it is hard to see in the middle panel. Of course, in the middle panel the value of a± is on the borderline between finite and infinite variance, and the infinite variance may take a very long time to have its effect.
8.11    GARCH-M processes
We have seen that one can fit regression models with AR/ G ARCH errors. In fact, we have done that with the S&P 500 data. In some examples, it makes sense to use the conditional standard deviation as one of the regression variables. For example, when the dependent variable is a return we might expect that higher conditional variability will cause higher returns. This is because the market demands a higher risk premium for higher risk.
Models where the conditonal standard deviation is a regression variable
8.11. GARCH-M PROCESSES
247
150
100
x 10
Figure 8.10: Sample variances of simulated ARCH(l) processes with a\ = .9,1, and 1.8.
248
CHAPTER 8. G ARCH MODELS: 4/24/01
are called GARCH-in-mean, or GARCH-M, models. They have the form
Yt = Xjj + 6at + at,
where at is a G ARCH process with conditional standard deviation ot.
GARCH-M models can be fit in SAS by adding the keyword "mean" to the GARCH specification, e.g.,
proc autoreg ;
model returnsp =/nlag = 1 garch=(p=l,q=l,mean);
run ;
or for I-GARCH-M
proc autoreg ;
model returnsp =/nlag = 1 garch=(p=l,q=l,mean,type=integrated);
run ;
For the S&P 500 returns data, a GARCH(1,1)-M was fit in SAS. The estimate of 6 was .5150 with a standard error of .3695. This gives a t-value of 1.39 and a p-value of .1633. Since the p-value is reasonably large we could accept the null hypothesis that 5 = 0. Therefore, we see no strong evidence that there are higher returns during times of higher volatility. The volatility of the S&P 500 is market risk so this finding is a bit surprising. It may be that the effect is small (6 is positive, after all) and cannot be detected with certainty. The AIC criterion does select the GARCH-M model; see Section 8.13.
8.12   E-GARCH
The exponential GARCH, or E-GARCH, model is
q                                p
iog(o-ŕ) = a0 + Y,aig(et-i) + EAlogM,
where
<7(et)=0et + 7{|et|-E(|et|)}
8.12. E-GARCH
249
and et = at/at. Since log(at) can be negative, there are no constraints on the parameters.
Notice that
g{et) = -1E{\et\) + {1 + 9)\et\    if    et > 0,
and
g(et) =--yE(\et\) + fr - 0)\et\    if    et < 0,
It is a good calculus exercise to show that £"(^1) = J2/tt = .7979.
Typically, — 1 < 6 < 0 so that O<7+0<7— 9. For example, 9 = —.7 in the S&P 500 example; see below. The function g with 9 = —.7 is plotted in the top left panel of Figure 8.11. Notice that g(et) is negative if |et| is close to zero; small values of noise decrease at. If |et| is large, then at increases. With a negative value of 9, at increases more rapidly as a function of | et | when et is negative than when et is positive,
In finance, the "leverage effect" predicts that a asset's price will become more volatile when its price decreases. This is the type of behavior obtained when 9 < 0. The ability to acommodate leverage effects was the reason that the E-GARCH model was introduced by 'Daniel Nelson.
The function g for several other values of 9 are also shown in Figure 8.11. When 9 = 0 (top right) the function is symmetric about 0. The bottom right panel where 9 = — 1 shows an extreme case where g(et) is negative for all positive et
SAS fits the E-GARCH model with 7 fixed as 1 and 9 estimated. The E-GARCH model is specified by using "type=exp" as in
proc autoreg ;
model returnsp =/nlag = 1 garch=(p=l,q=l,mean,type=exp);
run ;
This command specifies both a GARCH-in-mean effect and the E-GARCH model. Omitting "mean" removes the GARCH-in-mean effect.
250
CHAPTER 8. GARCH MODELS: 4/24/01
	e =	-0.7		e =	= 0
6			6		
					
5			5		
4			4		
CO			"„3	V	/
ra  2 1 0 -1			ra  2 -1	\	/
				\	/
		e =	0.7		e =	-1
6				6		
						
5				5		
4				4		
•~^3 to				^3 to		
ra  2				ra  2		
1 0 -1				1 0 -1		
						
Figure 8.11: T/ze g function f or the S&P 500 data (top left panel) and several other values of 9.
8.13   Back to the S&P 500 example
SAS can fit six different AR(1)/GARCH(1,1) models since SAS allows "type' to be "integrated," "exp," or "nonneg." The last is the default and specifies a GARCH model with non-negativity constraints. Moreover, for each of these three types we can specify that a GARCH-in-mean effect be included or not. Table 8.1 contains the AIC statistics for the six models. The models are ordered from best fitting to worse fitting according to AIC—remember that a smaller AIC is better.
It seems that the E-GARCH-M model is best, though the E-GARCH model fits nearly as well. The E-GARCH-M model will be used in the remaining discussion. To see if more AR or GARCH parameters would improve the fit, AR(2) and E-GARCH(1,2)-M, E-GARCH(2,1)-M, and E-GARCH(2,2)-M models were tried, but none of these lowered AIC or had all parameters significant at p = .1. Thus, AR(1)/E-GARCH(1,1) appears to be a good fit to the noise and adding a GARCH-in-mean term to the regression model
8.13. BACK TO THE S&P 500 EXAMPLE                                             251
Model	AIC	A AIC
E-GARCH-M	-1783.9	0
E-GARCH	-1783.1	0.8
GARCH-M	-1764.6	19.3
GARCH	-1764.1	19.8
I-GARCH-M	-1758.0	25.9
I-GARCH	-1756.4	27.5
Table 8.1: AIC statistics for six AR(1)/GARCH(1,1) models fit to the S&P 500 returns data. A AIC is the change in AIC between a given model and E-GARCH-M.
seems reasonable although it does not improve the fit very much.
The fit to this model is in the SAS output listed below.
252                                            CHAPTER 8. G ARCH MODELS: 4/24/01
Listing of SAS output for the E-GARCH-M model:
S&P 500 monthly data from Pindyck & Rubinfeld, Ex 10.5     2 Regression model with AR(1)/E-GARCH(1,1)-M
11:52 Sunday, April 15, 2001
The AUTOREG Procedure
Estimates of Autoregressive Parameters
Standard Lag    Coefficient                        Error   t Value
1       -0.234934        0.046929      -5.01
Algorithm converged.
Exponential GARCH Estimates
SSE			0.44211939	Observations			433
MSE			0.00102	Uncond Var			
Log Likel:	Lhood		900.962569	Total R-Square			0.1050
SBC			-1747.2885	AIC			■1783.9251
Normality	Test		24 .9607	Pr > ChiSq Standard			<.0001 Approx
Variable	]	DF	Estimate	Error	t	Value	Pr > 11 |
Intercept		1	-0.003791	0.0102		-0.37	0.7095
DR3		1	-1.2062	0.3044		-3.96	<.0001
GPW		1	-0.6456	0.2153		-3.00	0.0027
AR1		1	-0.2376	0.0592		-4.01	<.0001
EARCHO		1	-1.2400	0.4251		-2.92	0.0035
EARCH1		1	0.2520	0.0691		3.65	0.0003
EGARCHl		1	0.8220	0.0606		13.55	<.0001
THE TA		1	-0.6940	0.2646		-2.62	0.0087
DELTA		1	0.5067	0.3511		1.44	0.1490
8.14.  THE G ARCH ZOO
253
8.14    The GARCH zoo
There are many more types of GARCH models than the few mentioned so far. I've discussed only the most widely used models that can be fit in SAS. The number of models seems limited only by the number of letters in the alphabet, not the imagination of econometricians! Here's a sample of other GARCH models mentioned in Bollerslev, Engle, and Nelson (1994):
•  Q ARCH = quadratic ARCH
•  TARCH = threshold ARCH
•  STARCH = structural ARCH
•  SWARCH = switching ARCH
•  QTARCH = quantitative threshold ARCH
•  vector ARCH
•  diagonal ARCH
•  factor ARCH
8.15    Applications of GARCH in finance
GARCH models were developed by econometricians working with business and finance data, and their applications to finance have been ex-tenisve. The review paper by Bollerslev, Engle, and Nelson lists hundreds of references.
Finance models such as the CAPM and the Black-Scholes model for option pricing assume a constant conditional variance. When this assumption is false, use of these models can lead to serious errors. Therefore, generalization of finance models to include GARCH errors has been a hot topic. See Bollerslev, Engle, and Woolridge (1988) and Duan (1996a, 1996b) for some examples of finance models with conditional heteroscedasticity.
254
CHAPTER 8. G ARCH MODELS: 4/24/01
Rossi (1996) is a collection of papers, many reprinted from finance journals, on modeling stock market volatility with GARCH models.
8.16    Summary
•  The marginal, or unconditional, distribution of a stationary process is the distribution of an observation from the process given no information about the previous or future observations
-  by stationarity the marginal distribution must be constant
-  in particular, the marginal mean and variance are constant
•  Besides the marginal distribution, we are interested in the conditional distribution of the next observation given the current information set of present and past values of the process, and perhaps of other processes
•  For ARMA processes the conditional mean is non-constant but the conditional variance is constant
•  The constant conditional variance of ARMA processes makes them unsuitable for modeling the volatility of financial markets
•  GARCH process have non-constant conditional variance and were developed to model volatility
•  GARCH processes can be used as the "noise" term of an ARMA process
-  ARMA/GARCH processes have both non-constant conditional mean and a non-constant conditional variance
-  GARCH and ARMA/GARCH processes can be estimated by maximum likelihood.
-  Proc Autoreg in SAS fits AR/GARCH models
8.16.
SUMMARY
255
•  The simple ARCH(g) models have burst of volatility but cannot model persistent volatility
•  The generalized ARCH (G ARCH) models can model persistent volatil-ity
•  The marginal distribution of a GARCH process has heavier tails than the normal distribution.
-  heavy tails = outlier prone
-  in fact, for certain parameter values a GARCH process will have an infinite variance, which is an extreme case of heavy tails
*  I-GARCH (integrated GARCH) models are examples of GARCH models with infinite variance
•  If the marginal variance is infinite, then the sample variance will converge to infinity as the sample size increase
•  For extremely heavy tails, the marginal expectation may not exist
-  then there exists no point to which the sample mean can converge
*  the sample mean will wander aimlessly
•  ARMA/GARCH processes can be used as the noise term in regression models
-  SAS's Proc Autoreg can use an AR/GARCH noise term in a regression model
•  The G ARCH-M models use the conditional standard deviation as an independent variable in the regression
•  The "leverage effect" occurs when a negative return (drop in price) increases the volatility of future returns because the denominator of those returns is smaller.
•  E-GARCH models were designed to capture the leverage effect
256                                            CHAPTER 8. GARCH MODELS: 4/24/01
-  in an E-GARCH model, the log of the conditional standard deviation is modeled as an ARMA process but with the white noise process et replaced by another white noise process g(et)
-  there is no need for non-negativity constraints on the parameters, such as those in an ordinary GARCH model, since the log standard deviation can be negative
-  the parameter 9 in an E-GARCH model determines the leverage effects
*  9 < 0 => leverage effects
*  9 = 0 => no leverage
*  9 > 0 =>■ positive returns increase volatiltiy (this would be the opposite of the leverage effect and is not expected to happen in practice)
•  In the S&P 500 example we found that
-  returns are negatively associated with changes in interest rates (an increase in interest rates decreases returns)
-  returns are negatively associated with changes in wholesale prices
-  returns are positively associated with returns lagged one month ((f) = —.2376 is negative in the SAS output, but SAS's definition of <j) is the negative of ours—our 4> is +.2376)
-  there are leverage effects since a E-GARCH model fits better than a GARCH model and ó = -.7
-  there is slight evidence of a GARCH-in-mean effect, that is, there is some reason to believe that there is a risk premium
•  There is a wide variety of other GARCH models in the literature, but the ones discussed here, ARCH(?), GARCH (p, q), E-GARCH, GARCH-M, and I-GARCH, are probably enough to know about since they can model a wide variety of data types
-  the models discussed in these notes are the ones that can be fit by SAS
8.17. REFERENCES
257
• There is a large and growing literature on financial models with returns following GARCH processes
8.17   References
Bodie, Z., Kane, A., and Marcus, A. (1999), Investments, Irwin /McGraw-Hill, Boston.
BoUerslev, T. (1986), Generalized autoregressive conditional heteroskedas-ticity, /. of Econometrics, 31, 307-327.
BoUerslev, T., Engle, R.R, and Nelson, D.B. (1994), ARCH models, in Handbook of Econometrics, Volume IV, Engle, R.R, and McFadden, D.L., editors, Elsevier.
BoUerslev, T., Chou, R.Y, and Kroner, K.R (1992), ARCH modeling in finance, /. of Econometrics, 52,5-59.
BoUerslev, T., and Engle, R.R (1993), Common persistence in conditional variances, Econometrica, 61,167-186.
BoUerslev, T., Engle, R.R, and Nelson, D.B. (1994), ARCH models, In Handbook of Econometrics, Vol IV, R.R Engle and D.L. McFadden, ed., Elsevier.
BoUerslev, T., Engle, R.R, and Wooldridge, J.M. (1988). A capital asset pricing model with time-varying covariances, /. of Political Economy, 96,116-131.
Carroll, R.J., and Ruppert, D. (1988), Transformation and Weighting in Regression, Chapman & Hall, New York.
Duan, J-C. (1996a). A unified theory of option pricing under stochastic volatility — from GARCH to diffusion, manuscript, (available at http://www.bm.ust.hk/ fina/staff/jcduan.html)
258
CHAPTER 8. GARCH MODELS: 4/24/01
Duan, J-C. (1996b). Term structure and bond option pricing under GARCH, manuscript, (available at http://www.bm.ust.hk/ fina/staff/jcduan. html)
Enders, W. (1995), Applied Econometric Time Series, Wiley, New York.
Engle, R.E (1982), Autoregressive conditional heteroskedasticity with estimates of variance of U.K. inflation, Econometrica, 50, 987-1008.
Nelson, D.B. (1989). Modelling stock market volatility changes, ASA 1989 Proceedings of the Business and Economics Statistics Section, pp. 93-98. [Reprinted in Rossi (1996)]
Pindyck, R.S., and Rubinfeld, D.L., (1998), Econometric Models and Economic Forecasts, Irwin/McGraw Hill, Boston.
Rossi, P.E. (1996). Modelling Stock Market Volatility, Academic Press, San Diego.
SAS Institute (1993), SAS/ETS User's Guide, Version 6, 2nd Edition, SAS Institute, Cary, NC.
Chapter 9
Fixed Income Securities: 4/30/01
9.1    Introduction
Corporations finance their operations by selling stock and bonds. Owning a share of stock means partial ownership of the company. You share in both the profits and losses of the company, so nothing is guaranteed.
Owning a bond is different. When you buy a bond you are loaning money to the corporation. The corporation is obligated to pay back the principle and to pay interest as stipulated by the bond. You receive a fixed stream of income, unless the corporation defaults on the bond. For this reason, bonds are called "fixed-income" securities.
It might appear that bonds are risk-free, almost stodgy. This is not the case. Many bonds are long-term, e.g., 20 or 30 years. Even if the corporation stays solvent or if you buy a US Treasury bond where default is virtually impossible, your income from the bond is guaranteed only if you keep the bond to maturity. If you sell the bond before maturity, your return will depend on changes in the price of the bond due to changes in interest rates.
The interest rate of your bond is fixed, but in the market interest rates fluctuate. Therefore, the market value of your bond fluctuations too. For
259
260
CHAPTER 9. FIXED INCOME SECURITIES: 4/30/01
example, if you buy a bond paying 5% and the rate of interest increases to 6% then your bond is inferior to new bonds offering 6%. Consequently, the price of your bond will decrease. If you sell the bond you would lose money. So much for a "fixed income" stream!
If you ever bought a CD, which really is a bond that you buy from a bank or credit union, you will have noticed that the interest rate depends on the maturity of the CD. This is a general phenomenon. For example, on March 28, 2001, the interest rate of Treasury bills1 was 4.23% for 3-month bills. The yields on Treasurys were 4.41%, 5.01%, and 5.46% for 2, 10, and 30 year maturities, respectively. The term structure of interest rates describes how rates of interest change with the maturity of bonds.
In this chapter we will study how bond prices fluctuate due to interest rate changes. We will also study how the term structure of interest rates can be determined.
9.2   Zero coupon bonds
Zero-coupon bonds, also called pure discount bonds, pay no principle or interest until maturity. A "zero" has a par value which is the payment made to the bond holder at maturity. The zero sells for less than par, which is the reason it is a "discount bond."
For example, consider a 20-year zero with a par value of $1000 and 6% interest compounded annually. The price is the present-value of $1000 with discounting annually at 6%. That is, the price is
7^ = »311.80.
(1.06)20
1 Treasury bills have maturities of one year or less, Treasury notes have maturities from one to ten years, and Treasury bonds have maturities from 10 to 30 years.
9.2. ZERO COUPON BONDS
261
If the interest is 6% but compounded every six months, then the price is
$1000
just = $306'56'
and if the interest is 6% compounded continuously then the price is
?™    n = $301.19.
exp{(.06)(20)}
9.2.1    Price and returns fluctuate with the interest rate
For concreteness, assume semi-annual compounding. Suppose you just bought the zero for $306.56 and then six months later the interest rate increased to 7%. The price would now be
T^š = «61.41
(1.035)39
so your investment would drop by ($306.56 — $261.41) = $45.15. You will still get your $1000 if you keep the bond for 20 years, but if you sell it now you will lose $45.15. This is a return of
-45.15
306.56
-14.73%
for a half-year or —29.46% per year. And the interest rate only changed from 6% to 7%!
If the interest rate dropped to 5% after six months, then your bond would be worth
^ooo   _ .o81 74
(LÖ25F " This would be an annual rate of return of
'381.74-306.56
306.56
49.05%.
If the interest rate remained unchanged at 6%, then the price of the bond would be
$100°       $315.75.
11.03)
39
262
CHAPTER 9. FIXED INCOME SECURITIES: 4/30/01
The annual rate of return would be
'315.75-306.56
70.
306.56
Thus, if the interest rate does not change, you can earn a 6% annual rate of return by selling the bond before maturity. If the interest rate does change, however, the 6% annual rate of return is guaranteed only if you keep the bond until maturity.
9.3   Coupon bonds
Coupon bonds make regular interest payments.2 Coupon bonds generally sell at par when issued. At maturity, one receives the principle and the final interest payment.
As an example, consider a 20-year coupon bond with a par value of $1000 and 6% annual interest with semi-annual coupon payments. Each coupon payment will be $30. Thus, the bond holder receives 40 payments of $30, one every six months plus a principle payment of $1000 after 20 years. One can check that the present value of all payments, with discounting at the 6% annual rate (3% semi-annual), equals $1000:
40     30           1000
t=i v1-
w + 7-------OTT = 1000.
03)*      (1.03)40
After six months if the interest rate is unchanged, then the bond (including the first coupon payment which is now due) is worth
^    30           1000    _            (™     30           1000   \ _
to ÖW + ÍL03F " (L03) [t VWf + PF] " 1030' which is a 6% annual return as expected. If the interest rate increases to 7%, then after six months the bond (plus the interest due) is only worth
^      30             1000     _             /i°,      30         _L000_\
t (1-035)* + (1.035)39 " (LUá5J \t (1-035)* + (1.035)40J " 9       ^
2At one time actual coupons were attached to the bond, one coupon for each interest payment. When a payment was due, its coupon could be clipped off and sent to the issuing company for payment.
9.3. COUPON BONDS
263
This is an annual return of
Z924.49 - 1000^ =
V     looo     J
If the interest rate drops to 5% after six months then the investment is worth
^      30             1000     _ n no^ [^     30             1000    \ _
f-0 (1.025)* + (1.025)39 - (LU25j [^ (1.025)* + (1.025)4°J " ^ 15d'7U'
(9.1) and the annual return is
/1153.6-1000\ V        1000
2 I-------'t^-------) = 30.72%.
Some general formulas
Let's derive some useful formulas. If a bond with a par value of PAR matures in T years and makes semi-annual payments of C and the discount rate (rate of interest) is r per half-year, then the value of the bond when it is issued is
IT
C                 PAR         C <       ,        ,  o^i         PAR
t=1 (1 + r)*    +    (l + r)2T = 7l1"^1+r')      )+(r+r)2r
=   ^ + |pAR-^}(l + r)-2T                             (9.2)
If C7 = PARxr, then the value of the bond when issued is PAR. The value six months later is (1 + r) times the value in equation (9.2). The MAT-LAB function "bondvalue.m" computes (9.2). The call to this function is
bondvalue(c,T,r,par).
For example, if the coupon is C = 30, if T = 30, and if after six months r = 6.2%/half-year (or 3.1%/year), then the bond is worth
(1.031)
30-{1-(1.031)-«°}        100°
.031 L       ' '      '     J      (1.031)60 This value was computed by MATLAB with the call
1003.1.
264
CHAPTER 9. FIXED INCOME SECURITIES: 4/30/01
1.031*bondvalue(30,30,.031,1000). Similarly, (9.1) was computed with the MATLAB call
1.025*bondvalue(30,20,.025,1000).
Derivation of (9.2)
The summation formula for a finite geometric series is
T             -i   _    T+l
provided that r ^ 1. Therefore,
^     C        =      C   2^      1             C{l-(l + r)-2ľ}
Ží(l + r)*          1 + r t^  (1 + r)*      (1 + r)(l - (1 + r)-i)
(9.4)
r
{l-(l + r)-^}.                                           (9.5)
9.4   Yield to maturity
Suppose a bond with T = 30 and C = 40 is selling for $1200, $200 above par. If the bond were selling at par, then the interest rate would be .04/half-year (= .08/year). The 4%/half-year rate is called the coupon rate.
But the bond is not selling at par. If you purchase the bond at $1200 you will make less than 8% per year interest. There are two reasons when the rate of interest is less than 8%. First, the coupon payments are $40 or 40/1200 = 3.333%/half-year of the $1200 investment; 3.333% is called the current yield. Second, at maturity you only get back $1000 of the $1200 investment. The current yield overestimates the return since it does not account for this loss of capital.
The yield to maturity is a measure of the average rate of return, including the loss (or gain) of capital because the bond was purchased above (or
9.4.  YIELD TO MATURITY
265
below) par. For this bond, the yield to maturity is the value of r that solves
40      f___     40'
1200
^ + {l000-^}(l + r)-60.                       (9.6)
The right hand side of (9.6) is (9.5) with C = 40, T = 30, and PAR = 1000. It is easy to solve equation (9.6) numerically. The MATLAB program yield.m does the following:
•  computes the bond price for each r value on a grid
•  graphs bond price versus r (this is not necessary but it's fun to see the graph)
•  interpolates to find the value of r when bond value equals 1200
One finds that the yield to maturity is 0.0324. Figure 9.1 shows the graph of bond price versus r and shows that r = .0324 maps to a bond price of $1200.
The yield to maturity of .0324 is less than the current yield of 0.0333 which is less than the coupon rate of 40/1000 = .04. (All three rates are rates per half-year.) Thus, we see that
•  coupon rate > current yield
-  since the bond sells above par
•  current yield > yield to maturity
-  since yield to maturity accounts for the loss of capital when at the maturity date you only get back $1000 of the $1200 investment
Whenever, as in this example, the bond is selling above par, we have
coupon rate > current yield > yield to maturity              (9.7)
Everything is reversed if the bond is selling below par. For example, if the price of the bond were only $900, then
266
CHAPTER 9. FIXED INCOME SECURITIES: 4/30/01
par=1000, coupon payment=40, T=30
1800
1600-
1400
1200
1000
800
0.02       0.025       0.03       0.035       0.04       0.045       0.05 yield to maturity
Figure 9.1: Bond price versus the interest rate r and determining by interpolation the yield to maturity when the price equals $1200.
•  the yield to maturity would be 0.0448 (as before, this value can be determined by "yield.m" using interpolation)
•  the current yield would be 40/900 = 0.0444
•  The coupon rate would still be 40/1000 = .04
Therefore we would have
coupon rate < current yield < yield to maturity, which is just the opposite of (9.7).
(9.8)
9.4.1    Spot rates
The yield to maturity of a zero coupon bond of maturity n years is called the n-year spot rate.
9.5.  TERM STRUCTURE
267
A coupon bond is a bundle of zero coupon bonds, one for each coupon payment and a final one for the priniciple payment. The component zeros have different maturity dates and therefore different spot rates. The yield to maturity of the coupon bond is a complex "average" of the spot rates of the zeros in this bundle.
9.5   Term structure
On January 26, 2001, the Ithaca Journal stated that 1-year T-bill rate was 4.83% and the 30-year Treasury bond rate was 6.11%. This is typical— short and long term rates usually do differ. Such differences can be seen in Figure 10.5 of Campbell et al. or Figure 15.7 of Bodie, Kane, and Marcus (1999).
Often short term rates are lower than long-term rates. This makes sense since long term bonds are riskier. Long term bond prices fluctuate more with interest rate changes and these bonds are often sold before maturity. In contrast, a 90-day or even 1-year T-bill is often keep to maturity and so is really a risk-free "fixed income security." However, during periods of very high short-term rates, the short-term rates may be higher than the long term rates. The reason is that the market believes that rates will return to historic levels and no one will commit to the high interest rate for, say, 20 or 30 years.
The term structure of interest rates is a description of how, at a given time, yield to maturity depends on maturity. Term structure for all maturities up to n years can be described by any one of the following:
•  prices of zero coupon bonds of maturities 1-year, 2-years,..., n-years denoted here by P(l), P(2),..., P{n)
•  spot rates (yields of maturity of zero coupon bonds) of maturities 1-year, 2-years,... , n-years denoted by yi,..., yn
• forwards rates v\,..., r.
268
CHAPTER 9. FIXED INCOME SECURITIES: 4/30/01
As will be seen below, each of the sets . {P(l),...,P(n)}
•  {yi,---,yn}, and
•  {ru...,rn}
can be computed from either one of the other sets. For example, (9.10) gives {.P(l),.. •, P(n)} in terms of {ri:..., rn}. Also, equations (9.11) and (9.12) give {yi,..., yn} in terms of {P(l),..., P(n)} or {n,..., rn}, respectively.
Term structure can be described by breaking down the time interval between the present time and the maturity time of a bond into short time segments with a constant interest rate within each segment, but with interest rates varying between segments. For example, a 3-year loan can be considered as three consecutive 1-year loans.
Example:
As an illustration, suppose that the three 1-year loans have the forward interest rates listed in Table 9.1.
Year (i)     Interest rate (r,)
1	6%
2	7%
3	8%
Table 9.1: Forward interest rate example
Using the forward rates in Table 9.1, we see that a par $1000 1-year zero would sell for
J292. = 129° = «Ma.« = p(i).
1 + ri       1.06                         V '
9.5.  TERM STRUCTURE
269
A par $1000 2-year zero would sell for 1000                    1000
(l + ri)(l + r2)      (1.06)(1.07) A par $1000 3-year zero would sell for
1000                              1000
= $881.68 = P(2).
(1 + n)(l + r2)(l + r3)      (1.06)(1.07)(1.08)
816.37 = P(3).
The general formula for the present value of $1 paid n periods from now is
7---------77-------—^------7----------7-                                     (9-9)
{l + ri)(l + r2)---{l + rn)
Here r, is the forward interest rate during the ith period.  By "forward rate" we mean the price for that period that is agreed upon now.
Letting P(n) be the price of an n-year zero par $1000 coupon bond, P(n) is $1000 times the discount factor in (9.9), that is,
P{n) = 7_____™»_____T.                            (9.10)
^  ;      (l + ri)...(l + rn)
Back to the example
Let's look at the yields to maturity. For a 1-year zero, the yield to maturity yi solves
1000        993.40,
(l + »i)
which implies that yi = .06. Nothing surprising here, since r\ = .06! For a 2-year zero, the yield to maturity is y2 that solves
1000
(1 + 02)1 Thus,
= 881.68.
V2 = nm. _! = ,06Ba
y       V 881.68
270
CHAPTER 9. FIXED INCOME SECURITIES: 4/30/01
It is easy to show that y2 is also given by
V2 = y/(l + r1)(l + r2) - 1 = v/(1.06)(1.07) - 1 = .0650
For a 3-year zero, the yield to maturity y3 solves
1000          1000
(I + 2/3)3      881.68' Also,
y3 = {(1 + n)(l + r2)(l + r3)}1/3 - 1 = {(1.06)(1.07)(1.08)}1/3 - 1 = .0700,
or, more precisely .069969. Thus, (1 + y3) is the geometric average of 1.06, 1.07, and 1.08 and approximately equal to their arithmetic average.
Recall that P(n) is the price of a par $1000 n-year zero coupon bond. The general formulas for the yield to maturity yn of an n-year zero are
(ioool1/n		(9.11)
{(l + r1)...(l + rn)}1/n-	-1.	(9.12)
and
Equations (9.11) and (9.12) give the yields to maturity in terms of the bond prices and forward rates, respectively. Also,
n,   ^         100°                                         ,„-,™
P(n) =---------—,                                    (9.13)
which give P(n) in terms of the yield to maturity.
As mentioned before, interest rates for future years are called forward rates. A forward contract is an agreement to buy or sell an asset at some fixed future date at a fixed price. Since r2, r3,... are rates at future dates that are fixed now when a long-term bond is purchased, they are forward rates.
9.5.  TERM STRUCTURE
271
maturity    price
lyear       $920
2 year       $830
3 year       $760
Table 9.2: Bond price example
The general formula for determining forward rates from yields to maturity is
n = yu                                            (9.14)
and
-   -     <1 + *>"                                        (9.15)
"      (1 + !/„-i)-''
Now suppose that we only observed bond prices. Can we calculate yields to maturity and forward rates? The answer is "yes/ using (9.11) and then (9.15)."
Example:
Suppose that 1, 2, and 3-year par 1000 zeros are priced as Table 9.2. Then using (9.11), the yields to maturity are:
1000
ž/i =---------1 = .087,
y        920
r looo Ý12
y2 = \-^r\     - 1 = -0976,
I 830 J
r 10001 176ÖJ
r íooo i1/3
ž/3 =    ^77        - 1 = -096,
Then, using (9.14) and (9.15)
n = yi = .087,
il^ = (L0976P _ (1 + i/i)         1.0876
272
CHAPTER 9. FIXED INCOME SECURITIES: 4/30/01
and
(I + 2/3)3     .       (1-096)3             nQ„
r3=(ľT^-1 = (rô976F"1 = -093-
1.                                   (9.16)
The formula for finding rn from the prices of zero coupon bonds is
_ P(n-l) Tn -     P(n)
which can derived from
1000
P{n)
:i + ri)(l + r2).-.(l + rn)'
and
P(n-l) =________1222________
1         ]      (l + r1)(l + r2).-.(l + rn_1)-
To calculate r\ using (9.16), we need P(0), the price of a 0-year bond, but P(0) is simply the par value. (Trivially, a bond which must be be paid back immediately is worth exactly its par value.)
Example
Thus, using (9.16)
1000
7-1 =
r 2 = and
n =
920	— 1 — .UÖ/U
920 830	- 1 = .1084,
830	- 1 = .0921.
760
9.6    Continuous compounding
Now we will assume continuous compounding with forward rates ri:..., r„ We will see that the use of continous compounding rates simplifies the
9.6. CONTINUOUS COMPOUNDING
273
relationships between the forward rates, the yields to maturity, and the prices of zero coupon bonds.
If P(n) is the price of a $1000 par n-year zero coupon bond, then
P(B) =            1M°--------                      „.IT)
exp(ri +r2^-------\-rn)
Therefore,
P(n - 1)        exp(rx + • • • + r„)             ,    ,
p/ x    = —^——7-------\ = exP(rn),                  (9-18)
P(n)         exp(ri H-------hrn_i)
and
gl    P(n)    J      rn-
The yield to maturity of an n-year zero coupon bond solves the equation
P(n) =     10Q°
exp(nyn)'
and is easily seen to be
ž/n = (rH-------^rn)/n.
Therefore, {j"i,..., rn} is easily found from {?/i,..., yn} by the relationship
n = ž/n,
and
fn = «ž/n - (n - l)ž/n-l      for     U > 1.
Example
Using the prices in Table 9.2 (converted from par 1000 to par 1) we have P (I) = .930, P(2) = .850, and P(3) = .760. Therefore,
r'=MiöH726<
274
CHAPTER 9. FIXED INCOME SECURITIES: 4/30/01
and
Also,
and
f 930 "i
r2 = log \ '------\ = .0899,
5 1.850 J
— O —
i/i = ri = .0725, y2 = (ri + r2)/2 = .0813,
?/3 = (ri+r2 + r3)/3 = .0915.
9.6.1    Continuous forward rates
So far, we have assumed that forward interest rates vary from year to year, but that these rates are constant within each year. This assumption is, of course, unrealistic. The forward rates should be modeled as a function varying continuously in time, rather than as functions that are constant for one year at a time. It is unrealistic to assume a starting time and that interest rates change each year after this starting time. In fact, bonds are issued repeatedly and bonds of many maturities are on the market.
To specify the term structure in a realistic way, we will assume that there is a function r(t) called the forward rate function such that the price of a zero coupon bond of maturity T and with par equal to 1 is given by
P(T) = exp | - ľ r(t)dt 1.                        (9.19)
Formula (9.19) is a generalization of formula (9.17). To appreciate this, suppose that
r{t) = rk    for    k — 1 < t < k.
With this piecewise constant r,
/    r(t)dr = r\ + r2 + ... + rT, Jo
9.6. CONTINUOUS COMPOUNDING
275
so that
exp I - /   r(t)dt > = exp {-{ri~\-------h rT)}
and therefore (9.17) agrees with (9.19).
The yield to maturity of a bond with maturity date T is
1   rT
Vt = r L r^ dt
Think of (9.20) as the average of r(t) over the interval 0 < t < T.
(9.20)
Jarrow, Ruppert, and Yu (2001) estimate r(t) in (9.19) under the assumption that r(t) is a member of a flexible class of functions called splines. Figure 9.2 shows estimated forward rate curves for US Treasury bonds. Figure 9.3 shows estimated forward rate curves AT&T bonds. These two figures come from Jarrow, Ruppert, and Yu (2001).
0.075
0.07-
a>   0.06
0.05
0.045
----- GCV, R&C
-    RSA, R&C
- - GCV, QI2
.-.  largeX,R&C
■-■  large J..QI2
----- Schwartz
10             12             14             16             18
time to maturity
Figure 9.2: Spline estimates of forward rates of US Treaury bonds.
276                          CHAPTER 9. FIXED INCOME SECURITIES: 4/30/01
0.075 0.07
0.065
B ro
I    006 o
Li_
0.055 0.05
0             2             4             6             8            10           12           14           16           18           20
Time to maturity
Figure 9.3: Spline estimates of forward rates of AT&T bonds
9.7    Summary
9.7.1    Introduction
• buy a bond = making a loan to the company
-  corporation is obligated to pay back the principle and interest (unless it defaults)
-  you receive a fixed stream of income
-  bonds are called "fixed-income" securities
-  for long term bond your income is guaranteed only if you keep the bond to maturity
9.7. SUMMARY
277
9.7.2   Zero coupon bonds
•  Zero-coupon bonds pay no principle or interest until maturity
•  zero-coupon bond = pure discount bond
•  par value is the payment made to the bond holder at maturity
•  a zero sells for less than par
•  Example: 20-year zero
-  par value of $1000
-  interest 6% compounded every six months =>• price is
$10W       $306.56,
(1.03)40
9.7.3    Risk due to interest rate changes
• bond prices fluctuate with the interest rate
-  Example: assume semi-annual compounding
-  you just bought the zero for $306.56
* six months later the interest rate increased to 7%
-  price would now be
7^ = 1261.41
(1.035)39
-  investment would drop by ($306.56 - $261.41) = $45.15
-  return of
-4515      -14.73%
306.56 for a half-year or —29.46% per year
- however, if the interest rate remains unchanged then the bond
is worth
$1000
(1.03)39 * 3%/half-year return
(1.03)($306.56)
278
CHAPTER 9. FIXED INCOME SECURITIES: 4/30/01
9.7A   Coupon bonds
•  coupon bonds make regular interest payments
•  consider a 20-year coupon bond with a par value of $1000 and 6% annual interest with semi-annual coupon payments
-  coupon payment will be $30
-  bond holder receives 40 payments of $30
* plus a principle payment of $1000 after 20 years
-  present value of all payments, with discounting at the 6% annual rate (3% semi-annual), equals $1000:
40
?.k
30           1000
?x (1.03)*      (1.03)40 • General formula:
-  Notation
*  PAR = par value
*  C = coupon payment
*  T = maturity
*  r = interest rate per half-year
-  bond price =
= 1000.
2T
ť=l  \L
C             PAR
+
= 7 + {PAR-7}(1 + r)"2T
Yield to maturity
Example: a bond with T = 30 and C = 40 is selling for $1200, bond selling at par =>- interest rate = .04/half-year (= .08/year). - 4%/half-year rate = coupon rate.
9.7. SUMMARY
279
•  but not selling at par => if you purchase the bond at $1200 you will make less than 8% per year
•  two problems
-  coupon payments are $40 or 40/1200 = 3.333%/half-year of the $1200 investment
*  3.333% is called the current yield
-  at maturity you only get back $1000 of the $1200 investment
-  yield to maturity = the average rate of return
•  Spot rates
-  The yield to maturity of a zero coupon bond of maturity n years is called the n year spot rate.
-  A coupon bond is a bundle of zeros, each with a different maturity and therefore a different spot rate
*  the yield to maturity of a coupon bond is a complex "average" of these different spot rates
9.7.5   Term structure of interest rates
•  term structure is description of how, at a given time, yield to maturity depends on maturity
•  term structure for all maturities up to n years can be described by any one of the following sets:
-  prices of zero coupon bonds of maturities 1-year, 2-years, ..., n-years denoted here by P(l), -P(2),..., P(n)
-  spot rates (yields of maturity of zero coupon bonds) of maturities 1-year, 2-years,..., n-years denoted by z/i,..., yn
-  forwards rates ri:..., rn
•  each of the above sets can be computed from either of the other sets.
280
CHAPTER 9. FIXED INCOME SECURITIES: 4/30/01
9.7.6    Continuous compounding
•  continous compounding simplifies the relationships between
-  forward rates
-  yields to maturity of zeros (spot rates)
-  prices of zeros
•  prices from forward rates:
1000
P(l) P(2)
exp(rij 1000
exp(ri)exp(r2):
etc., so that
1000
P[n) =
exp(ri + r2H-------\-rn)'
forward rates from prices:
P(n-\)        exp(ri H-------h rn)
P (n)         exp(ri H-------h rn_i)
r   -logľ^H n~    H    P(n)    j
yield to maturity yn solves
1000
= exp(rn)
P {n)
exp(nyn)'
Vn = (n H-------\-rn)/n.
{r u..., r n] is easily found from {yu ..., yn} by:
n = yi,
and
rn = nyn - (n - l)yn-i    for    n > 1.
9.8. REFERENCES
281
9.8   References
Jarrow, R., Ruppert, D., and Yu, Yan. (2001), Estimating the term structure of corporate debt with a semiparametric penalized spline model, manuscript. Available at http://www.orie.cornell.edu/~davidr/papers/
282                        CHAPTER 9. FIXED INCOME SECURITIES: 4/30/01
Chapter 10
Behavioral finance: 5/1/01
10.1    Introduction
•  behavioral finance is an alternative to the EMH
•  this material taken mostly from the 2000 book by Andrei Shleifer of Harvard:
-  Inefficient Markets: An Introduction To Behavioral Finance
•  EMH has been the central tenet of finance for almost 30 years
•  power of the EMH assumption is remarkable
•  EMH started in the 1960's
-  immediate success in theory and empirically
-  early empircal work gave overwhelming support to EMH
-  EMH invented at Chicago and Chicago became a world center of research in finance
-  Jensen (1978) "no other proposition in economics ... has more solid empirical support"
•  verdict is changing
283
284
CHAPTER 10. BEHAVIORAL FINANCE: 5/1/01
-  efficiency of arbitrage is much weaker than expected
-  true arbitrage possibilities are rare
-  near arbitrage is riskier than expected
-  "Markets can remain irrational longer than you can remain solvent" — John Maynard Keyes
* quoted by Roger Lowenstein in When Genius Failed: The Rise and Fall of Long-Term Capital Management
10.2   Defense of EMH
•  three lines of defense of the EMH:
-  investors are rational
-  trading of irrational investors is random and their trades cancel each other
-  even if a "herd" of irrational investors trade in similar ways, rational arbitrageurs will eliminate their influence on market price
•  each of these defenses is weaker that had been thought
•  rational investing = "value a security by its fundamental value"
-  "fundamental value" = net present worth of all future cash flows
•  rational investing =>- prices are (geometric) random walks
•  but prices being random walks (or nearly so) does not imply rational investing
•  there is good evidence that irrational trading is correlated
-  look at the internet stock bubble
•  initial tests of the semi-strong form of efficiency supported that theory
10.3. CHALLENGES TO THE EMH
285
-  event studies showed that the market did react immediately to news and then stopping reactin
*  so reaction to news, as EMH predictos
*  also no reaction to stale news, again as EMH predicts
-  Scholes (1972) found little reaction to "non news"
*  block sales had little effect on prices
10.3    Challenges to the EMH
•  it is difficult to maintain that all investors are rational.
-  many investors react to irrelevant information
-  Black calls them noise traders
•  investors act irrationally when they
-  fail to diversify
-  purchase actively and expensively managed mutual funds
-  churn their portfolios
•  investors do not look at final levels of wealth when assessing risky situations ("prospect theory")
•  there is a serious "loss aversion"
•  people do not follow Bayes rule for evaluating new information
-  too much attention is paid to recent history
•  overreaction is commonplace
•  these deviations from fully rational behavior are not random
•  moreover, noise traders will follow each others mistakes
•  thus, noise trading will be correlated across investors
CHAPTER 10. BEHAVIORAL FINANCE: 5/1/01
•  managers of funds are themselves human and will make these errors too
•  managers also have their own types of errors
-  buying portfolios excessively close to a benchmark
-  buying the same stocks as other fund managers (so as not to look bad)
-  window dressing — adding stocks to the portfolio that have been performing well recently
-  on average, pension and mutual fund managers underperform passive investment strategies
* these managers might be noise traders too
.4   Can arbitrageurs save the day?
•  the last defense of the EMH depends on arbitrage
•  even if investor sentiment is correlated and noise traders create incorrectly priced assets
-  arbitrageurs are expected to take the other side of these traders and drive prices back to fundamental values
•  a fundamental assumption of behavioral finance is that real-world arbitrage is risky and limited
•  arbitrage depends on the existence of "close substitutes" for assets whose prices have been driven to incorrect levels by noise traders
•  many securities do not have true substitutes
•  often there are no risk-less hedges for arbitrageurs
•  mispricing can get even worse, as the managers of LTCM learned
-  this is called noise trader risk
10.5.  WHAT DO THE DATA SAY?
287
10.5   What do the data say?
•  Schiller (1981), "Do stock prices move too much to be justified by subsequent changes in dividends":
-  market prices are too volatile
-  more volatile than explained by a model where prices are expected net present values
-  this work has been criticized by Merton who said that Schiller did not correctly specify fundamental value
•  De Bondt and Thaler (1985), "Does the stock market overreact?":
-  frequently cited and reprinted paper
-  work done at Cornell
-  compare extreme winners and losers
-  find strong evidence of overreaction
-  for every year starting at 1933 they formed portfolios of the best performing stocks over the previous three years
*  "winner portfolios"
-  they also formed portfolios of the worse performing stocks
*  "loser portfolios"
-  then examined returns on these portfolios over the next five years
*  losers consistently outperformed winners
-  difference is difficult to explain as due to differences in risk, at least according to standard models such as CAPM
-  De Bondt and Thaler claim that investors overreact
*  extreme losers are too cheap
*  so they bounce back
-  the opposite is true of extreme winners
288                                CHAPTER 10. BEHAVIORAL FINANCE: 5/1/01
•  historically, small stocks have earned higher returns than large stocks
-  no evidence that the difference is due to higher risk
-  superior returns of small stocks have been concentrated in January
-  small firm effect and January effect seem to have disappeared over the last 15 years
•  market to book value is a measure of "cheapness"
-  high market to book value firms are "growth" stock
*  they tend to underperform
*  also they tend to be riskier, especially in severe down markets
•  October 19,1987 — Dow Jones index dropped 22.6%
-  there was no apparent news that day
•  Cutler et al (1991): looked at 50 largest one-day market changes
-  many came on days with no major news announcements
•  Roll (1988) tried to predict the share of return variation that could be explained by
-  economic influences
-  returns on other stocks in the same industry
-  public firm-specific news
•  Roll's findings:
-  R2 = .35 for monthly data
-  R2 = .2 for daily data
•  Roll's study also shows that there are no "close substitutes" for stocks
10.6. REFERENCES
289
-  this lack of close substitutes limits arbitrage
•  stocks rise if the company is put on the S&P 500 index
-  this is reaction to "non news"
-  America Online rose 18% when included on the S&P
•  In summary, there is now considerable evidence against the EMH
-  This evidence was not found during early testing of the EMH
-  Researchers needed to know what to look for
10.6   References
Cutler, D., Poterba, J., and Summers, L. (1991).  Speculative dynamics, Review of Economic Studies, 53,1839-1885.
De Bond t, W. and Thaler, R. (1985), 'Does the stock market overreact?, /. of Finance, 40, 793-805.
Jensen, M. (1978), Some anomalous evidence regarding market efficiency, /. of Financial Economics, 6,95-101.
Roll, R. (1988). R2, J. of Finance, 43,541-566.
Shiller, R. (1981), Do stock prices move too much to be justified by subsequent changes in dividends, American Economic Review, 71,421-436.
Shleifer, Andrei (2000), Inefficient Markets: An Introduction to Behavioral Finance, Oxford University Press, Oxford.