Causality in Economics Topics on Instrumental Variable Regression Techniques Alex Klein April 2016 Motivation ► Causality - a crucial issue in economics (maybe more than in other social sciences) ► Non-experimental nature of data as opposed to experiments such as laboratory experiments or randomized controlled trials ► Estimation techniques developed over the past 70 or so years to estimate a causal effect of variables on the outcome of interest ► Development of 'instrumental variable estimation techniques' is an attempt to account for causality in non-experimental data Basic set-up I ► Consider a basic regression y — x,-jß- + u, i = 1, K ► Key condition of consistency of OLS estimator is that the error term is uncorrelated with each of the regressors: cov (x/, u) — 0, / = 1,K ► Sufficient condition for cov (x,-, = 0 is £ (ü|x,-) = 0 ► An explanatory variable is endogenous if it is correlated with the error term which is caused by 1. omitted variables 2. measurement error 3. simultaneity -«[f^^ < >• -E 0 0,0 Basic set-up I ► p0LS = (x'xy1 x'y = (x'xy1 x'x$ + (x'xy1 x'u = p +(x'xy1 x'u ► = £ + (N^X'Xy1 N^X'u - renormalization to allow the use of large numbers to be applied to X'X ► plim/3^ = jS+ (p\\m N^X'Xy1 (plim A/^X'u) (Slutsky's theorem) OLS is consistent if plim A/_1X/u = 0 ► a necessary condition for the above equality to hold is that E [X'u] = 0 Instrumental Variable Regression I ► To obtain consistent estimates of /3 when cov (x,-, u) 7^ 0, we need to find a variable - call it z\ - which satisfies two conditions: 1. Instrument relevance: cov(z;,Xj) 7^ 0 2. Instrument exogeneity: cov (z,-, u) — 0 ► failure of the first condition leads to weak instrumental variable problem, but we can deal with it (somehow) ► failure of the second condition is fatal and we can't interpret the estimated relationship as causal (only as a sophisticated correlation) Instrumental Variable Regression I ► we will deal with a single equation model ► number of instruments can be the same as the number of endogenous variables (Just-identified model) or larger (overidentified model) ► just-identified model: = (z'xy1 zfy = p + (z'xy1 z'u = (N^z'xy1 N~lZ ► consistency of IV estimator requires plim N~lZru — 0 and plim N~lZ'X ^ 0 ► variance of J8^: V(f^) = (Zf X)~l Zf CiZ (Zf X)~l where O = Diag(uj2) ► though consistent, IV estimators exhibit efficiency loss Instrumental Variable Regression I ► over-identified model requires Two-Stage Least Square estimator (TSLS/2SLS) ^ = [X>Z(Z'Z)-iZ>X]-i[X>Z(Z'Z)-iZ>y] ► in just-identified model 2SLS=IV ► Stage 1: obtain predicted values of X from a regression of X on Z: X = Z(Z'Z)-lZ'X ► Stage 2: run OLS with predicted values X ► again, 2SLS causes efficiency loss relative to OLS, but, it is effecienct estimator in the class of all instrumental variable estimators using instrument linear in z Instrumental Variable Regression I ► Even though 2SLS is a consistent estimator when instruments satisfy the conditions of relevance and exogeneity, it is biased in finite samples ► In fact, we must rely on large sample analysis to derive the properties of 2SLS (mean of just-identified 2SLS does not even exist) ► When instruments are weak, 2SLS is biased even in very large sample ► Consider the 'degree of inconsistency' - there is some, though very mild, correlation between instruments are error terms ► When instruments are weak, the degree of inconsistency increases Instrumental Variable Regression I ► consider a simple model with one endogenous variable: Yu = Oil + j6x Y2j + €j and Y2\ = oc2 + j62Z/ + ]i] ► assume that Var(e,-)=1 and Var(^/)=1 =>cov(e/, Jij)=p where p is the correlation coefficient ► if we assume that Z/ is exogenous, then p measures the degree to which y2\ is correlated with €; ► Hahn and Hausman (2005) showed that in this simple case, the finite sample bias of 2SLS in overidentified case is, to a second-degree approximation ► / is the number of instruments, n is sample size, R2 is R2 from the regression of Z/ on Y2/ and measures the strength of instruments Instrumental Variable Regression I ► the bias of 2SLS in finite samples is toward inconsistent OLS ► a fundamental question arises: if a consistent 2SLS estimator is biased in finite samples toward inconsistent OLS, is 2SLS bias smaller or larger then that of OLS? ► Hahn and Hausman (2005) offer the following equation Bi*s(fiSLS) _ Bias(p°LS) ~ nR2 ► as long as the denominator is larger than the nominator, 2SLS bias is smaller than OLS bias ► ceteris paribus, the bias of 2SLS grows with the number of instruments ► weak instruments (low R2) increase the bias of 2SLS toward inconsistent OLS!!! Instrumental Variable Regression I weak instruments and 'mild inconsistency' cov(Z,u) _ au corr(Z ,u) corr(Z,X) ► relative inconsistency of 2SLS p\\mß2SLS— ß_corr(X,u) 1 plimißOlS-iß — corr(X,u) R2p ► if instruments are weak and moderately correlated with error term (mildly endogenous), instrumental variable estimator is even more inconsistent than OLS Instrumental Variable Regression I ► unless we have a perfect natural experiment of a perfectly exogenous instrument, weak instrument is more fatal than running a simple OLS even when a correlation between instrument and error term is very small ► this result is due to Bound, Jaeger and Baker (1995) and h not received much attention in the literature ► literature on weak instruments assumes that instruments satisfy exogeneity assumption and the only problem is their weak correlation with endogenous variables Weak Instruments ► how to detect it: 1. Shea's partial R2 from the first stage regression 2. F-statistics from the first stage regression ► logic of R2 from the first stage regressions: consider y = j61xi + j62x2 + u where xi is endogenous and x2 exogenous, and let z be a vector of instruments (includes x2) ► we need a measure of the correlation between z and xi which purges out x2 ► R2measure adjusted for the presence of x2 proposed by Bound, Jaeger, and Baker (1995) ► R2measure adjusted for the presence of x2 and another endogenous variables proposed Shea (1997) Weak Instruments ► F-statistics from the first-stage regression; the test statistics are not drawn from the standard F-distribution ► Stock and Yogo (2005) offer critical values which depend on the number of instruments and endogenous variables ► Null hypothesis: the bias in 2SLS is less than some percentage of the bias of OLS ► for example, for one endogenous variable and three instruments, and HO stating the bias being less than 10%, the critical value of F-statistic is 9.08 Weak Instruments - Solution(s) I ► alternative estimators to 2SLS which exhibit better properties in the presence of weak instruments ► test statistics which are robust to weak-instrument problem An Example - Housing Expenditures I ► the model allows for household fixed effects dit = 1 (n'xit + rj; - uit > 0) yon = jSjx/t + ocoi + eoit if dit = 0 ynt = jSix/t + OCi; + £i/t if dit = 1 ► the selection variable dlt is a choice between owning a property (djt = 1) and renting a property (djt = 0) ► Xjt is a vector of explanatory variables (total expenditures, square of total expenditures, prices, household characteristics) * ynt and yon are budget shares spent on housing for renters and owners respectively ► ocqi, ocu, t]i are unobservable household specific time-invariant effects An Example - Housing Expenditures I ► xj is decomposed into xa/- (log of total expend, square of total expend), (log of hh income, square of hh income), x^; (prices, hh characteristics), xc/ are exclusion restrictions ► selection equation includes x^\ and x^/, the budget equation xai and xci ► taking the difference between period t and r yields: Ypit ~ Ypir = jSpa (xa/t - Xair) + $pc (xdt - Xcjr) + (epjt ~ Spir) if dn = dir = p, p=0,l d/s = 1 (tt^x^ + zr^xj/t + rji - ult > 0) , s= t, r An Example - Housing Expenditures I ► we can rewrite the above equation as Ypit ~ Ypir — ftpa (xait ~ xair) + fi'pc (xcit ~ xcir) + SptT (xbit> xbir> xditi xdir) + ZpitT ► the function gptr, p — 0,1 is given by SptT (xbit> xbiTi xdit, xdir) — E (£pit ~ £pir xbit> xbiT> xdit> xdir> djt — djs = p) ► and Spitz satisfies E (£pitr\xbit, xbir, xdit, Xdir, dit = di5 = p) = 0, p = 0, 1 = An Example - Housing Expenditures I ► we can assume no sample selection after differencing => gptr — 0, p = 0,1 which is equivalent to saying that t] ■ — ult is independent of eo/t and f°r aH t ► in other words, possible selection effect on budget shares operate only through correlation between oc\ and {rj- Ujt)