Instrumental variables Lukˊaˇs Laffˊers Matej Bel University, Dept. of Mathematics MUNI Brno 12.11.2021 Causal graph D Y U Z Y is the outcome D is a variable of interest (treatment) Z is an instrument U is an unobserved variable In this situation, it is not possible to (non-parametrically) identify the causal effect of D on Y. Things are not completely hopeless though. Homogenous treatment effects Let us simplify it a little bit. We will assume: homogeneity of the effect linearity of the function forms Thus, we will assume a lot... But it makes it possible to proceed in a rather straightforward manner. D Y U ε The true relationship is Yi = α +δDi +γUi +εi =ηi But Ui is unobserved Yi = α +δDi +ηi ˆδ = Cov(Y,D) Var(D) = E[YD]−E[Y]E[D] Var(D) = E[αDi +δD2 i +γUiDi +εiDi]−E[α +δDi +γUi +εi]E[D] Var(D) = δ +γ Cov(U,D) Var(D) D Y U Z ε The true relationship is Yi = α +δDi +γUi +εi =ηi But Ui is unobserved Yi = α +δDi +ηi New variable Z Notice: no Z → U or Z → Y Cov(Y,Z) = Cov(α +δD +γU +ε,Z) = E[(α +δD +γU +ε)Z]−E[D]E[Z] = δCov(D,Z)+γ Cov(U,Z) =0 +Cov(ε,Z) =0 =⇒ δ = Cov(Y,Z) Cov(D,Z) Exclusion restriction There are no arrows Z → U Z → Y This is called an exclusion restriction Z provides us with the much needed exogenous source of variation The regression coefficient δ = Cov(Y,Z) Cov(D,Z) can be estimated by ˆδ = Cov(Y,Z) Cov(D,Z) = 1 n ∑n i=1(Zi − ¯Z)(Yi − ¯Y) 1 n ∑n i=1(Zi − ¯Z)(Di − ¯D) If we assume Yi = α +δDi +ηi Di = β0 +βzZi +υi Then ˆδ = 1 n ∑n i=1(Zi − ¯Z) =α+δDi +ηi Yi 1 n ∑n i=1(Zi − ¯Z)Di = δ + →P0 1 n ∑n i=1(Zi − ¯Z)ηi 1 n ∑n i=1(Zi − ¯Z)Di Yi = α +δ β0+βz Zi +υi Di +ηi = α0 + δ·βZ αZ Zi +ωi Reduced form eq. Di = β0 +βzZi +υi First stage eq. Take a closer look at ˆδ ˆδ = Cov(Y,Z) Cov(D,Z) = Cov(Y,Z) Var(Z) Cov(D,Z) Var(Z) = ˆαZ ˆβZ Two-stage least squares ˆδ = Cov(Y,Z) Cov(D,Z) = ˆβZ Cov(Y,Z) ˆβZ Cov(D,Z) = Cov(Y, ˆβZ Z) ˆβ2 Z Var(Z) = Cov(Y, ˆβZ Z) Var(ˆβZ Z) = ··· = Cov(Y, ˆD) Var(ˆD) where ˆD = ˆβ0 + ˆβZ Z This suggest the following two-stage strategy: Step 1 Estimate (ˆβ0, ˆβZ ) from Di = β0 +βzZi +υi and obtain ˆD = ˆβ0 + ˆβZ Z Step 2 Plug ˆD and estimate (ˆα, ˆδ) from Yi = α +δ ˆDi +ηi Such regression coefficient ˆδ will be identical to ˆαZ ˆβZ Additional covariates? D Y U Z X It is important to close all these paths (D ← X → Y) too. Wald estimator In case of abinary instrument and no covariates, the IV estimator is ˆδIV = ˆE[Y|Z = 1]− ˆE[Y|Z = 0] ˆE[D|Z = 1]− ˆE[D|Z = 0] Additional covariates? D Y U Z ε X υ Yi = α +δDi +δX Xi + =ηi δUUi +εi = α0 +αZ Zi +αX Xi +ωi Reduced form eq. Di = β0 +βzZi +βX Xi +υi First stage eq. Step 1 Estimate (ˆβ0, ˆβZ , ˆβX ) from Di = β0 +βzZi +βx Xi +υi and obtain ˆD = ˆβ0 + ˆβZ Z + ˆβX X Step 2 Plug ˆD and estimate (ˆα, ˆδ, ˆδX ) from Yi = α +δ ˆDi +δX Xi +ηi Instrument There are two qualities that the instrument needs to have: Validity - instrument Z has no direct effect on Y. It only operates via D. Z needs to be uncorrelated with ηi and therefore with both Ui and εi Relevance - Z is correlated with D Where are we now: So far, we were unable to non-parametrically identify ATE. We could not close the paths going via confounder U. By simplifying a lot, we can at least identify and estimate the regression coefficient δ within a linear model. This is a ratio of coefficients from two regression OR we can look at it as two stage estimator That is all great as long as the linear model is correct and effects are homogenous. Let us see it in action. Example: children and labor supply We wish to understand the causal link between the family size and the labor supply. Do parents of bigger families work more? A lot of literature found negative correlation between family size and female labor supply. How to estimate these? Clearly, the family size is not ”randomly assigned”. Angrist, Joshua, and William Evans. ”Children and Their Parents’ Labor Supply: Evidence from Exogenous Variation in Family Size.” American Economic Review 88.3 (1998): 450-77. Example: children and labor supply Where do we find a proper instrument, that would provide an exogenous variation in the family size? Parents have preference for mixed genders The gender ”assignment” itself is as good as random Parents with these kids {(♀,♀),(♂,♂)} are more likely to have another one in comparison to parents with {(♀,♂),(♂,♀)} kids Exogenous variation in the probability of having a third child! Gender of the first kid does not predict the probability of having the second child. Table 3 from Angrist and Evans (1998) Gender composition predicts the probability of having a third child. Table 3 from Angrist and Evans (1998) Ordinary least squares estimator (for comparison purposes) Yi = α +δDi +δX Xi +ηi Instrumental variable estimation Yi = α +δDi +δX Xi +ηi Di = β0 +βzZi +βX Xi +υi Y is one of these worked weeks worked hours/week log family income non-wife income D is an indicator of having more than 2 children X consists of: age, age at first birth, black indicator, hispanic indicator, boy 1st indicator, boy 2nd indicator z is one of these same sex two boys, two girls (as separate instruments) No covariates - Wald estimates Table 5 from Angrist and Evans (1998) Instrument is relevant Table 6 from Angrist and Evans (1998) With covariates Magnitude of the effect is smaller than under OLS Table 7 from Angrist and Evans (1998) Mechanics Step 1 Estimate (ˆβ0, ˆβZ , ˆβX ) from Di = β0 +βzZi +βx Xi +υi and obtain ˆD = ˆβ0 + ˆβZ Z + ˆβX X Step 2 Plug ˆD and estimate (ˆα, ˆδ, ˆδX ) from Yi = α +δ ˆDi +δX Xi +ηi Can be translated as Step 1 Regress D on all sources of exogenous variation (Z and X) Step 2 Regress Y on the predicted values ˆD of D and exogenous variables X (not instruments!) Mechanics (it is a simple projection) X = [1,X,D] Z = [1,X,Z] yi = Xiβ +ei ˆβOLS = (XT X)−1 XT Y →P β because E(XT e) = 0 Regress all the columns of X onto Z to obtain ˆX ˆX = Z(ZT Z)−1 ZT X = PZ X (note that projecting X on Z will give us the same X because it is in Z !) Regress y on ˆX ˆβIV = (ˆX T ˆX)−1 ˆX T y = ((PZ X)T PZ X)−1 (PZ X)T y = (XT PZ =PT Z PZ X)−1 XT PZ y Careful with the standard errors The second-stage regression does not give you the correct standard errors. (It ignores the first stage uncertainty). Notice that IV estimator is weighted least squares estimator: ˆβIV = (ˆX T ˆX)−1 ˆX T y = (XT PZ X)−1 XT PZ y and thus ˆσ2 (XT PZ X)−1 is a consistent estimator of covariance matrix of ˆβIV under homoscedasticity. Weak instruments D Y U Z X We relied on the fact that there exists this connection: Z → D But what if the link is only weak? Weak instruments So what if the correlation is very small(?) ˆδ = Cov(Y,Z) Cov(D,Z) very small = Cov(Y,Z) Var(Z) Cov(D,Z) Var(Z) = ˆαZ ˆβZ Then the ˆβZ is very imprecisely estimated. And this leads to an imprecise estimator for ˆδ itself. Weak instruments ˆδ = δ + →P0??? 1 n ∑n i=1(Zi − ¯Z)ηi 1 n n ∑ i=1 (Zi − ¯Z)Di very small Even a tiny small deviation from the exogeneity Cov(Z,η) = 0 may severely bias our estimator(!) This is a huge deal. Bound, John, David A. Jaeger, and Regina M. Baker. ”Problems with instrumental variables estimation when the correlation between the instruments and the endogenous explanatory variable is weak.” Journal of the American statistical association 90.430 (1995): 443-450. Weak instruments Luckily, we can check if we have this problem simply by looking at the first stage. Common rule of thumb is to have the value of F-statistic from the first stage regression at least 10. There is a huge strain of literature on weak instruments, many weak instruments etc. Older Survey: Stock, James H., Jonathan H. Wright, and Motohiro Yogo. ”A survey of weak instruments and weak identification in generalized method of moments.” Journal of Business & Economic Statistics 20.4 (2002): 518-529. Newer survey Andrews, Isaiah, James H. Stock, and Liyang Sun. ”Weak instruments in instrumental variables regression: Theory and practice.” Annual Review of Economics 11 (2019): 727-753. Statistical Inference: Staiger, Douglas O., and James H. Stock. ”Instrumental variables regression with weak instruments.” (1994). Heterogenous effects https://www.nobelprize.org/uploads/2021/10/fig4 ek en 21 LATE.pdf Heterogenous effects A natural question to ask is the following: Do all people have the same effect from the treatment? If not, who are these people who benefit from the treatment? Interpretation We now drop the linearity assumption and consider binary treatment and binary instrument. Every individual i may have her own effect δi = Yi(1)−Yi(0) depending on the treatment Every individual i may also react different in terms of treatment Di(1)−Di(1) on the instrument Z - randomly offered training D - actual training Y - outcome always-taker Di(1) = 1 and Di(0) = 1 complier Di(1) = 1 and Di(0) = 0 defier Di(1) = 0 and Di(0) = 1 never-taker Di(1) = 0 and Di(0) = 0 Denote Yi(d,z) as a potential outcome under Di = d and Zi = z. If Instrument is independent of potential outcomes: (Yi (Di (1),1).Yi (Di (0),0),Di (1),Di (0)) ⊥⊥ Zi Exclusion restriction: Yi (d) ≡ Yi (d,1) = Yi (d,0) Relevance restriction: E[Di (1)−Di (0)] = 0 Monotonicity: Di (1) ≥ Di (0) Stable Unit Treatment Value Assumption: There are no interaction between individuals and there is no hidden variation in the treatment then δIV = E[Y|Z = 1]−E[Y|Z = 0] E[D|Z = 1]−E[D|Z = 0] = E[Y(1)−Y(0)|D(1) > D(0)] Local average treatment effect Imbens, G. W. and Angrist, J. D. (1994). Identication and Estimation of Local Average Treatment Effects. Econometrica Proof E[Y|Z = 1] = exclusion E[Y(0)+(Y(1)−Y(0))D|Z = 1] = Ind. E[Y(0)+(Y(1)−Y(0))D(1)] and also E[Y|Z = 0] = E[Y(0)+(Y(1)−Y(0))D(0)] so E[Y|Z = 1]−E[Y|Z = 0] = mono E[(Y(1)−Y(0))(D(1)−D(0))] = E[(Y(1)−Y(0))|D(1) > D(0)]P(D(1) > D(0)) Similarly E[D|Z = 1]−E[D|Z = 0] = E[D(1)−D(0)] = P(D(1) > D(0)) Effects on the compliers Effects on the compliers LATE interpretation is specific for the instrument no restrictions were placed on the homogeneity of the effects no linearity was assumed Extensions: Further discussions: Angrist, J. D., Imbens, G. W. and Rubin, D. B. (1996). Identification of Causal Effects Using Instrumental Variables. Journal of the American Statistical Association. Multiple valued treatment: Angrist, Joshua D., and Guido W. Imbens. ”Two-stage least squares estimation of average causal effects in models with variable treatment intensity.” Journal of the American statistical Association 90.430 (1995): 431-442. Non-parametric LATE with covariates: Fr¨olich, Markus. ”Nonparametric IV estimation of local average treatment effects with covariates.” Journal of Econometrics 139.1 (2007): 35-75. Further applications Returns to schooling - Quarter of birth instrument (Andgrist and Krueger, 1991) Returns to schooling - Nearby college instrument (Card, 1995) Returns to schooling - Different instruments (Ichino and Winter-Ebmer, 1999) Classroom size - Legislative rule as instrument (Angrist and Lavy 1999) Effect of military service on labor market outcomes - Draft lottery instrument (Angrist, 1990) Impact of institutions on economic growth - Mortality instrument (Acemoglu, Johnson and Robinson, 2001), Comment (Albouy, 2012), Reply (AJR, 2012) Impact of economic conditions on prob. of a conflict - rainfall instrument (Miguel, Satyanath and Segenti, 2004) Further applications Demand for fish - Weather as an IV (Angrist, Graddy and Imbens) Childbearing on labor supply - twin births as a natural experiment (Jacobsen, Pearce and Rosenbloom. 1999) and (Black, Devereux and Salvanes, 2015) Using economic theory to estimate supply and demand curves using variation in a single tax rate(!) (Zoutman, Gavrilova and Hopland. 2018) Parental Meth Abuse and Foster Care - use supply shock on meth market as instrument (Cunningham and Finlay, 2013) Measurement error Suppose that X is measured with error: Yi = β0 +βX (X∗ i +ui) Xi +εi ˆβX = Cov(X,Y) Var(X) = Cov(X∗ +u,β0 +βX (X∗ +u)+ε) Var(X∗ +u) →P βX σ2 X σ2 X +σ2 u which is attenuated even if ui is uncorrelated with both X∗ i and εi AJR 2001 Institutions - with more secure property rights people will invest more in physical and human capital. Also includes indpendent judiciary, equal access to education and ensuring civil liberties Do institutions matter? well, they do: North/South Korea, West/East Germany. Different colonization policies: extractive (Kongo) vs strong property rights (Australia, Canada, USA) Higher mortality made it more difficult to set up settlements with strong property rights Settler mortality → Settlements → Early institutions → Current institutions → Current performance Reduced form AJR 2001 Exclusion restriction: mortality more that 100yrs ago have no direct impact on GDP per capita today (apart the channel via institutions). Why? Mortality mainly due to malaria and yellow fever. Insensitive to outliers (USA, Canada, NZ, Australia) Africa dummy and distance to equator insignificant Results robust to different covariates added: identify of main colonizer, climate, religion, geography, natural resources, current disease. (in DAG language: closing all the backdoor paths) Table 1 in AJR 2001 Model AJR 2001 Fig 3 in AJR 2001 IV estimates Table 4 in AJR 2001 First stage Table 4 in AJR 2001 OLS Table 4 in AJR 2001 This is compatible with attenuation bias explanation. Example: Meth, Parents and Foster Care (Cunningham and Finley, 2013) effect of drug abuse on parenting In 1994 - regulation on ephedrine → more difficult to produce meth Fig 3 from Cunningham and Finley (2013) Fig 4 from Cunningham and Finley (2013) Fig 5 from Cunningham and Finley (2013) s - state t - specific month γs,δs - state fixed effects φs,λt - month fixed effects tst ,ωst - state specific linear time trends Xst - log of state population of whites aged 0-19, 15-49, cigarette tax, state unemployment rate, log of alcohol treatment cases for whites Part of Table 3 from Cunningham and Finley (2013) Overidentifying restrictions test Z may be multidimensional. Two stage least squares procedure still can be used. Say we have 2 instruments: Under instrument exogeneity, both of them are fine and hence ˆβIV1 should be similar to ˆβIV2 Under exogeneity, both Z1 and Z2 should have zero coefficients in a regression with residuals (using original X and ˆβIV ) F-statistic that jointly tests this multiplied with m is called J-statistic ∼ χ2 q . Where m is the number of instruments, q is the number of endogenous variables and q = m −k is the number of over-identifying restrictions. See row Sargan-row in summary table of ivreg. Wrap up IV approach allows to make use of quasi-experimental variation in the treatment that is induced by the instrument. IV provides this exogenous variation IV needs to be strong enough otherwise estimates are sensitive Under monotonicity condition, results informs us only about a specific subpopulation (compliers). (*) More on IVs Testable implications on IVs Balke and Pearl (1997) for binary Y - based on linear programming Huber and Mellace, (2015) - under LATE assumptions Kitagawa, (2021) extends Balke and Peal (1997) results to continuous Y Zhang. Tian and Bareinboim (2021) - general algorithm for identification of distributions of counterfactual outcomes Fig 1 in Balke and Pearl (1997) If Y,D,Z are discrete, we have that max d ∑ y max z P(y,d|z) ≤ 1 Furthermore ATE = E[Y(1)−Y(0)] is bounded. Thank you for your attention! References Imbens, Guido W., and Joshua D. Angrist. ”Identification and Estimation of Local Average Treatment Effects.” Econometrica 62.2 (1994): 467-475. Angrist, Joshua D., and Alan B. Keueger. ”Does compulsory school attendance affect schooling and earnings?.” The Quarterly Journal of Economics 106.4 (1991): 979-1014. Angrist, Joshua D. ”Lifetime earnings and the Vietnam era draft lottery: evidence from social security administrative records.” The American Economic Review (1990): 313-336. Card, David. ”Using geographic variation in college proximity to estimate the return to schooling.” (1993). Acemoglu, Daron, Simon Johnson, and James A. Robinson. ”The colonial origins of comparative development: An empirical investigation.” American economic review 91.5 (2001): 1369-1401. Miguel, Edward, Shanker Satyanath, and Ernest Sergenti. ”Economic shocks and civil conflict: An instrumental variables approach.” Journal of political Economy 112.4 (2004): 725-753. Ichino, Andrea, and Rudolf Winter-Ebmer. ”Lower and upper bounds of returns to schooling: An exercise in IV estimation with different instruments.” European Economic Review 43.4-6 (1999): 889-901. Bound, John, David A. Jaeger, and Regina M. Baker. ”Problems with instrumental variables estimation when the correlation between the instruments and the endogenous explanatory variable is weak.” Journal of the American statistical association 90.430 (1995): 443-450. Stock, James H., Jonathan H. Wright, and Motohiro Yogo. ”A survey of weak instruments and weak identification in generalized method of moments.” Journal of Business & Economic Statistics 20.4 (2002): 518-529. Andrews, Isaiah, James H. Stock, and Liyang Sun. ”Weak instruments in instrumental variables regression: Theory and practice.” Annual Review of Economics 11 (2019): 727-753. Staiger, Douglas O., and James H. Stock. ”Instrumental variables regression with weak instruments.” (1994). Fr¨olich, Markus. ”Nonparametric IV estimation of local average treatment effects with covariates.” Journal of Econometrics 139.1 (2007): 35-75. ”Two-stage least squares estimation of average causal effects in models with variable treatment intensity.” Journal of the American statistical Association 90.430 (1995): 431-442. Angrist, J. D., Imbens, G. W. and Rubin, D. B. (1996). Identification of Causal Effects Using Instrumental Variables. Journal of the American Statistical Association. References Jacobsen, Joyce P., James Wishart Pearce III, and Joshua L. Rosenbloom. ”The effects of childbearing on married women’s labor supply and earnings: using twin births as a natural experiment.” Journal of Human Resources (1999): 449-474. Angrist, Joshua D., Kathryn Graddy, and Guido W. Imbens. ”The interpretation of instrumental variables estimators in simultaneous equations models with an application to the demand for fish.” The Review of Economic Studies 67.3 (2000): 499-527. Zoutman, Floris T., Evelina Gavrilova, and Arnt O. Hopland. ”Estimating both supply and demand elasticities using variation in a single tax rate.” Econometrica 86.2 (2018): 763-771. Cunningham, Scott, and Keith Finlay. ”Parental substance use and foster care: Evidence from two methamphetamine supply shocks.” Economic Inquiry 51.1 (2013): 764-782. Overidentification test, the very first paper: Sargan, John D. ”The estimation of economic relationships using instrumental variables.” Econometrica: Journal of the Econometric Society (1958): 393-415. Balke, Alexander, and Judea Pearl. ”Bounds on treatment effects from studies with imperfect compliance.” Journal of the American Statistical Association 92.439 (1997): 1171-1176. Huber, Martin, and Giovanni Mellace. ”Testing instrument validity for LATE identification based on inequality moment constraints.” Review of Economics and Statistics 97.2 (2015): 398-411. Kitagawa, Toru. ”The identification region of the potential outcome distributions under instrument independence.” Journal of Econometrics (2021). Zhang, Junzhe, Jin Tian, and Elias Bareinboim. ”Partial Counterfactual Identification from Observational and Experimental Data.” arXiv preprint arXiv:2110.05690 (2021).