Econometrics - Lecture 5 Endogeneity, Instru-mental Variables, IV Estimator Contents nOLS Estimator Revisited nCases of Regressors Correlated with Error Term nInstrumental Variables (IV) Estimator: The Concept nIV Estimator: The Method nCalculation of the IV Estimator nAn Example nThe GIV Estimator nSome Tests Dec 16, 2011 Hackl, Econometrics, Lecture 5 2 OLS Estimator Dec 16, 2011 Hackl, Econometrics, Lecture 5 3 Linear model for yt yi = xi'β + εi, i = 1, …, N (or y = Xβ + ε) given observations xik, k =1, …, K, of the regressor variables, error term εi OLS estimator b = (Σixi xi’)-1Σixi yi = (X’X)-1X’y From b = (Σixi xi’)-1Σixi yi = (Σixi xi’)-1Σixi xi‘ β + (Σixi xi’)-1Σixi εi = β + (Σixi xi’)-1Σixi εi = β + (X’X)-1 X’ε follows E{b} = (Σixi xi’)-1Σixiyi = (Σixi xi’)-1Σixi xi‘ β + (Σixi xi’)-1Σixi εi = β + (Σixi xi’)-1 E{Σixi εi} = β + (X’X)-1 E{X’ε} OLS Estimator, cont’d Dec 16, 2011 Hackl, Econometrics, Lecture 5 4 1.OLS estimator b is unbiased if n(A1) E{ε} = 0 nE{Σixi εi } = E{X’ε} = 0; is fulfilled if (A7) or a stronger assumption is true q(A2) {xi, i =1, …,N} and {εi, i =1, …,N} are independent; is the strongest assumption q(A10) E{ε|X} = 0, i.e., X uninformative about E{εi} for all i (ε is conditional mean independent of X); is implied by (A2) q(A8) xi and εi are independent for all i (no contemporaneous dependence); is less strong than (A2) and (A10) q(A7) E{xi εi} = 0 for all i (no contemporaneous correlation); is even less strong than (A8) q q n OLS Estimator, cont’d Dec 16, 2011 Hackl, Econometrics, Lecture 5 5 2.OLS estimator b is consistent for β if n(A8) xi and εi are independent for all i n(A6) (1/N)Σi xi xi’ has as limit (N→∞) a nonsingular matrix Σxx (A8) can be substituted by (A7) [E{xi εi} = 0 for all i, no contemporaneous correlation] 3.OLS estimator b is asymptotically normally distributed if (A6), (A8) and n(A11) εi~ IID(0,σ²) are true; nfor large N, b follows approximately the normal distribution b ~a N{β, σ2(Σi xi xi’ )-1} nUse White and Newey-West estimators for V{b} in case of heteroskedasticity and autocorrelation of error terms, respectively n Hackl, Econometrics, Lecture 5 6 Assumption (A7): E{xi εi} = 0 for all i nImplication of (A7): for all i, each of the regressors is uncorrelated with the current error term, no contemporaneous correlation nStronger assumptions – (A2), (A10), (A8) – have same consequences n(A7) guaranties unbiasedness and consistency of the OLS estimator nIn reality, (A7) is not always true: alternative estimation procedures are required for ascertaining consistency and unbiasedness nExamples of situations with E{xi εi} ≠ 0: nRegressors with measurement errors nRegression on the lagged dependent variable with autocorrelated error terms (dynamic regression) nEndogeneity of regressors nSimultaneity n n Dec 16, 2011 Contents nOLS Estimator Revisited nCases of Regressors Correlated with Error Term nInstrumental Variables (IV) Estimator: The Concept nIV Estimator: The Method nCalculation of the IV Estimator nAn Example nThe GIV Estimator nSome Tests Dec 16, 2011 Hackl, Econometrics, Lecture 5 7 Hackl, Econometrics, Lecture 5 8 Regressor with Measurement Error n yi = β1 + β2wi + vi nwith white noise vi, V{vi} = σv², and E{vi|wi} = 0; conditional expectation of yi given wi : E{yi|wi} = β1 + β2wi nExample: wi: household income, yi: household savings nMeasurement process: reported household income xi, may deviate from household income wi n xi = wi + ui n where ui is (i) white noise with V{ui} = σu², (ii) independent of vi, and (iii) independent of wi nThe model to be analyzed is n yi = β1 + β2xi + εi with εi = vi - β2ui nE{xi εi} = - β2 σu² ≠ 0: requirement for consistency and unbiasedness is violated nxi and εi are negatively (positively) correlated if β2 > 0 (β2 < 0) n n Dec 16, 2011 Hackl, Econometrics, Lecture 5 9 Measurement Error, cont‘d nInconsistency of b2 n plim b2 = β2 + E{xi εi} / V{xi} n n n n β2 is underestimated nInconsistency of b1 n plim (b1 - β1) = - plim (b2 - β2) E{xi} n given E{xi} > 0 for the reported income: β1 is overestimated; inconsistency “carries over” nThe model does not correspond to the conditional expectation of yi given xi: n E{yi|xi} = β1 + β2xi - β2 E{ui|xi} ≠ β1 + β2xi n as E{ui|xi} ≠ 0 n Dec 16, 2011 Hackl, Econometrics, Lecture 5 10 Dynamic Regression nAllows to model dynamic effects of changes of x on y: n yt = β1 + β2xt + β3yt-1 + εt nOLS estimators are consistent if E{xt εt} = 0 and E{yt-1 εt} = 0 nAR(1) model for εt: n εt = ρεt-1 + vt n vt white noise with σv² nFrom yt = β1 + β2xt + β3yt-1 + ρεt-1 + vt follows n E{yt-1εt} = β3 E{yt-2εt} + ρ²σv²(1 - ρ²)-1 n i.e., yt-1 is correlated with εt nOLS estimators not consistent nThe model does not correspond to the conditional expectation of yt given the regressors xt and yt-1: n E{yt|xt, yt-1} = β1 + β2xt + β3yt-1 + E{εt |xt, yt-1} Dec 16, 2011 Hackl, Econometrics, Lecture 5 11 Omission of Relevant Regressors nTwo models: n yi = xi‘β + zi’γ + εi (A) n yi = xi‘β + vi (B) nTrue model (A), fitted model (B) nOLS estimates bB of β from (B) n n nOmitted variable bias: E{(Σi xi xi’)-1 Σi xi zi’}γ = E{(X’X)-1 X’Z}γ nNo bias if (a) γ = 0 or if (b) variables in xi and zi are uncorrelated (orthogonal) nOLS estimators are biased, if relevant regressors are omitted that are non-orthogonal, i.e., correlated with regressors in xi Dec 16, 2011 Hackl, Econometrics, Lecture 5 12 Unobserved Heterogeneity nExample: Wage equation with yi: log wage, x1i: personal characteristics, x2i: years of schooling, ui: abilities (unobservable) n yi = x1i‘β1 + x2iβ2 + uiγ + vi nModel for analysis (unobserved ui covered in error term) n yi = xi‘β + εi n with xi = (x1i‘, x2i)’, β = (β1‘, β2)’, εi = uiγ + vi nGiven E{xi vi} = 0 n plim b = β + Σxx-1 E{xi ui} γ nOLS estimators b are inconsistent if xi and ui are correlated (γ ≠ 0), e.g., if higher abilities induce more years at school: estimator for β2 might be overestimated, hence effects of years at school etc. are overestimated: “ability bias” nUnobserved heterogeneity: observational units differ in other aspects than the ones that are observable n Dec 16, 2011 Hackl, Econometrics, Lecture 5 13 Endogenous Regressors nRegressors correlated with error term: E{X‘ε} ≠ 0; are called endogenous nEndogeneity bias nFor many economic applications relevant nOLS estimators b = β + (X‘X)-1X‘ε qE{b} ≠ β, b is biased; bias E{(X‘X)-1X‘ε} difficult to assess qplim b = β + Σxx-1q with q = plim(N-1X‘ε) nFor q = 0 (regressors and error term asymptotically uncorrelated), OLS estimators b are consistent also in case of endogenous regressors nFor q ≠ 0 (error term and at least one regressor asymptotically correlated): plim b ≠ β, the OLS estimators b are not consistent nExogenous regressors: with error term uncorrelated, all non-endogenous regressors Dec 16, 2011 Hackl, Econometrics, Lecture 5 14 Consumption Function nAWM data base, 1970:1-2003:4 nC: private consumption (PCR), growth rate p.y. nY: disposable income of households (PYR), growth rate p.y. n Ct = β1 + β2Yt + εt (A) n β2: marginal propensity to consume, 0 < β2 < 1 nOLS estimates: n Ĉt = 0.011 + 0.718 Yt n with t = 15.55, R2 = 0.65, DW = 0.50 nIt: per capita investment (exogenous, E{It εt} = 0) n Yt = Ct + It (B) nBoth Yt and Ct are endogenous: E{Ct εt} = E{Yt εt} = σε²(1 – β2)-1 nThe regressor Yt has an impact on Ct; at the same time Ct has an impact on Yt Dec 16, 2011 Hackl, Econometrics, Lecture 5 15 Simultaneous Equation Models nIllustrated by the preceding consumption function: nVariables Yt and Ct are simultaneously determined by equations (A) and (B) nEquations (A) and (B) are the structural equations or the structural form of the simultaneous equation model that describes both Yt and Ct nThe coefficients β1 and β2 are behavioral parameters nReduced form of the model: one equation for each of the endogenous variables Ct and Yt, with only the exogenous variable It as regressor nThe OLS estimators are biased and inconsistent n n n Dec 16, 2011 Hackl, Econometrics, Lecture 5 16 Consumption Function, cont’d nReduced form of the model: n n n n n nOLS estimator b2 from (A) is inconsistent; E{Yt εt} ≠ 0 n plim b2 = β2 + Cov{Yt εt} / V{Yt} = β2 + (1 – β2) σε²(V{It} + σε²)-1 n for 0 < β2 < 1, b2 overestimates β2 nThe OLS estimator b1 is also inconsistent n Dec 16, 2011 Contents nOLS Estimator Revisited nCases of Regressors Correlated with Error Term nInstrumental Variables (IV) Estimator: The Concept nIV Estimator: The Method nCalculation of the IV Estimator nAn Example nThe GIV Estimator nSome Tests Dec 16, 2011 Hackl, Econometrics, Lecture 5 17 Hackl, Econometrics, Lecture 5 18 An Alternative Estimator nModel n yi = β1 + β2 xi + εi n with E{ εi xi } ≠ 0, i.e., endogenous regressor xi : OLS estimators are biased and inconsistent nInstrumental variable zi satisfying 1.Exogeneity: E{εi zi} = 0: is uncorrelated with error term 2.Relevance: Cov{xi , zi} ≠ 0: is correlated with endogenous regressor nTransformation of model equation n Cov{yi , zi } = β2 Cov{xi , zi} + Cov{εi , zi} n gives n Dec 16, 2011 Hackl, Econometrics, Lecture 5 19 IV Estimator for β2 nSubstitution of sample moments for covariances gives the instrumental variables (IV) estimator n n n nConsistent estimator for β2 given that the instrumental variable zi is valid , i.e., it is qExogenous, i.e. E{εi zi} = 0 qRelevant, i.e. Cov{xi , zi} ≠ 0 nTypically, nothing can be said about the bias of an IV estimator; small sample properties are unknown nCoincides with OLS estimator for zi = xi Dec 16, 2011 Hackl, Econometrics, Lecture 5 20 Consumption Function, cont’d nAlternative model: Ct = β1 + β2Yt-1 + εt nYt-1 and εt are certainly uncorrelated; avoids risk of inconsistency due to correlated Yt and εt nYt-1 is certainly highly correlated with Yt, is almost as good as regressor as Yt nFitted model: n Ĉ = 0.012 + 0.660 Y-1 n with t = 12.86, R2 = 0.56, DW = 0.79 (instead of Ĉ = 0.011 + 0.718 y with t = 15.55, R2 = 0.65, DW = 0.50) nDeterioration of t-statistic and R2 are price for improvement of the estimator Dec 16, 2011 Hackl, Econometrics, Lecture 5 21 IV Estimator: The Concept nAlternative to OLS estimator nAvoids inconsistency in case of endogenous regressors nIdea of the IV estimator: qReplace regressors which are correlated with error terms by regressors nwhich are uncorrelated with the error terms nwhich are (highly) correlated with the regressors that are to be replaced qand use OLS estimation nThe hope is that the IV estimator is consistent (and less biased) than the OLS estimator nPrice: Deteriorated model fit as measured by, e.g., t-statistic, R2 Dec 16, 2011 Contents nOLS Estimator Revisited nCases of Regressors Correlated with Error Term nInstrumental Variables (IV) Estimator: The Concept nIV Estimator: The Method nCalculation of the IV Estimator nAn Example nThe GIV Estimator nSome Tests Dec 16, 2011 Hackl, Econometrics, Lecture 5 22 Hackl, Econometrics, Lecture 5 23 IV Estimator: General Case nThe model is n yi = xi‘β + εi n with V{εi} = σε² and n E{εi xi} ≠ 0 nat least one component of xi is correlated with the error term nThe vector of instruments zi (with the same dimension as xi) fulfills n E{εi zi} = 0 nIV estimator based on the instruments zi n n n Dec 16, 2011 Hackl, Econometrics, Lecture 5 24 IV Estimator: General Case, cont’d nThe (asymptotic) covariance matrix of is given by n n nIn the estimated covariance matrix, σ² is substituted by n n n which is based on the IV residuals nThe asymptotic distribution of IV estimators, given IID(0, σε²) error terms, leads to the approximate distribution n n with the estimated covariance matrix n Dec 16, 2011 Hackl, Econometrics, Lecture 5 25 Derivation of the IV Estimator nThe model is n yi = xi‘β + εt = x0i‘β0 + βKxKi + εi n with x0i = (x1i, …, xK-1,i)’ containing the first K-1 components of xi, and E{εi x0i} = 0 nK-the component is endogenous: E{εi xKi} ≠ 0 nThe instrumental variable zKi fulfills n E{εi zKi} = 0 nMoment conditions: K conditions to be satisfied by the coefficients, the K-th condition with zKi instead of xKi: n E{εi x0i} = E{(yi – x0i‘β0 – βKxKi) x0i} = 0 (K-1 conditions) n E{εi zi} = E{(yi – x0i‘β0 – βKxKi) zKi} = 0 nNumber of conditions – and corresponding linear equations – equals the number of coefficients to be estimated Dec 16, 2011 Hackl, Econometrics, Lecture 5 26 Derivation of the IV Estimator, cont’d nThe system of linear equations for the K coefficients β to be estimated can be uniquely solved for the coefficients β: the coefficients β are said “to be identified” nTo derive the IV estimators from the moment conditions, the expectations are replaced by sample averages n n n nThe solution of the linear equation system – with zi’ = (x0i‘, zKi) – is n n nIdentification requires that the KxK matrix Σi zi xi’ is finite and invertible; instrument zKi is relevant when this is fulfilled Dec 16, 2011 Contents nOLS Estimator Revisited nCases of Regressors Correlated with Error Term nInstrumental Variables (IV) Estimator: The Concept nIV Estimator: The Method nCalculation of the IV Estimator nAn Example nThe GIV Estimator nSome Tests Dec 16, 2011 Hackl, Econometrics, Lecture 5 27 Hackl, Econometrics, Lecture 5 28 Calculation of the IV Estimator nThe model in matrix notation, n y = Xβ + ε nThe IV estimator n n with zi obtained from xi by substituting instrumental variable(s) for all endogenous regressors nCalculation in two steps: 1.Regression of the explanatory variables x1, …, xK – including the endogenous ones – on the columns of Z: fitted values 2. 2.Regression of y on the fitted explanatory variables: 3. Dec 16, 2011 Hackl, Econometrics, Lecture 5 29 Calculation of the IV Estimator, cont’d nRemarks: nThe KxK matrix Z’X = Σi zixi’ is required to be finite and invertible nFrom n n n it is obvious that the estimator obtained in the second step is the IV estimator nHowever, the estimator obtained in the second step is more general; see below nIn GRETL: The sequence of buttons „Model > Instrumental variables > Two-Stage Least Squares…“ leads to the specification window with boxes (i) for the independent variables and (ii) for the instruments Dec 16, 2011 Hackl, Econometrics, Lecture 5 30 Choice of Instrumental Variables nInstrumental variable are required to be nexogenous, i.e., uncorrelated with the error terms nrelevant, i.e., correlated with the endogenous regressors nInstruments nmust be based on subject matter arguments, e.g., arguments from economic theory nshould be explained and motivated nmust show a significant effect in explaining an endogenous regressor nChoice of instruments often not easy nRegression of endogenous variables on instruments nBest linear approximation of endogenous variables nEconomic interpretation not of importance and interest n n n Dec 16, 2011 Contents nOLS Estimator Revisited nCases of Regressors Correlated with Error Term nInstrumental Variables (IV) Estimator: The Concept nIV Estimator: The Method nCalculation of the IV Estimator nAn Example nThe GIV Estimator nSome Tests Dec 16, 2011 Hackl, Econometrics, Lecture 5 31 Hackl, Econometrics, Lecture 5 32 Example: Returns to Schooling nHuman capital earnings function: n wi = β1 + β2Si + β3Ei + β4Ei2 + εi n with wi: log of individual earnings, Si: years of schooling, Ei: years of experience (Ei = agei - Si – 6) nEmpirically, more education implies higher income nQuestion: Is this effect causal? nIf yes, one year more at school increases wage by β2 nOtherwise, abilities may cause higher income and also more years at school; more years at school do not increase wage nIssue of substantial attention in literature Dec 16, 2011 Hackl, Econometrics, Lecture 5 33 Returns to Schooling nWage equation: besides Si and Ei, additional explanatory variables like gender, regional, racial dummies nModel for analysis: n wi = β1 + zi‘γ + β2Si + β3Ei + β4Ei2 + εi n zi: observable variables besides Ei, Si nzi is assumed to be exogenous, i.e., E{zi εi} = 0 nSi may be endogenous, i.e., E{Si εi} ≠ 0 qAbility bias: unobservable factors like intelligence, family background, etc. enable to more schooling and higher earnings qMeasurement error in measuring schooling qEtc. nWith Si, also Ei = agei – Si – 6 and Ei2 are endogenous nOLS estimators may be inconsistent n Dec 16, 2011 Hackl, Econometrics, Lecture 5 34 Returns to Schooling: Data nVerbeek‘s data set “schooling” nNational Longitudinal Survey of Young Men (Card, 1995) nData from 3010 males, survey 1976 nIndividual characteristics, incl. experience, race, region, family background etc. nHuman capital function q log(wagei) = β1 + β2 edi + β3 expi + β3 expi² + εi n with edi: years of schooling (Si), expi: years of experience (Ei) nFurther explanatory variables: black: dummy for afro-american, smsa: dummy for living in metropolitan area, south: dummy for living in the south Dec 16, 2011 OLS Estimation Dec 16, 2011 Hackl, Econometrics, Lecture 5 35 OLS estimated wage function : Output from GRETL Model 2: OLS, using observations 1-3010 Dependent variable: l_WAGE76 Koeffizient Std.-fehler t-Quotient P-Wert ---------------------------------------------------------- const 4.73366 0.0676026 70.02 0.0000 *** ED76 0.0740090 0.00350544 21.11 2.28e-092 *** EXP76 0.0835958 0.00664779 12.57 2.22e-035 *** EXP762 -0.00224088 0.000317840 -7.050 2.21e-012 *** BLACK -0.189632 0.0176266 -10.76 1.64e-026 *** SMSA76 0.161423 0.0155733 10.37 9.27e-025 *** SOUTH76 -0.124862 0.0151182 -8.259 2.18e-016 *** Mean dependent var 6.261832 S.D. dependent var 0.443798 Sum squared resid 420.4760 S.E. of regression 0.374191 R-squared 0.290505 Adjusted R-squared 0.289088 F(6, 3003) 204.9318 P-value(F) 1.5e-219 Log-likelihood -1308.702 Akaike criterion 2631.403 Schwarz criterion 2673.471 Hannan-Quinn 2646.532 Hackl, Econometrics, Lecture 5 36 Instruments for Si, Ei, Ei2 nPotential instrumental variables nFactors which affect schooling but are uncorrelated with error terms, in particular with unobserved abilities that are determining wage nFor years of schooling (Si) qCosts of schooling, e.g., distance to school (lived near college), number of siblings qParents’ education qQuarter of birth nFor years of experience (Ei, Ei2): age is natural candidate n Dec 16, 2011 Step 1 of IV Estimation Dec 16, 2011 Hackl, Econometrics, Lecture 5 37 Model for schooling (ed76), gives predicted values ed76_h, from GRETL Model 3: OLS, using observations 1-3010 Dependent variable: ED76 coefficient std. error t-ratio p-value ---------------------------------------------------------- const -1.81870 4.28974 -0.4240 0.6716 AGE76 1.05881 0.300843 3.519 0.0004 *** sq_AGE76 -0.0187266 0.00522162 -3.586 0.0003 *** BLACK -1.46842 0.115245 -12.74 2.96e-036 *** SMSA76 0.841142 0.105841 7.947 2.67e-015 *** SOUTH76 -0.429925 0.102575 -4.191 2.85e-05 *** NEARC4A 0.441082 0.0966588 4.563 5.24e-06 *** Mean dependent var 13.26346 S.D. dependent var 2.676913 Sum squared resid 18941.85 S.E. of regression 2.511502 R-squared 0.121520 Adjusted R-squared 0.119765 F(6, 3003) 69.23419 P-value(F) 5.49e-81 Log-likelihood -7039.353 Akaike criterion 14092.71 Schwarz criterion 14134.77 Hannan-Quinn 14107.83 Step 2 of IV Estimation Dec 16, 2011 Hackl, Econometrics, Lecture 5 38 Wage equation, estimated by IV with instruments age, age2, and nearc4a Model 4: OLS, using observations 1-3010 Dependent variable: l_WAGE76 coefficient std. error t-ratio p-value ---------------------------------------------------------- const 3.69771 0.435332 8.494 3.09e-017 *** ED76_h 0.164248 0.036887 4.453 8.79e-06 *** EXP76_h 0.044588 0.022502 1.981 0.0476 ** EXP762_h -0.000195 0.001152 -0.169 0.8655 BLACK -0.057333 0.056772 -1.010 0.3126 SMSA76 0.079372 0. 037116 2.138 0.0326 ** SOUTH76 -0.083698 0.022985 -3.641 0.0003 *** Mean dependent var 6.261832 S.D. dependent var 0.443798 Sum squared resid 446.8056 S.E. of regression 0.385728 R-squared 0.246078 Adjusted R-squared 0.244572 F(6, 3003) 163.3618 P-value(F) 4.4e-180 Log-likelihood -1516.471 Akaike criterion 3046.943 Schwarz criterion 3089.011 Hannan-Quinn 3062.072 GRETL’s TSLS Estimation Dec 16, 2011 Hackl, Econometrics, Lecture 5 39 Wage equation, estimated by IV: Output from GRETL Model 8: TSLS, using observations 1-3010 Dependent variable: l_WAGE76 Instrumented: ED76 EXP76 EXP762 Instruments: const AGE76 sq_AGE76 BLACK SMSA76 SOUTH76 NEARC4A coefficient std. error t-ratio p-value ---------------------------------------------------------- const 3.69771 0.495136 7.468 8.14e-014 *** ED76 0.164248 0.0419547 3.915 9.04e-05 *** EXP76 0.0445878 0.0255932 1.742 0.0815 * EXP762 -0.00019526 0.0013110 -0.1489 0.8816 BLACK -0.0573333 0.0645713 -0.8879 0.3746 SMSA76 0.0793715 0.0422150 1.880 0.0601 * SOUTH76 -0.0836975 0.0261426 -3.202 0.0014 *** Mean dependent var 6.261832 S.D. dependent var 0.443798 Sum squared resid 577.9991 S.E. of regression 0.438718 R-squared 0.195884 Adjusted R-squared 0.194277 F(6, 3003) 126.2821 P-value(F) 8.9e-143 Hackl, Econometrics, Lecture 5 40 Returns to Schooling: Summary of Estimates nEstimated regression coefficients and t-statistics n 1) 1) 1) 1) 1) 1) 1) 1) 1) n 1) The model differs from that used by Verbeek 1) Dec 16, 2011 OLS IV1) TSLS1) IV (M.V.) ed76 0.0740 0.1642 0.1642 0.1329 21.11 4.45 3.92 2.59 exp76 0.0836 0.0445 0.0446 0.0560 12.75 1.98 1.74 2.15 exp762 -0.0022 -0.0002 -0.0002 -0.0008 -7.05 -0.17 -0.15 -0.59 black -0.1896 -0. 0573 -0.0573 -0.1031 -10.76 -1.01 -0.89 -1.33 Hackl, Econometrics, Lecture 5 41 Some Comments nInstrumental variables (age, age2, nearc4a) nare relevant, i.e., have explanatory power for ed76, exp76, exp762 nWhether they are exogenous, i.e., uncorrelated with the error terms, is not answered nTest for exogeneity of regressors: Wu-Hausman test nEstimates of ed76-coefficient: nIV estimate: 0.13, i.e., 13% higher wage for one additional year of schooling; nearly the double of the OLS estimate (0.07); not in line with “ability bias” argument! ns.e. of IV estimate (0.04) much higher than s.e. of OLS estimate (0.004) nLoss of efficiency especially in case of weak instruments: R2 of model for ed76: 0.12; Corr{ed76, ed76_h} = 0,35 Dec 16, 2011 Contents nOLS Estimator Revisited nCases of Regressors Correlated with Error Term nInstrumental Variables (IV) Estimator: The Concept nIV Estimator: The Method nCalculation of the IV Estimator nAn Example nThe GIV Estimator nSome Tests Dec 16, 2011 Hackl, Econometrics, Lecture 5 42 From OLS to IV Estimation nLinear model yi = xi‘β + εi nOLS estimator: solution of the K normal equations n 1/N Σi(yi – xi‘β) xi = 0 nCorresponding moment conditions n E{εi xi} = E{(yi – xi‘β) xi} = 0 nIV estimator given R instrumental variables zi which may overlap with xi: based on the R moment conditions n E{εi zi} = E{(yi – xi‘β) zi} = 0 nIV estimator: solution of corresponding sample moment conditions n Dec 16, 2011 Hackl, Econometrics, Lecture 5 43 Number of Instruments nMoment conditions n E{εi zi} = E{(yi – xi‘β) zi} = 0 n one equation for each component of zi nzi possibly overlapping with xi nGeneral case: R moment conditions nSubstitution of expectations by sample averages gives R equations n n 1.R = K: one unique solution, the IV estimator; identified model 2. 2.R < K: infinite number of solutions, not enough instruments for a unique solution; under-identified or not identified model Dec 16, 2011 Hackl, Econometrics, Lecture 5 44 The GIV Estimator 3.R > K: more instruments than necessary for identification; over-identified model nFor R > K, in general, no unique solution of all R sample moment conditions can be obtained; instead: nthe weighted quadratic form in the sample moments n n with a RxR positive definite weighting matrix WN is minimized ngives the generalized instrumental variable (GIV) estimator n n n n Dec 16, 2011 Hackl, Econometrics, Lecture 5 45 The GIV Estimator, cont’d nThe weighting matrix WN nDifferent weighting matrices result in different consistent GIV estimators with different covariance matrices nFor R = K, the matrix Z’X is square and invertible; the IV estimator is (Z’X)-1Z’y for any WN nOptimal choice for WN? n n n Dec 16, 2011 Hackl, Econometrics, Lecture 5 46 GIV and TSLS Estimator nOptimal weighting matrix: WNopt = [1/N(Z’Z)]-1; corresponds to the most efficient IV estimator n nIf the error terms are heteroskedastic or autocorrelated, the optimal weighting matrix has to be adapted nRegression of each regressor, i.e., each column of X, on Z results in and n nThis explains why the GIV estimator is also called “two stage least squares” (TSLS) estimator: 1.First step: regress each column of X on Z 2.Second step: regress y on predictions of X n n n Dec 16, 2011 Hackl, Econometrics, Lecture 5 47 GIV Estimator and Properties nGIV estimator is consistent nThe asymptotic distribution of the GIV estimator, given IID(0, σε²) error terms, leads to the approximate distribution n nThe (asymptotic) covariance matrix of is given by n n nIn the estimated covariance matrix, σ² is substituted by n n the estimate based on the IV residuals Dec 16, 2011 Hackl, Econometrics, Lecture 5 48 Contents nOLS Estimator Revisited nCases of Regressors Correlated with Error Term nInstrumental Variables (IV) Estimator: The Concept nIV Estimator: The Method nCalculation of the IV Estimator nAn Example nThe GIV Estimator nSome Tests Dec 16, 2011 Hackl, Econometrics, Lecture 5 49 Hackl, Econometrics, Lecture 5 50 Some Tests nFor testing nEndogeneity of regressors: Wu-Hausman test or Durbin-Wu-Hausman test nRelevance of potential instrumental variables: over-identifying restrictions test or Sargan test nWeak instruments: Cragg-Donald test Dec 16, 2011 Hackl, Econometrics, Lecture 5 51 Wu-Hausman Test nFor testing whether one or more regressors are endogenous (correlated with the error term) nBased on the assumption that the instrumental variables are valid; i.e., given that E{εi zi} = 0, the null hypothesis, E{εi xi} = 0, can be tested nThe idea of the test: nUnder the null hypothesis, both the OLS and IV estimator are consistent; they should differ by sampling errors only nRejection of the null hypothesis indicates inconsistency of the OLS estimator Dec 16, 2011 Hackl, Econometrics, Lecture 5 52 Wu-Hausman Test, cont’d nBased on the (squared) difference between OLS- and IV-estimators nAdded variable interpretation of the Wu-Hausman test: checks whether the residuals vi from the reduced form equation of potentially endogenous regressors contribute to explaining n yi = x1i’b1 + x2ib2 + viγ + εi nvi: residuals from reduced form equation for x2 (predicted values for x2: x2 + v) nH0: γ = 0; corresponds to: x2 is exogenous nFor testing H0: use of nt-test, if γ has one component, x2 is just one regressor nF-test, if more than 1 regressors are tested for exogeneity Dec 16, 2011 Hackl, Econometrics, Lecture 5 53 Wu-Hausman Test, cont’d nRemarks nTest requires valid instruments nTest has little power if instruments are weak or invalid nTest can be used to test whether additional instruments are valid Dec 16, 2011 Hackl, Econometrics, Lecture 5 54 Sargan Test nFor testing whether the instruments are valid nThe validity of the instruments zi requires that all moment conditions are fulfilled; for the R-vector zi, the R sums n n n must be close to zero nTest statistic n n has under the null hypothesis an asymptotic Chi-squared distribution with R-K df nCalculation of ξ: ξ = NRe2 using Re2 form the auxiliary regression of IV residuals ei = on the instruments zi Dec 16, 2011 Hackl, Econometrics, Lecture 5 55 Sargan Test, cont’d nRemarks nOnly R-K of the R moment conditions are “free”; in case of an identified model (R = K), all R moment conditions are fulfilled nThe test is also called over-identifying restrictions test nRejection implies: the joint validity of all moment conditions and hence of all instruments is not acceptable nThe Sargan test gives no indication of invalid instruments nTest whether a subset of R-R1 instruments is valid; R1 (>K) instruments are out of doubt: qCalculate ξ for all R moment conditions qCalculate ξ1 for the R1 moment conditions qUnder H0, ξ - ξ1 has a Chi-squared distribution with R-R1 df Dec 16, 2011 Hackl, Econometrics, Lecture 5 56 Cragg-Donald Test nWeak (only marginally valid) instruments: nBiased estimates nInconsistent IV estimates nInappropriate large-sample approximations to the finite-sample distributions even for large N nDefinition of weak instruments: estimates are biased to an extent that is unacceptably large nNull hypothesis: instruments are weak, i.e., can lead to an asymptotic relative bias greater than some value b n Dec 16, 2011 Your Homework 1.Use the data set “schooling” of Verbeek for the following analyses based on the wage equation n log(wage76) = b1 + b2 ed76 + b3 exp76 + b4 exp762 n + b5 black + b6 smsa76 + b7 south76 + b8 nearc4 + e a.Estimate the reduced form for ed76, including daded and momed (i) with and (ii) without nearc4; assess the validity of the potential instruments; what indicate the correlation coefficients? b.Estimate the wage equation, using the instruments age, age2, daded, and momed (i) with and (ii) without nearc4; interpret the results including the test for validity and the Sargan test. c.Compare the estimates for b2 (i) from the model in b., (ii) from the model with instruments age, age2, and nearc4, (iii) from the GRETL Instrumental variables (Two-Stage Least Squares …) procedure, and (iv) with the OLS estimates. Dec 16, 2011 Hackl, Econometrics, Lecture 5 57 Your Homework, cont’d 2.For the model for consumption and income (slide 14 ff): a.Show that both yt and xt are endogenous: q E{yi εi} = E{xi εi} = σε²(1 – β2)-1 b.Derive the reduced form of the model Dec 16, 2011 Hackl, Econometrics, Lecture 5 58