Econometrics - Lecture 5 Endogeneity, Instru-mental Variables, IV Estimator Contents nOLS Estimator Revisited nCases of Regressors Correlated with Error Term nInstrumental Variables (IV) Estimator: The Concept nIV Estimator: The Method nCalculation of the IV Estimator nAn Example nThe GIV Estimator nSome Tests Nov 27, 2015 Hackl, Econometrics, Lecture 5 2 OLS Estimator Nov 27, 2015 Hackl, Econometrics, Lecture 5 3 Linear model for yt yi = xi'β + εi, i = 1, …, N (or y = Xβ + ε) given observations xik, k =1, …, K, of the regressor variables, error term εi OLS estimator b = (Σixi xi’)-1Σixi yi = (X’X)-1X’y From b = (Σixi xi’)-1Σixi yi = (Σixi xi’)-1Σixi xi‘ β + (Σixi xi’)-1Σixi εi = β + (Σixi xi’)-1Σixi εi = β + (X’X)-1 X’ε follows E{b} = (Σixi xi’)-1Σixiyi = (Σixi xi’)-1Σixi xi‘ β + (Σixi xi’)-1Σixi εi = β + (Σixi xi’)-1 E{Σixi εi} = β + (X’X)-1 E{X’ε} OLS Estimator, cont’d Nov 27, 2015 Hackl, Econometrics, Lecture 5 4 1.OLS estimator b is unbiased if n(A1) E{ε} = 0 nE{Σixi εi } = E{X’ε} = 0; is fulfilled if (A7) or a stronger assumption is true q(A2) {xi, i =1, …,N} and {εi, i =1, …,N} are independent; is the strongest assumption q(A10) E{ε|X} = 0, i.e., X uninformative about E{εi} for all i (ε is conditional mean independent of X); is implied by (A2) q(A8) xi and εi are independent for all i (no contemporaneous dependence); is less strong than (A2) and (A10) q(A7) E{xi εi} = 0 for all i (no contemporaneous correlation); is even less strong than (A8) q q n OLS Estimator, cont’d Nov 27, 2015 Hackl, Econometrics, Lecture 5 5 2.OLS estimator b is consistent for β if n(A8) xi and εi are independent for all i n(A6) (1/N)Σi xi xi’ has as limit (N→∞) a nonsingular matrix Σxx (A8) can be substituted by (A7) [E{xi εi} = 0 for all i, no contemporaneous correlation] 3.OLS estimator b is asymptotically normally distributed if (A6), (A8) and n(A11) εi ~ IID(0,σ²) are true; nfor large N, b follows approximately the normal distribution b ~a N{β, σ2(Σi xi xi’ )-1} nUse White and Newey-West estimators for V{b} in case of heteroskedasticity and autocorrelation of error terms, respectively n Hackl, Econometrics, Lecture 5 6 Assumption (A7): E{xi εi} = 0 for all i nImplication of (A7): for all i, each of the regressors is uncorrelated with the current error term, no contemporaneous correlation nStronger assumptions – (A2), (A10), (A8) – have same consequences n(A7) guaranties unbiasedness and consistency of the OLS estimator nIn reality, (A7) is not always true: alternative estimation procedures are required for ascertaining consistency and unbiasedness nExamples of situations with E{xi εi} ≠ 0: nRegressors with measurement errors nRegression on the lagged dependent variable with autocorrelated error terms (dynamic regression) nUnobserved heterogeneity nEndogeneity of regressors, simultaneity n n n Nov 27, 2015 Contents nOLS Estimator Revisited nCases of Regressors Correlated with Error Term nInstrumental Variables (IV) Estimator: The Concept nIV Estimator: The Method nCalculation of the IV Estimator nAn Example nThe GIV Estimator nSome Tests Nov 27, 2015 Hackl, Econometrics, Lecture 5 7 Hackl, Econometrics, Lecture 5 8 Regressor with Measurement Error n yi = β1 + β2wi + vi nwith white noise vi, V{vi} = σv², and E{vi|wi} = 0; conditional expectation of yi given wi : E{yi|wi} = β1 + β2wi nExample: wi: household income, yi: household savings nMeasurement process: reported household income xi, may deviate from household income wi n xi = wi + ui n where ui is (i) white noise with V{ui} = σu², (ii) independent of vi, and (iii) independent of wi nThe model to be analyzed is n yi = β1 + β2xi + εi with εi = vi - β2ui nE{xi εi} = - β2 σu² ≠ 0: requirement for consistency and unbiasedness is violated nxi and εi are negatively (positively) correlated if β2 > 0 (β2 < 0) n n Nov 27, 2015 Hackl, Econometrics, Lecture 5 9 Consequences of Measurement Errors nInconsistency of b2 n plim b2 = β2 + E{xi εi} / V{xi} n n n n β2 is underestimated nInconsistency of b1 n plim (b1 - β1) = - plim (b2 - β2) E{xi} n given E{xi} > 0 for the reported income: β1 is overestimated; inconsistency “carries over” nThe model does not correspond to the conditional expectation of yi given xi: n E{yi|xi} = β1 + β2xi - β2 E{ui|xi} ≠ β1 + β2xi n as E{ui|xi} ≠ 0 n Nov 27, 2015 Hackl, Econometrics, Lecture 5 10 Dynamic Regression nAllows modelling dynamic effects of changes of x on y: n yt = β1 + β2xt + β3yt-1 + εt nOLS estimators are consistent if E{xt εt} = 0 and E{yt-1 εt} = 0 nAR(1) model for εt: n εt = ρεt-1 + vt n vt white noise with σv² nFrom yt = β1 + β2xt + β3yt-1 + ρεt-1 + vt follows n E{yt-1εt} = β3 E{yt-2εt} + ρ²σv²(1 - ρ²)-1 n i.e., yt-1 is correlated with εt nOLS estimators not consistent nThe model does not correspond to the conditional expectation of yt given the regressors xt and yt-1: n E{yt|xt, yt-1} = β1 + β2xt + β3yt-1 + E{εt |xt, yt-1} Nov 27, 2015 Hackl, Econometrics, Lecture 5 11 Omission of Relevant Regressors nTwo models: n yi = xi‘β + zi’γ + εi (A) n yi = xi‘β + vi (B) nTrue model (A), fitted model (B) nOLS estimates bB of β from (B) n n nOmitted variable bias: E{(Σi xi xi’)-1 Σi xi zi’}γ = E{(X’X)-1 X’Z}γ nNo bias if (a) γ = 0, i.e., model (A) is correct, or if (b) variables in xi and zi are uncorrelated (orthogonal) nOLS estimators are biased, if relevant regressors are omitted that are non-orthogonal, i.e., correlated with regressors in xi Nov 27, 2015 Hackl, Econometrics, Lecture 5 12 Unobserved Heterogeneity nExample: Wage equation with yi: log wage, x1i: personal characteristics, x2i: years of schooling, ui: abilities (unobservable) n yi = x1i‘β1 + x2iβ2 + uiγ + vi nModel for analysis (unobserved ui covered in error term) n yi = xi‘β + εi n with xi = (x1i‘, x2i)’, β = (β1‘, β2)’, εi = uiγ + vi nGiven E{xi vi} = 0 n plim b = β + Σxx-1 E{xi ui} γ nOLS estimators b are inconsistent if xi and ui are correlated (γ ≠ 0), e.g., if higher abilities induce more years at school: estimator for β2 might be overestimated, hence effects of years at school etc. are overestimated: “ability bias” nUnobserved heterogeneity: observational units differ in other aspects than ones that are observable n Nov 27, 2015 Hackl, Econometrics, Lecture 5 13 Endogenous Regressors nRegressors in X which are correlated with error term, E{X‘ε} ≠ 0, are called endogenous nEndogeneity bias nRelevant for many economic applications nOLS estimators b = β + (X‘X)-1X‘ε qE{b} ≠ β, b is biased; bias E{(X‘X)-1X‘ε} difficult to assess qplim b = β + Σxx-1q with q = plim(N-1X‘ε) nFor q = 0 (regressors and error term asymptotically uncorrelated), OLS estimators b are consistent also in case of endogenous regressors nFor q ≠ 0 (error term and at least one regressor asymptotically correlated): plim b ≠ β, the OLS estimators b are not consistent nExogenous regressors: with error term uncorrelated, all non-endogenous regressors Nov 27, 2015 Hackl, Econometrics, Lecture 5 14 Consumption Function nAWM data base, 1970:1-2003:4 nC: private consumption (PCR), growth rate p.y. nY: disposable income of households (PYR), growth rate p.y. n Ct = β1 + β2Yt + εt (A) n β2: marginal propensity to consume, 0 < β2 < 1 nOLS estimates: n Ĉt = 0.011 + 0.718 Yt n with t = 15.55, R2 = 0.65, DW = 0.50 nIt: per capita investment (exogenous, E{It εt} = 0) n Yt = Ct + It (B) nBoth Yt and Ct are endogenous: E{Ct εi} = E{Yt εi} = σε²(1 – β2)-1 nThe regressor Yt has an impact on Ct; at the same time Ct has an impact on Yt Nov 27, 2015 Hackl, Econometrics, Lecture 5 15 Simultaneous Equation Models nIllustrated by the preceding consumption function: nVariables Yt and Ct are simultaneously determined by equations (A) and (B) nEquations (A) and (B) are the structural equations or the structural form of the simultaneous equation model that describes both Yt and Ct nThe coefficients β1 and β2 are behavioral parameters nReduced form of the model: one equation for each of the endogenous variables Ct and Yt, with only the exogenous variable It as regressor nThe OLS estimators are biased and inconsistent n n n Nov 27, 2015 Hackl, Econometrics, Lecture 5 16 Consumption Function, cont’d nReduced form of the model: n n n n n nOLS estimator b2 from (A) is inconsistent; E{Yt εt} ≠ 0 n plim b2 = β2 + Cov{Yt εt} / V{Yt} = β2 + (1 – β2) σε²(V{It} + σε²)-1 n for 0 < β2 < 1, b2 overestimates β2 nThe OLS estimator b1 is also inconsistent n Nov 27, 2015 Contents nOLS Estimator Revisited nCases of Regressors Correlated with Error Term nInstrumental Variables (IV) Estimator: The Concept nIV Estimator: The Method nCalculation of the IV Estimator nAn Example nThe GIV Estimator nSome Tests Nov 27, 2015 Hackl, Econometrics, Lecture 5 17 Hackl, Econometrics, Lecture 5 18 An Alternative Estimator nModel n yi = β1 + β2 xi + εi n with E{ εi xi } ≠ 0, i.e., endogenous regressor xi : OLS estimators are biased and inconsistent nInstrumental variable zi satisfying 1.Exogeneity: E{εi zi} = 0: is uncorrelated with error term 2.Relevance: Cov{xi , zi} ≠ 0: is correlated with endogenous regressor nTransformation of model equation n Cov{yi , zi } = β2 Cov{xi , zi} + Cov{εi , zi} n gives n Nov 27, 2015 Hackl, Econometrics, Lecture 5 19 IV Estimator for β2 nSubstitution of sample moments for covariances gives the instrumental variables (IV) estimator n n n nConsistent estimator for β2 given that the instrumental variable zi is valid , i.e., it is qExogenous, i.e. E{εi zi} = 0 qRelevant, i.e. Cov{xi , zi} ≠ 0 nTypically, nothing can be said about the bias of an IV estimator; small sample properties are unknown nCoincides with OLS estimator for zi = xi Nov 27, 2015 Hackl, Econometrics, Lecture 5 20 Consumption Function, cont’d nAlternative model: Ct = β1 + β2Yt-1 + εt nYt-1 and εt are certainly uncorrelated; avoids risk of inconsistency due to correlated Yt and εt nYt-1 is certainly highly correlated with Yt, is almost as good as regressor as Yt nFitted model: n Ĉ = 0.012 + 0.660 Y-1 n with t = 12.86, R2 = 0.56, DW = 0.79 (instead of Ĉ = 0.011 + 0.718 Y with t = 15.55, R2 = 0.65, DW = 0.50) nDeterioration of t-statistic and R2 are price for improvement of the estimator Nov 27, 2015 Hackl, Econometrics, Lecture 5 21 IV Estimator: The Concept nAlternative to OLS estimator nAvoids inconsistency in case of endogenous regressors nIdea of the IV estimator: qReplace regressors which are correlated with error terms by regressors which are nuncorrelated with the error terms n(highly) correlated with the regressors that are to be replaced q and use OLS estimation nThe hope is that the IV estimator is consistent (and less biased) than the OLS estimator nPrice: Deteriorated model fit as measured by, e.g., t-statistic, R2 Nov 27, 2015 Contents nOLS Estimator Revisited nCases of Regressors Correlated with Error Term nInstrumental Variables (IV) Estimator: The Concept nIV Estimator: The Method nCalculation of the IV Estimator nAn Example nThe GIV Estimator nSome Tests Nov 27, 2015 Hackl, Econometrics, Lecture 5 22 Hackl, Econometrics, Lecture 5 23 IV Estimator: General Case nThe model is n yi = xi‘β + εi n with V{εi} = σε² and n E{εi xi} ≠ 0 nat least one component of xi is correlated with the error term nThe vector of instruments zi (with the same dimension as xi) fulfils n E{εi zi} = 0 n Cov{xi , zi} ≠ 0 nIV estimator based on the instruments zi n n n Nov 27, 2015 Hackl, Econometrics, Lecture 5 24 IV Estimator: General Case, cont’d nThe (asymptotic) covariance matrix of the IV estimator is given by n n nIn the estimated covariance matrix , σ² is substituted by n n n which is based on the IV residuals nThe asymptotic distribution of IV estimators, given IID(0, σε²) error terms, leads to the approximate distribution n n with the estimated covariance matrix n Nov 27, 2015 Hackl, Econometrics, Lecture 5 25 Derivation of the IV Estimator nThe model is n yi = xi‘β + εt = x0i‘β0 + βKxKi + εi n with x0i = (x1i, …, xK-1,i)’ containing the first K-1 components of xi, and E{εi x0i} = 0 nK-the component is endogenous: E{εi xKi} ≠ 0 nThe instrumental variable zKi fulfills n E{εi zKi} = 0 nMoment conditions: K conditions to be satisfied by the coefficients, the K-th condition with zKi instead of xKi: n E{εi x0i} = E{(yi – x0i‘β0 – βKxKi) x0i} = 0 (K-1 conditions) n E{εi zi} = E{(yi – x0i‘β0 – βKxKi) zKi} = 0 nNumber of conditions – and of corresponding linear equations – equals the number of coefficients to be estimated Nov 27, 2015 Hackl, Econometrics, Lecture 5 26 Derivation of the IV Estimator, cont’d nThe system of linear equations for the K coefficients β to be estimated can be uniquely solved for the coefficients β: the coefficients β are said “to be identified” nTo derive the IV estimators from the moment conditions, the expectations are replaced by sample averages n n n nThe solution of the linear equation system – with zi’ = (x0i‘, zKi) – is n n nIdentification requires that the KxK matrix Σi zi xi’ is finite and invertible; instrument zKi is relevant when this is fulfilled Nov 27, 2015 Contents nOLS Estimator Revisited nCases of Regressors Correlated with Error Term nInstrumental Variables (IV) Estimator: The Concept nIV Estimator: The Method nCalculation of the IV Estimator nAn Example nThe GIV Estimator nSome Tests Nov 27, 2015 Hackl, Econometrics, Lecture 5 27 Hackl, Econometrics, Lecture 5 28 Calculation of the IV Estimator nThe model in matrix notation n y = Xβ + ε nThe IV estimator n n with zi obtained from xi by substituting instrumental variable(s) for all endogenous regressors nCalculation in two steps: 1.Reduced form: Regression of the explanatory variables x1, …, xK – including the endogenous ones – on the columns of Z: fitted values 2. 2.Regression of y on the fitted explanatory variables: 3. Nov 27, 2015 Hackl, Econometrics, Lecture 5 29 Calculation of the IV Estimator, cont’d nRemarks: nThe KxK matrix Z’X = Σi zixi’ is required to be finite and invertible nFrom n n n it is obvious that the estimator obtained in the second step is the IV estimator nHowever, the estimator obtained in the second step is more general; see below nIn GRETL: The sequence „Model > Instrumental variables > Two-Stage Least Squares…“ leads to the specification window with boxes (i) for the independent variables and (ii) for the instruments Nov 27, 2015 Hackl, Econometrics, Lecture 5 30 Choice of Instrumental Variables nInstrumental variable are required to be nexogenous, i.e., uncorrelated with the error terms nrelevant, i.e., correlated with the endogenous regressors nInstruments nmust be based on subject matter arguments, e.g., arguments from economic theory nshould be explained and motivated nmust show a significant effect in explaining an endogenous regressor nChoice of instruments often not easy nRegression of endogenous variables on instruments nBest linear approximation of endogenous variables nEconomic interpretation not of importance and interest n n n Nov 27, 2015 Contents nOLS Estimator Revisited nCases of Regressors Correlated with Error Term nInstrumental Variables (IV) Estimator: The Concept nIV Estimator: The Method nCalculation of the IV Estimator nAn Example nThe GIV Estimator nSome Tests Nov 27, 2015 Hackl, Econometrics, Lecture 5 31 Hackl, Econometrics, Lecture 5 32 Example: Returns to Schooling nHuman capital earnings function: n wi = β1 + β2Si + β3Ei + β4Ei2 + εi n with wi: log of individual earnings, Si: years of schooling, Ei: years of experience (Ei = agei – Si – 6) nEmpirically, more education implies higher income nQuestion: Is this effect causal? nIf yes, one year more at school increases wage by β2 (Theory A) nAlternatively, personal abilities of an individual causes higher income and also more years at school; more years at school do not increase wage (Theory B) nIssue of substantial attention in literature Nov 27, 2015 Hackl, Econometrics, Lecture 5 33 Returns to Schooling nWage equation: besides Si and Ei, additional explanatory variables like gender, regional, racial dummies nModel for analysis: n wi = β1 + zi‘γ + β2Si + β3Ei + β4Ei2 + εi n zi: observable variables besides Ei, Si nzi is assumed to be exogenous, i.e., E{zi εi} = 0 nSi may be endogenous, i.e., E{Si εi} ≠ 0 qAbility bias: unobservable factors like intelligence, family background, etc. enable to more schooling and higher earnings qMeasurement error in measuring schooling qEtc. nWith Si, also Ei = agei – Si – 6 and Ei2 are endogenous nOLS estimators may be inconsistent n Nov 27, 2015 Hackl, Econometrics, Lecture 5 34 Returns to Schooling: Data nVerbeek‘s data set “schooling” nNational Longitudinal Survey of Young Men (Card, 1995) nData from 3010 males, survey 1976 nIndividual characteristics, incl. experience, race, region, family background etc. nHuman capital function q log(wagei) = β1 + β2 edi + β3 expi + β3 expi² + εi n with edi: years of schooling (Si), expi: years of experience (Ei) nVariables: wage76 (wage in 1976, raw, cents p.h.), ed76 (years at school in 1976), exp76 (experience in 1976), exp762 (exp76 squared) nFurther explanatory variables: black: dummy for afro-american, smsa: dummy for living in metropolitan area, south: dummy for living in the south Nov 27, 2015 OLS Estimation Nov 27, 2015 Hackl, Econometrics, Lecture 5 35 OLS estimated wage function : Output from GRETL Model 2: OLS, using observations 1-3010 Dependent variable: l_WAGE76 Koeffizient Std.-fehler t-Quotient P-Wert ---------------------------------------------------------- const 4.73366 0.0676026 70.02 0.0000 *** ED76 0.0740090 0.00350544 21.11 2.28e-092 *** EXP76 0.0835958 0.00664779 12.57 2.22e-035 *** EXP762 -0.00224088 0.000317840 -7.050 2.21e-012 *** BLACK -0.189632 0.0176266 -10.76 1.64e-026 *** SMSA76 0.161423 0.0155733 10.37 9.27e-025 *** SOUTH76 -0.124862 0.0151182 -8.259 2.18e-016 *** Mean dependent var 6.261832 S.D. dependent var 0.443798 Sum squared resid 420.4760 S.E. of regression 0.374191 R-squared 0.290505 Adjusted R-squared 0.289088 F(6, 3003) 204.9318 P-value(F) 1.5e-219 Log-likelihood -1308.702 Akaike criterion 2631.403 Schwarz criterion 2673.471 Hannan-Quinn 2646.532 Hackl, Econometrics, Lecture 5 36 Instruments for Si, Ei, Ei2 nPotential instrumental variables nFactors which affect schooling but are uncorrelated with error terms, in particular with unobserved abilities that are determining wage nFor years of schooling (Si) qCosts of schooling, e.g., distance to school (lived near college), number of siblings qParents’ education qQuarter of birth nFor years of experience (Ei, Ei2): age is natural candidate n Nov 27, 2015 Step 1 of IV Estimation Nov 27, 2015 Hackl, Econometrics, Lecture 5 37 Reduced form for schooling (ed76), gives predicted values ed76_h, Model 3: OLS, using observations 1-3010 Dependent variable: ED76 coefficient std. error t-ratio p-value ---------------------------------------------------------- const -1.81870 4.28974 -0.4240 0.6716 AGE76 1.05881 0.300843 3.519 0.0004 *** sq_AGE76 -0.0187266 0.00522162 -3.586 0.0003 *** BLACK -1.46842 0.115245 -12.74 2.96e-036 *** SMSA76 0.841142 0.105841 7.947 2.67e-015 *** SOUTH76 -0.429925 0.102575 -4.191 2.85e-05 *** NEARC4A 0.441082 0.0966588 4.563 5.24e-06 *** Mean dependent var 13.26346 S.D. dependent var 2.676913 Sum squared resid 18941.85 S.E. of regression 2.511502 R-squared 0.121520 Adjusted R-squared 0.119765 F(6, 3003) 69.23419 P-value(F) 5.49e-81 Log-likelihood -7039.353 Akaike criterion 14092.71 Schwarz criterion 14134.77 Hannan-Quinn 14107.83 Step 2 of IV Estimation Nov 27, 2015 Hackl, Econometrics, Lecture 5 38 Wage equation, estimated by IV with instruments age, age2, and nearc4a Model 4: OLS, using observations 1-3010 Dependent variable: l_WAGE76 coefficient std. error t-ratio p-value ---------------------------------------------------------- const 3.69771 0.435332 8.494 3.09e-017 *** ED76_h 0.164248 0.036887 4.453 8.79e-06 *** EXP76_h 0.044588 0.022502 1.981 0.0476 ** EXP762_h -0.000195 0.001152 -0.169 0.8655 BLACK -0.057333 0.056772 -1.010 0.3126 SMSA76 0.079372 0. 037116 2.138 0.0326 ** SOUTH76 -0.083698 0.022985 -3.641 0.0003 *** Mean dependent var 6.261832 S.D. dependent var 0.443798 Sum squared resid 446.8056 S.E. of regression 0.385728 R-squared 0.246078 Adjusted R-squared 0.244572 F(6, 3003) 163.3618 P-value(F) 4.4e-180 Log-likelihood -1516.471 Akaike criterion 3046.943 Schwarz criterion 3089.011 Hannan-Quinn 3062.072 GRETL’s TSLS Estimation Nov 27, 2015 Hackl, Econometrics, Lecture 5 39 Wage equation, estimated by IV Model 8: TSLS, using observations 1-3010 Dependent variable: l_WAGE76 Instrumented: ED76 EXP76 EXP762 Instruments: const AGE76 sq_AGE76 BLACK SMSA76 SOUTH76 NEARC4A coefficient std. error t-ratio p-value ---------------------------------------------------------- const 3.69771 0.495136 7.468 8.14e-014 *** ED76 0.164248 0.0419547 3.915 9.04e-05 *** EXP76 0.0445878 0.0255932 1.742 0.0815 * EXP762 -0.00019526 0.0013110 -0.1489 0.8816 BLACK -0.0573333 0.0645713 -0.8879 0.3746 SMSA76 0.0793715 0.0422150 1.880 0.0601 * SOUTH76 -0.0836975 0.0261426 -3.202 0.0014 *** Mean dependent var 6.261832 S.D. dependent var 0.443798 Sum squared resid 577.9991 S.E. of regression 0.438718 R-squared 0.195884 Adjusted R-squared 0.194277 F(6, 3003) 126.2821 P-value(F) 8.9e-143 Hackl, Econometrics, Lecture 5 40 Returns to Schooling: Summary of Estimates nEstimated regression coefficients and t-statistics n 1) 1) 1) 1) 1) 1) 1) 1) 1) n 1) The model differs from that used by Verbeek 1) Nov 27, 2015 OLS IV1) TSLS1) IV (M.V.) ed76 0.0740 0.1642 0.1642 0.1329 21.11 4.45 3.92 2.59 exp76 0.0836 0.0445 0.0446 0.0560 12.75 1.98 1.74 2.15 exp762 -0.0022 -0.0002 -0.0002 -0.0008 -7.05 -0.17 -0.15 -0.59 black -0.1896 -0. 0573 -0.0573 -0.1031 -10.76 -1.01 -0.89 -1.33 Hackl, Econometrics, Lecture 5 41 Some Comments nInstrumental variables (age, age2, nearc4a) nare relevant, i.e., have explanatory power for ed76, exp76, exp762 nWhether they are exogenous, i.e., uncorrelated with the error terms, is not answered nTest for exogeneity of regressors: Wu-Hausman test nEstimates of ed76-coefficient: nIV estimate: 0.13, i.e., 13% higher wage for one additional year of schooling; nearly the double of the OLS estimate (0.07); not in line with “ability bias” argument! ns.e. of IV estimate (0.04) much higher than s.e. of OLS estimate (0.004) nLoss of efficiency especially in case of weak instruments: R2 of model for ed76: 0.12; Corr{ed76, ed76_h} = 0.35 Nov 27, 2015 Contents nOLS Estimator Revisited nCases of Regressors Correlated with Error Term nInstrumental Variables (IV) Estimator: The Concept nIV Estimator: The Method nCalculation of the IV Estimator nAn Example nThe GIV Estimator nSome Tests Nov 27, 2015 Hackl, Econometrics, Lecture 5 42 From OLS to IV Estimation nLinear model yi = xi‘β + εi nOLS estimator: solution of the K normal equations n 1/N Σi(yi – xi‘b) xi = 0 nCorresponding moment conditions n E{εi xi} = E{(yi – xi‘β) xi} = 0 nIV estimator given R instrumental variables zi which may overlap with xi: based on the R moment conditions n E{εi zi} = E{(yi – xi‘β) zi} = 0 nIV estimator: solution of corresponding sample moment conditions n Nov 27, 2015 Hackl, Econometrics, Lecture 5 43 Number of Instruments nMoment conditions n E{εi zi} = E{(yi – xi‘β) zi} = 0 n one equation for each component of zi nzi possibly overlapping with xi nGeneral case: R moment conditions nSubstitution of expectations by sample averages gives R equations n n 1.R = K: one unique solution, the IV estimator; identified model 2. 2.R < K: infinite number of solutions, not enough instruments for a unique solution; under-identified or not identified model Nov 27, 2015 Hackl, Econometrics, Lecture 5 44 The GIV Estimator 3.R > K: more instruments than necessary for identification; over-identified model nFor R > K, in general, no unique solution of all R sample moment conditions can be obtained; instead: nthe weighted quadratic form in the sample moments n n with a RxR positive definite weighting matrix WN is minimized ngives the generalized instrumental variable (GIV) estimator n n n n Nov 27, 2015 Hackl, Econometrics, Lecture 5 45 The weighting matrix WN nWN: positive definite, order RxR nDifferent weighting matrices result in different consistent GIV estimators with different covariance matrices nFor R = K, the matrix Z’X is square and invertible; the IV estimator is (Z’X)-1Z’y for any WN nOptimal choice for WN? n Nov 27, 2015 Hackl, Econometrics, Lecture 5 46 GIV and TSLS Estimator nOptimal weighting matrix: WNopt = [1/N(Z’Z)]-1; corresponds to the most efficient IV estimator n nIf the error terms are heteroskedastic or autocorrelated, the optimal weighting matrix has to be adapted nRegression of each regressor, i.e., each column of X, on Z results in and n nThis explains why the GIV estimator is also called “two stage least squares” (TSLS) estimator: 1.First step: regress each column of X on Z 2.Second step: regress y on predictions of X n n n Nov 27, 2015 Hackl, Econometrics, Lecture 5 47 GIV Estimator and Properties nGIV estimator is consistent nThe asymptotic distribution of the GIV estimator, given IID(0, σε²) error terms, leads to n n which is used as approximate distribution in case of finite N nThe (asymptotic) covariance matrix of the GIV estimator is given by n n nIn the estimated covariance matrix, σ² is substituted by n n the estimate based on the IV residuals Nov 27, 2015 Hackl, Econometrics, Lecture 5 48 Contents nOLS Estimator Revisited nCases of Regressors Correlated with Error Term nInstrumental Variables (IV) Estimator: The Concept nIV Estimator: The Method nCalculation of the IV Estimator nAn Example nThe GIV Estimator nSome Tests Nov 27, 2015 Hackl, Econometrics, Lecture 5 49 Hackl, Econometrics, Lecture 5 50 Some Tests nFor testing nEndogeneity of regressors: Wu-Hausman test, also called Durbin-Wu-Hausman test, in GRETL: Hausman test nRelevance of potential instrumental variables: over-identifying restrictions test or Sargan test nWeak instruments, i.e., only weak correlation between endogenous regressor and instrument: Cragg-Donald test Nov 27, 2015 Hackl, Econometrics, Lecture 5 51 Wu-Hausman Test nFor testing whether one or more regressors are endogenous (correlated with the error term) nBased on the assumption that the instrumental variables are valid; i.e., given that E{εi zi} = 0, the null hypothesis E{εi xi} = 0 can be tested nThe idea of the test: nUnder the null hypothesis, both the OLS and IV estimator are consistent; they should differ by sampling errors only nRejection of the null hypothesis indicates inconsistency of the OLS estimator Nov 27, 2015 Hackl, Econometrics, Lecture 5 52 Wu-Hausman Test, cont’d nBased on the squared difference between OLS- and IV-estimators nAdded variable interpretation of the Wu-Hausman test: checks whether the residuals vi from the reduced form equation of potentially endogenous regressors contribute to explaining n yi = x1i’b1 + x2ib2 + viγ + εi nx2: potentially endogenous regressors nvi: residuals from reduced form equation for x2 (predicted values for x2: x2 + v) nH0: γ = 0; corresponds to: x2 is exogenous nFor testing H0: use of nt-test, if γ has one component, x2 is just one regressor nF-test, if more than 1 regressors are tested for exogeneity Nov 27, 2015 Hackl, Econometrics, Lecture 5 53 Wu-Hausman Test, cont’d nRemarks nTest requires valid instruments nTest has little power if instruments are weak or invalid nTest can be used to test whether additional instruments are valid Nov 27, 2015 Hackl, Econometrics, Lecture 5 54 Sargan Test nFor testing whether the instruments are valid nThe validity of the instruments zi requires that all moment conditions are fulfilled; for the R-vector zi, the R sums n n n must be close to zero nTest statistic n n has, under the null hypothesis, an asymptotic Chi-squared distribution with R-K df nCalculation of ξ: ξ = NRe2 using Re2 from the auxiliary regression of IV residuals ei = on the instruments zi Nov 27, 2015 Hackl, Econometrics, Lecture 5 55 Sargan Test, cont’d nRemarks nOnly R-K of the R moment conditions are “free”; in case of an identified model (R = K), all R moment conditions are fulfilled nThe test is also called over-identifying restrictions test nRejection implies: the joint validity of all moment conditions and hence of all instruments is not acceptable nThe Sargan test gives no indication of invalid instruments nTest whether a subset of R-R1 instruments is valid; R1 (>K) instruments are out of doubt: qCalculate ξ for all R moment conditions qCalculate ξ1 for the R1 moment conditions qUnder H0, ξ - ξ1 has a Chi-squared distribution with R-R1 df Nov 27, 2015 Hackl, Econometrics, Lecture 5 56 Cragg-Donald Test nWeak (only marginally valid) instruments, i.e., only weak correlation between endogenous regressor and instrument : nBiased IV estimates nInconsistent IV estimates nInappropriate large-sample approximations to the finite-sample distributions even for large N nDefinition of weak instruments: estimates are biased to an extent that is unacceptably large nNull hypothesis: instruments are weak, i.e., can lead to an asymptotic relative bias greater than some value b n Nov 27, 2015 Your Homework 1.Use the data set “schooling” of Verbeek for the following analyses based on the wage equation n log(wage76) = b1 + b2 ed76 + b3 exp76 + b4 exp762 n + b5 black + b6 momed + e a)Estimate the reduced form for ed76, including smsa66, sinmom14, south66, and mar76; assess the validity of the potential instruments; what indicate the correlation coefficients? b)Estimate, by means of the GRETL Instrumental variables (Two-Stage Least Squares …) procedure, the wage equation, using the instruments black, momed, sinmom14, smsa66, south76, and mar76; interpret the results including the Hausman test and the Sargan test. c)Compare the estimates for b2 (i) from the model in b), (ii) from the model with instruments black, momed, smsa66, south76, and age76, and (iii) with the OLS estimates. Nov 27, 2015 Hackl, Econometrics, Lecture 5 57 Your Homework, cont’d 2.For the model for consumption and income (slide 14 ff): a.Show that both yt and xt are endogenous: q E{yi εi} = E{xi εi} = σε²(1 – β2)-1 b.Derive the reduced form of the model Nov 27, 2015 Hackl, Econometrics, Lecture 5 58