Econometrics - Lecture 5
Endogeneity, Instru-mental Variables, IV Estimator


Contents
nOLS Estimator Revisited
nCases of Regressors Correlated with Error Term
nInstrumental Variables (IV) Estimator: The Concept
nIV Estimator: The Method
nCalculation of the IV Estimator
nAn Example
nThe GIV Estimator
nSome Tests
Dec 16, 2011
Hackl,  Econometrics, Lecture 5
2

OLS Estimator
Dec 16, 2011
Hackl,  Econometrics, Lecture 5
3
Linear model for yt
yi = xi'β + εi, i = 1, …, N  (or y = Xβ + ε)
given observations xik, k =1, …, K, of the regressor variables, error term εi
OLS estimator
b = (Σixi xi’)-1Σixi yi = (X’X)-1X’y
From
b = (Σixi xi’)-1Σixi yi = (Σixi xi’)-1Σixi xi‘ β + (Σixi xi’)-1Σixi εi
   = β + (Σixi xi’)-1Σixi εi = β + (X’X)-1 X’ε
follows
 E{b} = (Σixi xi’)-1Σixiyi = (Σixi xi’)-1Σixi xi‘ β + (Σixi xi’)-1Σixi εi
   = β + (Σixi xi’)-1 E{Σixi εi} = β + (X’X)-1 E{X’ε}

OLS Estimator, cont’d
Dec 16, 2011
Hackl,  Econometrics, Lecture 5
4
1.OLS estimator b is unbiased if
n(A1)  E{ε} = 0
nE{Σixi εi } = E{X’ε} = 0; is fulfilled if (A7) or a stronger assumption is true
q(A2)  {xi, i =1, …,N} and  {εi, i =1, …,N} are independent; is the strongest assumption
q(A10)  E{ε|X} = 0, i.e., X uninformative about E{εi} for all i (ε is conditional mean independent
of X); is implied by (A2)
q(A8) xi  and εi  are independent for all i (no contemporaneous dependence); is less strong than
(A2) and (A10)
q(A7) E{xi εi} = 0 for all i (no contemporaneous correlation); is even less strong than (A8)
q
q
n

OLS Estimator, cont’d
Dec 16, 2011
Hackl,  Econometrics, Lecture 5
5
2.OLS estimator b is consistent for β if
n(A8) xi and εi are independent for all i
n(A6) (1/N)Σi xi xi’ has as limit (N→∞) a nonsingular matrix Σxx
(A8) can be substituted by (A7) [E{xi εi} = 0 for all i, no contemporaneous correlation]
3.OLS estimator b is asymptotically normally distributed if (A6), (A8) and
n(A11) εi~ IID(0,σ²)
are true;
nfor large N, b follows approximately the normal distribution
   b ~a N{β, σ2(Σi xi xi’ )-1}
nUse White and Newey-West estimators for V{b} in case of heteroskedasticity and autocorrelation of
error terms, respectively
n

Hackl,  Econometrics, Lecture 5
6
Assumption (A7): E{xi εi} = 0 for all i
nImplication of (A7): for all i, each of the regressors is uncorrelated with the current error
term, no contemporaneous correlation
nStronger assumptions – (A2), (A10), (A8) – have same consequences
n(A7) guaranties unbiasedness and consistency of the OLS estimator
nIn reality, (A7) is not always true: alternative estimation procedures are required for
ascertaining consistency and unbiasedness
nExamples of situations with E{xi εi} ≠ 0:
nRegressors with measurement errors
nRegression on the lagged dependent variable with autocorrelated error terms (dynamic regression)
nEndogeneity of regressors
nSimultaneity
n
n
Dec 16, 2011

Contents
nOLS Estimator Revisited
nCases of Regressors Correlated with Error Term
nInstrumental Variables (IV) Estimator: The Concept
nIV Estimator: The Method
nCalculation of the IV Estimator
nAn Example
nThe GIV Estimator
nSome Tests
Dec 16, 2011
Hackl,  Econometrics, Lecture 5
7

Hackl,  Econometrics, Lecture 5
8
Regressor with Measurement Error
n yi = β1 + β2wi + vi
nwith white noise vi, V{vi} = σv², and E{vi|wi} = 0; conditional expectation of yi given wi :
E{yi|wi} = β1 + β2wi
nExample: wi: household income, yi: household savings
nMeasurement process: reported household income xi, may deviate from household income wi
n xi = wi + ui
n where ui is (i) white noise with V{ui} = σu², (ii) independent of vi, and (iii) independent of
wi
nThe model to be analyzed is
n yi = β1 + β2xi + εi  with εi = vi - β2ui
nE{xi εi} = - β2 σu² ≠ 0: requirement for consistency and unbiasedness is violated
nxi and εi are negatively (positively) correlated if β2 > 0 (β2 < 0)
n
n
Dec 16, 2011

Hackl,  Econometrics, Lecture 5
9
Measurement Error, cont‘d
nInconsistency of b2
n plim b2 = β2 + E{xi εi} / V{xi}
n
n
n
n β2 is underestimated
nInconsistency of b1
n plim (b1 - β1) = - plim (b2 - β2) E{xi}
n given E{xi} > 0 for the reported income: β1 is overestimated; inconsistency “carries over”
nThe model does not correspond to the conditional expectation of yi given xi:
n E{yi|xi} = β1 + β2xi - β2 E{ui|xi} ≠ β1 + β2xi
n as E{ui|xi} ≠ 0
n
Dec 16, 2011

Hackl,  Econometrics, Lecture 5
10
Dynamic Regression
nAllows to model dynamic effects of changes of x on y:
n yt = β1 + β2xt + β3yt-1 + εt
nOLS estimators are consistent if E{xt εt} = 0 and E{yt-1 εt} = 0
nAR(1) model for εt:
n εt = ρεt-1 + vt
n vt white noise with σv²
nFrom yt = β1 + β2xt + β3yt-1 + ρεt-1 + vt follows
n E{yt-1εt} = β3 E{yt-2εt} + ρ²σv²(1 - ρ²)-1
n i.e., yt-1 is correlated with εt
nOLS estimators not consistent
nThe model does not correspond to the conditional expectation of yt given the regressors xt and
yt-1:
n E{yt|xt, yt-1} = β1 + β2xt + β3yt-1 + E{εt |xt, yt-1}
Dec 16, 2011

Hackl,  Econometrics, Lecture 5
11
Omission of Relevant Regressors
nTwo models:
n yi = xi‘β + zi’γ + εi  (A)
n yi = xi‘β + vi  (B)
nTrue model (A), fitted model (B)
nOLS estimates bB of β from (B)
n
n
nOmitted variable bias: E{(Σi xi xi’)-1 Σi xi zi’}γ = E{(X’X)-1 X’Z}γ
nNo bias if (a) γ = 0 or if (b) variables in xi and zi are uncorrelated (orthogonal)
nOLS estimators are biased, if relevant regressors are omitted that are non-orthogonal, i.e.,
correlated with regressors in xi
Dec 16, 2011

Hackl,  Econometrics, Lecture 5
12
Unobserved Heterogeneity
nExample: Wage equation with yi: log wage, x1i: personal characteristics, x2i: years of schooling,
ui: abilities (unobservable)
n yi = x1i‘β1 + x2iβ2 + uiγ + vi
nModel for analysis (unobserved ui covered in error term)
n yi = xi‘β + εi
n with xi = (x1i‘, x2i)’, β = (β1‘, β2)’, εi = uiγ + vi
nGiven E{xi vi} = 0
n plim b = β + Σxx-1 E{xi ui} γ
nOLS estimators b are inconsistent if xi and ui are correlated (γ ≠ 0), e.g., if higher abilities
induce more years at school: estimator for β2 might be overestimated, hence effects of years at
school etc. are overestimated: “ability bias”
nUnobserved heterogeneity: observational units differ in other aspects than the ones that are
observable
n
Dec 16, 2011

Hackl,  Econometrics, Lecture 5
13
Endogenous Regressors
nRegressors correlated with error term: E{X‘ε} ≠ 0; are called endogenous
nEndogeneity bias
nFor many economic applications relevant
nOLS estimators b = β + (X‘X)-1X‘ε
qE{b} ≠ β, b is biased; bias E{(X‘X)-1X‘ε} difficult to assess
qplim b = β + Σxx-1q  with q = plim(N-1X‘ε)
nFor q = 0 (regressors and error term asymptotically uncorrelated), OLS estimators b are consistent
also in case of endogenous regressors
nFor q ≠ 0 (error term and at least one regressor asymptotically correlated): plim b ≠ β, the OLS
estimators b are not consistent
nExogenous regressors: with error term uncorrelated, all non-endogenous regressors
Dec 16, 2011

Hackl,  Econometrics, Lecture 5
14
Consumption Function
nAWM  data base, 1970:1-2003:4
nC: private consumption (PCR), growth rate p.y.
nY: disposable income of households (PYR), growth rate p.y.
n Ct = β1 + β2Yt + εt (A)
n β2: marginal propensity to consume, 0 < β2 < 1
nOLS estimates:
n Ĉt = 0.011 + 0.718 Yt
n with t = 15.55, R2 = 0.65, DW = 0.50
nIt: per capita investment (exogenous, E{It εt} = 0)
n Yt = Ct + It   (B)
nBoth Yt and Ct are endogenous: E{Ct εt} = E{Yt εt} = σε²(1 – β2)-1
nThe regressor Yt has an impact on Ct; at the same time Ct has an impact on Yt
Dec 16, 2011

Hackl,  Econometrics, Lecture 5
15
Simultaneous Equation Models
nIllustrated by the preceding consumption function:
nVariables Yt and Ct are simultaneously determined by equations (A) and (B)
nEquations (A) and (B) are the structural equations or the structural form of the simultaneous
equation model that describes both Yt and Ct
nThe coefficients β1 and β2 are behavioral parameters
nReduced form of the model: one equation for each of the endogenous variables Ct and Yt, with only
the exogenous variable It as regressor
nThe OLS estimators are biased and inconsistent
n
n
n
Dec 16, 2011

Hackl,  Econometrics, Lecture 5
16
Consumption Function, cont’d
nReduced form of the model:
n
n
n
n
n
nOLS estimator b2 from (A) is inconsistent; E{Yt εt} ≠ 0
n plim b2 = β2 + Cov{Yt εt} / V{Yt} = β2 + (1 – β2) σε²(V{It} + σε²)-1
n for 0 < β2 < 1, b2 overestimates β2
nThe OLS estimator b1 is also inconsistent
n
Dec 16, 2011

Contents
nOLS Estimator Revisited
nCases of Regressors Correlated with Error Term
nInstrumental Variables (IV) Estimator: The Concept
nIV Estimator: The Method
nCalculation of the IV Estimator
nAn Example
nThe GIV Estimator
nSome Tests
Dec 16, 2011
Hackl,  Econometrics, Lecture 5
17

Hackl,  Econometrics, Lecture 5
18
An Alternative Estimator
nModel
n yi = β1 + β2 xi + εi
n with E{ εi xi } ≠ 0, i.e., endogenous regressor xi : OLS estimators are biased and inconsistent
nInstrumental variable zi satisfying
1.Exogeneity: E{εi zi} = 0: is uncorrelated with error term
2.Relevance: Cov{xi , zi} ≠ 0: is correlated with endogenous regressor
nTransformation of model equation
n Cov{yi , zi } = β2 Cov{xi , zi} + Cov{εi , zi}
n gives
n
Dec 16, 2011

Hackl,  Econometrics, Lecture 5
19
IV Estimator for β2
nSubstitution of sample moments for covariances gives the instrumental variables (IV) estimator
n
n
n
nConsistent estimator for β2 given that the instrumental variable zi is valid , i.e., it is
qExogenous, i.e. E{εi zi} = 0
qRelevant, i.e. Cov{xi , zi} ≠ 0
nTypically, nothing can be said about the bias of an IV estimator; small sample properties are
unknown
nCoincides with OLS estimator for zi = xi
Dec 16, 2011

Hackl,  Econometrics, Lecture 5
20
Consumption Function, cont’d
nAlternative model: Ct = β1 + β2Yt-1 + εt
nYt-1 and εt are certainly uncorrelated; avoids risk of inconsistency due to correlated Yt and εt
nYt-1 is certainly highly correlated with Yt, is almost as good as regressor as Yt
nFitted model:
n Ĉ = 0.012 + 0.660 Y-1
n with t = 12.86, R2 = 0.56, DW = 0.79 (instead of Ĉ = 0.011 + 0.718 y with t = 15.55, R2 = 0.65,
DW = 0.50)
nDeterioration of t-statistic and R2 are price for improvement of the estimator
Dec 16, 2011

Hackl,  Econometrics, Lecture 5
21
IV Estimator: The Concept
nAlternative to OLS estimator
nAvoids inconsistency in case of endogenous regressors
nIdea of the IV estimator:
qReplace regressors which are correlated with error terms by regressors
nwhich are uncorrelated with the error terms
nwhich are (highly) correlated with the regressors that are to be replaced
qand use OLS estimation
nThe hope is that the IV estimator is consistent (and less biased) than the OLS estimator
nPrice: Deteriorated model fit as measured by, e.g., t-statistic, R2
Dec 16, 2011

Contents
nOLS Estimator Revisited
nCases of Regressors Correlated with Error Term
nInstrumental Variables (IV) Estimator: The Concept
nIV Estimator: The Method
nCalculation of the IV Estimator
nAn Example
nThe GIV Estimator
nSome Tests
Dec 16, 2011
Hackl,  Econometrics, Lecture 5
22

Hackl,  Econometrics, Lecture 5
23
IV Estimator: General Case
nThe model is
n yi = xi‘β + εi
n with V{εi} = σε² and
n E{εi xi}  ≠ 0
nat least one component of xi is correlated with the error term
nThe vector of instruments zi (with the same dimension as xi) fulfills
n E{εi zi}  = 0
nIV estimator based on the instruments zi
n
n
n
Dec 16, 2011

Hackl,  Econometrics, Lecture 5
24
IV Estimator: General Case, cont’d
nThe (asymptotic) covariance matrix of is given by
n
n
nIn the estimated covariance matrix, σ² is substituted by
n
n
n which is based on the IV residuals
nThe asymptotic distribution of IV estimators, given IID(0, σε²) error terms, leads to the
approximate distribution
n
n with the estimated covariance matrix
n
Dec 16, 2011

Hackl,  Econometrics, Lecture 5
25
Derivation of the IV Estimator
nThe model is
n yi = xi‘β + εt = x0i‘β0 + βKxKi + εi
n with x0i = (x1i, …, xK-1,i)’ containing the first K-1 components of xi, and E{εi x0i} = 0
nK-the component is endogenous: E{εi xKi}  ≠ 0
nThe instrumental variable zKi fulfills
n E{εi zKi}  = 0
nMoment conditions: K conditions to be satisfied by the coefficients, the K-th condition with
zKi instead of xKi:
n E{εi x0i} = E{(yi – x0i‘β0 – βKxKi) x0i} = 0  (K-1 conditions)
n E{εi zi}  = E{(yi – x0i‘β0 – βKxKi) zKi} = 0
nNumber of conditions – and corresponding linear equations – equals the number of coefficients to
be estimated
Dec 16, 2011

Hackl,  Econometrics, Lecture 5
26
Derivation of the IV Estimator, cont’d
nThe system of linear equations for the K coefficients β to be estimated can be uniquely solved for
the coefficients β: the coefficients β are said “to be identified”
nTo derive the IV estimators from the moment conditions, the expectations are replaced by sample
averages
n
n
n
nThe solution of the linear equation system – with zi’ = (x0i‘, zKi) – is
n
n
nIdentification requires that the KxK matrix Σi zi xi’ is finite and invertible; instrument zKi is
relevant when this is fulfilled
Dec 16, 2011

Contents
nOLS Estimator Revisited
nCases of Regressors Correlated with Error Term
nInstrumental Variables (IV) Estimator: The Concept
nIV Estimator: The Method
nCalculation of the IV Estimator
nAn Example
nThe GIV Estimator
nSome Tests
Dec 16, 2011
Hackl,  Econometrics, Lecture 5
27

Hackl,  Econometrics, Lecture 5
28
Calculation of the IV Estimator
nThe model in matrix notation,
n y = Xβ + ε
nThe IV estimator
n
n with zi obtained from xi by substituting instrumental variable(s) for all endogenous regressors
nCalculation in two steps:
1.Regression of the explanatory variables x1, …, xK – including the endogenous ones – on  the
columns of Z: fitted values
2.
2.Regression of y on the fitted explanatory variables:
3.
Dec 16, 2011

Hackl,  Econometrics, Lecture 5
29
Calculation of the IV Estimator, cont’d
nRemarks:
nThe KxK matrix Z’X = Σi zixi’ is required to be finite and invertible
nFrom
n
n
n it is obvious that the estimator obtained in the second step is the IV estimator
nHowever, the estimator obtained in the second step is more general; see below
nIn GRETL:  The sequence of buttons „Model > Instrumental variables > Two-Stage Least Squares…“
leads to the specification window with boxes (i) for the independent variables and (ii) for the
instruments
Dec 16, 2011

Hackl,  Econometrics, Lecture 5
30
Choice of Instrumental Variables
nInstrumental variable are required to be
nexogenous, i.e., uncorrelated with the error terms
nrelevant, i.e., correlated with the endogenous regressors
nInstruments
nmust be based on subject matter arguments, e.g., arguments from economic theory
nshould be explained and motivated
nmust show a significant effect in explaining an endogenous regressor
nChoice of instruments often not easy
nRegression of endogenous variables on instruments
nBest linear approximation of endogenous variables
nEconomic interpretation not of importance and interest
n
n
n
Dec 16, 2011

Contents
nOLS Estimator Revisited
nCases of Regressors Correlated with Error Term
nInstrumental Variables (IV) Estimator: The Concept
nIV Estimator: The Method
nCalculation of the IV Estimator
nAn Example
nThe GIV Estimator
nSome Tests
Dec 16, 2011
Hackl,  Econometrics, Lecture 5
31

Hackl,  Econometrics, Lecture 5
32
Example: Returns to Schooling
nHuman capital earnings function:
n wi = β1 + β2Si + β3Ei + β4Ei2 + εi
n with wi: log of individual earnings, Si: years of schooling, Ei: years of experience (Ei = agei -
Si – 6)
nEmpirically, more education implies higher income
nQuestion: Is this effect causal?
nIf yes, one year more at school increases wage by β2
nOtherwise, abilities may cause higher income and also more years at school; more years at school
do not increase wage
nIssue of substantial attention in literature
Dec 16, 2011

Hackl,  Econometrics, Lecture 5
33
Returns to Schooling
nWage equation: besides Si and Ei, additional explanatory variables like gender, regional, racial
dummies
nModel for analysis:
n wi = β1 + zi‘γ + β2Si + β3Ei + β4Ei2 + εi
n zi: observable variables besides Ei, Si
nzi is assumed to be exogenous, i.e., E{zi εi} = 0
nSi may be endogenous, i.e., E{Si εi} ≠ 0
qAbility bias: unobservable factors like intelligence, family background, etc. enable to more
schooling and higher earnings
qMeasurement error in measuring schooling
qEtc.
nWith Si, also Ei = agei – Si – 6 and Ei2 are endogenous
nOLS estimators may be inconsistent
n
Dec 16, 2011

Hackl,  Econometrics, Lecture 5
34
Returns to Schooling: Data
nVerbeek‘s data set “schooling”
nNational Longitudinal Survey of Young Men (Card, 1995)
nData from 3010 males, survey 1976
nIndividual characteristics, incl. experience, race, region, family background etc.
nHuman capital function
q log(wagei) = β1 + β2 edi + β3 expi + β3 expi² + εi
n with edi: years of schooling (Si), expi: years of experience (Ei)
nFurther explanatory variables: black: dummy for afro-american, smsa: dummy for living in
metropolitan area, south: dummy for living in the south
Dec 16, 2011

OLS Estimation
Dec 16, 2011
Hackl,  Econometrics, Lecture 5
35
OLS estimated wage function : Output from GRETL
Model 2: OLS, using observations 1-3010
Dependent variable: l_WAGE76
  Koeffizient   Std.-fehler   t-Quotient    P-Wert
   ----------------------------------------------------------
  const          4.73366      0.0676026      70.02    0.0000    ***
  ED76           0.0740090    0.00350544     21.11    2.28e-092 ***
  EXP76         0.0835958    0.00664779     12.57    2.22e-035 ***
  EXP762      -0.00224088   0.000317840    -7.050   2.21e-012 ***
  BLACK        -0.189632     0.0176266     -10.76    1.64e-026 ***
  SMSA76       0.161423     0.0155733      10.37    9.27e-025 ***
  SOUTH76   -0.124862     0.0151182      -8.259   2.18e-016 ***
Mean dependent var   6.261832   S.D. dependent var   0.443798
Sum squared resid    420.4760   S.E. of regression   0.374191
R-squared            0.290505   Adjusted R-squared   0.289088
F(6, 3003)           204.9318   P-value(F)           1.5e-219
Log-likelihood      -1308.702   Akaike criterion     2631.403
Schwarz criterion    2673.471   Hannan-Quinn         2646.532

Hackl,  Econometrics, Lecture 5
36
Instruments for Si, Ei, Ei2
nPotential instrumental variables
nFactors which affect schooling but are uncorrelated with error terms, in particular with
unobserved abilities that are determining wage
nFor years of schooling (Si)
qCosts of schooling, e.g., distance to school (lived near college), number of siblings
qParents’ education
qQuarter of birth
nFor years of experience (Ei, Ei2): age is natural candidate
n
Dec 16, 2011

Step 1 of IV Estimation
Dec 16, 2011
Hackl,  Econometrics, Lecture 5
37
Model for schooling (ed76), gives predicted values ed76_h, from GRETL
Model 3: OLS, using observations 1-3010
Dependent variable: ED76
             coefficient   std. error   t-ratio     p-value
  ----------------------------------------------------------
  const      -1.81870      4.28974       -0.4240   0.6716
  AGE76       1.05881      0.300843       3.519    0.0004    ***
  sq_AGE76 -0.0187266    0.00522162    -3.586    0.0003    ***
  BLACK      -1.46842      0.115245     -12.74     2.96e-036 ***
  SMSA76     0.841142     0.105841       7.947    2.67e-015 ***
  SOUTH76  -0.429925     0.102575      -4.191    2.85e-05  ***
  NEARC4A   0.441082     0.0966588      4.563    5.24e-06  ***
Mean dependent var   13.26346   S.D. dependent var   2.676913
Sum squared resid    18941.85   S.E. of regression   2.511502
R-squared            0.121520   Adjusted R-squared   0.119765
F(6, 3003)           69.23419   P-value(F)           5.49e-81
Log-likelihood      -7039.353   Akaike criterion     14092.71
Schwarz criterion    14134.77   Hannan-Quinn         14107.83

Step 2 of IV Estimation
Dec 16, 2011
Hackl,  Econometrics, Lecture 5
38
Wage equation, estimated by IV with instruments age, age2, and nearc4a
Model 4: OLS, using observations 1-3010
Dependent variable: l_WAGE76
             coefficient   std. error    t-ratio    p-value
  ----------------------------------------------------------
  const          3.69771      0.435332      8.494     3.09e-017 ***
  ED76_h      0.164248    0.036887      4.453    8.79e-06  ***
  EXP76_h    0.044588    0.022502     1.981    0.0476  **
  EXP762_h -0.000195   0.001152    -0.169    0.8655
  BLACK       -0.057333     0.056772     -1.010    0.3126
  SMSA76     0.079372     0. 037116      2.138    0.0326  **
  SOUTH76  -0.083698    0.022985     -3.641    0.0003  ***
Mean dependent var   6.261832   S.D. dependent var   0.443798
Sum squared resid    446.8056   S.E. of regression   0.385728
R-squared            0.246078   Adjusted R-squared   0.244572
F(6, 3003)           163.3618   P-value(F)           4.4e-180
Log-likelihood      -1516.471   Akaike criterion     3046.943
Schwarz criterion    3089.011   Hannan-Quinn         3062.072

GRETL’s TSLS Estimation
Dec 16, 2011
Hackl,  Econometrics, Lecture 5
39
Wage equation, estimated by IV: Output from GRETL
Model 8: TSLS, using observations 1-3010
Dependent variable: l_WAGE76
Instrumented: ED76 EXP76 EXP762
Instruments: const AGE76 sq_AGE76 BLACK SMSA76 SOUTH76 NEARC4A
             coefficient    std. error   t-ratio    p-value
  ----------------------------------------------------------
  const          3.69771       0.495136      7.468    8.14e-014 ***
  ED76          0.164248      0.0419547     3.915    9.04e-05  ***
  EXP76        0.0445878    0.0255932     1.742    0.0815    *
  EXP762    -0.00019526  0.0013110  -0.1489   0.8816
  BLACK      -0.0573333     0.0645713    -0.8879   0.3746
  SMSA76     0.0793715     0.0422150      1.880    0.0601    *
  SOUTH76  -0.0836975     0.0261426    -3.202    0.0014    ***
Mean dependent var   6.261832   S.D. dependent var   0.443798
Sum squared resid    577.9991   S.E. of regression   0.438718
R-squared            0.195884   Adjusted R-squared   0.194277
F(6, 3003)           126.2821   P-value(F)           8.9e-143

Hackl,  Econometrics, Lecture 5
40
Returns to Schooling: Summary of Estimates
nEstimated regression coefficients and  t-statistics
n
1)
1)
1)
1)
1)
1)
1)
1)
1)
n 1) The model differs from  that used by Verbeek
1)
Dec 16, 2011
OLS
IV1)
TSLS1)
IV (M.V.)
ed76
0.0740
0.1642
0.1642
0.1329
21.11
4.45
3.92
2.59
exp76
0.0836
0.0445
0.0446
0.0560
12.75
1.98
1.74
2.15
exp762
-0.0022
-0.0002
-0.0002
-0.0008
-7.05
-0.17
-0.15
-0.59
black
-0.1896
-0. 0573
-0.0573
-0.1031
-10.76
-1.01
-0.89
-1.33

Hackl,  Econometrics, Lecture 5
41
Some Comments
nInstrumental variables (age, age2, nearc4a)
nare relevant, i.e., have explanatory power for ed76, exp76, exp762
nWhether they are exogenous, i.e., uncorrelated with the error terms, is not answered
nTest for exogeneity of regressors: Wu-Hausman test
nEstimates of ed76-coefficient:
nIV estimate: 0.13, i.e., 13% higher wage for one additional year of schooling; nearly the double
of the OLS estimate (0.07); not in line with “ability bias” argument!
ns.e. of IV estimate (0.04) much higher than s.e. of OLS estimate (0.004)
nLoss of efficiency especially in case of weak instruments: R2 of model for ed76: 0.12; Corr{ed76,
ed76_h} = 0,35
Dec 16, 2011

Contents
nOLS Estimator Revisited
nCases of Regressors Correlated with Error Term
nInstrumental Variables (IV) Estimator: The Concept
nIV Estimator: The Method
nCalculation of the IV Estimator
nAn Example
nThe GIV Estimator
nSome Tests
Dec 16, 2011
Hackl,  Econometrics, Lecture 5
42

From OLS to IV Estimation
nLinear model yi = xi‘β + εi
nOLS estimator: solution of the K normal equations
n 1/N Σi(yi – xi‘β) xi = 0
nCorresponding moment conditions
n E{εi xi} = E{(yi – xi‘β) xi} = 0
nIV estimator given R instrumental variables zi which may overlap with xi: based on the R moment
conditions
n E{εi zi} = E{(yi – xi‘β) zi} = 0
nIV estimator: solution of corresponding sample moment conditions
n
Dec 16, 2011
Hackl,  Econometrics, Lecture 5
43

Number of Instruments
nMoment conditions
n E{εi zi} = E{(yi – xi‘β) zi} = 0
n one equation for each component of zi
nzi possibly overlapping with xi
nGeneral case: R moment conditions
nSubstitution of expectations by sample averages gives R equations
n
n
1.R = K: one unique solution, the IV estimator; identified model
2.
2.R < K: infinite number of solutions, not enough instruments for a unique solution;
under-identified or not identified model
Dec 16, 2011
Hackl,  Econometrics, Lecture 5
44

The GIV Estimator
3.R > K: more instruments than necessary for identification; over-identified model
nFor R > K, in general, no unique solution of all R sample moment conditions can be obtained;
instead:
nthe weighted quadratic form in the sample moments
n
n with a RxR positive definite weighting matrix WN is minimized
ngives the generalized instrumental variable (GIV) estimator
n
n
n
n
Dec 16, 2011
Hackl,  Econometrics, Lecture 5
45

The GIV Estimator, cont’d
nThe weighting matrix WN
nDifferent weighting matrices result in different consistent GIV estimators with different
covariance matrices
nFor R = K, the matrix Z’X is square and invertible; the IV estimator is (Z’X)-1Z’y for any WN
nOptimal choice for WN?
n
n
n
Dec 16, 2011
Hackl,  Econometrics, Lecture 5
46

GIV and TSLS Estimator
nOptimal weighting matrix: WNopt = [1/N(Z’Z)]-1; corresponds to the most efficient IV estimator
n
nIf the error terms are heteroskedastic or autocorrelated, the optimal weighting matrix has to be
adapted
nRegression of each regressor, i.e., each column of X, on Z results
in                                 and
n
nThis explains why the GIV estimator is also called “two stage least squares” (TSLS) estimator:
1.First step: regress each column of X on Z
2.Second step: regress y on predictions of X
n
n
n
Dec 16, 2011
Hackl,  Econometrics, Lecture 5
47

GIV Estimator and Properties
nGIV estimator is consistent
nThe asymptotic distribution of the GIV estimator, given IID(0, σε²) error terms, leads to the
approximate distribution
n
nThe (asymptotic) covariance matrix of is given by
n
n
nIn the estimated covariance matrix, σ² is substituted by
n
n the estimate based on the IV residuals
Dec 16, 2011
Hackl,  Econometrics, Lecture 5
48

Contents
nOLS Estimator Revisited
nCases of Regressors Correlated with Error Term
nInstrumental Variables (IV) Estimator: The Concept
nIV Estimator: The Method
nCalculation of the IV Estimator
nAn Example
nThe GIV Estimator
nSome Tests
Dec 16, 2011
Hackl,  Econometrics, Lecture 5
49

Hackl,  Econometrics, Lecture 5
50
Some Tests
nFor testing
nEndogeneity of regressors: Wu-Hausman test or Durbin-Wu-Hausman test
nRelevance of potential instrumental variables: over-identifying restrictions test or Sargan test
nWeak instruments: Cragg-Donald test
Dec 16, 2011

Hackl,  Econometrics, Lecture 5
51
Wu-Hausman Test
nFor testing whether one or more regressors are endogenous (correlated with the error term)
nBased on the assumption that the instrumental variables are valid; i.e., given that E{εi zi} = 0,
the null hypothesis, E{εi xi} = 0, can be tested
nThe idea of the test:
nUnder the null hypothesis, both the OLS and IV estimator are consistent; they should differ by
sampling errors only
nRejection of the null hypothesis indicates inconsistency of the OLS estimator
Dec 16, 2011

Hackl,  Econometrics, Lecture 5
52
Wu-Hausman Test, cont’d
nBased on the (squared) difference between OLS- and IV-estimators
nAdded variable interpretation of the Wu-Hausman test: checks whether the residuals vi from the
reduced form equation of potentially endogenous regressors contribute to explaining
n yi = x1i’b1 + x2ib2 + viγ + εi
nvi: residuals from reduced form equation for x2 (predicted values for x2: x2 + v)
nH0: γ = 0; corresponds to: x2 is exogenous
nFor testing H0: use of
nt-test, if γ has one component, x2 is just one regressor
nF-test, if more than 1 regressors are tested for exogeneity
Dec 16, 2011

Hackl,  Econometrics, Lecture 5
53
Wu-Hausman Test, cont’d
nRemarks
nTest requires valid instruments
nTest has little power if instruments are weak or invalid
nTest can be used to test whether additional instruments are valid
Dec 16, 2011

Hackl,  Econometrics, Lecture 5
54
Sargan Test
nFor testing whether the instruments are valid
nThe validity of the instruments zi requires that all moment conditions are fulfilled; for the
R-vector zi, the R sums
n
n
n must be close to zero
nTest statistic
n
n has under the null hypothesis an asymptotic Chi-squared distribution with R-K df
nCalculation of ξ: ξ = NRe2 using Re2 form the auxiliary regression of IV residuals
ei =                   on the instruments zi
Dec 16, 2011

Hackl,  Econometrics, Lecture 5
55
Sargan Test, cont’d
nRemarks
nOnly R-K of the R moment conditions are “free”; in case of an identified model (R = K), all R
moment conditions are fulfilled
nThe test is also called over-identifying restrictions test
nRejection implies: the joint validity of all moment conditions and hence of all instruments is not
acceptable
nThe Sargan test gives no indication of invalid instruments
nTest whether a subset of R-R1 instruments is valid; R1 (>K) instruments are out of doubt:
qCalculate ξ for all R moment conditions
qCalculate ξ1 for the R1 moment conditions
qUnder H0, ξ - ξ1 has a Chi-squared distribution with R-R1 df
Dec 16, 2011

Hackl,  Econometrics, Lecture 5
56
Cragg-Donald Test
nWeak (only marginally valid) instruments:
nBiased estimates
nInconsistent IV estimates
nInappropriate large-sample approximations to the ﬁnite-sample distributions even for large N
nDefinition of weak instruments: estimates are biased to an extent that is unacceptably large
nNull hypothesis: instruments are weak, i.e., can lead to an asymptotic relative bias greater than
some value b
n
Dec 16, 2011

Your Homework
1.Use the data set “schooling” of Verbeek for the following analyses based on the wage equation
n log(wage76) = b1 + b2 ed76 + b3 exp76 + b4 exp762
n + b5 black + b6 smsa76 + b7 south76 + b8 nearc4 + e
a.Estimate the reduced form for ed76, including daded and momed (i) with and (ii) without nearc4;
assess the validity of the potential instruments; what indicate the correlation coefficients?
b.Estimate the wage equation, using the instruments age, age2, daded, and momed (i) with and (ii)
without nearc4; interpret the results including the test for validity and the Sargan test.
c.Compare the estimates for b2 (i) from the model in b., (ii) from the model with instruments age,
age2, and nearc4, (iii) from the GRETL Instrumental variables (Two-Stage Least Squares …)
procedure, and (iv) with the OLS estimates.
Dec 16, 2011
Hackl,  Econometrics, Lecture 5
57

Your Homework, cont’d
2.For the model for consumption and income (slide 14 ff):
a.Show that both yt and xt are endogenous:
q E{yi εi} = E{xi εi} = σε²(1 – β2)-1
b.Derive the reduced form of the model
Dec 16, 2011
Hackl,  Econometrics, Lecture 5
58