Econometrics 2 - Lecture 6 Models Based on Panel Data Contents nPanel Data nPooling Independent Cross-sectional Data nPanel Data: Pooled OLS Estimation nPanel Data Models nFixed Effects Model nFixed Effects Model: More Estimators nRandom Effects Model nAnalysis of Panel Data Models nPanel Data in GRETL April 29, 2016 Hackl, Econometrics 2, Lecture 6 2 Example: Individual Wages nVerbeek’s data set “males” nSample of q545 full-time working males, end of schooling in 1980 qfrom each person: yearly data collection from 1980 till 1987 nVariables qwage: log of hourly wage (in USD) qschool: years of schooling qexper: age – 6 – school qdummies for union membership, married, black, Hispanic, public sector qothers n April 29, 2016 Hackl, Econometrics 2, Lecture 6 3 Types of Data nPopulations of interest: individuals, households, companies, countries nTypes of observations nCross-sectional data: Observations of all units of a population, or of a (representative) subset, at one specific point in time; e.g., wages in 1980 nTime series data: Series of observations on units of the population over a period of time; e.g., wages of a worker in 1980 through 1987 nPanel data (longitudinal data): Repeated observations of (the same) population units collected over a number of periods; data set with both a cross-sectional and a time series aspect; multi-dimensional data nCross-sectional and time series data are one-dimensional, special cases of panel data nPooling independent cross-sections: (only) similar to panel data n n n April 29, 2016 Hackl, Econometrics 2, Lecture 6 4 Data in GRETL nThree types of data structure nCross-sectional data: Matrix of observations, variables over the columns, each row corresponding to the set of variables observed for one unit nTime series data: Matrix of observations, each column a time series, rows correspond to observation periods (annual, quarterly, etc.) nPanel data: Matrix of observations with special data structure qStacked time series: each column one variable, with stacked time series corresponding to cross-sectional units qStacked cross sections: each column one variable, with stacked cross sections corresponding to observation periods qUse of index variables: index variables defined for units and observation periods n n April 29, 2016 Hackl, Econometrics 2, Lecture 6 5 Stacked Data: Examples n Index variables n Stacked time series n n n n n n n n n Stacked cross sections April 29, 2016 Hackl, Econometrics 2, Lecture 6 6 unit Year x1 x2 1:1 1 2009 1.197 252 1:2 1 2010 1.369 269 1:3 1 2011 1.675 275 ... ... ... ... ... 2:1 2 2009 1.220 198 2:2 2 2010 1.397 212 2:3 2 2011 1.569 275 ... ... ... ... ... unit year x1 x2 1:1 1 2009 1.197 252 2:1 2 2009 1.220 198 3:1 3 2009 1.173 167 ... ... ... ... ... 1:2 1 2010 1.369 269 2:2 2 2010 1.397 212 3:2 3 2010 1.358 201 ... ... ... ... ... Panel Data Files nFiles with one record per observation qFor each cross-sectional unit (individual, company, country, etc.) T records qStacked time series or stacked cross sections qAllows easy differencing qTime-constant variable: on each record the same value nFiles with one record per unit qEach record contains all observations for all T periods qTime-constant variables are stored only once n n April 29, 2016 Hackl, Econometrics 2, Lecture 6 7 Panel Data Files: Examples nVerbeek’s data set “males” n Stacked time series n Stacked time series n n n n n n nOne record n per unit n Stacked cross sections April 29, 2016 Hackl, Econometrics 2, Lecture 6 8 unit Year wage school black … 1 1980 1.197 14 0 … … … … … … … 1 1987 1.669 14 0 … 2 1980 1.676 13 0 … ... ... ... ... … … unit wage80 ... wage87 school black … 1 1.197 … 1.669 14 0 … 2 1.676 … 1.820 13 0 … 3 1.516 --- 2.873 12 1 … ... ... ... … … … … unit Year wage school black … 1 1980 1.197 14 0 … … … … … … … 545 1980 1.131 9 0 … 1 1981 1.676 14 0 … ... ... ... ... … … 545 1981 1.312 9 0 … … … … … … … Panel Data nTypically data at micro-economic level (individuals, households, firms), but also at macro-economic level (e.g., countries) nNotation: nN: Number of cross-sectional units nT: Number of time periods nTypes of panel data: nLarge T, small N: “long and narrow” nSmall T, large N: “short and wide” nLarge T, large N: “long and wide” n nExample: Data set “males”: short (T = 8) and wide (N = 545) panel (N » T) April 29, 2016 Hackl, Econometrics 2, Lecture 6 9 Panel Data: Some Examples nData set “males”: Wages and related variables nshort and wide panel (N = 545, T = 8) nrich in information (~40 variables) nGrunfeld investment data: Investments in plant and equipment by nN = 10 firms nfor each of T = 20 yearly observations for 1935-1954 nPenn World Table: Purchasing power parity and national income accounts for nN = 189 countries/territories nfor some or all of the years 1950-2011 (T ≤ 62) n n n n April 29, 2016 Hackl, Econometrics 2, Lecture 6 10 Use of Panel Data nEconometric models for describing the behaviour of cross-sectional units over time nPanel data models nAllow controlling individual differences, comparing behaviour, analysing dynamic adjustment, measuring effects of policy changes nMore realistic models than cross-sectional and time-series models nAllow more detailed or sophisticated research questions nMethodological implications nDependence of sample units in time-dimension nSome variables might be time-constant (e.g., variable school in “males”, population size in the Penn World Table dataset) nMissing values n April 29, 2016 Hackl, Econometrics 2, Lecture 6 11 Contents nPanel Data nPooling Independent Cross-sectional Data nPanel Data: Pooled OLS Estimation nPanel Data Models nFixed Effects Model nFixed Effects Model: More Estimators nRandom Effects Model nAnalysis of Panel Data Models nPanel Data in GRETL April 29, 2016 Hackl, Econometrics 2, Lecture 6 12 Example: Wages and Experience nData set “males” nIndependent random samples for 1980 and 1987 nN80 = N87 = 100 nVariables: wage (log of hourly wage), exper (age – 6 – years of schooling) n n n n n n April 29, 2016 Hackl, Econometrics 2, Lecture 6 13 1980 1987 Full set sample Full set sample wage mean 1.39 1.37 1.87 1.89 st.dev. 0.558 0.598 0.467 0.475 exper mean 3.01 2.96 10.02 9.99 st.dev. 1.65 1.29 1.65 1.85 exp(wage) 4.01 3.94 6.49 6.62 Pooling of Samples nIndependent random samples: nPooling gives an independently pooled cross section nOLS estimates with higher precision, tests with higher power nRequires qthe same distributional properties of sampled variables qthe same relation between variables in the samples April 29, 2016 Hackl, Econometrics 2, Lecture 6 14 Example: Wage and Experience nSome wage equations (coefficients in bold letters: p<0.05): n1980 data n wage = 1.315 + 0.026*exper, R2 = 0.006 n1987 data n wage = 2.441 – 0.057*exper, R2 = 0.041 npooled 1980 and 1987 data n wage = 1.289 + 0.052*exper, R2 = 0.128 npooled data with dummy d87 n wage = 1.441 – 0.016*exper + 0.583*d87, R2 = 0.177 npooled sample with dummy d87 and interaction n wage = 1.315 + 0. 026*exper + 1.126*d87 – 0.083*d87*exper nd87: dummy for observations from 1987 n n n April 29, 2016 Hackl, Econometrics 2, Lecture 6 15 Wage Equations nWage equations, dependent variable: wage (log of hourly wage) n n n n n n n n n n nCoefficients in bold letters: p<0.05 April 29, 2016 Hackl, Econometrics 2, Lecture 6 16 1980 1987 80+87 80+87 80+87 Interc. coeff 1.315 2.441 1.289 1.441 1.315 s.e. 0.050 0.120 0.031 0.036 0.045 exper coeff 0.026 -0.057 0.052 -0.016 0.026 s.e. 0.014 0.012 0.004 0.009 0.013 d87 coeff 0.583 1.126 s.e. 0.073 0.141 d87*exper coeff -0.083 s.e. 0.019 R2 (%) 0.6 4.1 12.8 17.7 19.2 Pooled Independent Cross-sectional Data nPooling of two independent cross-sectional samples n yit = β1 + β2xit + εit for i = 1,...,N (units), t = 1,2 (time points) nImplicit assumption: identical β1, β2 for i = 1,...,N, t = 1,2 nOLS-estimation: requires qhomoskedastic and uncorrelated εit n E{εit} = 0, Var{εit} = σ2 for i = 1,...,N, t = 1,2 n Cov{εi1, εj2} = 0 for all i, j with i ≠ j qexogenous xit nFor the analysis of panel data, often a more realistic model is needed, taking into consideration nchanging coefficients ncorrelated error terms nendogenous regressors April 29, 2016 Hackl, Econometrics 2, Lecture 6 17 Model with Time Dummy nModel for pooled independent cross-sectional data in presence of changes: nDummy variable d: indicator for t = 2 (dt=0 for t=1, dt=1 for t=2) n yit = β1 + β2 xit + β3 dt + β4 dt*xit + εit n allows changes (from t =1 to t = 2) qof intercept from β1 to β1 + β3 qof coefficient of x from β2 to β2 + β4 nTests for constancy of (1) the intercept or (2) the intercept and slope over time (cf. Chow test) n H0(1): β3 = 0 or H0(2): β3 = β4 = 0 nSimilarly testing for constancy of σ2 over time nGeneralization to more than two time periods n April 29, 2016 Hackl, Econometrics 2, Lecture 6 18 Example: Wages and Experience nWage equation n wageit = β1 + β2 experit + β3 dt + εit nWages might depend also on other variables; omitted variables are covered by the error term nblack: time-constant variable, omission may cause autocorrelation of error terms; similar other time-constant factors like hisp nmar (married): (not for all) units time-constant variable, similar rural, union, ne (living in north east), etc.; omission may cause autocorrelation nschool: omission may cause endogeneity of exper; Corr(school, exper) = -0.34 nUnobserved and unobservable variables can have similar effects, e.g., parental background, attitudes, etc. n n n n n April 29, 2016 Hackl, Econometrics 2, Lecture 6 19 Problems with Sample Pooling nThe analysis of the data (yit, xit), i = 1,...,N, t = 1,2, by OLS estimation of the parameters of model n yit = β1 + β2 xit + εit n (or extensions based on a year dummy for t=2) may not fulfil usual requirements nThe independence assumption across time may be unrealistic nMain reason: effects of non-measured and non-measurable variables are only covered by the error terms nExogeneity of regressors may be unrealistic nConsequences: OLS-estimates nbiased and inconsistent nnot efficient nPanel data models allow more adequate analyses April 29, 2016 Hackl, Econometrics 2, Lecture 6 20 Contents nPanel Data nPooling Independent Cross-sectional Data nPanel Data: Pooled OLS Estimation nPanel Data Models nFixed Effects Model nFixed Effects Model: More Estimators nRandom Effects Model nAnalysis of Panel Data Models nPanel Data in GRETL April 29, 2016 Hackl, Econometrics 2, Lecture 6 21 Models for Panel Data nModel for y, based on panel data from N cross-sectional units and T periods n yit = β0 + xit'β1 + εit n i = 1, ..., N: sample unit n t = 1, ..., T: time period of sample n xit and β1: K-vectors nβ0 and β1: represent intercept and K regression coefficients; are assumed to be identical for all units and all time periods nεit: represents unobserved factors that may affect yit qAssumption that εit are uncorrelated over time not realistic; refer to the same unit or individual qStandard errors of OLS estimates misleading, OLS estimation not efficient relative to estimators that exploit the dependence structure of εit over time n April 29, 2016 Hackl, Econometrics 2, Lecture 6 22 Random Effects Model nStarting point is again the model n yit = β0 + xit'β1 + εit n with composite error εit = αi + uit nSpecification for the error terms: quit ~ IID(0, σu2); homoskedastic, uncorrelated over time qαi ~ IID(0, σa2); represents all unit-specific, time-constant factors; correlation of error terms over time only via the αi qαi and uit are assumed to be mutually independent; uit is assumed to be independent of xjt; αi and xit may be correlated nRandom effects (RE) model n yit = β0 + xit'β1 + αi + uit nUnbiased and consistent (N → ∞) estimation of β0 and β1 nEfficient estimation of β0 and β1: takes error covariance structure into account; GLS estimation n n April 29, 2016 Hackl, Econometrics 2, Lecture 6 23 Fixed Effects Model nThe general model n yit = β0 + xit'β1 + εit nSpecification for the error terms: two components n εit = αi + uit qαi fixed, unit-specific, time-constant factors, also called unobserved (individual) heterogeneity quit ~ IID(0, σu2); homoskedastic, uncorrelated over time; represents unobserved factors that change over time, also called idiosyncratic or time-varying error qεit : also called composite error nFixed effects (FE) model n yit = Σj αi dij + xit'β1 + uit n dij: dummy variable for unit i: dij = 1 if i = j, otherwise dij = 0 nOverall intercept β0 omitted; unit-specific intercepts αi n n April 29, 2016 Hackl, Econometrics 2, Lecture 6 24 Examples for Fixed- and Random-effects nGrunfeld investment data: Investment model n Iit = αi + βi1Fit + βi2Cit + uit qwith Fit: market value, Cit: value of stock of plant and equipment, both of firm i at the end of year t-1 qN = 10 firms, T = 20 yearly observations qFixed effects αi allow for firm-specific, time-constant factors nWage equation n wageit = β1 + β2 experit + β3 exper2it + β4 schoolit + β5 unionit n + β6 marit + β7 blackit + β8 ruralit + αi + uit n with composite error εit = αi + uit qαi: unit-specific parameter for each of 545 units qTime-constant factors αi: stochastic variables with identical distribution qRegressors are uncorrelated with uit April 29, 2016 Hackl, Econometrics 2, Lecture 6 25 Contents nPanel Data nPooling Independent Cross-sectional Data nPanel Data: Pooled OLS Estimation nPanel Data Models nFixed Effects Model nFixed Effects Model: More Estimators nRandom Effects Model nAnalysis of Panel Data Models nPanel Data in GRETL April 29, 2016 Hackl, Econometrics 2, Lecture 6 26 Fixed Effects (FE) Model nModel for y, based on panel data for T periods n yit = αi + xit'β + uit , uit ~ IID(0, σu2) n i = 1, ..., N: sample unit n t = 1, ..., T: time period of sample nαi: fixed parameter, represents all unit-specific, time-constant factors, unobserved (individual) heterogeneity nxit: K-vector, all K components are assumed to be independent of all uit; strictly exogenous nRegression model with dummies dij = 1 for i = j and 0 otherwise: n yit = Σj αi dij + xit'β + uit nNumber of coefficients (α1,..., αN and β): N + K nMain interest: estimators for β April 29, 2016 Hackl, Econometrics 2, Lecture 6 27 FE Model Parameters: Estimation nFE model with dummies dij = 1 for i = j and 0 otherwise: n yit = Σj αi dij + xit'β + uit n Number of coefficients: N + K nVarious estimation procedures nLeast squares dummy variable (LSDV) estimator nWithin or fixed effects estimator nFirst-difference estimator nA special case nDifferences-in-differences (DD or DID or D-in-D) estimator April 29, 2016 Hackl, Econometrics 2, Lecture 6 28 Least Squares Dummy Variable (LSDV) Estimator nEstimation procedure for N + K parameters β and αi of the FE model n yit = Σj αi dij + xit'β + uit nOLS estimation of α1,..., αN and β nNT observations for estimating N + K coefficients nNumerically costly, not attractive nEstimates for αi usually not of interest nFixed effects and first-difference estimators are more attractive April 29, 2016 Hackl, Econometrics 2, Lecture 6 29 Example: Data Set “males” nPanel data set nNumber of cross-sectional units N = 545 nNumber of time periods T = 8 nNumber of parameters in a FE model: nαi, i = 1, ..., 545: unit-specific fixed parameters nβi, i = 1, ..., K: coefficients of regressors nFor the model n wageit = β1 + β2 experit + β3 exper2it + β4 schoolit + β5 unionit n + β6 marit + β7 blackit + β8 ruralit + εit n 553 coefficients need to be estimated on the basis of 4360 observations n n n n n n n April 29, 2016 Hackl, Econometrics 2, Lecture 6 30 Fixed Effects Estimation n“Within transformation”: transforms yit into time-demeaned ÿit by subtracting the average ӯi = (Σt yit )/T: n ÿit = yit - ӯi n analogously ẍit and üit, for i = 1,...,N, t = 1, ..., T nSubstracting from yit = αi + xit’β + uit the model in averages, n ӯi = αi + ẋi'β + ūi n with averages ẋi and ūi gives the model in time-demeaned variables n ÿit = ẍit'β + üit nPooled OLS estimator bFE for β nbFE: “fixed effects estimator”, also called “within estimator” nUses time variation in y and x within each cross-sectional unit; explains deviations of yit from ӯi (not of ӯi from ӯj!) n nGRETL: Model > Panel > Fixed or random effects ... n n April 29, 2016 Hackl, Econometrics 2, Lecture 6 31 The Fixed Effects Estimator nFE model n yit = αi + xit'β + uit , uit ~ IID(0, σu2) n xit are assumed to be independent of all uit nEstimation of β from the model in time-demeaned variables n ÿit = ẍit'β + üit n gives n bFE = (Σj Σt ẍit ẍit')-1Σj Σt ẍit ÿit nTime-demeaning differences away time-constant factors αi nUnder the assumption that xit are independent of all uit, i.e., for all i and t: bFE is unbiased and consistent nbFE coincides with LSDV estimator n n April 29, 2016 Hackl, Econometrics 2, Lecture 6 32 Wage Equations nWage equations, dependent variable: wage (log of hourly wage) n n n n n n n n n n April 29, 2016 Hackl, Econometrics 2, Lecture 6 33 Pooled 80+87 FE 80+87 FE 80+87 FE 80+87 FE 80...87 Interc. coeff 1.289 1.285 1.432 1.307 1.237 s.e. 0.031 0.031 0.036 0.045 0.016 exper coeff 0.052 0.053 -0.013 0.029 0.063 s.e. 0.004 0.004 0.009 0.013 0.002 d87 coeff 0.564 1.107 s.e. 0.073 0.141 d87*exper coeff -0.083 s.e. 0.019 adjR2 (%) 12.8 13.7 18.1 19.5 55.6 Properties of Fixed Effects Estimator April 29, 2016 Hackl, Econometrics 2, Lecture 6 34 bFE = (ΣiΣt ẍit ẍit')-1 ΣiΣt ẍit ÿit nUnbiased if all xit are independent of all uit nNormally distributed if normality of uit is assumed nConsistent (for N → ∞) if xit are strictly exogenous, i.e., E{xit uis} = 0 for all s, t nAsymptotically normally distributed nCovariance matrix V{bFE} = σu2(ΣiΣt ẍit ẍit')-1 nEstimated covariance matrix: substitution of σu2 by su2 = (ΣiΣt ῦitῦit)/[N(T-1)] with the residuals ῦit = ÿit - ẍit'bFE nAttention! The standard OLS estimate of the covariance matrix underestimates the true values Estimator for αi nTime-constant factors αi, i = 1, ..., N nEstimates based on the fixed effects estimator bFE n ai = ӯi - ẋi'bFE n with averages over time ӯi and ẋi for the i-th unit nConsistent (for T → ∞) if xit are strictly exogenous nPotentially interesting aspects of estimates ai qDistribution of the ai , i = 1, ..., N qValue of ai for unit i of special interest April 29, 2016 Hackl, Econometrics 2, Lecture 6 35 Wage Equations, 1980-1987 nDependent variable: wage (log of hourly wage) n n n n n n n n n n April 29, 2016 Hackl, Econometrics 2, Lecture 6 36 F.E. OLS Intercept 1.072 1.177 exper 0.118*** 0.115*** exper2 -0.004*** -0.006*** mar 0.047*** 0.186*** rural 0.051* -0.181*** adjR2 (%) 56.33 9.30 Contents nPanel Data nPooling Independent Cross-sectional Data nPanel Data: Pooled OLS Estimation nPanel Data Models nFixed Effects Model nFixed Effects Model: More Estimators nRandom Effects Model nAnalysis of Panel Data Models nPanel Data in GRETL April 29, 2016 Hackl, Econometrics 2, Lecture 6 37 The First-Difference Estimator April 29, 2016 Hackl, Econometrics 2, Lecture 6 38 Elimination of time-constant factors αi by differencing ∆yit = yit – yi,t-1 = ∆xit'β + ∆uit ∆xit and ∆uit analogously defined to ∆yit = yit – yi,t-1 First-difference estimator: OLS estimation bFD = (ΣiΣt ∆xit ∆xit')-1 ΣiΣt ∆xit ∆yit Properties nConsistent (for N → ∞) under slightly weaker conditions than bFE nSlightly less efficient than bFE due to serial correlations of the ∆uit nFor T = 2, bFD and bFE coincide Wage Differences 1980 - 1987 nEffect of ethnicity nwage (log of hourly wage) : from 1.419 (1980) to 1.892 (1987) ni.e., increase of hourly wage from USD 4.13 (1980) to 6.63 (1987), i.e., 60.5% nDoes the wage increase depend on ethnicity? nDummy blackit = 1 if i-th person is afro-american, blackit = 0 otherwise nModel for wage: n wageit = μt + αi + uit, i =1,...,N, t = 1980, 1987 nαi: time-constant factors, e.g., schooling, rural, industry, etc. nModel for differences with μ0 = μ1987 – μ1980 n ∆wageit = μ0 + δ blackit + ∆uit n n n n n n n n n nAt least the intercept changes from 1980 to 1987 April 29, 2016 Hackl, Econometrics 2, Lecture 6 39 Wage Differences, cont’d nIncrease of wage (log of hourly wage) n ∆wageit = μ0 + δ blackit + ∆uit n OLS-estimation gives (N = 545, 63 afro-americans) n n n nIncrease in wage (log of hourly wage) and in hourly wages n April 29, 2016 Hackl, Econometrics 2, Lecture 6 40 μ0 δ adj R2 Estimate 0.491 -0.154 0.47 Std.err. 0.027 0.081 μ0 μ0+ δ all black = 0 black = 1 Increase in wage (average) 0.491 0.337 0.473 Ratio of hourly wages 1.634 1.401 1.605 Increase of hourly wages (%) 63.4 40.1 60.5 Differences-in-Differences Estimator April 29, 2016 Hackl, Econometrics 2, Lecture 6 41 Natural experiment or quasi-experiment: nExogenous event or treatment, e.g., a training, a new law, a change in operating conditions nTreatment group, control group nAssignment to groups not (like in a true experiment) at random nData: before treatment, after treatment Assessment of treatment based on response variable y nCompare y of treatment group with y of control group nCompare y before and after treatment nPanel data allow both comparisons at once Differences-in-Differences Estimator, cont’d April 29, 2016 Hackl, Econometrics 2, Lecture 6 42 Model for response yit of unit i (=1,...,N) before (t = 1) and after (t = 2) the treatment yit = δrit + μt + αi + uit ndummy ri = 1 if i-th unit receives treatment in t, ri = 0 otherwise nδ: treatment effect, the parameter in focus nαi: time-constant factors of i-th unit nμt: time-specific fixed effects Fixed effects model (for differencing away time-constant factors): ∆yi = yi2 – yi1 = δri + μ0 + vi with nvi = ui2 – ui1: error term nμ0 = μ2 – μ1, the time-specific fixed effects Estimator of Treatment Effect April 29, 2016 Hackl, Econometrics 2, Lecture 6 43 Effect of treatment (event) by comparing units qwith and without treatment qbefore and after treatment Model for panel data yit yit = δrit + μt + αi + uit, i =1,...,N, t = 1 (before), 2 (after event) Differences-in-differences (DD or DID or D-in-D) estimator of treatment effect δ dDD = ∆ӯtreated - ∆ӯuntreated ∆ӯtreated: average difference yi2 – yi1 of treatment group units ∆ӯuntreated: average difference yi2 – yi1 of control group units nTreatment effect δ measured as difference between changes of y with and without treatment nAllows for correlation between time-constant factors αi and rit Contents nPanel Data nPooling Independent Cross-sectional Data nPanel Data: Pooled OLS Estimation nPanel Data Models nFixed Effects Model nFixed Effects Model: More Estimators nRandom Effects Model nAnalysis of Panel Data Models nPanel Data in GRETL April 29, 2016 Hackl, Econometrics 2, Lecture 6 44 Random Effects Model April 29, 2016 Hackl, Econometrics 2, Lecture 6 45 Model yit = β0 + xit'β + αi + uit , uit ~ IID(0, σu2) nTime-constant factors αi: stochastic variables, independently and identically distributed over all units, may show correlation over time αi ~ IID(0, σa2) nAttention! More information about αi than in the fixed effects model nαi + uit: error term with two components qUnit-specific component αi, time-constant qRemainder uit, assumed to be uncorrelated over time nαi, uit: mutually independent, independent of xjs for all j and s nOLS estimators for β0 and β are unbiased, consistent, not efficient (see next slide) n Remember the GLS Estimator April 29, 2016 Hackl, Econometrics 2, Lecture 6 46 Model y = Xβ + ε with E{ε|X} = 0 V{ε|X} = σ2 Ψ GLS estimator bGLS = (X ' Ψ-1 X)-1 X ' Ψ-1 y with V{bGLS} = (X ' Ψ-1X)-1 GLS Estimator April 29, 2016 Hackl, Econometrics 2, Lecture 6 47 αi iT + ui: T-vector of error terms for i-th unit, T-vector iT = (1, ..., 1)’ Ω = Var{αiiT + ui}: Covariance matrix of αiiT + ui Ω = σa2 iT iT' + σu2IT Inverted covariance matrix for data from i-th unit Ω-1 = σu-2{[IT – σa2/(σu2+Tσa2)(iTiT')} = σu-2{[IT –(iTiT')/T]+ψ (iTiT')/T} with ψ = σu2/(σu2 + Tσa2) (iTiT')/T: transforms into averages; e.g., (iTiT') (yi1, ..., yiT)'/T = ӯi iT IT – (iTiT')/T: transforms into deviations from average GLS estimator bGLS = [ΣiΣtẍitẍit'+ψTΣi(ẋi –ẋ)(ẋi –ẋ)']-1[ΣiΣtẍitÿit+ψTΣi(ẋi –ẋ)(ӯi –ӯ)] with ndeviations from average ÿit = yit – ӯi, analogous ẍit naverages ӯi over all t, analogous ẋi naverages ӯ over all t and i, analogous ẋ n GLS Estimator, cont’d April 29, 2016 Hackl, Econometrics 2, Lecture 6 48 GLS estimator bGLS = [ΣiΣtẍitẍit'+ψTΣi(ẋi –ẋ)(ẋi –ẋ)']-1[ΣiΣtẍitÿit+ψTΣi(ẋi –ẋ)(ӯi –ӯ)] with the average ӯ over all i and t, analogous ẋ nψ = 0: bGLS coincides with bFE bFE = (ΣiΣt ẍit ẍit')-1 ΣiΣt ẍit ÿit nfor growingT, ψ → 0: bGLS and bFE equivalent for large T nψ = 1 (σa2= 0): bGLS coincides with the OLS estimators for β0 and β Between Estimator April 29, 2016 Hackl, Econometrics 2, Lecture 6 49 Model for individual means ӯi and ẋi: ӯi = β0 + ẋi'β + αi + ūi , i = 1, ..., N OLS estimator bB = [Σi(ẋi –ẋ)(ẋi –ẋ)']-1Σi(ẋi –ẋ)(ӯi –ӯ) is called the between estimator nConsistent if xit strictly exogenous, uncorrelated with αi nDescribes the relation between the units, discarding the time series information of the data nVariance of the regression error terms αi + ūi is σB2 = σa2 + (1/T)σu2 GLS Estimator: A Linear Combination April 29, 2016 Hackl, Econometrics 2, Lecture 6 50 GLS estimator bGLS = [ΣiΣtẍitẍit'+ψTΣi(ẋi –ẋ)(ẋi –ẋ)']-1[ΣiΣtẍitÿit+ψTΣi(ẋi –ẋ)(ӯi –ӯ)] can be written as bGLS = ∆bB + (IK - ∆)bFE i.e., a matrix-weighted average of between estimator bB and within estimator bFE ∆: (KxK) weighting matrix, proportional to the inverse of Var{bB} qThe more accurate bB the more weight has bB in bGLS qbGLS: optimal combination of bB and bFE, more efficient than bB and bFE n GLS Estimator: Properties April 29, 2016 Hackl, Econometrics 2, Lecture 6 51 GLS estimator bGLS = [ΣiΣtẍitẍit'+ψTΣi(ẋi –ẋ)(ẋi –ẋ)']-1[ΣiΣtẍitÿit+ψTΣi(ẋi –ẋ)(ӯi –ӯ)] nUnbiased, if xit are independent of all αi and uit nConsistent for N or T or both tending to infinity if qE{ẍit αi} = 0 qE{ẍit uit} = 0, E{ẋi uit} = 0 qThese conditions are required also for consistency of bB nMore efficient than the between estimator bB and the within estimator bFE; also more efficient than the OLS estimator nOLS estimator: also a linear combination of between estimator bB and within estimator bFE, not efficient Random Effects Estimator April 29, 2016 Hackl, Econometrics 2, Lecture 6 52 Calculation of bGLS from the transformed model yit – ϑӯi = β0(1 – ϑ) + (xit – ϑẋi)'β + vit with ϑ = 1 – ψ1/2, ψ = σu2/(σu2 + Tσa2) qquasi-demeaned yit – ϑӯi and xit – ϑẋi qvit ~ IID(0, σv2) over units and time Feasible GLS or EGLS or Balestra-Nerlove estimator Balestra-Nerlove Estimator April 29, 2016 Hackl, Econometrics 2, Lecture 6 53 The model yit – ϑӯi = β0(1 – ϑ) + (xit – ϑẋi)'β + vit, vit ~ IID(0, σv2) with ϑ = 1 – ψ1/2 fulfils Gauss-Markov conditions Two step estimator: 1.Step 1: Transformation parameter ψ calculated from (method by Swamy & Arora) qwithin estimation: su2 = (ΣiΣt ῦitῦit)/[N(T-1)] qbetween estimation: sB2 = (1/N)Σi (ӯi – b0B – ẋi'bB)2 = sa2+(1/T)su2 qsa2 = sB2 – (1/T)su2 2.Step 2: qCalculation of d =1 – [su2/(su2 + Tsa2)]1/2 for parameter ϑ qTransformation of yit and xit into yit – dӯi and xit – dẋi qOLS estimation gives the random effect estimator bRE for β Random Effects Estimator bRE: Properties April 29, 2016 Hackl, Econometrics 2, Lecture 6 54 EGLS estimator of β from yit – ϑӯi = β0(1 – ϑ) + (xit – ϑẋi)'β + vit nCovariance matrix Var{bRE} = σu2[ΣiΣt ẍit ẍit' + ψTΣi(ẋi –ẋ)(ẋi –ẋ)']-1 nMore efficient than the within estimator bFE (if ψ > 0) nAsymptotically normally distributed under weak conditions Wage Equations, 1980-1987 nDependent variable: wage (log of hourly wage) n n n n n n n n n n April 29, 2016 Hackl, Econometrics 2, Lecture 6 55 Between Fixed Effects Random Effects Pooled OLS Intercept 0.511 1.053 -0.079 0.049 school 0.089*** -- 0.100*** 0.095*** exper -0.032 0.118*** 0.111*** 0.087*** exper2 0.004 -0.004*** -0.004*** -0.003*** union 0.262*** 0.082*** 0.109*** 0.179*** mar 0.184*** 0.045** 0.064*** 0.126*** black -0.141*** -- -0.149*** -0.150*** rural 0.188*** 0.049* -0.026 -0.138*** adjR2 (%) 23.7 56.5 -- 19.6 Contents nPanel Data nPooling Independent Cross-sectional Data nPanel Data: Pooled OLS Estimation nPanel Data Models nFixed Effects Model nFixed Effects Model: More Estimators nRandom Effects Model nAnalysis of Panel Data Models nPanel Data in GRETL April 29, 2016 Hackl, Econometrics 2, Lecture 6 56 Summary of Estimators April 29, 2016 Hackl, Econometrics 2, Lecture 6 57 nBetween estimator nFixed effects (within) estimator nCombined estimators qOLS estimator qRandom effects (EGLS) estimator nFirst-difference estimator q Estimator Consistent, if Between bB xit strictly exog, xit and αi uncorr Fixed effects bFE xit strictly exog OLS b xit and αi uncorr, xit and uit contemp. uncorr Random effects bRE conditions for bB and bFE are met First-difference bFD E{xit – xi,t-1,uit – ui,t-1} = 0 Fixed Effects or Random Effects? April 29, 2016 Hackl, Econometrics 2, Lecture 6 58 Random effects model E{yit | xit} = xit'β nLarge values N; of interest: population characteristics (β), not characteristics of individual units (αi) nMore efficient estimation of β, given adequate specification of the time-constant model characteristics Fixed effects model E{yit | xit,αi} = xit'β + αi nOf interest: besides population characteristics (β), also characteristics of individual units (αi), e.g., of countries or companies; rather small values N nLarge values of N, if xit and αi correlated: estimator bFE are consistent n Diagnostic Tools April 29, 2016 Hackl, Econometrics 2, Lecture 6 59 nTest of common intercept of all units qApplied to pooled OLS estimation: Rejection indicates preference for fixed or random effects model qApplied to fixed effects estimation: Non-rejection indicates preference for pooled OLS estimation nHausman test (of correlation between xit and αi); H0: xit and αi are uncorrelated qNull-hypothesis implies that GLS estimates are consistent qRejection indicates preference for fixed effects model nTest of non-constant variance σa2, Breusch-Pagan test; H0: σa2 = 0 pRejection indicates preference for fixed or random effects model pNon-rejection indicates preference for pooled OLS estimation n Hausman Test April 29, 2016 Hackl, Econometrics 2, Lecture 6 60 Tests of correlation between xit and αi H0: xit and αi are uncorrelated Test statistic: ξH = (bFE - bRE)' [Ṽ{bFE} - Ṽ{bRE}]-1 (bFE - bRE) with estimated covariance matrices Ṽ{bFE} and Ṽ{bRE} nbRE: consistent if xit and αi are uncorrelated nbFE: consistent also if xit and αi are correlated Under H0: plim(bFE - bRE) = 0 nξH asymptotically chi-squared distributed with K d.f. nK: dimension of xit and β Hausman test may indicate also other types of misspecification q Robust Inference April 29, 2016 Hackl, Econometrics 2, Lecture 6 61 Consequences of heteroskedasticity and autocorrelation of the error terms: nStandard errors and related tests are incorrect nInefficiency of estimators Robust covariance matrix for estimator b of β from yit = xit'β + εit b = (ΣiΣt xitxit')-1 ΣiΣt xityit nAdjustment of covariance matrix similar to Newey-West: assuming uncorrelated error terms for different units (E{εit εjs} = 0 for all i ≠ j) V{b} = (ΣiΣt xitxit')-1 ΣiΣtΣs eiteis xitxis' (ΣiΣt xitxit')-1 eit: OLS residuals nCorrects for heteroskedasticity and autocorrelation within units nCalled panel-robust estimate of the covariance matrix Analogous variants of the Newey-West estimator for robust covariance matrices of random effects and fixed effects estimators Testing for Autocorrelation and Heteroskedasticity April 29, 2016 Hackl, Econometrics 2, Lecture 6 62 Tests for heteroskedasticity and autocorrelation in random effects model error terms nComputationally cumbersome Tests based on fixed effects model residuals nEasier to conduct nApplicable for testing in both fixed and random effects case Test for Autocorrelation April 29, 2016 Hackl, Econometrics 2, Lecture 6 63 Durbin-Watson test for autocorrelation in the fixed effects model nError term uit = ρui,t-1 + vit qSame autocorrelation coefficient ρ for all units qvit iid across time and units nTest of H0: ρ = 0 against ρ > 0 nAdaptation of Durbin-Watson statistic n nTables with critical limits dU and dL for K, T, and N; e.g., Verbeek’s Table 10.1 Test for Heteroskedasticity April 29, 2016 Hackl, Econometrics 2, Lecture 6 64 Breusch-Pagan test for heteroskedasticity of fixed effects model error terms nV{uit} = σ2h(zit'γ); unknown function h(.) with h(0)=1, J-vector z nH0: γ = 0, homoskedastic uit nAuxiliary regression of squared residuals on intercept and regressors z nTest statistic: N(T-1) times R2 of auxiliary regression nChi-squared distribution with J d.f. under H0 Wage Equations, 1980-1987 nFixed effects estimation, standard and HAC standard errors n n n n n n n n q: ratio of HAC s.e. to s.e. n n n n n n n n n April 29, 2016 Hackl, Econometrics 2, Lecture 6 65 Coeff. s.e. HAC s.e. q Intercept 1.053 0.0276 0.0384 1.39 exper 0.118 0.0084 0.0108 1.29 exper2 -0.004 0.0006 0.0007 1.17 union 0.082 0.0193 0.0227 1.18 mar 0.045 0.0183 0.0210 1.15 rural 0.049 0.0290 0.0391 1.35 Goodness-of-Fit April 29, 2016 Hackl, Econometrics 2, Lecture 6 66 Goodness-of-fit measures for panel data models: different from measures for OLS estimated regression models nFocus may be on within or between variation in the data nThe usual R2 measure relates to OLS-estimated models Definition of goodness-of-fit measures: squared correlation coefficients between actual and fitted values nR2within: squared correlation between within time-demeaned actual and fitted yit; maximized by within estimator nR2between: based upon individual averages of actual and fitted yit; maximized by between estimator nR2overall: squared correlation between actual and fitted yit; maximized by OLS Corresponds to the decomposition [1/TN]ΣiΣt(yit – ӯ)2 = [1/TN]ΣiΣt(yit – ӯi)2 + [1/N]Σi(ӯi – ӯ)2 q Goodness-of-Fit, cont’d April 29, 2016 Hackl, Econometrics 2, Lecture 6 67 Fixed effects estimator bFE nExplains the within variation nMaximizes R2within R2within(bFE) = corr2{ŷitFE – ŷiFE, yit – ӯi} Between estimator bB nExplains the between variation nMaximizes R2between R2between(bB) = corr2{ŷiB, ӯi} Wage Equations, 1980-1987 nDependent variable: wage (log of hourly wage) n n n n n n n n n n April 29, 2016 Hackl, Econometrics 2, Lecture 6 68 Between F.E. R.E. OLS Intercept 0.511 1.053 -0.079 0.049 school 0.089*** -- 0.100*** 0.095*** exper -0.032 0.118*** 0.111*** 0.087*** exper2 0.004 -0.004*** -0.004*** -0.003*** union 0.262*** 0.082*** 0.109*** 0.179*** mar 0.184*** 0.045** 0.064*** 0.126*** black -0.141*** -- -0.149*** -0.150*** rural 0.188*** 0.049* -0.026 -0.138*** overall R2 (%) 16.07 5.66 18.42 19.70 Extensions of Panel Data Models April 29, 2016 Hackl, Econometrics 2, Lecture 6 69 Dynamic linear models yit = xit'β + γyi,t-1 + αi + uit , uit ~ IID(0, σu2) nFixed or random effects αi nComplication due to dependence between yi,t-1 and αi nGMM estimation Unit root and cointegration nPanel data unit root tests nPanel data cointegration tests Models for limited dependent variables nBinary choice models nTobit models Incomplete panels, pseudo panels Contents nPanel Data nPooling Independent Cross-sectional Data nPanel Data: Pooled OLS Estimation nPanel Data Models nFixed Effects Model nFixed Effects Model: More Estimators nRandom Effects Model nAnalysis of Panel Data Models nPanel Data in GRETL April 29, 2016 Hackl, Econometrics 2, Lecture 6 70 Panel Data and GRETL April 29, 2016 Hackl, Econometrics 2, Lecture 6 71 Estimation of panel models Pooled OLS nModel > Ordinary Least Squares … nSpecial diagnostics on the output window: Tests > Panel diagnostics Fixed and random effects models nModel > Panel > Fixed or random effects… nProvide diagnostic tests qFixed effects model: Test for common intercept of all units qRandom effects model: Breusch-Pagan test, Hausman test Further estimation procedures nBetween estimator nDynamic panel models nInstrumental variable panel procedure Hackl, Econometrics 2, Lecture 6 72 Your Homework 1.Use Verbeek’s data set MALES which contains panel data for 545 full-time working males over the period 1980-1987. Estimate a wage equation which explains the individual log wages by the variables years of schooling, years of experience and its squares, and dummy variables for union membership, being married, black, and working in the public sector. Use (i) pooled OLS, (ii) the between and (iii) the within estimator, and (iv) the random effects estimator. Compare the resulting models. 2. n April 29, 2016