Econometrics - Lecture 3 Regression Models: Interpretation and Comparison Contents nThe Linear Model: Interpretation nSelection of Regressors nSpecification of the Functional Form n Nov 5, 2010 Hackl, Econometrics, Lecture 3 2 Economic Models nDescribe economic relationships (not only a set of observations), have an economic interpretation nLinear regression model: n yi = b1 + b2xi2 + … + bKxiK + ei = xi’b + εi nVariables Y, X2, …, XK: observable nObservations: yi, xi2, …, xiK, i = 1, …, N nError term εi (disturbance term) contains all influences that are not included explicitly in the model; unobservable nAssumption (A1), i.e., E{εi | X} = 0 or E{εi | xi} = 0, gives q E{yi | xi} = xi‘β qthe model describes the expected value of yi given xi Nov 5, 2010 Hackl, Econometrics, Lecture 3 3 Example nWage equation n wagei = β1 + β2 malei + β3 schooli + β4 experi + εi n Answers questions like: qExpected wage p.h. of a female with 12 years of education and 10 years of experience nWage equation fitted to all 3294 observations n wagei = -3.38 + 1.34*malei + 0.64*schooli + 0.12*experi qExpected wage p.h. of a female with 12 years of education and 10 years of experience: 5.50 USD n Nov 5, 2010 Hackl, Econometrics, Lecture 3 4 Regression Coefficients nLinear regression model: n yi = b1 + b2xi2 + … + bKxiK + ei = xi’b + εi nCoefficient bk measures the change of Y if Xk changes by one unit n n for ∆xk = 1 n nFor continuous regressors n n n Marginal effect of changing Xk on Y nCeteris paribus condition: measuring the effect of a change of Y if Xk changes by one unit by bk implies qknowledge which other Xi, i ǂ k, are in the model qthat all other Xi, i ǂ k, remain unchanged Nov 5, 2010 Hackl, Econometrics, Lecture 3 5 Example nWage equation n wagei = β1 + β2 malei + β3 schooli + β4 experi + εi n β3 measures the impact of one additional year at school upon a person’s wage, keeping gender and years of experience fixed n n nWage equation fitted to all 3294 observations n wagei = -3.38 + 1.34*malei + 0.64*schooli + 0.12*experi nOne extra year at school, e.g., at the university, results in an increase of 64 cents; a 4-year study results in an increase of 2.56 USD of the wage p.h. nThis is true for otherwise (gender, experience) identical people Nov 5, 2010 Hackl, Econometrics, Lecture 3 6 Regression Coefficients, cont’d nThe marginal effect of a changing regressor may be depending on other variables nExample nWage equation: wagei = β1 + β2 malei + β3 agei + β4 agei2 + εi n the impact of changing age depends on age: n n nWage equation may contain β3 agei + β4 agei malei: marginal effect of age depends upon gender Nov 5, 2010 Hackl, Econometrics, Lecture 3 7 Elasticities nElasticity: measures the relative change in the dependent variable Y due to a relative change in Xk nFor a linear regression, the elasticity of Y with respect to Xk is n n n nFor a loglinear model n log yi = (log xi)’ β + εi with (log xi)’ = (1, log xi2,…, log xik) n elasticities are the coefficients β n Nov 5, 2010 Hackl, Econometrics, Lecture 3 8 Elasticities, cont’d nThis follows from n n n n n nand Nov 5, 2010 Hackl, Econometrics, Lecture 3 9 Semi-Elasticities nSemi-elasticity: measures the relative change in the dependent variable Y due to a one-unit-change in Xk nLinear regression for n log yi = xi’ β + εi n the elasticity of Y with respect to Xk is n n n n βk measures the relative change in Y due to a change in Xk by one unit Nov 5, 2010 Hackl, Econometrics, Lecture 3 10 Example nWage equation, fitted to all 3294 observations: n log(wagei) = 1.09 + 0.20 malei + 0.19 log(experi) nThe coefficient of malei measures the semi-elasticity of wages with respect to gender: The wage differential between males (malei =1) and females is obtained from wf = exp{1.09 + 0.19 log(experi)} and wm = wf exp{0.20} = 1.22 wf; the wage differential is 0.22 or 22%, i.e., approximately the coefficient 0.201) nThe coefficient of log(experi) measures the elasticity of wages with respect to experience: 10% more time of experience results in a 1.9% higher wage n____________________ n1) For small x, exp{x} = Skxk/k! ≈ 1+x Nov 5, 2010 Hackl, Econometrics, Lecture 3 11 Contents nThe Linear Model: Interpretation nSelection of Regressors nSpecification of the Functional Form n Nov 5, 2010 Hackl, Econometrics, Lecture 3 12 Selection of Regressors nSpecification errors: nOmission of a relevant variable nInclusion of an irrelevant variable nQuestions: nWhat are the consequences? nHow to avoid specification errors? nHow to detect a committed specification error? n n n n Nov 5, 2010 Hackl, Econometrics, Lecture 3 13 Example: Income and Consumption Hackl, Econometrics, Lecture 3 14 PCR: Private Consumption, real, in bn. EUROs PYR: Household's Dispos- able Income, real, in bn. EUROs 1970:1-2003:4 Basis: 1995 Source: AWM-Database Nov 5, 2010 Income and Consumption Hackl, Econometrics, Lecture 3 15 PCR: Private Consumption, real, in bn. EUROs PYR: Household's Dispos- able Income, real, in bn. EUROs 1970:1-2003:4 Basis: 1995 Source: AWM-Database Nov 5, 2010 Income and Consumption: Growth Rates Hackl, Econometrics, Lecture 3 16 PCR_D4: Private Consump- tion, real, growth rate PYR_D4: Household’s Dis- posable Income, real, growth rate 1970:1-2003:4 Basis: 1995 Source: AWM-Database Nov 5, 2010 Consumption Function nC: Private Consumption, real, growth rate (PCR_D4) nY: Household’s Disposable Income, real, growth rate (PYR_D4) nT: Trend (Ti = i/1000) n n nConsumption function with trend Ti = i/1000: n n n Hackl, Econometrics, Lecture 3 17 Nov 5, 2010 Consumption Function, cont’d Hackl, Econometrics, Lecture 3 18 OLS estimated consumption function: Output from GRETL Dependent variable : PCR_D4 coefficient std. error t-ratio p-value ------------------------------------------------------------- const 0,0162489 0,00187868 8,649 1,76e-014 *** PYR_D4 0,707963 0,0424086 16,69 4,94e-034 *** T -0,0682847 0,0188182 -3,629 0,0004 *** Mean dependent var 0,024911 S.D. dependent var 0,015222 Sum squared resid 0,007726 S.E. of regression 0,007739 R- squared 0,745445 Adjusted R-squared 0,741498 F(2, 129) 188,8830 P-value (F) 4,71e-39 Log-likelihood 455,9302 Akaike criterion -905,8603 Schwarz criterion -897,2119 Hannan-Quinn -902,3460 rho 0,701126 Durbin-Watson 0,601668 Nov 5, 2010 Consequences nConsequences of specification errors: nOmission of a relevant variable nInclusion of a irrelevant variable n n n n Nov 5, 2010 Hackl, Econometrics, Lecture 3 19 Misspecification: Omitted Regressor nTwo models, with J-vector zi: n yi = xi‘β + zi’γ + εi (A) n yi = xi‘β + vi (B) nOLS estimates bB of β from (B) can be written with yi from (A): n n nIf (A) is the true model but (B) is specified, i.e., relevant regressors zi are omitted, bB is biased by n n nOmitted variable bias nNo bias if (a) γ = 0 or if (b) variables in xi and zi are orthogonal n n n n Nov 5, 2010 Hackl, Econometrics, Lecture 3 20 Misspecification: Irrelevant Regressor nTwo models: n yi = xi‘β + zi’γ + εi (A) n yi = xi‘β + vi (B) nIf (B) is the true model but (A) is specified, i.e., the model contains irrelevant regressors zi nThe OLS estimates bA nare unbiased nHave higher variances and standard errors than the OLS estimate bB obtained from fitting model (B) n n n n n Nov 5, 2010 Hackl, Econometrics, Lecture 3 21 Specification Search nGeneral-to-specific modeling: 1.List all potential regressors, based on, e.g., qeconomic theory qempirical results qavailability of data 2.Specify the most general model: include all potential regressors 3.Iteratively, test which variables have to be dropped, re-estimate 4.Stop if no more variable has to be dropped nThe procedure is known as the LSE (London School of Economics) method nAlternatively, one can start with a small model and add variables as long as they contribute to explaining Y n n n Nov 5, 2010 Hackl, Econometrics, Lecture 3 22 Specification Search, cont’d nAlternative procedures nSpecific-to-general modeling: start with a small model and add variables as long as they contribute to explaining Y nStepwise regression nSpecification search can be subsumed under data mining n n n Nov 5, 2010 Hackl, Econometrics, Lecture 3 23 Practice of Specification Search nApplied research nStarts with a – in terms of economic theory – plausible specification nTests whether imposed restrictions are correct qTests for omitted regressors qTests for autocorrelation of residuals qTests for heteroskedasticity nTests whether further restrictions need to be imposed qTests for irrelevant regressors nObstacles for good specification nComplexity of economic theory nLimited availability of data n n n n Nov 5, 2010 Hackl, Econometrics, Lecture 3 24 Regressor Selection Criteria nCriteria for adding and deleting regressors nt-statistic, F-statistic nAdjusted R2 nInformation Criteria: penalty for increasing number of regressors nAkaike’s Information Criterion n n nSchwarz’s Bayesian Information Criterion n n n model with smaller BIC (or AIC) is preferred nThe corresponding probabilities for type I and type II errors can hardly be assessed n n n n Nov 5, 2010 Hackl, Econometrics, Lecture 3 25 Individual Wages nAre school and exper relevant regressors in n wagei = β1 + β2 malei + β3 schooli + β4 experi + εi n or shall they be omitted? nt-test: p-values are 4.62E-80 (school) and 1.59E-7 (exper) nF-test: F = [(0.1326-0.0317)/2]/[(1-0.1326)/(3294-4)] = 191.24, with p-value 2.68E-79 nadj R2: 0.1318 for the wider model, much higher than 0.0315 nAIC: the wider model (AIC = 16690.18) is preferable; for the smaller model: AIC = 17048.46 nBIC: the wider model (BIC = 16714.58) is preferable; for the smaller model: BIC = 17060.66 nAll criteria suggest the wider model Nov 5, 2010 Hackl, Econometrics, Lecture 3 26 Individual Wages, cont’d nOLS estimated smaller wage equation (Table 2.1, Verbeek) n n n n n n nwith AIC = 17048.46, BIC = 17060.66 n n Oct 1, 2010 Hackl, Econometrics, Lecture 2 27 Individual Wages, cont’d nOLS estimated wider wage equation (Table 2.2, Verbeek) n n n n n n n n n n nwith AIC = 16690.18, BIC = 16714.58 Oct 1, 2010 Hackl, Econometrics, Lecture 2 28 11.11.2004 Ökonometrie I 29 The AIC Criterion nVarious versions in literature nVerbeek, also Greene: n n nAkaike‘s original formula is n n n n with the log-likelihoodfunktion n n n nGRETL: n Nested Models: Comparison nModel (B), p.20, is nested in model (A); (A) is extended by J additional regressors nDo the J added regressors contribute to explaining Y? nF-test (t-test when J = 1) for testing H0: coefficients of added regressors are zero q q qRB2 and RA2 are the R2 of the models without (B) and with (A) the J additional regressors, respectively nComparison of adjusted R2: adj RA2 > adj RB2 equivalent to F > 1 nInformation Criteria: choose the model with the smaller value of the information criterion n n n n Nov 5, 2010 Hackl, Econometrics, Lecture 3 30 Comparison of Non-nested Models n nNon-nested models: A: yi = xi’β + εi, B: yi = zi’γ + vi with components in zi that are not in xi nNon-nested or encompassing F-test: compares by F-tests artificially nested models n yi = xi’β + z2i’δB + ε*i with z2i: regressors from zi not in xi n yi = zi’γ + x2i’δA + v*i with x2i: regressors from xi not in zi qTest validity of model A by testing H0: δB = 0 qAnalogously, test validity of model B by testing H0: δA = 0 qPossible results: A or B is valid, both models are valid, none is valid nOther procedures: J-test, PE-test n Nov 5, 2010 Hackl, Econometrics, Lecture 3 31 Individual Wages nWhich of the models is adequate? n log(wagei) = 0.119 + 0.260 malei + 0.115 schooli (A) n adj R2 = 0.121, BIC = 5824.90, n log(wagei) = 0.119 + 0.064 agei (B) n adj R2 = 0.069, BIC = 6004.60 nThe artificially nested model is n -0.472 + 0.243 malei + 0.088 schooli + 0.035 agei nTest of model validity qmodel A: t-test for age, p-value 5.79E-15; model A is not adequate qmodel B: F-test for male and school: model B is not adequate Nov 5, 2010 Hackl, Econometrics, Lecture 3 32 Comparison of Non-nested Models: J-Test n nNon-nested models: A: yi = xi’β + εi, B: yi = zi’γ + vi with components of zi that are not in xi nCombined model n yi = (1 - δ) xi’β + δ zi’γ + ui n δ indicates model adequacy nTransformed model n yi = xi’β* + δzi’c + ui = xi’β* + δŷiB + u*i n with OLS-estimate c for γ and predicted values ŷiB obtained from fitting model B; β* = (1-δ)β nJ-test for validity of model A by testing H0: δ = 0 nLess computational effort than the encompassing F-test q n Nov 5, 2010 Hackl, Econometrics, Lecture 3 33 Individual Wages nWhich of the models is adequate? n log(wagei) = 0.119 + 0.260 malei + 0.115 schooli (A) n adj R2 = 0.121, BIC = 5824.90, n log(wagei) = 0.119 + 0.064 agei (B) n adj R2 = 0.069, BIC = 6004.60 nTest of model validity by means of the J-test nExtend the model B to n log(wagei) = -0.587 + 0.034 agei + 0.826 ŷiA n with values ŷiA predicted for log(wagei) from model A nTest of model validity: t-test for coefficient of ŷiA, t = 15.96, p-value 2.65E-55 nModel B is not a valid model Nov 5, 2010 Hackl, Econometrics, Lecture 3 34 Linear vs. Loglinear Model nChoice between linear and loglinear functional form n yi = xi’β + εi (A) q log yi = (log xi)’β + vi (B) nOn the basis of economic interpretation: are effects additive or multiplicative? nLog-transformation stabilizes variance, particularly if the dependent variable has a skewed distribution (wages, income, production, firm size, sales,…) nLoglinear models are easily interpretable in terms of elasticities Nov 5, 2010 Hackl, Econometrics, Lecture 3 35 Linear vs. Loglinear Model: The PE-Test nChoice between linear and loglinear functional form nEstimate both models n yi = xi’β + εi (A) q log yi = (log xi)’β + vi (B) n calculate the fitted values ŷ (from model A) and log ӱ (from B) nTest δLIN = 0 in n yi = xi’β + δLIN (log ŷi – log ӱi) + ui n not rejecting δLIN = 0 favors the model A nTest δLOG = 0 in n log yi = xi’β + δLOG (ŷi – exp{log ӱi}) + ui n not rejecting δLOG = 0 favors the model B nBoth null hypotheses are rejected: find a more adequate model n Nov 5, 2010 Hackl, Econometrics, Lecture 3 36 Individual Wages nTest of validity of models by means of the PE-test nThe fitted models are (with l_x for log(x)) n wagei = -2.046 + 1.406 malei + 0.608 schooli (A) n l_wagei = 0.119 + 0.260 malei + 0.115 l_schooli (B) nx_f: predicted value of x: d_lg = log(wage_f) – l_wage_f, d_ln = wage_f – exp(l_wage_f) nTest of model validity, model A: q wagei = -1.708 + 1.379 malei + 0.637 schooli – 4.731 d_lgi qwith p-value 0.013 for d_lg; validity in doubt nTest of model validity, model B: n l_wagei = -1.132 + 0.240 malei + 1.008 l_schooli + 0.171 d_lni n with p-value 0.076 for d_ln; model B to be preferred Nov 5, 2010 Hackl, Econometrics, Lecture 3 37 The PE-Test nChoice between linear and loglinear functional form nThe auxiliary regressions are estimated for testing purposes nIf the linear model is not rejected: accept the linear model nIf the loglinear model is not rejected: accept the loglinear model nIf both are rejected, neither model is appropriate, a more general model should be considered nIn case of the Individual Wages example: qLinear model: t-statistic is – 4.731, p-value 0.013: the model is rejected qLoglinear model: t-statistic is 0.171, p-value 0.076 : the model is not rejected n n Nov 5, 2010 Hackl, Econometrics, Lecture 3 38 Contents nThe Linear Model: Interpretation nSelection of Regressors nSpecification of the Functional Form n Nov 5, 2010 Hackl, Econometrics, Lecture 3 39 Non-linear Functional Forms nModel specification n yi = g(xi, β) + εi n instead of yi = xi’β + εi: violation of linearity nNon-linearity in regressors (but linear in parameters) qPowers of regressors qInteractions of regressors n OLS-technique still works; t-test, F-test for specification check nNon-linearity in regression coefficients, e.g., qg(xi, β) = β1 xi1β2 xi2β3 q logarithmic transformation: log g(xi, β) = log β1 + β2log xi1+ β3log xi2 qg(xi, β) = β1 + β2 xiβ3 q non-linear least squares estimation, numerical procedures n Various test procedures, e.g., RESET test, Chow test Nov 5, 2010 Hackl, Econometrics, Lecture 3 40 Individual Wages: Effect of Gender nEffect of gender may be depending of education level nSeparate models for males and females nInteraction terms between dummies for education level and male nExample: Belgian Household Panel, 1994 (N=1472) nFive education levels nModel with education dummies nModel with interaction terms between education dummies and gender dummy nF-statistic for interaction terms: n F(5, 1460) = {(0.4032-0.3976)/5}/{(1-0.4032)/(1472-12)} = 2.74 n with a p-value of 0.018 Nov 5, 2010 Hackl, Econometrics, Lecture 3 41 Wages: Education Dummies nModel with education dummies: Verbeek, Table 3.11 Nov 5, 2010 Hackl, Econometrics, Lecture 3 42 Wages: Interactions with Gender nWage equation with interactions educ*male Nov 5, 2010 Hackl, Econometrics, Lecture 3 43 Wages: Effect of Gender nWage equation with interaction educ*male Nov 5, 2010 Hackl, Econometrics, Lecture 3 44 RESET Test nTest of the linear model E{yi |xi}= xi’β against misspecification of the functional form: nNull hypothesis: linear model is correct functional form nTest of H0: RESET test (Regression Equation Specification Error Test) nTest idea: non-linear functions of ŷi, the fitted values from the linear model, e.g., ŷi², ŷi³, ... , do not improve model fit unter H0 nTest procedure: linear model extended by adding ŷi², ŷi³, ... nF-test to decide whether powers of fitted values like ŷi², ŷi³, ... contribute as additional regressors to explaining Y nPower Q of fitted values: typical choice is Q = 2 or Q = 3 Nov 5, 2010 Hackl, Econometrics, Lecture 3 45 Individual Wages: RESET Test nThe fitted models are (with l_x for log(x)) n wagei = -2.046 + 1.406 malei + 0.608 schooli (A) n l_wagei = 0.119 + 0.260 malei + 0.115 l_schooli (B) nTest of specification of the functional form with Q = 2 nModel A: Test statistic: F(2, 3288) = 10.23, p-value = 3.723e-005 nModel B: Test statistic: F(2, 3288) = 4.52, p-value = 0.011 nFor both models the adequacy of the functional form is in doubt n Nov 5, 2010 Hackl, Econometrics, Lecture 3 46 Structural Break: Chow Test nIn time-series context, coefficients of a model may change due to a major policy change, e.g., the oil price shock nModeling a process with structural break n E{yi |xi}= xi’β + gixi’ γ n with dummy variable gi=0 before the break, gi=1 after the break nRegressors xi, coefficients β before, β+γ after the break nNull hypothesis: no structural break, γ=0 nTest procedure: fitting the extended model, F- (or t-) test of γ=0 n n n with Sr (Su): sum of squared residuals of the (un)restricted model nChow test for structural break or structural change Nov 5, 2010 Hackl, Econometrics, Lecture 3 47 Chow Test: The Practice nTest procedure is performed in the following steps nFit the restricted model: Sr nFit the extended model: Su nCalculate f and the p-value from the F-distribution with K and N-2K d.f. nNeeds knowledge of break point Nov 5, 2010 Hackl, Econometrics, Lecture 3 48 Your Homework 1.Show that the OLS estimator for β from yi = xi‘β + zi’γ + εi can be written as (a) b = (X’X)-1X’(y-Zc) with estimator c for γ, or as (b) b = (X’MzX)-1X’Mzy with residual generating matrix Mz=I-Z(Z’Z)-1Z’. 2.Use the data set “wages” of Verbeek for the following analyses: a.Estimate the model where the log hourly wages are explained by male, age and educ with age = school + exper + 6; interpret the results. b.Repeat the analysis after adding four dummy variables for the educational levels 2 through 5 instead of the variable educ; compare the model by using (a) the non-nested F-test and (b) the JE-test; interpret the results. c.Use the PE-test to decide whether the model in b. (where log hourly wages are explained) or the same model but with levels of hourly wages as explained variable is to be preferred; interpret the result. d.Repeat a. with the interaction age*educ as added regressor; interpret the result. 3. Nov 5, 2010 Hackl, Econometrics, Lecture 3 49