Econometrics - Lecture 3 Regression Models: Interpretation and Comparison Contents nThe Linear Model: Interpretation nSelection of Regressors nSpecification of the Functional Form n Nov 18, 2011 Hackl, Econometrics, Lecture 3 2 Economic Models nDescribe economic relationships (not only a set of observations), have an economic interpretation nLinear regression model: n yi = b1 + b2xi2 + … + bKxiK + ei = xi’b + εi nVariables Y, X2, …, XK: observable nObservations: yi, xi2, …, xiK, i = 1, …, N nError term εi (disturbance term) contains all influences that are not included explicitly in the model; unobservable nAssumption (A1), i.e., E{εi | X} = 0 or E{εi | xi} = 0, gives q E{yi | xi} = xi‘β qthe model describes the expected value of yi given xi (conditional expectation) Nov 18, 2011 Hackl, Econometrics, Lecture 3 3 Example nWage equation n wagei = β1 + β2 malei + β3 schooli + β4 experi + εi n Answers questions like: qExpected wage p.h. of a female with 12 years of education and 10 years of experience nWage equation fitted to all 3294 observations n wagei = -3.38 + 1.34*malei + 0.64*schooli + 0.12*experi qExpected wage p.h. of a female with 12 years of education and 10 years of experience: 5.50 USD n Nov 18, 2011 Hackl, Econometrics, Lecture 3 4 Regression Coefficients nLinear regression model: n yi = b1 + b2xi2 + … + bKxiK + ei = xi’b + εi nCoefficient bk measures the change of Y if Xk changes by one unit n n for ∆xk = 1 n nFor continuous regressors n n n Marginal effect of changing Xk on Y nCeteris paribus condition: measuring the effect of a change of Y if Xk changes by one unit by bk implies qknowledge which other Xi, i ǂ k, are in the model qthat all other Xi, i ǂ k, remain unchanged Nov 18, 2011 Hackl, Econometrics, Lecture 3 5 Example nWage equation n wagei = β1 + β2 malei + β3 schooli + β4 experi + εi n β3 measures the impact of one additional year at school upon a person’s wage, keeping gender and years of experience fixed n n nWage equation fitted to all 3294 observations n wagei = -3.38 + 1.34*malei + 0.64*schooli + 0.12*experi nOne extra year at school, e.g., at the university, results in an increase of 64 cents; a 4-year study results in an increase of 2.56 USD of the wage p.h. nThis is true for otherwise (gender, experience) identical people Nov 18, 2011 Hackl, Econometrics, Lecture 3 6 Regression Coefficients, cont’d nThe marginal effect of a changing regressor may be depending on other variables nExample nWage equation: wagei = β1 + β2 malei + β3 agei + β4 agei2 + εi n the impact of changing age depends on age: n n nWage equation may contain β3 agei + β4 agei malei: marginal effect of age depends upon gender Nov 18, 2011 Hackl, Econometrics, Lecture 3 7 Elasticities nElasticity: measures the relative change in the dependent variable Y due to a relative change in Xk nFor a linear regression, the elasticity of Y with respect to Xk is n n n nFor a loglinear model n log yi = (log xi)’ β + εi with (log xi)’ = (1, log xi2,…, log xik) n elasticities are the coefficients β n Nov 18, 2011 Hackl, Econometrics, Lecture 3 8 Example nWage equation, fitted to all 3294 observations: n log(wagei) = 1.09 + 0.20 malei + 0.19 log(experi) nThe coefficient of malei measures the semi-elasticity of wages with respect to gender: The wage differential between males (malei =1) and females is obtained from wf = exp{1.09 + 0.19 log(experi)} and wm = wf exp{0.20} = 1.22 wf; the wage differential is 0.22 or 22%, i.e., approximately the coefficient 0.201) nThe coefficient of log(experi) measures the elasticity of wages with respect to experience: 10% more time of experience results in a 1.9% higher wage n____________________ n1) For small x, exp{x} = Skxk/k! ≈ 1+x Nov 18, 2011 Hackl, Econometrics, Lecture 3 9 Elasticities, cont’d nThis follows from n n n n n nand Nov 18, 2011 Hackl, Econometrics, Lecture 3 10 Semi-Elasticities nSemi-elasticity: measures the relative change in the dependent variable Y due to a one-unit-change in Xk nLinear regression for n log yi = xi’ β + εi n the elasticity of Y with respect to Xk is n n n n βk measures the relative change in Y due to a change in Xk by one unit Nov 18, 2011 Hackl, Econometrics, Lecture 3 11 Example nWage equation, fitted to all 3294 observations: n log(wagei) = 1.09 + 0.20 malei + 0.19 log(experi) nThe coefficient of malei measures the semi-elasticity of wages with respect to gender: The wage differential between males (malei =1) and females is obtained from wf = exp{1.09 + 0.19 log(experi)} and wm = wf exp{0.20} = 1.22 wf; the wage differential is 0.22 or 22%, i.e., approximately the coefficient 0.201) nThe coefficient of log(experi) measures the elasticity of wages with respect to experience: 10% more time of experience results in a 1.9% higher wage n____________________ n1) For small x, exp{x} = Skxk/k! ≈ 1+x Nov 18, 2011 Hackl, Econometrics, Lecture 3 12 Contents nThe Linear Model: Interpretation nSelection of Regressors nSpecification of the Functional Form n Nov 18, 2011 Hackl, Econometrics, Lecture 3 13 Selection of Regressors nSpecification errors: nOmission of a relevant variable nInclusion of an irrelevant variable nQuestions: nWhat are the consequences of a specification error? nHow to avoid specification errors? nHow to detect an erroneous specification? n n n n Nov 18, 2011 Hackl, Econometrics, Lecture 3 14 Example: Income and Consumption Hackl, Econometrics, Lecture 3 15 PCR: Private Consumption, real, in bn. EUROs PYR: Household's Dispos- able Income, real, in bn. EUROs 1970:1-2003:4 Basis: 1995 Source: AWM-Database Nov 18, 2011 Income and Consumption Hackl, Econometrics, Lecture 3 16 PCR: Private Consumption, real, in bn. EUROs PYR: Household's Dispos- able Income, real, in bn. EUROs 1970:1-2003:4 Basis: 1995 Source: AWM-Database Nov 18, 2011 Income and Consumption: Growth Rates Hackl, Econometrics, Lecture 3 17 PCR_D4: Private Consump- tion, real, yearly growth rate PYR_D4: Household’s Dis- posable Income, real, yearly growth rate 1970:1-2003:4 Basis: 1995 Source: AWM-Database Nov 18, 2011 Consumption Function nC: Private Consumption, real, yearly growth rate (PCR_D4) nY: Household’s Disposable Income, real, yearly growth rate (PYR_D4) nT: Trend (Ti = i/1000) n n nConsumption function with trend Ti = i/1000: n n n Hackl, Econometrics, Lecture 3 18 Nov 18, 2011 Consumption Function, cont’d Hackl, Econometrics, Lecture 3 19 OLS estimated consumption function: Output from GRETL Dependent variable : PCR_D4 coefficient std. error t-ratio p-value ------------------------------------------------------------- const 0,0162489 0,00187868 8,649 1,76e-014 *** PYR_D4 0,707963 0,0424086 16,69 4,94e-034 *** T -0,0682847 0,0188182 -3,629 0,0004 *** Mean dependent var 0,024911 S.D. dependent var 0,015222 Sum squared resid 0,007726 S.E. of regression 0,007739 R- squared 0,745445 Adjusted R-squared 0,741498 F(2, 129) 188,8830 P-value (F) 4,71e-39 Log-likelihood 455,9302 Akaike criterion -905,8603 Schwarz criterion -897,2119 Hannan-Quinn -902,3460 rho 0,701126 Durbin-Watson 0,601668 Nov 18, 2011 Consequences nConsequences of specification errors: nOmission of a relevant variable nInclusion of a irrelevant variable n n n n Nov 18, 2011 Hackl, Econometrics, Lecture 3 20 Misspecification: Omitted Regressor nTwo models, with J-vector zi: n yi = xi‘β + zi’γ + εi (A) n yi = xi‘β + vi (B) nOLS estimates bB of β from (B) can be written with yi from (A): n n nIf (A) is the true model but (B) is specified, i.e., relevant regressors zi are omitted, bB is biased by n n nOmitted variable bias nNo bias if (a) γ = 0 or if (b) variables in xi and zi are orthogonal n n n n Nov 18, 2011 Hackl, Econometrics, Lecture 3 21 Misspecification: Irrelevant Regressor nTwo models: n yi = xi‘β + zi’γ + εi (A) n yi = xi‘β + vi (B) nIf (B) is the true model but (A) is specified, i.e., the model contains irrelevant regressors zi nThe OLS estimates bA nare unbiased nhave higher variances and standard errors than the OLS estimate bB obtained from fitting model (B) n n n n n Nov 18, 2011 Hackl, Econometrics, Lecture 3 22 Specification Search nGeneral-to-specific modeling: 1.List all potential regressors, based on, e.g., qeconomic theory qempirical research qavailability of data 2.Specify the most general model: include all potential regressors 3.Iteratively, test which variables have to be dropped, re-estimate 4.Stop if no more variable has to be dropped nThe procedure is known as the LSE (London School of Economics) method n n n Nov 18, 2011 Hackl, Econometrics, Lecture 3 23 Specification Search, cont’d nAlternative procedures nSpecific-to-general modeling: start with a small model and add variables as long as they contribute to explaining Y nStepwise regression nSpecification search can be subsumed under data mining n n n Nov 18, 2011 Hackl, Econometrics, Lecture 3 24 Practice of Specification Search nApplied research nStarts with a – in terms of economic theory – plausible specification nTests whether imposed restrictions are correct qTests for omitted regressors qTests for autocorrelation of residuals qTests for heteroskedasticity nTests whether further restrictions need to be imposed qTests for irrelevant regressors nObstacles for good specification nComplexity of economic theory nLimited availability of data n n n n Nov 18, 2011 Hackl, Econometrics, Lecture 3 25 Regressor Selection Criteria nCriteria for adding and deleting regressors nt-statistic, F-statistic nAdjusted R2 nInformation Criteria: penalty for increasing number of regressors nAkaike’s Information Criterion n n nSchwarz’s Bayesian Information Criterion n n n model with smaller BIC (or AIC) is preferred nThe corresponding probabilities for type I and type II errors can hardly be assessed n n n n Nov 18, 2011 Hackl, Econometrics, Lecture 3 26 Individual Wages nAre school and exper relevant regressors in n wagei = β1 + β2 malei + β3 schooli + β4 experi + εi n or shall they be omitted? nt-test: p-values are 4.62E-80 (school) and 1.59E-7 (exper) nF-test: F = [(0.1326-0.0317)/2]/[(1-0.1326)/(3294-4)] = 191.24, with p-value 2.68E-79 nadj R2: 0.1318 for the wider model, much higher than 0.0315 nAIC: the wider model (AIC = 16690.2) is preferable; for the smaller model: AIC = 17048.5 nBIC: the wider model (BIC = 16714.6) is preferable; for the smaller model: BIC = 17060.7 nAll criteria suggest the wider model Nov 18, 2011 Hackl, Econometrics, Lecture 3 27 Individual Wages, cont’d nOLS estimated smaller wage equation (Table 2.1, Verbeek) n n n n n n nwith AIC = 17048.46, BIC = 17060.66 n n Nov 18, 2011 Hackl, Econometrics, Lecture 3 28 Individual Wages, cont’d nOLS estimated wider wage equation (Table 2.2, Verbeek) n n n n n n n n n n nwith AIC = 16690.18, BIC = 16714.58 Nov 18, 2011 Hackl, Econometrics, Lecture 3 29 Nov 18, 2011 Hackl, Econometrics, Lecture 3 30 The AIC Criterion nVarious versions in literature nVerbeek, also Greene: n n nAkaike‘s original formula is n n n n with the log-likelihood function n n n nGRETL: n Nested Models: Comparison nModel (B), p.21, is nested in model (A); (A) is extended by J additional regressors nDo the J added regressors contribute to explaining Y? nF-test (t-test when J = 1) for testing H0: coefficients of added regressors are zero q q qRB2 and RA2 are the R2 of the models without (B) and with (A) the J additional regressors, respectively nComparison of adjusted R2: adj RA2 > adj RB2 equivalent to F > 1 nInformation Criteria: choose the model with the smaller value of the information criterion n n n n Nov 18, 2011 Hackl, Econometrics, Lecture 3 31 Comparison of Non-nested Models n nNon-nested models: A: yi = xi’β + εi, B: yi = zi’γ + vi with components in zi that are not in xi nNon-nested or encompassing F-test: compares by F-tests artificially nested models n yi = xi’β + z2i’δB + ε*i with z2i: regressors from zi not in xi n yi = zi’γ + x2i’δA + v*i with x2i: regressors from xi not in zi qTest validity of model A by testing H0: δB = 0 qAnalogously, test validity of model B by testing H0: δA = 0 qPossible results: A or B is valid, both models are valid, none is valid nOther procedures: J-test, PE-test n Nov 18, 2011 Hackl, Econometrics, Lecture 3 32 Individual Wages nWhich of the models is adequate? n log(wagei) = 0.119 + 0.260 malei + 0.115 schooli (A) n adj R2 = 0.121, BIC = 5824.90, n log(wagei) = 0.119 + 0.064 agei (B) n adj R2 = 0.069, BIC = 6004.60 nThe artificially nested model is n -0.472 + 0.243 malei + 0.088 schooli + 0.035 agei nTest of model validity qmodel A: t-test for age, p-value 5.79E-15; model A is not adequate qmodel B: F-test for male and school: model B is not adequate Nov 18, 2011 Hackl, Econometrics, Lecture 3 33 Comparison of Non-nested Models: J-Test n nNon-nested models: A: yi = xi’β + εi, B: yi = zi’γ + vi with components of zi that are not in xi nCombined model n yi = (1 - δ) xi’β + δ zi’γ + ui n δ indicates model adequacy nTransformed model n yi = xi’β* + δzi’c + ui = xi’β* + δŷiB + u*i n with OLS estimate c for γ and predicted values ŷiB obtained from fitting model B; β* = (1-δ)β nJ-test for validity of model A by testing H0: δ = 0 nLess computational effort than the encompassing F-test q n Nov 18, 2011 Hackl, Econometrics, Lecture 3 34 Individual Wages nWhich of the models is adequate? n log(wagei) = 0.119 + 0.260 malei + 0.115 schooli (A) n adj R2 = 0.121, BIC = 5824.90, n log(wagei) = 0.119 + 0.064 agei (B) n adj R2 = 0.069, BIC = 6004.60 nTest of model validity by means of the J-test nExtend the model B to n log(wagei) = -0.587 + 0.034 agei + 0.826 ŷiA n with values ŷiA predicted for log(wagei) from model A nTest of model validity: t-test for coefficient of ŷiA, t = 15.96, p-value 2.65E-55 nModel B is not a valid model Nov 18, 2011 Hackl, Econometrics, Lecture 3 35 Linear vs. Loglinear Model nChoice between linear and loglinear functional form n yi = xi’β + εi (A) q log yi = (log xi)’β + vi (B) nOn the basis of economic interpretation: are effects additive or multiplicative? nLog-transformation stabilizes variance, particularly if the dependent variable has a skewed distribution (wages, income, production, firm size, sales,…) nLoglinear models are easily interpretable in terms of elasticities Nov 18, 2011 Hackl, Econometrics, Lecture 3 36 Linear vs. Loglinear Model: The PE-Test nChoice between linear and loglinear functional form nEstimate both models n yi = xi’β + εi (A) q log yi = (log xi)’β + vi (B) n calculate the fitted values ŷ (from model A) and log ӱ (from B) nTest δLIN = 0 in n yi = xi’β + δLIN (log ŷi – log ӱi) + ui n not rejecting δLIN = 0 favors the model A nTest δLOG = 0 in n log yi = (log xi)’β + δLOG (ŷi – exp{log ӱi}) + ui n not rejecting δLOG = 0 favors the model B nBoth null hypotheses are rejected: find a more adequate model n Nov 18, 2011 Hackl, Econometrics, Lecture 3 37 Individual Wages nTest of validity of models by means of the PE-test nThe fitted models are (with l_x for log(x)) n wagei = -2.046 + 1.406 malei + 0.608 schooli (A) n l_wagei = 0.119 + 0.260 malei + 0.115 l_schooli (B) nx_f: predicted value of x: d_log = log(wage_f) – l_wage_f, d_lin = wage_f – exp(l_wage_f) nTest of model validity, model A: q wagei = -1.708 + 1.379 malei + 0.637 schooli – 4.731 d_logi qwith p-value 0.013 for d_log; validity in doubt nTest of model validity, model B: n l_wagei = -1.132 + 0.240 malei + 1.008 l_schooli + 0.171 d_lini n with p-value 0.076 for d_lin; model B to be preferred Nov 18, 2011 Hackl, Econometrics, Lecture 3 38 The PE-Test nChoice between linear and loglinear functional form nThe auxiliary regressions are estimated for testing purposes nIf the linear model is not rejected: accept the linear model nIf the loglinear model is not rejected: accept the loglinear model nIf both are rejected, neither model is appropriate, a more general model should be considered nIn case of the Individual Wages example: qLinear model: t-statistic is – 4.731, p-value 0.013: the model is rejected qLoglinear model: t-statistic is 0.171, p-value 0.076 : the model is not rejected n n Nov 18, 2011 Hackl, Econometrics, Lecture 3 39 Contents nThe Linear Model: Interpretation nSelection of Regressors nSpecification of the Functional Form n Nov 18, 2011 Hackl, Econometrics, Lecture 3 40 Non-linear Functional Forms nModel specification n yi = g(xi, β) + εi n instead of yi = xi’β + εi: violation of linearity nNon-linearity in regressors (but linear in parameters) qPowers of regressors qInteractions of regressors n OLS technique still works; t-test, F-test for specification check nNon-linearity in regression coefficients, e.g., qg(xi, β) = β1 xi1β2 xi2β3 q logarithmic transformation: log g(xi, β) = log β1 + β2log xi1+ β3log xi2 qg(xi, β) = β1 + β2 xiβ3 q non-linear least squares estimation, numerical procedures n Various test procedures, e.g., RESET test, Chow test Nov 18, 2011 Hackl, Econometrics, Lecture 3 41 Individual Wages: Effect of Gender nEffect of gender may be depending of education level nSeparate models for males and females nInteraction terms between dummies for education level and male nExample: Belgian Household Panel, 1994 (N=1472) nFive education levels nModel with education dummies nModel with interaction terms between education dummies and gender dummy nF-statistic for interaction terms: n F(5, 1460) = {(0.4032-0.3976)/5}/{(1-0.4032)/(1472-12)} = 2.74 n with a p-value of 0.018 Nov 18, 2011 Hackl, Econometrics, Lecture 3 42 Wages: Model with Education Dummies nModel with education dummies: Verbeek, Table 3.11 Nov 18, 2011 Hackl, Econometrics, Lecture 3 43 Wages: Model with Gender Interactions nWage equation with interactions educ*male Nov 18, 2011 Hackl, Econometrics, Lecture 3 44 RESET Test nTest of the linear model E{yi |xi}= xi’β against misspecification of the functional form: nNull hypothesis: linear model is correct functional form nTest of H0: RESET test (Regression Equation Specification Error Test) nTest idea: non-linear functions of ŷi, the fitted values from the linear model, e.g., ŷi², ŷi³, ... , do not improve model fit unter H0 nTest procedure: linear model extended by adding ŷi², ŷi³, ... nF-test to decide whether powers of fitted values like ŷi², ŷi³, ... contribute as additional regressors to explaining Y nPower Q of fitted values: typical choice is Q = 2 or Q = 3 Nov 18, 2011 Hackl, Econometrics, Lecture 3 45 Individual Wages: RESET Test nThe fitted models are (with l_x for log(x)) n wagei = -2.046 + 1.406 malei + 0.608 schooli (A) n l_wagei = 0.119 + 0.260 malei + 0.115 l_schooli (B) nTest of specification of the functional form with Q = 3 nModel A: Test statistic: F(2, 3288) = 10.23, p-value = 3.723e-005 nModel B: Test statistic: F(2, 3288) = 4.52, p-value = 0.011 nFor both models the adequacy of the functional form is in doubt n Nov 18, 2011 Hackl, Econometrics, Lecture 3 46 Structural Break: Chow Test nIn time-series context, coefficients of a model may change due to a major policy change, e.g., the oil price shock nModeling a process with structural break n E{yi |xi}= xi’β + gixi’ γ n with dummy variable gi=0 before the break, gi=1 after the break nRegressors xi, coefficients β before, β+γ after the break nNull hypothesis: no structural break, γ=0 nTest procedure: fitting the extended model, F- (or t-) test of γ=0 n n n with Sr (Su): sum of squared residuals of the (un)restricted model nChow test for structural break or structural change Nov 18, 2011 Hackl, Econometrics, Lecture 3 47 Chow Test: The Practice nTest procedure is performed in the following steps nFit the restricted model: Sr nFit the extended model: Su nCalculate F and the p-value from the F-distribution with K and N-2K d.f. nNeeds knowledge of break point Nov 18, 2011 Hackl, Econometrics, Lecture 3 48 Your Homework 1.Show that the OLS estimator for β from yi = xi‘β + zi’d + εi can be written as (a) b = (X’X)-1X’(y-Zd) with estimator d for d, or as (b) b = (X’MzX)-1X’Mzy with residual generating matrix Mz=I-Z(Z’Z)-1Z’. 2.Use the data set “wages” of Verbeek for the following analyses: a.Estimate the model where the log hourly wages are explained by male, age and school with age = school + exper + 6; interpret the results. b.Repeat the analysis after adding four dummy variables for the educational levels 2 through 4 instead of the variable school; compare the model by using (a) the non-nested F-test and (b) the J-test; interpret the results. c.Use the PE-test to decide whether the model in b. (where log hourly wages are explained) or the same model but with levels of hourly wages as explained variable is to be preferred; interpret the result. d.Repeat a. with the interaction age*educ as added regressor; interpret the result. 3. Nov 18, 2011 Hackl, Econometrics, Lecture 3 49