Introductory Econometrics Multicollinearity and Heteroskedasticity Suggested Solution by Hieu Nguyen Fall 2024 1. We estimate a linear regression model for the years 1972 to 1991: yt = β0 + β1xt1 + β2xt2 + ϵt, where ϵt are normally and independently distributed, but we suspect that the variance of the error term is heteroskedastic and depends on xt1. We estimate the following regression where et are residuals from regression (1): e2 t = δ0 + δ1xt1 + ut. We find that R2 for regression (2) is 0.201. Use these results to test for the presence of heteroskedasticity. Extract from statistical table of χ2 distribution (area under right-hand tail): d.f. 0.05 0.025 0.01 1 3.841 5.324 6.635 2 5.991 7.378 9.210 3 7.815 9.348 11.345 4 9.488 11.143 13.277 2. Use data htv selected.gdt to estimate the returns to education in the ‘wage equation.’ (a) Estimate the baseline model of the impact of education and experience on wages: ln(wagei) = β0 + β1educi + β2experi + ϵi. Interpret the estimated coefficient ˆβ1. (b) Re-estimate the model using robust standard errors, comment on the differences. (c) Test for heteroskedasticity in the model in part (a). Is it necessary to use robust standard errors in this case? (d) Perform RESET (specification test) and discuss the results. (e) Generate variable exper2 . Why we include this variable in the model and what is the expected sign of its coefficient? (f) Estimate the model with quadratic specification (polynomial functional form) of experience: ln(wagei) = β0 + β1educi + β2experi + β3exper2 i + ui. Comment on how and why the estimated coefficient ˆβ2 changed with respect to part (a). Did the estimated coefficient ˆβ1 change as well? Why or why not? Compare R2 and R2 adj with the previous specification. Perform RESET again. 1 (g) Find ∂ ln(wage) ∂exper , which describes the marginal effect of a 1 year increase in work experience on wage. Compare the result with the marginal effect from the estimated model without exper2 . (h) Do you believe that the coefficient β1 is correctly estimated? Is there any issue that could create a bias in this equation? If yes, how would you solve for this problem? What is the expected sign of this bias? (i) In the dataset, there are two proxies for inherent abilities and skills of the observed individuals, abil1 and abil2. Estimate the model with just one of those. Is there an impact on the coefficient ˆβ1? Does this signalize there likely was a problem with bias in the model from part (f)? Estimate the model with both proxies and discuss the differences and potential multicollinearity. Which Classical Assumption might be violated in this case? How do we check for this assumption? (j) Include in the model from part (f) the education of the mother and of the father of the observed individuals: ln(wagei) = β0 + β1educi + β2experi + β3exper2 i + β4motheduci + β5fatheduci + vi. i. What is the idea beyond including these variables in the model? ii. Is there an impact on the estimated coefficient ˆβ1? Does this signalize there likely was a problem with bias in the model from part (f)? Comment on the sign of this bias. iii. Are both motheduc and fatheduc individually significant? Are they jointly significant? Check potential multicollinearity. iv. What happens if you exclude one these variables from the regression? Which one would you keep? (k) Compare the final models from parts (i) and (j). Which is a better model (based on the dataset in hand)? Try RESET again to potentially support your answer. 2