Exercise 6 Problem 1 The file stockton96.gdt contains 940 observations on home sales in Stockton, CA in 1996. a) Use least squares to estimate a linear equation that relates house price PRICE to the size of the house in square feet SQFT and the age of the house in years AGE. Interpret all the estimates. ols price const age sqft b) Suppose that you own two houses. One has 1400 square feet; the other has 1800 square feet. Both are 20 years old. What price do you estimate you will get for each house? c) Test the hypothesis that the size and the age of the house are important determinants of its price (separately as well as jointly). Both have three stars. Also jointly significant according to above output d) Using the Breusch-Pagan test for heteroscedasticity, test whether the model satisfies the homoscedasticity assumption by using the command for the BP test in Gretl. You could certainly use software to do the test for you, which will be modtest --breusch-pagan according to the test, LM test statistic is very large 148 as well as the P-value is extremely small, therefore, you are rejecting the H[0] hypothesis that there is no heteroskedasticity: Graphical user interface, application, Word Description automatically generated You could do the test also with more manual way and it is important to be able to do so, because BP test in the software tests heteroskedasticity for all the variables at the same time in your regression. If you are asked to test for heteroskedasticity by just one variable for example in your multivariate regression, then standard BP test will not do it (at least I could not find appropriate command in Gretl, Stata has it). For that matter we need to do several steps: Step 1. Run original regression ols price const age sqft Step 2. Generate residuals and its squares series resid=$uhat genr sq_resid=resid^2 Step 3. Run regression of squared residuals on the explanatory variable(s) of interest ols sq_resid sqft age const Graphical user interface, text, application Description automatically generated Step 4. Derive LM test statistic by taking R^2 from the regression in step 3 and multiplying it by the number of observations In this case, LM=0.0337*940=31.68 Step 5. Find critical value in the distribution table which at 1% significance level will be 9.21 and we can again reject the H[0] e) Use the White test to test for heteroskedasticity. You could certainly use software to do the test for you, which will be modtest –white Don’t forget to re-run the original regression before doing the test Graphical user interface, application Description automatically generated with medium confidence according to the test, LM test statistic is very large 35.25 as well as the P-value is extremely small, therefore, you are rejecting the H[0] hypothesis that there is no heteroskedasticity. Manual version: Step 1. Run original regression ols price const age sqft Step 2. Generate residuals and its squares series resid=$uhat genr sq_resid=resid^2 Step 3. Generate squares and interaction terms of the explanatory variables genr sq_sqft=sqft^2 genr sq_age=age^2 genr agesqft=sqft*age Step 4. Run regression of squared residuals on the explanatory variable(s), their squared terms and the interaction terms Graphical user interface, text, application Description automatically generated Step 5. Derive LM test statistic by taking R^2 from the regression in step 4 and multiplying it by the number of observations In this case, LM=0.03749*940=35.24 (just like in the software version, yey ) Step 6. Find critical value in the distribution table which at 1% significance level will be 15.09 and we can again reject the H[0] f) What do you conclude regarding the heteroskedasticity? Does your conclusion depend on the choosing a specific test? Discuss also drawbacks of the BP and White tests. There is heteroskedasticity A weakness of the BP test is that it assumes the heteroskedasticity is a linear function of the independent variables. Failing to find evidence of heteroskedasticity with the BP doesn't rule out a nonlinear relationship between the independent variable(s) and the error variance. The weakness of white test is that if you have many variables, the number of possible interactions plus the squared variables plus the original variables can be quite high. g) Test the hypothesis that the size and the age of the house are important determinants of its price (separately as well as jointly). Hint: choose appropriate standard errors. Does your conclusion differ from part (c)? ols price const age sqft –robust compare the robust and non-robust standard errors and parameters. You can see that the parameters did not change, while standard errors increased. Still, conclusions have not changed, based on the F-statistic Problem 2 Using the data in cps4_small.gdt estimate the following wage equation with least squares and heteroskedasticity-robust standard errors: (a) Report the results. genr exper2=exper^2 genr experedu=exper*educ genr lnwage=ln(wage) ols lnwage educ exper exper2 experedu const --robust Graphical user interface, text, application, Word Description automatically generated (b) Add MARRIED to the equation and re-estimate. Holding education and experience constant, do married workers get higher wages? Using a 5% significance level, test a null hypothesis that wages of married workers are less than or equal to those of unmarried workers against the alternative that wages of married workers are higher. Graphical user interface, text, application, Word Description automatically generated The null and alternative hypotheses for testing whether married workers get higher wages are given by The test value is: 1.188, the critical value at the 5% level of significance is 1.646. Since the test value is less than the critical value, we do not reject the null hypothesis at the 5% level. We conclude that there is insufficient evidence to show that wages of married workers are greater than those of unmarried workers. (c) Plot the residuals from part (a) against the two values of MARRIED. Is there evidence of heteroskedasticity? series uhat=$uhat genr sq_uhat=uhat^2 gnuplot uhat married Graphical user interface, application, Word Description automatically generated The residual plot suggests the variance of wages for married workers is greater than that for unmarried workers. Thus, there is the evidence of heteroskedasticity. It probably makes better sense to plot squared residuals against the married variable because in reality, variance is a squared term. However, above figure still shows the change in the dispersion of the data-cloud given the explanatory variable. As we can see, the slope of the fitted line is not horizontal, meaning that there is a heteroskedasticity issue gnuplot sq_uhat married Graphical user interface, application, Word Description automatically generated (d) Plot the least squares residuals against EDUC and against EXPER. What do they suggest? Graphical user interface, application Description automatically generated Graphical user interface, application Description automatically generated Both residual plots exhibit a pattern in which the absolute magnitudes of the residuals tend to increase as the values of EDUC and EXPER increase, although for EXPER the increase is not very pronounced. Thus, the plots suggest there is heteroskedasticity with the variance dependent on EDUC and possibly EXPER. Again, we should better plot squared residuals against the explanatory variables (e) Test for heteroskedasticity using a Breusch-Pagan test where the variance depends on EDUC, EXPER and MARRIED. What do you conclude at a 5% significance level? Since this question asks to use all the variables from the original regression (and not the subset of it (well interaction terms and squares still involve these variables, although they are independent variables derived from the original variables, but it is up to you how you understand the question) , we can just use the software to calculate automatically modtest --breusch-pagan Graphical user interface, application, Word Description automatically generated The null and alternative hypotheses are With H[1] implying the error variance depends on one or more of EXPER, EDUC or MARRIED. The value of the test statistic is 26.1, with P value 0.000085, therefore, we reject the null hypothesis and conclude that heteroskedasticity exists. Feel free to use the manual method by yourself as well as try the white test (manually it will be hard to put all the squares and interactions…)