Exercise 6


Problem 1

The file stockton96.gdt contains 940 observations on home sales in Stockton, CA in 1996.


a)     Use least squares to estimate a linear equation that relates house price PRICE to the size
of the house in square feet SQFT and the age of the house in years AGE. Interpret all the
estimates.

ols price const age sqft

b)     Suppose that you own two houses. One has 1400 square feet; the other has 1800 square feet.
Both are 20 years old. What price do you estimate you will get for each house?


c)     Test the hypothesis that the size and the age of the house are important determinants of its
price (separately as well as jointly).  Both have three stars. Also jointly significant according
to above output

d)     Using the Breusch-Pagan test for heteroscedasticity, test whether the model satisfies the
homoscedasticity assumption by using the command for the BP test in Gretl.

You could certainly use software to do the test for you, which will be

modtest --breusch-pagan

according to the test, LM test statistic is very large 148 as well as the P-value is extremely
small, therefore, you are rejecting the H[0] hypothesis that there is no heteroskedasticity:
Graphical user interface, application, Word Description automatically generated

You could do the test also with more manual way and it is important to be able to do so, because BP
test in the software tests heteroskedasticity for all the variables at the same time in your
regression. If you are asked to test for heteroskedasticity by just one variable for example in
your multivariate regression, then standard BP test will not do it (at least I could not find
appropriate command in Gretl, Stata has it). For that matter we need to do several steps:

Step 1. Run original regression

ols price const age sqft

Step 2. Generate residuals and its squares

series resid=$uhat

genr sq_resid=resid^2

Step 3. Run regression of squared residuals on the explanatory variable(s) of interest

ols sq_resid sqft age const


Graphical user interface, text, application Description automatically generated


Step 4. Derive LM test statistic by taking R^2 from the regression in step 3 and multiplying it by
the number of observations

In this case, LM=0.0337*940=31.68

Step 5. Find critical value in the  distribution table which at 1% significance level will be 9.21
and we can again reject the H[0]

e)     Use the White test to test for heteroskedasticity.


You could certainly use software to do the test for you, which will be

modtest –white

Don’t forget to re-run the original regression before doing the test

Graphical user interface, application Description automatically generated with medium confidence

according to the test, LM test statistic is very large 35.25 as well as the P-value is extremely
small, therefore, you are rejecting the H[0] hypothesis that there is no heteroskedasticity.


Manual version:


Step 1. Run original regression

ols price const age sqft

Step 2. Generate residuals and its squares

series resid=$uhat

genr sq_resid=resid^2

Step 3. Generate squares and interaction terms of the explanatory variables

genr sq_sqft=sqft^2

genr sq_age=age^2

genr agesqft=sqft*age

Step 4. Run regression of squared residuals on the explanatory variable(s), their squared terms and
the interaction terms


Graphical user interface, text, application Description automatically generated

Step 5. Derive LM test statistic by taking R^2 from the regression in step 4 and multiplying it by
the number of observations

In this case, LM=0.03749*940=35.24  (just like in the software version, yey )

Step 6. Find critical value in the  distribution table which at 1% significance level will be 15.09
and we can again reject the H[0]


f)       What do you conclude regarding the heteroskedasticity? Does your conclusion depend on the
choosing a specific test? Discuss also drawbacks of the BP and White tests.

There is heteroskedasticity

A weakness of the BP test is that it assumes the heteroskedasticity is a linear function of the
independent variables. Failing to find evidence of heteroskedasticity with the BP doesn't rule out
a nonlinear relationship between the independent variable(s) and the error variance.

The weakness of white test is that if you have many variables, the number of possible interactions
plus the squared variables plus the original variables can be quite high.

g)     Test the hypothesis that the size and the age of the house are important determinants of its
price (separately as well as jointly). Hint: choose appropriate standard errors. Does your
conclusion differ from part (c)?

ols price const age sqft –robust

compare the robust and non-robust standard errors and parameters. You can see that the parameters
did not change, while standard errors increased. Still, conclusions have not changed, based on the
F-statistic

Problem 2


Using the data in cps4_small.gdt estimate the following wage equation with least squares and
heteroskedasticity-robust standard errors:


(a)    Report the results.

genr exper2=exper^2

genr experedu=exper*educ

genr lnwage=ln(wage)

ols lnwage educ exper exper2 experedu const --robust

Graphical user interface, text, application, Word Description automatically generated

(b)   Add MARRIED to the equation and re-estimate. Holding education and experience constant, do
married workers get higher wages? Using a 5% significance level, test a null hypothesis that wages
of married workers are less than or equal to those of unmarried workers against the alternative
that wages of married workers are higher. Graphical user interface, text, application, Word
Description automatically generated

The null and alternative hypotheses for testing whether married workers get higher wages are given
by


The test value is: 1.188, the critical value at the 5% level of significance is 1.646. Since the
test value is less than the critical value, we do not reject the null hypothesis at the 5% level.
We conclude that there is insufficient evidence to show that wages of married workers are greater
than those of unmarried workers.

(c)    Plot the residuals from part (a) against the two values of MARRIED. Is there evidence of
heteroskedasticity?


series uhat=$uhat

genr sq_uhat=uhat^2

gnuplot uhat married

Graphical user interface, application, Word Description automatically generated

The residual plot suggests the variance of wages for married workers is greater than that for
unmarried workers. Thus, there is the evidence of heteroskedasticity.

It probably makes better sense to plot squared residuals against the married variable because in
reality, variance is a squared term. However, above figure still shows the change in the dispersion
of the data-cloud given the explanatory variable. As we can see, the slope of the fitted line is
not horizontal, meaning that there is a heteroskedasticity issue


gnuplot sq_uhat married


Graphical user interface, application, Word Description automatically generated


(d) Plot the least squares residuals against EDUC and against EXPER. What do they suggest?

Graphical user interface, application Description automatically generated

Graphical user interface, application Description automatically generated


Both residual plots exhibit a pattern in which the absolute magnitudes of the residuals tend to
increase as the values of EDUC and EXPER increase, although for EXPER the increase is not very
pronounced. Thus, the plots suggest there is heteroskedasticity with the variance dependent on EDUC
and possibly EXPER. Again, we should better plot squared residuals against the explanatory
variables


(e) Test for heteroskedasticity using a Breusch-Pagan test where the variance depends on EDUC,
EXPER and MARRIED. What do you conclude at a 5% significance level?

Since this question asks to use all the variables from the original regression (and not the subset
of it (well interaction terms and squares still involve these variables, although they are
independent variables derived from the original variables, but it is up to you how you understand
the question) , we can just use the software to calculate automatically

modtest --breusch-pagan


Graphical user interface, application, Word Description automatically generated

The null and alternative hypotheses are


With H[1] implying the error variance depends on one or more of EXPER, EDUC or MARRIED. The value
of the test statistic is 26.1, with P value 0.000085, therefore, we reject the null hypothesis and
conclude that heteroskedasticity exists.


Feel free to use the manual method by yourself as well as try the white test (manually it will be
hard to put all the squares and interactions…)