Introductory Econometrics
Multiple Hypothesis Testing
Suggested Solution
by Hieu Nguyen
Fall 2024
1.
File wage.csv contains a cross-sectional dataset on 526 working individuals for the year 1976 in the US.
Using this labor market data, estimate a simple model describing the impact of years of education and
work experience on hourly wage in USD per hour:
wage = β0 + β1educ + β2exper + ϵ.
(a) Import data into Gretl from the .csv file. Carry out a basic inspection of data (display values,
visually, descriptive statistics).
(b) Comment on the expected signs of coefficients β1 and β2 first and then estimate the model.
(c) Evaluate the statistical significance of β1 and β2 based on the Gretl output.
(d) How much of the variation in wage for these 526 individuals is explained by educ and exper?
Explain.
(e) Estimate also the model without exper, compare R2
and R2
adj. Which is a better model? Why?
(f) Test formally the following hypotheses at the 5% significance level:
(i) Education has a significant impact on wages.
(ii) Workforce experience has a significantly positive impact on wages.
(iii) The regression is overall significant.
(g) Set up a 90% confidence interval for β2 (and a 99% confidence interval for β1).
(h) How would the estimated coefficients, standard errors, and t-statistics have differed if we transformed
the wage variable into monthly income and exper into decades? Explain.
Solution:
(a) To open the data (in Mac): File—Open data—User file—select *.csv as a type of files—find
wage.csv in your directory and select it—No—Close. Path in the Gretl menu to open the data using
lab computers (Window): File—Open data—Import—text/CSV—comma (,)—find wage.csv in
your directory and select it—No—close the Gretl info window. Or you can drag and drop data
directly into the Gretl window. To conduct a basic data inspection, use the right mouse click on a
specific variable or the View option from the Gretl menu.
(b) Before estimation, we state our expectations about signs of coefficients (intuition behind the ‘wage
equation’). Then follow the path in the Gretl menu: Model—Ordinary Least Squares—select wage
as the dependent var—select independent variables—OK:
Coefficient Std. Error t-ratio p-value
const -3.39054 0.766566 -4.4230 0.0000
educ 0.644272 0.0538061 11.9740 0.0000
exper 0.0700954 0.0109776 6.3853 0.0000
1
(c) Both estimated regression coefficients have expected signs. Moreover, using the Gretl-default twosided
t-test with H0 : βi = 0 vs HA : βi ̸= 0 at the 5% significance level (critical value t523,0.975 =
1.96 or you can use the rule of thumb with 3) and t-statistics (t-ratios) from the Gretl output, we
strongly reject H0 for both coefficients that are thus statistically significant. Considering p-values,
both coefficients would have been statistically significant even at the 1% significance level because
you can see the *** in the Gretl output.
(d) R2
= 0.225, i.e., 22.5% variation in wage is explained by the variation in educ and exper, leaving
77.5% variation in wage can be explained by other variables, not included in the model.
(e) • Model with exper: R2
= 0.225, R2
adj = 0.222.
• Model without exper: R2
= 0.165, R2
adj = 0.163.
The model with exper is better and will be used in further analysis because:
(a) Both RHS variables follow a good theoretical economic motivation to be included.
(b) Both estimated regression coefficients have expected signs and are statistically significant at
usual significant levels (individually based on t-tests as well as jointly based on the F-test).
(c) It explains more variation in the dependent variable based on R2
adj (as well as based on R2
,
in fact).
(f) (i) This is an example of a two-sided t-test (because the focus is only on significance, not the
direction of the impact):
H0 : β1 = 0 vs HA : β1 ̸= 0 =⇒ tβ1 =
bβ1
s.e.(bβ1
)
∼ tn−k−1.
We simply compute from the regression output: tβ1 = 0.644
0.054 ≈ 11.9. The critical value for a
two-sided t-test is tn−k−1,0.975 = t523,0.975 = 1.96. We reject H0 if |t| > 1.96, otherwise we do
not reject H0. Hence we reject H0 for the coefficient β1, which is thus statistically significant
at the given significance level.
(ii) This is an example of a one-sided t-test (because the focus is also on the direction of the
impact):
H0 : β2 ≤ 0 vs HA : β2 > 0 =⇒ tβ1 =
bβ1
s.e.(bβ1 )
∼ tn−k−1.
We simply compute from the regression output: tβ2 = 0.070
0.011 ≈ 6.4. The critical value for a
one-sided t-test is tn−k−1,0.95 = t523,0.95 = 1.645. We reject H0 if |t| > 1.645, and t also has
the sign implied by HA, otherwise we do not reject H0. Hence we reject H0 for the coefficient
β2, which is thus statistically significantly positive at the given significance level.
(iii) Here we test the overall significance of the regression, i.e., we test for this complete set of two
joint hypotheses using an F-test:
H0 :
β1 = 0
β2 = 0
vs HA :
β1 ̸= 0
or β2 ̸= 0
We need to estimate also the restricted model:
wage = β0 + ϵ
and compute from the regression outputs using the F-statistic formula from lecture #5 slides:
F =
(RSSR − RSSU )/J
RSSU /(n − k − 1)
=
(7160.4 − 5548.2)/2
5548.2/523
=
806.1
10.6
≈ 76.05 ∼ FJ,n−k−1.
The critical value for an F-test is FJ,n−k−1,0.95 = F2,523,0.95 = 3. We reject H0 if F > 3,
otherwise we do not reject H0. Hence, we reject the joint H0 in favor of the HA at the given
significance level, and the regression is overall statistically significant.
2
(g) Since
bβ
s.e.(bβ)
∼ tn−k−1, we derive the 90% confidence interval for β2 as:
bβ2
± tn−k−1,1− α
2 =0.95 · s.e.(bβ2
) = 0.070 ± 1.645 · 0.011 = [0.052, 0.088]
Hence, β2 ∈ [0.052, 0.088] with 90% probability.
Similarly,
bβ1
± tn−k−1,0.995 · s.e.(bβ1
) = 0.644 ± 2.576 · 0.054 = [0.505, 0.783]
Hence, β1 ∈ [0.505, 0.783] with 99% probability.
(h) This is, in fact, just a linear transformation (multiplication/scaling) of data by a constant; see
seminar #4, exercise 2. Assuming 20 workdays per month and 8 work hours per day, the impact
of data transformation can be summarized as follows:
• bβ0
, after transformation = bβ0
· 20 · 8;
• bβ1
, after transformation = bβ1
· 20 · 8;
• bβ2
, after transformation = bβ2
· 20 · 8 · 10;
• Respective standard errors change accordingly;
• t-statistics not affected.
2.
Answer the following questions about data on the sales prices of houses in the UK. The variables in this
study are:
• PRICEi: sales price for house i;
• ASSESSi: assessed price of house i;
• LOTSIZEi: size of lot (in square feet) for house i;
• BDRMSi: number of bedrooms for house i;
• BATHi: number of bathrooms for house i;
• OCEANi: a variable equal to 1 if house i is located within 10 miles of the ocean, 0 otherwise;
• URBANi: a variable equal to 1 if house i is located in an area classified as urban, 0 otherwise;
• LAKEi: a variable equal to 1 if house i is located within 10 miles of a lake, 0 otherwise;
• INTERCEPT: intercept in the model.
Table 1 lists estimated coefficients with standard errors in parentheses below.
(a) Using the reported regressions, could you test whether the value of the house near water was
different from the value of the house away from water at the 5% significance level, controlling for
assessed value, lot size, and the number of bedrooms? If so, perform the test. If not, explain what
results you would need to do the test.
(b) Could you test whether bathrooms change the house value, controlling for assessed value, lot size,
and the number of bedrooms at the 5% significance level? If so, perform the test. If not, explain
what results you would need to do the test.
(c) Can you test whether the assessed value and number of bedrooms are jointly significant, controlling
for lot size? If yes, perform the test at the 5% significance level. If not, explain what you would
need to perform this test.
(d) Could you test whether all 7 of the listed variables (excluding the intercept) are jointly significant
at the 5% significance level? Be sure to state any assumptions you are making.
3
Table 1: Results of regressions
Dependent variable PRICEi, n = 238
(1) (2) (3) (4) (5) (6) (7)
ASSESSi 0.90 0.90 0.91 0.90 0.89 0.90 0.90
(0.03) (0.03) (0.03) (0.03) (0.03) (0.03) (0.03)
LOTSIZEi 0.0035 0.00059 0.00059 0.00057 0.00058 0.00059 0.00060
(0.00002) (0.00002) (0.00002) (0.00002) (0.00002) (0.00002) (0.00002)
BDRMSi 11.5 9.74 7.65 8.74 10.43
(2.32) (3.11) (3.29) (3.54) (3.77)
BATHi 3.57 3.78
(2.24) (1.11)
OCEANi 15.6 14.32 16.76 15.32 14.56
(11.43) (5.21) (4.32) (4.98) (7.01)
URBANi 9.54 10.29 12.32
(8.99) (5.43) (5.22)
LAKEi 11.36 12.87 11.98
(4.28) (8.32) (6.43)
INTERCEPT 261.9 -38.91 -40.30 -43.21 -36.54 -42.37 -38.44
(11.98) (6.78) (7.32) (6.99) (5.87) (7.22) (9.43)
RSS 145.69 142.99 136.66 134.54 135.38 135.22 136.54
R2
0.143 0.159 0.196 0.209 0.204 0.205 0.197
Solution:
(a) Here we test the joint significance of two coefficients because we test for this (incomplete) set of
two joint hypotheses using an F-test:
H0 :
βOCEAN = 0
βLAKE = 0
vs HA :
βOCEAN ̸= 0
or βLAKE ̸= 0
We have J = 2 (the number of restrictions), n = 238 (sample size), k = 5 (the number of
independent variables).
F =
(RSSR − RSSU )/J
RSSU /(n − k − 1)
∼ FJ,n−k−1.
Unfortunately, while we have the unrestricted model of (6), we don’t have the restricted model.
Therefore, we cannot find the value of F-test and hence, cannot make decision of reject or accept
the H0 and H1.
To perform the test, we need to have the regression output of restricted model to get SSRR to find
F-test, then find F-critical value and compare between F-test and F-critical value to make decision
of reject or accept the H0 and H1.
(b)
H0 : βBATH = 0 vs HA : βBATH ̸= 0 =⇒ tβBATH
=
bβBATH
s.e.(bβBATH
)
∼ tn−k−1.
This is a standard two-sided t-test; however, we cannot conduct it because we do not have the
model with only 4 mentioned explanatory variables (ASSESS, LOTSIZE, BDRMS, BATH).
(c) Again, we test the joint significance of two coefficients, i.e., we test for this (incomplete) set of two
joint hypotheses:
H0 :
βASSESS = 0
βBDRMS = 0
vs HA :
βASSESS ̸= 0
or βBDRMS ̸= 0
Unrestricted model: none, restricted model: none.
F =
(RSSR − RSSU )/J
RSSU /(n − k − 1)
=∼ FJ,n−k−1 = F2,234,0.95 = 3.
4
Unfortunately, we don’t have unrestricted model nor restricted model to get the values of SSEU
and SSER to perform the F-test. Hence, we cannot make decision of accept or reject null hypoth-
esis.
To perform the test, we need to have the regression output of both unrestricted and restricted
models to get SSRR, SSRU to find F-test, then find F-critical value and compare between F-test
and F-critical value to make decision of reject or accept the H0 and H1.
(d) This is another example of testing the overall significance of the regression because we consider the
complete set of all 7 variables. Although we do not have the restricted model PRICEi = β0 + ϵi in
Table 1, we utilize the fact that if we regress on a constant only, R2
= 0. Unrestricted model: (4),
Restricted model will be ˆprice = β0 + u and hypotheses are
H0 : all β = 0 and HA : at least one β ̸= 0.
F =
R2
/J
(1 − R2)/(n − k − 1)
=
(0.209 − 0)/7
(1 − 0.209)/(238 − 8)
= 8.7 ∼ F7,230,0.95 = 2.01
Hence we can reject the joint H0 in favor of the HA at the given significance level, and we can
conclude that all 7 of the listed variables are jointly significant.
The assumption under which we can compute F-test statistics based on R2
s instead of RSSs is
that TSSU = TSSR, i.e., that Total Sums of Squares are the same for our unrestricted and
restricted model. Since TSS =
n
i=1(yi − ¯y)2
we can safely suppose in our case that the mentioned
assumption is fulfilled, as we use in both models the same dependent variable (PRICE), thus we
have the same observations yi and also the same ¯y.
5