1/36
Econometrics
F-Test
Omitted Variables
Nonlinear specifications and dummy variables
Anna Donina
Lecture 5
TESTING MULTIPLE HYPOTHESES REVISITED
2/49
• Suppose we have amodel
yi = β0 + β1xi1 + β2xi2 + β3xi3 + εi
• Suppose we want to test multiple linear hypotheses in
this model
• For example, we want to see if the following restrictions
on coefficients hold jointly:
β1 + β2 = 1 and β3 = 0
• Wecannot use a t-test in this case (t-test can be used only
for one hypothesis at a time)
• Wewill use anF-test
RESTRICTED VS. UNRESTRICTED MODEL
3/49
• Wecan reformulate the model by plugging the restrictions as if
they were true (model under H0)
• Wecall this model restricted model as opposed tothe unrestricted
model
• The unrestricted modelis
yi = β0 + β1xi1 + β2xi2 + β3xi3 + εi
• Restricted model can be derived to have the following form:
y∗
i = β0 + β1x∗
i + εi ,
where y∗
i = yi − xi2 and x∗
i = xi1 − xi2
IDEA OF THE F-TEST
4/49
• If the restrictions are true, then the restricted model fits the
data in the same way as the unrestricted model
▪ residuals are nearly the same
• If the restrictions are false, then the restricted model fits the
data poorly
▪ residuals from the restricted model are much larger than
those from the unrestricted model
• The idea is thus to compare the residuals from the two
models
IDEA OF THE F-TEST
5/49
How to compare residuals in the twomodels?
▪ Calculate the sum of squared residuals in the two models
▪ Test if the difference between the two sums is equal to zero
(statistically)
▪ H0: the difference is zero (residuals in the two models are
the same, restrictions hold)
▪ HA: the difference is positive (residuals in the restricted
model are bigger, restrictions do not hold)
Sum of squaredresiduals
F-TEST
6/49
The test statistic is defined as
F =
(SSRr − SSRur)/q
SSRur/(n − k − 1)
∼ Fq,n−k−1 ,
. . . sum of squared residuals from the restricted model
. . . sum of squared residuals from the unrestricted model
where:
SSRr
SSRur
q . . . number of restrictions
n . . . number of observations
k . . . number of estimated coefficients
GOODNESS OF FIT MEASURE
7/49
• Weknow that education and experience have a significant
influence on wages
• But how important are they in determiningwages?
• How much of difference in wages between people is
explained by differences in education and in experience?
• How well variation in the independent variable(s)
explains variation in the dependent variable?
• This are the questions answered by the goodness of fit
measure - R2
TOTAL AND EXPLAINED VARIATION
Total variation in the dependent variable:
Predicted value of the dependent variable = part that is
explained by independent variables:
(case of regression line - for simplicity of notation)
Explained variation in the dependent variable:
8/49
GOODNESS OF FIT - R2
Denote:
9/49
Define the measure of the goodness offit:
R2 =
SSE
=
Explained variation in y
SST Total variation in y
GOODNESS OF FIT - R2
10/49
In all models: 0 ≤ R2 ≤ 1
• R2 tells us what percentage of the total variation in the
dependent variable is explained by the variation in
the independent variable(s)
▪ R2 = 0.3 means that the independent variables can explain
30% of the variation in the dependent variable
• Higher R2 means better fit of the regression model (not
necessarily a better model!)
DECOMPOSING THE VARIANCE
For models with intercept, R2 can be rewritten using the
decomposition of variance.
Variance decomposition:
11/49
VARIANCE DECOMPOSITION AND R2
12/49
Variance decomposition: SST = SSE + SSR
Intuition: total variation can be divided between the
explained variation and the unexplained variation
▪ the true value y is a sum of estimated (explained) ư𝑦 and
the residual ei (unexplainedpart)
Wecan rewriteR2:
2
R = =
SSE SST −SSR
SST SST
= 1−
SSR
SST
ADJUSTED R2
13/49
• The sum of squared residuals (SSR) decreases when
additional explanatory variables are introduced in the
model, whereas total sum of squares (SST) remains the
same
▪ 𝑅2 = 1 − 𝑆𝑆𝑅
𝑆𝑆𝑇
increases if we add explanatory variables
▪ Models with more variables automatically have better fit.
• To deal with this problem, we define the adjusted R2:
R2
adj = 1−
SSR
n−k−1
SST
n−1
≤ R2
(k is the number of coefficients)
• This measure introduces a “punishment” for including more
explanatory variables
OMITTED VARIABLES
14/49
Weomit a variable whenwe
▪ forget to include it
▪ do not have data for it
This misspecification resultsin
▪ not having the coefficient for this variable
▪ biasing estimated coefficients of other variables in the
equation → omitted variable bias
OMITTED VARIABLES
15/49
• Where does the omitted variable bias come from?
• True model:
𝑦𝑖 = 𝛽𝑥𝑖 + 𝛾𝑧𝑖 + 𝑢𝑖
• Model as it looks when we omit variable z:
𝑦𝑖 = 𝛽𝑥𝑖 + ෤𝑢𝑖
implying
෤𝑢𝑖 = 𝛾𝑧𝑖 + 𝑢𝑖
• Weassume that Cov ෤𝑢𝑖, 𝑥𝑖 = 0, but:
Cov ෤𝑢𝑖, 𝑥𝑖 = 𝐶𝑜𝑣 𝛾𝑧𝑖 + 𝑢𝑖, 𝑥𝑖 = 𝛾𝐶𝑜𝑣 𝑧𝑖, 𝑥𝑖 ≠ 0
• The classical assumption is violated ⇒
biased (and inconsistent) estimate!!!
OMITTED VARIABLES
16/49
For the model with omitted variable:
▪ Coefficients β and γ are from the true model
𝑦𝑖 = 𝛽𝑥𝑖 + 𝛾𝑧𝑖 + 𝑢𝑖
▪ Coefficient 𝛼 is from a regression of z on x, i.e.
𝑧𝑖 = 𝛼𝑥𝑖 + 𝑒𝑖
The bias is zero if 𝛾 = 0 or 𝛼 = 0 (not likely to happen)
OMITTED VARIABLES
17/49
Intuitive explanation:
▪ if we leave out an important variable from the regression
(𝛾 ≠ 0), coefficients of other variables are biased unless the
omitted variable is uncorrelated with all included
dependent variables (𝛼 ≠ 0)
▪ the included variables pick up some of the effect of the
omitted variable (if they are correlated), and the
coefficients of included variables thus change causing the
bias
Example: what would happen if you estimated a
production function with capital only and omitted labor?
OMITTED VARIABLES
18/49
Example: estimating the price of chicken meat in theUS
Yt . . . per capita chicken consumption
PCt . . . price ofchicken
PBt . . . price ofbeef
YDt . . . per capita disposableincome
OMITTED VARIABLES
19/49
When we omit price of beef:
, n = 44R2 = 0.895
Compare to the true model:
R2 = 0.986 , n = 44
Weobserve positive bias in the coefficient of PC (was it expected?)
OMITTED VARIABLES
20/49
Determining the direction of bias: 𝑏𝑖𝑎𝑠 = 𝛾 ∗𝛼
▪ Where 𝛾 is a correlation between the omitted variable and
the dependent variable (the price of beef and chicken
consumption)
▪ 𝛾 is likely to be positive
▪ Where 𝛼 is a correlation between the omitted variable
and the included independent variable (the price of beef
and the price of chicken)
▪ 𝛼 is likely to be positive
Conclusion: Bias in the coefficient of the price of chicken is
likely to be positive if we omit the price of beef from the
equation.
OMITTED VARIABLES
21/49
• In reality, we usually do not have the true model to
compare with
▪ Because we do not know what the true model is
▪ Because we do not have data for some important variable
• We can often recognize the bias if we obtain some
unexpected results
• We can prevent omitting variables by relying on the
theory
• If we cannot prevent omitting variables, we can at least
determine in what way this biases our estimates
IRRELEVANT VARIABLES
22/49
A second type of specification error is including a variable
that does not belong to the model
This misspecification
▪ Does not cause bias
▪ But it increases the variance of the estimated coefficients
of the included variables
IRRELEVANT VARIABLES
23/49
• True model:
yi = βxi + ui (1)
(2)
• Model as it looks when we add irrelevant z:
𝑦𝑖 = 𝛽𝑥𝑖 + 𝛾𝑧𝑖 + ǁ𝑢𝑖
• Wecan represent the error term as ǁ𝑢𝑖 = 𝑢𝑖 − 𝛾𝑧𝑖
• But since from the true model 𝛾 = 0, we have ǁ𝑢𝑖 = 𝑢𝑖
and there is no bias
SUMMARY OF THE THEORY
Bias – efficiency trade-off:
Omitted variable Irrelevantvariable
Bias Yes* No
Variance Decreases * Increases*
* As long as we have correlation between x and z
24/49
FOUR IMPORTANT SPECIFICATION CRITERIA
25/49
Does a variable belong to the equation?
1. Theory: Is the variable’s place in the equation
unambiguous and theoretically sound? Does intuition tells
you it should be included?
2. t-test: Is the variable’s estimated coefficient significant in
the expected direction?
3. R2: Does the overall fit of the equation improve (enough)
when the variable is added to the equation?
4. Bias: Do other variables’ coefficients change significantly
when the variable is added to the equation?
FOUR IMPORTANT SPECIFICATION CRITERIA
26/49
• If all conditions hold, the variable belongs in the equation
• If none of them holds, the variable is irrelevant and can be
safely excluded
• If the criteria give contradictory answers, most importance
should be attributed to theoretical justification
▪ Therefore, if theory (intuition) says that variable belongs to
the equation, we include it (even though its coefficients
might be insignificant!).
NONLINEAR SPECIFICATION
27/49
Wewill discuss different specifications:
▪ nonlinear in dependent and independent variables
and their interpretation
Wewill define the notion of a dummy variable and we
will show its different uses in linear regression models
NONLINEAR SPECIFICATION
28/49
There is not always a linear relationship between dependent
variable and explanatory variables
▪ The use of OLS requires that the equation be linear in
coefficients
▪ However, there is a wide variety of functional forms that are
linear in coefficients while being nonlinear in variables!
We have to choose carefully the functional form of the
relationship between the dependent variable and each
explanatory variable
▪ The choice of a functional form should be based on the
underlying economic theory and/or intuition
▪ Do we expect a curve instead of a straight line? Does the effect
of a variable peak at some point and then start to decline?
LINEAR FORM
y = β0 + β1x1 + β2x2 + ε
• Assumes that the effect of the explanatory variable on the
dependent variable is constant:
𝑑 𝑦
𝑑 𝑥 𝑘
= 𝛽 𝑘 , k = 1,2
• Interpretation: if xk increases by 1 unit (in which xk is
measured), then y will change by 𝛽 𝑘 units (in which y is
measured)
• Linear form is used as default functional form until strong
evidence that it is inappropriate is found
29/49
LOG-LOG FORM
ln y = β0 + β1 ln x1 + β2 ln x2 + ε
• Assumes that the elasticity of the dependent variable
with respect to the explanatory variable is constant:
∂ ln y ∂y/y
∂ ln xk
=
∂xk/xk
= βk
30/49
k = 1,2
• Interpretation: if xk increases by 1 percent, then y will
change by βk percent
• Before using a double-log model, make sure that
there are no negative or zero observations in the
data set
EXAMPLE
31/49
• Estimating the production function of Indian sugar industry:
ln Q = 2.70 + 0
(0.14) (0.17)
.59 ln L + 0.33 lnK
Q . . . output L
. . . labor K
. . . capital employed
Interpretation: if we increase the amount of labor by 1%, the
production of sugar will increase by 0.59%, ceteris paribus.
Ceteris paribus is a Latin phrase meaning ’other things being
equal’.
LOG-LINEAR FORMS
32/49
Linear-log form:
y = β0 + β1 ln x1 + β2 ln x2 + ε
▪ Interpretation: if xk increases by 1 percent, then y will
change by (𝛽 𝑘/100) units (k = 1,2)
Log-linear form:
ln y = β0 + β1x1 + β2x2 + ε
▪ Interpretation: if xk increases by 1 unit, then y will change
by (𝛽 𝑘 ∗100) percent (k = 1,2)
EXAMPLES OF LOG LINEAR FORMS
33/49
Estimating demand for chicken meat:
Y . . . annual chicken consumption (kg.)
PC . . . price of chicken
PB . . . price of beef
YD . . . annual disposable income
Interpretation: An increase in the annual disposable income by
1% increases chicken consumption by 0.12 kg per year, ceteris
paribus.
EXAMPLES OF LOG LINEAR FORMS
34/49
Estimating the influence of education and experience on wages:
wage
educ
exper
. . . annual wage (USD)
. . . years of education
. . . years of experience
Interpretation: An increase in education by one year increases
annual wage by 9.8%, ceteris paribus. An increase in experience
by one year increases annual wage by 1%, ceteris paribus.
POLYNOMIAL FORM
1y = β0 + β1x1 + β2x2 + ε
• Todetermine the effect of x1 on y, we need to calculate the
derivative:
∂y
∂x1
= β1 + 2 ·β2·x1
• Clearly, the effect of x1 on y is not constant, but changes
with the level of x1
35/49
• Wemight also have higher order polynomials,e.g.:
y = β0 + β1x1 + β2x2 + β3x3 + β4x4 + ε
1 1 1
EXAMPLE OF POLYNOMIAL FORM
• The impact of the number of hours of studying on
the grade from Econometrics:
• Todetermine the effect of hours on grade, calculate
the derivative:
36/49
▪ Decreasing returns to hours of studying: more hours
implies higher grade, but the positive effect of additional
hour of studying decreases with more hours
CHOICE OF CORRECT FUNCTIONAL FORM
37/49
• The functional form has to be correctly specified in
order to avoid biased and inconsistent estimates
➢ Remember that one of the OLS assumptions is that
the model is correctly specified
• Ideally: the specification is given by underlying theory of
the equation
• In reality: underlying theory does not give precise
functional form
• In most cases, either linear form is adequate, or common
sense will point out an easy choice from among the
alternatives
CHOICE OF CORRECT FUNCTIONAL FORM
38/49
Nonlinearity of explanatory variables
▪ often approximated by polynomial form
▪ missing higher powers of a variable can be detected as
omitted variables
Nonlinearity of dependentvariable
▪ harder to detect based on statistical fit of the regression
R2 is incomparable across models where the y is
transformed
▪ dependent variables are often transformed to log-form in
order to make their distribution closer to the normal
distribution
DUMMY VARIABLES
39/49
Dummy variable - takes on the values of 0 or 1, depending
on a qualitative attribute
Examples of dummyvariables:
INTERCEPT DUMMY
40/49
• Dummy variable included in a regression alone (not
interacted with other variables) is an intercept dummy
• It changes the intercept for the subset of data defined by a
dummy variable condition:
yi = β0 + β1Di + β2xi + εi
where
Wehave
yi = (β0 + β1) + β2xi + εi if Di = 1
yi = β0 + β2xi + εi if Di = 0
INTERCEPT DUMMY
X
41/49
Y
β0+β1
β0
Di=1
Slope = β2
Di=0
Slope = β2
EXAMPLE
42/49
• Estimating the determinants of wages:
• Interpretation of the dummy variable M: men earn on
average $2.156 per hour more than women, ceteris
paribus
SLOPE DUMMY
43/49
• If a dummy variable is interacted with another variable
(x), it is a slope dummy.
• It changes the relationship between x and y for a subset of
data defined by a dummy variable condition:
We have
yi = β0 + (β1 + β2)xi + εi if Di = 1
yi = β0 + β1xi + εi if Di = 0
SLOPE DUMMY
X
44/49
Y
β0
Di=0
Slope = β1+β2
Di=1
Slope = β1
EXAMPLE
45/49
Estimating the determinants of wages:
Interpretation: men gain on average 17 cents per hour more
than women for each additional year of education, ceteris
paribus
SLOPE AND INTERCEPT DUMMIES
46/49
• Allow both for different slope and intercept for two subsets of
data distinguished by a qualitativecondition:
yi = β0 + β1Di + β2xi + β3(xi ·Di) + εi
where
iD = 1 if the i-th observation meets a particularcondition
0 otherwise
We have
yi = (β0 + β1) + (β2 + β3)xi + εi if Di = 1
yi = β0 + β2xi + εi if Di = 0
SLOPE AND INTERCEPT DUMMIES
X
47/49
Y
Di=0
Slope = β2+β3
Di=1
Slope = β2
β0+β1
β0
DUMMY VARIABLES - MULTIPLE CATEGORIES
48/49
• What if a variable defines three or more qualitative
attributes?
• Example: level of education - elementary school, high
school, and college
• Define and use a set of dummy variables:
• Should we include also a third dummy in the regression,
which is equal to 1 for people with elementaryeducation?
▪ No, unless we exclude the intercept!
▪ Using full set of dummies leads to perfect multicollinearity
(dummy variable trap)
SUMMARY
49/49
• WerevisitedF-testandtalkedaboutomittedvariables
• Wediscussed different nonlinear specifications of a
regression equation and their interpretation
• Wedefined the concept of a dummy variable and we
showed its use
❖ Furtherreadings:
Studenmund, Chapter 7
Wooldridge, Chapters 6 & 7