LECTURE 3
Introduction to Econometrics
INTRODUCTION TO LINEAR
REGRESSION ANALYSIS II
October 4, 2016
1 / 37
REVISION: THE PREVIOUS LECTURE
Desired properties of an estimator:
An estimator is unbiased if the mean of its distribution is
equal to the value of the parameter it is estimating
An estimator is consistent if it converges to the value of the
true parameter as the sample size increases
An estimator is efﬁcient if the variance of its sampling
distribution is the smallest possible
2 / 37
REVISION: THE PREVIOUS LECTURE
We explained the principle of OLS estimator: minimizing
the sum of squared differences between the observation
and the regression line yi = β0 + β1xi + εi
We found the formulae for the estimates:
β1 =
n
i=1
(xi − xn) yi − yn
n
i=1
(xi − xn)2
β0 = yn − β1xn
3 / 37
REVISION: THE PREVIOUS LECTURE
We explained that the stochastic error term must be
present in a regression equation because of:
1. omission of many minor inﬂuences (unavailable data)
2. measurement error
3. possibly incorrect functional form
4. stochastic character of unpredictable human behavior
Remember that all of these factors are included in the error
term and may alter its properties
The properties of the error term determine the properties
of the estimates
4 / 37
WARM-UP EXERCISE
You receive a unique dataset that includes wages of all
citizens of Brno as well as their experience (number of
years spent working). Obviously, you are very curious
about what is the effect of experience on wages.
You run an OLS regression of monthly wage in CZK on the
number of years of experience and obtain the following
results:
wagei = 14450 + 1135 · experi
1. Interpret the meaning of the coefﬁcient of experi.
2. Use the estimates to predict the average wage of a person
with 1, 5, 20, and 40 years of experience.
3. Do the predicted wages seem realistic? Explain your
answer.
5 / 37
ON TODAY’S LECTURE
We will derive estimation formulas for multivariate OLS
We will list the assumptions about the error term and the
explanatory variables that are required in classical
regression models
We will show that under these assumptions, OLS is the
best estimator available for regression models
The rest of the course will mostly deal in one way or
another with the question what to do when one of the
classical assumptions is not met
Readings:
Studenmund - chapter 4
Wooldridge - chapters 5, 8, 9, 12
6 / 37
ORDINARY LEAST SQUARES WITH SEVERAL
EXPLANATORY VARIABLES
Usually, there are more than one explanatory variables in
regression models
Multivariate model with k explanatory variables:
yi = β0 + β1xi1 + β2xi2 + . . . + βkxik + εi
For observations 1, 2, . . . , n, we have:
y1 = β0 + β1x11 + β2x12 + . . . + βkx1k + ε1
y2 = β0 + β1x21 + β2x22 + . . . + βkx2k + ε2
...
...
yn = β0 + β1xn1 + β2xn2 + . . . + βkxnk + εn
7 / 37
MATRIX NOTATION
We can write in matrix form:





y1
y2
...
yn





=





1 x11 x12 · · · x1n
1 x21 x22 · · · x2n
...
...
...
...
1 xn1 xn2 · · · xnk












β0
β1
β2
...
βk







+





ε1
ε2
...
εn





or in a simpliﬁed notation:
y = Xβ + ε
8 / 37
OLS - DERIVATION UNDER MATRIX NOTATION
We have to ﬁnd
β = argmin
β
(y − Xβ) (y − Xβ)
= argmin
β
y y − y Xβ − β X y + β X Xβ
FOC:
∂
∂β
: − y X − X y + X Xβ + (X X) β = 0
X Xβ = X y
This gives us
β = X X
−1
X y
9 / 37
MEANING OF REGRESSION COEFFICIENT
Consider the multivariate model
Q = β0 + β1P + β2Ps + β3Y + ε
estimated as Q = 31.50 − 0.73P + 0.11Ps + 0.23Y
Q . . . quantity demanded
P . . . commodity’s price
Ps . . . price of substitute
Y . . . disposable income
Meaning of β1 is the impact of a one unit increase in P on
the dependent variable Q, holding constant the other
included independent variables Ps and Y
When price increases by 1 unit (and price of substitute
good and income remain the same), quantity demanded
decreases by 0.73 units
10 / 37
EXERCISE
Remember the unique dataset that includes wages of all
citizens of Brno as well as their experience (number of
years spent working).
Because you realize that wages may not be linearly
dependent on experience, you add an additional variable
exper2
i into your model and you obtain the following
results:
wagei = 14450 + 1160 · experi − 25 · exper2
i
1. What is the overall impact of increasing the number of
years of experience by 1 year?
2. Use the estimates to determine the average wage of a
person with 1, 5, 20, and 40 years of experience.
3. Do the predicted wages seem realistic now? Explain your
answer.
11 / 37
THE CLASSICAL ASSUMPTIONS
1. The regression model is linear in the coefﬁcients, is
correctly speciﬁed, and has an additive error term
2. The error term has a zero population mean
3. Observations of the error term are uncorrelated with each
other
4. The error term has a constant variance
5. All explanatory variables are uncorrelated with the error
term
6. No explanatory variable is a perfect linear function of any
other explanatory variable(s)
7. The error term is normally distributed
12 / 37
GRAPHICAL REPRESENTATION
X
Y
13 / 37
1. LINEARITY IN COEFFICIENTS
The regression model is linear in the coefﬁcients, is correctly speciﬁed,
and has an additive error term.
Linearity in variables is not required
Example: production function Y = AKβ1 Lβ2 for which
we suppose A = expβ0+ε can be transformed so that
ln Y = β0 + β1 ln K + β2 ln L + ε
and the linearity in coefﬁcients is restored
Note that it is the linearity in coefﬁcients that allows us to
rewrite the general regression model in matrix form
14 / 37
EXERCISE
Which of the following models is/are linear?
y = β0 + β1x + ε
ln y = β0 + β1 ln x + β2
√
z + ε
y = xβ1 + ε
15 / 37
EXERCISE
Which of the following models is/are linear?
y = β0 + β1x + ε is a linear model
ln y = β0 + β1 ln x + β2
√
z + ε is a linear model
y = xβ1 + ε is NOT a linear model
Regression models are linear in parameters, but they do
not need to be linear in variables
16 / 37
2. ZERO MEAN OF THE ERROR TERM
The error term has a zero population mean.
Notation: E[εi] = 0 or E[ε] = 0
Idea: observations are distributed around the regression
line, the average of deviations is zero
In fact, the mean of εi is forced to be zero by the existence
of the intercept (β0) in the equation
Hence, this assumption is satisﬁed as long as there is an
intercept included in the equation
17 / 37
3. ERRORS UNCORRELATED WITH EACH OTHER
Observations of the error term are uncorrelated with each other.
If there is a systematic correlation between one observation
of the error term and another (serial correlation), it is more
difﬁcult for OLS to get precise estimates of the coefﬁcients
of the explanatory variables
Technically: the OLS estimate will be consistent, but not
efﬁcient
Often happens in time series data, where a random shock
in one time period affects the random shock in another
time period
We will solve this problem using Generalized Least
Squares estimator
18 / 37
GRAPHICAL REPRESENTATION
X
Y
Estimated model
True model
19 / 37
4. CONSTANT VARIANCE OF THE ERROR TERM
The error term has a constant variance.
This property is called homoskedasticity; if it is not satisﬁed,
we talk about heteroskedasticity
It states that each observation of the error is drawn from a
distribution with the same variance and thus varies in the
same manner around the regression line
If the error term is heteroskedastic, it is more difﬁcult for
OLS to get precise estimates of the coefﬁcients of the
explanatory variables
Technically: the OLS estimate will be consistent, but not
efﬁcient
20 / 37
4. CONSTANT VARIANCE OF THE ERROR TERM
Heteroskedasticity is often present in cross-sectional data
Example: Analysis of household consumption patterns
Variance of the consumption of certain goods might be
greater for higher-income households
These have more discretionary income than do
lower-income households
We will solve this problem using Hull-White robust
standard errors
21 / 37
GRAPHICAL REPRESENTATION
X
Y
True model
Estimated model
22 / 37
3. NO CORRELATION + 4. HOMOSKEDASTICITY
Notation:
no correlation: corr(εiεj) ⇒ E[εiεj] = 0 for each i, j
homoskedasticity: E[ε2
i ] = σ2
for each i
Matrix notation:
Var[ε] =







σ2 0 0 · · · 0
0 σ2 0 · · · 0
0 0 σ2 · · · 0
...
...
...
0 0 0 · · · σ2







= σ2
I
23 / 37
5. VARIABLES UNCORRELATED WITH THE ERROR
TERM
All explanatory variables are uncorrelated with the error term.
Notation: E[xiεi] = 0 or E[X ε] = 0
If an explanatory variable and the error term were
correlated with each other, the OLS estimates would be
likely to attribute to the x some of the variation in y that
actually came from the error term
Example: Analysis of household consumption patterns
Households with lower incomes may indicate higher
consumption (because of shame)
Negative correlation between X and error term
(measurement error higher for lower incomes)
Leads to biased and inconsistent estimates
We will solve this problem using IV approach
24 / 37
GRAPHICAL REPRESENTATION
X
Y
True model
Estimated model
25 / 37
6. LINEARLY INDEPENDENT VARIABLES
No explanatory variable is a perfect linear function of any other
explanatory variable(s).
If this condition does not hold, we talk about
(multi)collinearity
Multicollinearity can be perfect of imperfect
Perfect multicollinearity: one explanatory variable is an
exact linear function of one or more other explanatory
variables
In this case, the OLS model is incapable to distinguish one
variable from the other
Technical consequence: (X X)−1
does not exist
OLS estimation cannot be conducted
Example: we include dummy variables for men and
women together with the intercept
26 / 37
6. LINEARLY INDEPENDENT VARIABLES
Imperfect multicollinearity:
There is a linear relationship between the variables, but
there is some error in that relationship
Example: we include two variables that proxy for
individual health status
Consequences of multicollinearity:
Estimated coefﬁcients remain unbiased
But the standard errors of estimates are inﬂated - making
the variable insigniﬁcant even though they might be
signiﬁcant
Solution: drop one of the variables
27 / 37
EXERCISE
Which of the following pairs of independent variables
would violate the Assumption of no multicollinearity?
(That is, which pairs of variables are perfect linear
functions of each other?)
right shoe size and left shoe size (of students in the class)
consumption and disposable income (in the United States
over the last 30 years)
Xi and 2Xi
Xi and (Xi)
2
28 / 37
7. NORMALITY OF THE ERROR TERM
The error term is normally distributed.
This assumption is optional, but usually it is invoked
Normality of the error term is inherited by the estimate β
Knowing the distribution of the estimate allows us to ﬁnd
its conﬁdence intervals and to test hypotheses about
coefﬁcients
29 / 37
PROPERTIES OF THE OLS ESTIMATE
OLS estimate is deﬁned by the formula
β = X X
−1
X y ,
where y = Xβ + ε
Hence, it is dependent on the random variable ε and thus
β is a random variable itself
The properties of β are based on the properties of ε
30 / 37
GAUSS-MARKOV THEOREM
Given Classical Assumptions 1. - 6., the OLS estimator of β is the
minimum variance estimator from among the set of all linear
unbiased estimators of β.
Assumption 7., normality, is not needed for this theorem
The theorem is also known as a stating: “OLS is BLUE”,
where BLUE stands for “Best Linear Unbiased Estimator”
It means that:
OLS is linear: β = (X X)
−1
X y = Ly ,
OLS is unbiased (see next slide)
OLS has the minimum variance of all unbiased estimators
(it is efﬁcient)
31 / 37
EXPECTED VALUE OF THE OLS ESTIMATE
We show:
β = X X
−1
X y = X X
−1
X (Xβ + ε) =
= X X
−1
X X
I
β + X X
−1
X ε = β + X X
−1
X ε
E β = E β+ (X X)−1
X ε = E [β] + E (X X)−1
X ε =
=β +(X X)−1
X E [ε]
0
= β
Since E β = β, OLS is unbiased
32 / 37
VARIANCE OF THE OLS ESTIMATE
We show:
β = X X
−1
X y = β + X X
−1
X ε
Var β = Var β + (X X)−1
X ε =
=Var( β) + Var[(X X)−1
X ε] =
=(X X)−1
X · Var [ε] · [(X X)−1
X ] =
=(X X)−1
X · Var [ε]
σ2I
· X (X X)−1
=
=σ2 (X X)−1
X X (X X)−1
= σ2 (X X)−1
33 / 37
NORMALITY OF THE OLS ESTIMATE
When we assume that εi ∼ N(0, σ2), we can see that
β = X X
−1
X y = β + X X
−1
X ε
is also normally distributed (it is a linear combination of
normally distributed variables)
Hence, we say that β is jointly normal:
β ∼ N β, σ2
X X
−1
This will help us to test hypotheses about regression
coefﬁcients (see next lecture)
Note that the normality of errors is not required for large
samples, be-cause β is asymptotically normal anyway
34 / 37
CONSISTENCY OF THE OLS ESTIMATE
When no explanatory variables are correlated with the
error term (Assumption 5.), OLS estimate is consistent:
E X ε = 0 ⇒ β
n→∞
−→ β
In other words: as the number of observations increases,
the estimate converges to the true value of the coefﬁcient
Consistency is the most important property
of any estimate!!!
35 / 37
CONSISTENCY OF THE OLS ESTIMATE
As long as the OLS estimate of β is consistent, the
residuals are consistent estimates of the error term
If we have consistent estimates of the error term, we can
test if it satisﬁes the classical assumptions
Moreover, possible deviations from the classical model can
be corrected
As a consequence, the assumption of zero correlation
between explanatory variables and the error term
E X ε = 0
is the most important one to satisfy in regression models
36 / 37
SUMMARY
We expressed the multivariate OLS model in matrix
notation y = Xβ + ε and we found the formula of the
estimate: β = X X
−1
X y
We listed the classical assumptions of regression models:
model linear in parameters, explanatory variables linearly
independent
(normally distributed) error term with zero mean and
constant variance, no serial autocorrelation
no correlation between error term and explanatory
variables
We showed that if these assumptions hold, OLS estimate is
consistent (if no correlation between X and ε)
unbiased (if no correlation between X and ε)
efﬁcient (if homoskedasticity and no autocorrelation of ε)
normally distributed (if ε normally distributed)
37 / 37