LECTURE 2
Introduction to Econometrics
INTRODUCTION TO LINEAR
REGRESSION ANALYSIS I.
September 27, 2016
1 / 33
PREVIOUS LECTURE...
Introduction, organization, review of statistical
background
random variables
mean, variance, standard deviation
covariance, correlation, independence
statistical distributions
standardized random variables
2 / 33
LECTURE 2.
Introduction to simple linear regression analysis
Sampling and estimation
OLS principle
Readings:
Studenmund, A. H., Using Econometrics: A Practical
Guide, Chapters 1, 2.1, 17.2, 17.3
Wooldridge, J. M., Introductory Econometrics: A Modern
Approach, Chapters 2.1, 2.2
3 / 33
WARM-UP EXERCISE
The heights of U.S. females between age 25 and 34 are
approximately normally distributed with a mean of 66
inches and a standard deviation of 2.5 inches.
What fraction of U.S. female population in this age bracket
is taller than 70 inches, the height of average adult U.S.
male of this age?
4 / 33
SAMPLING
Population: the entire group of items that interests us
Sample: the part of the population that we actually
observe
Statistical inference: use of the sample to draw conclusion
about the characteristics of the population from which the
sample came
Examples: medical experiments, opinion polls
5 / 33
RANDOM SAMPLING VS SELECTION BIAS
Correct statistical inference can be performed only on a
random sample - a sample that reﬂects the true
distribution of the population
Biased sample: any sample that differs systematically
from the population that it is intended to represent
Selection bias: occurs when the selection of the sample
systematically excludes or under represents certain groups
Example: opinion poll about tuition payments among
undergraduate students vs all citizens
Self-selection bias: occurs when we examine data for a
group of people who have chosen to be in that group
Example: accident records of people who buy collision
insurance
6 / 33
EXERCISE 2
American Express and the French tourist ofﬁce sponsored
a survey that found that most visitors to France do not
consider the French to be especially unfriendly.
The sample consisted of 1,000 Americans who have visited
France more than once for pleasure over the past two
years.
Is this survey unbiased?
7 / 33
ESTIMATION
Parameter: a true characteristic of the distribution of a
variable, whose value is unknown, but can be estimated
Example: population mean E[X]
Estimator: a sample statistic that is used to estimate the
value of the parameter
Example: sample mean Xn
Note that the estimator is a random variable (it has a
probability distribution, mean, variance,...)
Estimate: the speciﬁc value of the estimator that is
obtained on a speciﬁc sample
8 / 33
PROPERTIES OF AN ESTIMATOR
An estimator is unbiased if the mean of its distribution is
equal to the value of the parameter it is estimating
An estimator is consistent if it converges to the value of
the true parameter as the sample size increases
An estimator is efﬁcient if the variance of its sampling
distribution is the smallest possible
9 / 33
EXERCISE 3
The Slovak Ministry of Labor and Social Affairs aimed to
evaluate the impact of some of its re-qualiﬁcation courses
for newly unemployed workers.
For this purpose, the Ministry tracked workers who lost
their jobs in October 2015 and went through 3-months long
re-qualiﬁcation program.
The Ministry found that 90 % of workers who ﬁnished the
course found a new job within 6 months after ﬁnishing the
course.
The Ministry concluded that the re-qualiﬁcation program
was successful.
Was the evaluation unbiased?
10 / 33
ECONOMETRIC MODELS
Econometric model is an estimable formulation of a
theoretical relationship
Theory says: Q = f(P, Ps, Y)
Q . . . quantity demanded
P . . . commodity’s price
Ps . . . price of substitute good
Y . . . disposable income
We simplify: Q = β0 + β1P + β2Ps + β3Y
We estimate: Q = 31.50 − 0.73P + 0.11Ps + 0.23Y
11 / 33
ECONOMETRIC MODELS
Today’s econometrics deals with different, even very
general models
During the course we will cover just linear regression
models
We will see how these models are estimated by
Ordinary Least Squares (OLS)
Generalized Least Squares (GLS)
We will perform estimation on different types of data
12 / 33
DATA USED IN ECONOMETRICS
cross-section repeated cross-section
sample of units several independent
(eg. ﬁrms, individuals) samples of units
taken at a given point in time (eg. ﬁrms, individuals)
taken at different points in time
time-series panel data
observations of variable(s) time series for each
in different points in time cross-sectional unit
in the data set
13 / 33
DATA USED IN ECONOMETRICS - EXAMPLES
Country’s macroeconomic indicators (GDP, inﬂation rate,
net exports, etc.) month by month
Data about ﬁrms’ employees or ﬁnancial indicators as of
the end of the year
Records of bank clients who were given a loan
Annual social security or tax records of individual workers
14 / 33
STEPS OF AN ECONOMETRIC ANALYSIS
1. Formulation of an economic model (rigorous or intuitive)
2. Formulation of an econometric model based on the
economic model
3. Collection of data
4. Estimation of the econometric model
5. Interpretation of results
15 / 33
EXAMPLE - ECONOMIC MODEL
Denote:
p . . . price of the good
c . . . ﬁrm’s average cost per one unit of output
q(p) . . . demand for ﬁrm’s output
Firm proﬁt:
π = q(p) · (p − c)
Demand for good:
q(p) = a − b · p
Derive:
q =
a
2
−
b
2
· c
We call q dependent variable and c explanatory variable
16 / 33
EXAMPLE - ECONOMETRIC MODEL
Write the relationship in a simple linear form
q = β0 + β1c
(have in mind that β0 = a
2 and β1 = −b
2 )
There are other (unpredictable) things that inﬂuence ﬁrms’
sales ⇒ add disturbance term
q = β0 + β1c + ε
Find the value of parameters β1 (slope) and β0 (intercept)
17 / 33
EXAMPLE - DATA
Ideally: investigate all ﬁrms in the economy
Really: investigate a sample of ﬁrms
We need a random (unbiased) sample of ﬁrms
Collect data:
Firm 1 2 3 4 5 6
q 15 32 52 14 37 27
c 294 247 153 350 173 218
18 / 33
EXAMPLE - DATA
10
10
1020
20
2030
30
3040
40
4050
50
50Output
Output
Output150
150
150200
200
200250
250
250300
300
300350
350
350Average cost
Average cost
Average cost
19 / 33
EXAMPLE - ESTIMATION
10
10
1020
20
2030
30
3040
40
4050
50
50Output
Output
Output150
150
150200
200
200250
250
250300
300
300350
350
350Average cost
Average cost
Average cost
20 / 33
EXAMPLE - ESTIMATION
10
10
1020
20
2030
30
3040
40
4050
50
50Output
Output
Output150
150
150200
200
200250
250
250300
300
300350
350
350Average cost
Average cost
Average cost
OLS method:
Make the ﬁt as good as possible
⇓
Make the misﬁt as low as
possible
⇓
Minimize the (vertical) distance
between data points and
regression line
⇓
Minimize the sum of squared
deviations
21 / 33
TERMINOLOGY
yi = β0 + β1xi + εi . . . regression line
yi . . . dependent/explained variable (i-th observation)
xi . . . independent/explanatory variable (i-th observation)
εi . . . random error term/disturbance (of i-th observation)
β0 . . . intercept parameter (β0 . . . estimate of this parameter)
β1 . . . slope parameter (β1 . . . estimate of this parameter)
22 / 33
ORDINARY LEAST SQUARES
OLS = ﬁtting the regression line by minimizing the sum of
vertical distance between the regression line and the
observed points
10
10
1020
20
2030
30
3040
40
4050
50
50OutputOutputOutput150
150
150200
200
200250
250
250300
300
300350
350
350Average cost
Average cost
Average cost
23 / 33
ORDINARY LEAST SQUARES - PRINCIPLE
Take the squared differences between observed point yi
and regression line β0 + β1xi:
(yi − β0 − β1xi)2
Sum them over all n observations:
n
i=1
(yi − β0 − β1xi)2
Find β0 and β1 such that they minimize this sum
β0, β1 = argmin
β0,β1
n
i=1
(yi − β0 − β1xi)2
24 / 33
ORDINARY LEAST SQUARES - DERIVATION
β0, β1 = argmin
β0,β1
n
i=1
(yi − β0 − β1xi)2
FOC:
∂
∂β0
: −2
n
i=1
yi − β0 − β1xi = 0
∂
∂β1
: −2
n
i=1
xi yi − β0 − β1xi = 0
We express (on the lecture):
β0 = yn − β1xn β1 =
n
i=1
(xi − xn) yi − yn
n
i=1
(xi − xn)2
25 / 33
RESIDUAL
Residual is the vertical difference between the estimated
regression line and the observation points
OLS minimizes the sum of squares of all residuals
It is the difference between the true value yi and the
estimated value yi = β0 + β1xi
We deﬁne:
ei = yi − β0 − β1xi
Residual ei (observed) is not the same as the disturbance εi
(unobserved)!!!
Residual is an estimate of the disturbance: ei = εi
26 / 33
RESIDUAL VS. DISTURBANCE
10
10
1020
20
2030
30
3040
40
4050
50
50Output
Output
Output150
150
150200
200
200250
250
250300
300
300350
350
350Average cost
Average cost
Average cost
True relationship
Estimated relationship
Disturbance
Residual
27 / 33
GETTING BACK TO THE EXAMPLE
We have the economic model
q =
a
2
−
b
2
· c
We estimate
qi = β0 + β1ci + εi
(having in mind that β0 = a
2 and β1 = −b
2 )
Over data:
Firm 1 2 3 4 5 6
q 15 32 52 14 37 27
c 294 247 153 350 173 218
28 / 33
GETTING BACK TO THE EXAMPLE
When we plug in the formula:
β1 =
6
i=1
(ci − c) (qi − q)
6
i=1
(ci − c)2
= −0.177
β0 = q − β1c = 71.74
The estimated equation is
q = 71.74 − 0.177c
and so
a = 2β0 = 143.48 and b = −2β1 = 0.354
29 / 33
MEANING OF REGRESSION COEFFICIENT
Consider the model
q = β0 + β1c
estimated as q = 71.74 − 0.177c
q . . . demand for ﬁrm’s
output
c . . . ﬁrm’s average cost per
unit of output
Meaning of β1 is the impact of a one unit increase in c on
the dependent variable q
When average costs increase by 1 unit, quantity demanded
decreases by 0.177 units
30 / 33
BEHIND THE ERROR TERM
The stochastic error term must be present in a regression
equation because of:
1. omission of many minor inﬂuences (unavailable data)
2. measurement error
3. possibly incorrect functional form
4. stochastic character of unpredictable human behavior
Remember that all of these factors are included in the error
term and may alter its properties
The properties of the error term determine the properties
of the estimates
31 / 33
SUMMARY
We have learned that an econometric analysis consists of
1. deﬁnition of the model
2. estimation
3. interpretation
We have explained the principle of OLS: minimizing the
sum of squared differences between the observations and
the regression line
We have derived the formulas of the estimates:
β1 =
n
i=1
(xi − xn) yi − yn
n
i=1
(xi − xn)2
β0 = yn − β1xn
32 / 33
WHAT’S NEXT
In the next lectures, we will
derive estimation formulas for multivariate models
specify properties of the OLS estimator
33 / 33