Econometrics - Lecture 1
Introduction to Linear Regression


Contents
nOrganizational Issues
nSome History of Econometrics
nAn Introduction to Linear Regression
qOLS as an algebraic tool
qThe Linear Regression Model
qSmall Sample Properties of OLS estimator
nIntroduction to GRETL
n
Sept 24, 2010
Hackl, Econometrics
2

Organizational Issues
nAims of the course
nUnderstanding of econometric concepts and principles
nIntroduction to commonly used econometric tools and techniques
nUse of econometric tools for analyzing economic data: specification of adequate models,
identification of appropriate econometric methods, interpretation of results
nUse of GRETL
Sept 24, 2010
Hackl, Econometrics
3

Organizational Issues, cont’d
nLiterature
nCourse textbook
nMarno Verbeek, A Guide to Modern Econometrics, 3rd Ed., Wiley, 2008
nSuggestions for further reading
nP. Kennedy, A Guide to Econometrics, 6th Ed., Blackwell, 2008
nW.H. Greene, Econometric Analysis. 6th Ed., Pearson International, 2008
Sept 24, 2010
Hackl, Econometrics
4

Organizational Issues, cont’d
nPrerequisites
nLinear algebra: linear equations, matrices, vectors (basic operations and properties)
nDescriptive statistics: measures of central tendency, measures of dispersion, measures of
association, histogram, frequency tables, scatter plot, quantile
nTheory of probability: probability and its properties, random variables and distribution functions
in one and several dimensions, moments, convergence of random variables, limit theorems, law of
large numbers
nMathematical statistics: point estimation, confidence intervals, hypothesis testing, p-value,
significance level
Sept 24, 2010
Hackl, Econometrics
5

Organizational Issues, cont’d
nTeaching and learning method
nCourse in six blocks
nClass discussion, written homework (computer exercises, GRETL) submitted by groups of (3-5)
students, presentations of homework by participants
nFinal exam
nAssessment of student work
nFor grading, the written homework, presentation of homework in class and a final written exam will
be of relevance
nWeights: homework 70 %, final written exam 30 %
nPresentation of homework in class: students must be prepared to be called at random
Sept 24, 2010
Hackl, Econometrics
6

Contents
nOrganizational Issues
nSome History of Econometrics
nAn Introduction to Linear Regression
qOLS as an algebraic tool
qThe Linear Regression Model
qSmall Sample Properties of OLS estimator
nIntroduction to GRETL
n
Sept 24, 2010
Hackl, Econometrics
7

Empirical Economics Prior to 1930ies
nThe situation in the early 1930ies:
nTheoretical economics aims at “operationally meaningful theorems“; “operational” means purely
logical mathematical deduction
nEconomic theories or laws are seen as deterministic relations; no inference from data as part of
economic analysis
nIgnorance of the stochastic nature of economic concepts
nUse of statistical methods for
qmeasuring theoretical coefficients, e.g., demand elasticities,
qrepresenting business cycles
nData: limited availability; time-series on agricultural commodities, foreign trade
n
Sept 24, 2010
Hackl, Econometrics
8

Early Institutions
nApplied demand analysis: US Bureau of Agricultural Economics
nStatistical analysis of business cycles: H.L.Moore (Columbia University): Fourier periodogram ;
W.M.Persons et al. (Harvard): business cycle forecasting; US National Bureau of Economic Research
(NBER)
nCowles Commission for Research in Economics
qFounded 1932 by A.Cowles: determinants of stock market prices?
qFormalization of econometrics, development of econometric methodology
qR.Frisch, G.Tintner; European refugees
qJ.Marschak (head 1943-55) recruited people like T.C.Koopmans, T.M.Haavelmo, T.W.Anderson,
L.R.Klein
qInterests shifted to theoretical and mathematical economics after 1950
n
n
n
Sept 24, 2010
Hackl, Econometrics
9

Early Actors
nR.Frisch (Oslo Institute of Economic Research): econometric project, 1930-35; T.Haavelmo, Reiersol
nJ.Tinbergen (Dutch Central Bureau of Statistics, Netherlands Economic Institute; League of
Nations, Genova): macro-econometric model of Dutch economy, ~1935; T.C.Koopmans, H.Theil
nAustrian Institute for Trade Cycle Research: O. Morgenstern (head), A.Wald, G.Tintner
nEconometric Society, founded 1930 by R.Frisch et al.
qFacilitates exchange of scholars from Europe and US
qCovers econometrics and mathematical statistics
n
n
Sept 24, 2010
Hackl, Econometrics
10

First Steps
nR.Frisch, J.Tinbergen:
qMacro-economic modeling based on time-series, ~ 1935
qAiming at measuring parameters, e.g., demand elasticities
qAware of problems due to quality of data
qNobel Memorial Prize in Economic Sciences jointly in 1969 (“for having developed and applied
dynamic models for the analysis of economic processes”)
nT.Haavelmo
q“The Probability Approach in Econometrics”: PhD thesis (1944)
qEconometrics as a tool for testing economic theories
qStates assumptions needed for building and testing econometric models
qNobel Memorial Prize in Economic Sciences in 1989 ("for his clarification of the probability
theory foundations of econometrics and his analyses of simultaneous economic structures”)
Sept 24, 2010
Hackl, Econometrics
11

First Steps, Cont’d
nCowles Commission
qMethodology for macro-economic modeling based on Haavelmo’s approach
qCowles Commission monographs by G.Tintner, T.C.Koopmans, et al.
Sept 24, 2010
Hackl, Econometrics
12

The Haavelmo Revolution
nIntroduction of probabilistic concepts in economics
qObvious deficiencies of traditional approach: Residuals, measurement errors, omitted variables;
stochastic time-series data
qAdvances in probability theory in early 1930ies
qFisher‘s likelihood function approach
nHaavelmo‘s ideas
qCritical view of Tinbergen‘s macro-econometric models
qThorough adoption of probability theory in econometrics
qConversion of deterministic economic models into stochastic structural equations
nHaavelmo‘s “The Probability Approach in Econometrics”
qWhy is the probability approach indispensible?
qModeling procedure based on ML and hypothesis testing
Sept 24, 2010
Hackl, Econometrics
13

Haavelmo’s Arguments for the Probabilistic Approach
nEconomic variables in economic theory and econometric models
q“Observational” vs. “theoretical” vs. “true” variables
qModels have to take into account inaccurately measured data and passive observations
nUnrealistic assumption of permanence of economic laws
qCeteris paribus assumption
qEconomic time-series data
qSimplifying economic theories
qSelection of economic variables and relations out of the whole system of fundamental laws
Sept 24, 2010
Hackl, Econometrics
14

Cowles Commission Methodology
nAssumptions based to macro-econometric modeling and testing of economic theories
nTime series model
n Yt = aXt + bWt+ u1t,
n Xt = gYt + dZt+ u2t
1.Specification of the model equation(s) includes the choice of variables; functional form is
(approximately) linear
2.Time-invariant model equation(s): the model parameters a, …, d are independent of time t
3.Parameters a, …, d are structurally invariant, i.e., invariant wrt changes in the variables
4.Causal ordering (exogeneity, endogeneity) of variables is known
5.Statistical tests can falsify but not verify a model
6.
n
Sept 24, 2010
Hackl, Econometrics
15

Classical Econometrics and More
n“Golden age” of econometrics till ~1970
qMulti-equation models for analyses and forecasting
qGrowing computing power
qDevelopment of econometric tools
nScepticism
qPoor forecasting performance
qDubious results due to
nWrong specifications
nImperfect estimation methods
nTime-series econometrics: non-stationarity of economic time-series
qConsequences of non-stationarity: misleading t-, DW-statistics, R²
qNon-stationarity: needs new models (ARIMA, VAR, VEC); Box & Jenkins (1970: ARIMA-models), Granger
& Newbold (1974, spurious regression), Dickey-Fuller (1979, unit-root tests)
Sept 24, 2010
Hackl, Econometrics
16
Model
year
eq‘s
Tinbergen
1936
24
Klein
1950
6
Klein & Goldberger
1955
20
Brookings
1965
160
Brookings Mark II
1972
~200

Econometrics …
n… consists of the application of statistical data and techniques to mathematical formulations of
economic theory. It serves to test the hypotheses of economic theory and to estimate the implied
interrelationships. (Tinbergen, 1952)
n… is the interaction of economic theory, observed data and statistical methods. It is the
interaction of these three that makes econometrics interesting, challenging, and, perhaps,
difficult. (Verbeek, 2008)
n… is a methodological science with the elements
qeconomic theory
qmathematical language
qstatistical methods
qsoftware
Sept 24, 2010
Hackl, Econometrics
17

The Course
n1. Introduction to linear regression (Verbeek, Ch. 2): the linear regression model, OLS method,
properties of OLS estimators
n2. Introduction to linear regression (Verbeek, Ch. 2): goodness of fit, hypotheses testing,
multicollinearity
n3. Interpreting and comparing regression models (MV, Ch. 3): interpretation of the fitted model,
selection of regressors, testing the functional form
n4. Heteroskedascity and autocorrelation (Verbeek, Ch. 4): causes, consequences, testing,
alternatives for inference
n5. Endogeneity, instrumental variables and GMM (Verbeek, Ch. 5): the IV estimator, the generalized
instrumental variables estimator, the generalized method of moments (GMM)
n6. The practice of econometric modeling
Sept 24, 2010
Hackl, Econometrics
18

The Next Course
nUnivariate and multivariate time series models:  ARMA-, ARCH-, GARCH-models, VAR-, VEC-models
nModels for panel data
nModels with limited dependent variables: binary choice, count data
n
Sept 24, 2010
Hackl, Econometrics
19

Contents
nOrganizational Issues
nSome History of Econometrics
nAn Introduction to Linear Regression
qOLS as an algebraic tool
qThe Linear Regression Model
qSmall Sample Properties of OLS estimator
nIntroduction to GRETL
n
Sept 24, 2010
Hackl, Econometrics
20

Linear Regression
Sept 24, 2010
Hackl, Econometrics
21
Y: explained variable
X: explanatory or regressor variable
The linear regression model describes the data-generating
    process of Y under the condition X
   simple linear regression model
b: coefficient of X
a: intercept
multiple linear regression model

Example: Individual Wages
nSample (US National Longitudinal Survey, 1987)
nN = 3294 individuals (1569 females)
nVariable list
qWAGE: wage (in 1980 $) per hour  (p.h.)
qMALE: gender (1 if male, 0 otherwise)
qEXPER: experience in years
qSCHOOL: years of schooling
qAGE: age in years
nPossible questions
qEffect of gender on wage p.h.: Average wage p.h.: 6,31$ for males, 5,15$ for females
qEffects of education, of experience, of interactions, etc. on wage p.h.
n
n
n
n
n
Sept 24, 2010
Hackl, Econometrics
22

Individual Wages, cont’d
qWage per hour vs. Years of schooling
n
n
n
n
n
Sept 24, 2010
Hackl, Econometrics
23

Fitting a Model to Data
nChoice of values b1, b2 for model parameters b1, b2 of Y = b1 + b2 X,
ngiven the observations (yi, xi), i = 1,…,N
n
nPrinciple of (Ordinary) Least Squares or OLS:
n bi = arg minb1, b2 S(b1, b2), i=1,2
n
nObjective function: sum of the squared deviations
n S(b1, b2) = Si [yi - (b1 + b2xi)]2 = Si ei2
n
nDeviation between observation and fitted value: ei = yi - (b1 + b2xi)
n
n
Sept 24, 2010
Hackl, Econometrics
24

Observations and Fitted Regression Line
n
nSimple linear regression: Fitted line and observation points (Verbeek, Figure 2.1)
Sept 24, 2010
Hackl, Econometrics
25

OLS-Estimators
nOLS-estimators b1 und b2 result in
Sept 24, 2010
Hackl, Econometrics
26
with mean values     and
and second moments
Equating the partial derivatives of S(b1, b2) to zero: normal equations

Individual Wages, cont’d
nSample (US National Longitudinal Survey, 1987): wage per hour, gender, experience, years of
schooling; N = 3294 individuals (1569 females)
nAverage wage p.h.: 6,31$ for males, 5,15$ for females
nModel:
n wagei = β1 + β2 malei + εi
nmaleI: male dummy, has value 1 if individual is male, otherwise value 0
nOLS-estimation gives
n wagei = 5,15 + 1,17*malei
nCompare with averages!
n
n
n
n
n
n
Sept 24, 2010
Hackl, Econometrics
27

Individual Wages, cont’d
nOLS estimated wage equation (Table 2.1, Verbeek)
n
n
n
n
n
n
n
n
n wagei = 5,15 + 1,17*malei
n estimated wage p.h for males: 6,313
n        for females: 5,150
n
n
n
n
Sept 24, 2010
Hackl, Econometrics
28

OLS-Estimators: General Case
nModel for Y contains K-1 explanatory variables
n Y = b1 + b2X2 + … + bKXK = x’b
nwith x = (1, X2, …, XK)’ and b = (b1, b2, …, bK)’
nObservations: (yi, xi) = (yi, (1, xi2, …, xiK)’), i = 1, …, N
nOLS-estimates b = (b1, b2, …, bK)’ are obtained by minimizing the objective function wrt the bk’s
n
n
nthis results in
n
n
n
n
n
Sept 24, 2010
Hackl, Econometrics
29

OLS-Estimators: General Case, cont’d
nor
n
n
nthe so-called normal equations, a system of K linear equations for the components of b
nGiven that the symmetric KxK-matrix                  has full rank K and is hence invertible, the
OLS-estimators are
n
n
n
n
n
n
Sept 24, 2010
Hackl, Econometrics
30

Best Linear Approximation
nGiven the observations: (yi, xi’) = (yi, (1, xi2, …, xiK)’), i = 1, …, N
nFor yi, the linear combination or the fitted value
n
n
nis the best linear combination for Y from X2, …, XK and a constant (the intercept)
n
Sept 24, 2010
Hackl, Econometrics
31

Some Matrix Notation
nN observations
n (y1,x1), … , (yN,xN)
n
nModel: yi = b1 + b2xi + εi, i  = 1, …,N, or
n y = Xb + ε
nwith
n
n
n
n
Sept 24, 2010
Hackl, Econometrics
32

OLS Estimators in Matrix Notation
nMinimizing
n S(b) = (y  - Xb)’ (y  - Xb) = y’y – 2y’Xb + b’ X’Xb
n with respect to b gives the normal equations
n
n
n resulting from differentiating S(b) with respect to b and setting the first derivative to zero
nThe OLS-solution or OLS-estimators for b are
n b = (X’X)-1X’y
nThe best linear combinations or predicted values for Y given X or projections of y into the space
of X are obtained as
n ŷ = Xb = X(X’X)-1X’y = Pxy
n the NxN-matrix Px is called the projection matrix or hat matrix
Sept 24, 2010
Hackl, Econometrics
33

Residuals in Matrix Notation
nThe vector y can be written as y  = Xb + e  = ŷ + e with residuals
n e  = y – Xb or ei = yi – xi‘b, i = 1, …, N
nFrom the normal equations follows
n -2(X‘y – X‘Xb) = -2 X‘e = 0
n i.e., each column of X is orthogonal to e
nWith
n e = y – Xb = y – Pxy = (I – Px)y = Mxy
n the residual generating matrix Mx is defined as
n Mx = I – X(X’X)-1X’ = I – Px
n Mx projects y into the orthogonal complement of the space of X
nProperties of Px and Mx: symmetry (P’x = Px, M’x = Mx) idempotence (PxPx = Px, MxMx = Mx), and
orthogonality (PxMx = 0)
Sept 24, 2010
Hackl, Econometrics
34

Properties of Residuals
nResiduals: ei = yi – xi‘b, i = 1, …, N
nMinimum value of objective function
n S(b) = e’e = Si ei2
nFrom the orthogonality of e = (e1, …, eN)‘ to each xi = (x1i, …, xNi)‘, i.e., e‘xi = 0, follows
that
n Si ei = 0
n i.e., average residual is zero, if the model has an intercept
Sept 24, 2010
Hackl, Econometrics
35

Contents
nOrganizational Issues
nSome History of Econometrics
nAn Introduction to Linear Regression
qOLS as an algebraic tool
qThe Linear Regression Model
qSmall Sample Properties of OLS estimator
nIntroduction to GRETL
n
Sept 24, 2010
Hackl, Econometrics
36

Economic Models
nDescribe economic relationships (not only a set of observations), have an economic interpretation
nLinear regression model:
n yi = b1 + b2xi2 + … + bKxiK + ei = xi’b + εi
nVariables yi, xi2, …, xiK: observable, sample (i = 1, …, N) from a well-defined population or
universe
nError term εi (disturbance term) contains all influences that are not included explicitly in the
model; unobservable; assumption E{εi | xi} = 0 gives
q E{yi | xi} = xi‘β
qthe model describes the expected value of y given x
nUnknown coefficients b1, …, bK: population parameters
n
Sept 24, 2010
Hackl, Econometrics
37

The Sampling Concept
nThe regression model yi = xi’b + εi, i = 1, …, N; or y = Xb + ε
n describes one realization out of all possible samples of size N from the population
nA) Sampling process with fixed, non-stochastic xi’s
nNew sample: new error terms εi and new yi’s
nRandom sampling of εi, i = 1, …, N: joint distribution of εi‘s determines properties of b etc.
nB) Sampling process with samples of (xi, yi) or (xi, ei)
nNew sample: new error terms εi and new xi’s
nRandom sampling of (xi, ei), i = 1, …, N: joint distribution of (xi, ei)‘s determines properties
of b etc.
Sept 24, 2010
Hackl, Econometrics
38

The Sampling Concept, cont’d
nThe sampling with fixed, non-stochastic xi’s is not realistic for economic data
nSampling process with samples of (xi, yi) is appropriate for modeling cross-sectional data
qExample: household surveys, e.g., EU-SILC
nSampling process with samples of (xi, yi) from time-series data: sample is seen as one out of all
possible realizations of the underlying data-generating process
q
Sept 24, 2010
Hackl, Econometrics
39

The Ceteris Paribus Condition
nThe linear regression model needs assumptions to allow interpretation
nAssumption for εi‘s: E{εi | xi } = 0; exogeneity of variables X
nThis implies
n E{yi | xi } = xi'b
n i.e., the regression line describes the conditional expectation of yi given xi
nCoefficient bk measures the change of the expected value of Y if Xk changes by one unit and all
other Xj values, j ǂ k, remain the same (ceteris paribus condition)
nExogeneity can be restrictive
Sept 24, 2010
Hackl, Econometrics
40

Regression Coefficients
nLinear regression model:
n yi = b1 + b2xi2 + … + bKxiK + ei = xi’b + εi
nCoefficient bk measures the change of the expected value of Y if Xk changes by one unit and all
other Xj values, j ǂ k, remain the same (ceteris paribus condition); marginal effect of changing
Xk on Y
n
n
nExample
nWage equation: wagei = β1 + β2 malei + β3 schooli + β4 experi + εi
n β3 measures the impact of one additional year at school upon a person’s wage, keeping gender and
years of experience fixed
Sept 24, 2010
Hackl, Econometrics
41

Estimation of β
nGiven a sample (xi, yi), i = 1, …, N, the OLS-estimators for b
n b = (X’X)-1X’y
n can be used as an approximation for b
nThe vector b is a vector of numbers, the estimates
nThe sampling concept and assumptions on εi‘s determine the quality, i.e., the statistical
properties, of b
Sept 24, 2010
Hackl, Econometrics
42

Contents
nOrganizational Issues
nSome History of Econometrics
nAn Introduction to Linear Regression
qOLS as an algebraic tool
qThe Linear Regression Model
qSmall Sample Properties of OLS estimator
nIntroduction to GRETL
n
Sept 24, 2010
Hackl, Econometrics
43

Fitting Economic Models to Data
nObservations allow
nto estimate parameters
nto assess how well the data-generating process is represented by the model, i.e., how well the
model coincides with reality
nto improve the model if necessary
nFitting a linear regression model to data
nParameter estimates b = (b1, …, bK)’ for coefficients b  = (b1, …, bK)’
nStandard errors se(bk) of the estimates bk, k=1,…,K
nt-statistics, F-statistic, R2, Durbin Watson test-statistic, etc.
n
n
Sept 24, 2010
Hackl, Econometrics
44

OLS Estimator and OLS Estimates b
nOLS estimates b are a realization of the OLS estimator
nThe OLS estimator is a random variable
nObservations are a random sample from the population of all possible samples
nObservations are generated by some random process
nDistribution of the OLS estimator
nActual distribution not known
nTheoretical distribution determined by assumptions on
qmodel specification
qthe error term εi and regressor variables xi
nQuality criteria (bias, accuracy, efficiency) of OLS estimates are determined by the properties of
the distribution
n
Sept 24, 2010
Hackl, Econometrics
45

Gauss-Markov Assumptions
A1
E{εi} = 0 for all i
A2
all εi are independent of all xi (exogeneous xi)
A3
V{ei} = s2 for all i (homoskedasticity)
A4
Cov{εi, εj} = 0 for all i and j with i ≠ j (no autocorrelation)
Sept 24, 2010
Hackl, Econometrics
46
Observation yi is a linear function
yi = xi'b + εi
of observations xik, k =1, …, K, of the regressor variables and the error term εi
for i = 1, …, N; xi' = (xi1, …, xiK); X = (xik)
n
n
n
n
n
n
n
In matrix notation: E{ε} = 0, V{ε} = s2 IN

Systematic Part of the Model
nThe systematic part E{yi | xi } of the model yi = xi'b  + εi, given observations xi, is derived
under the Gauss-Markov assumptions as follows:
n(A2) implies E{ε | X} = E{ε} = 0 and V{ε | X} = V{ε} = s2 IN
nObservations xi, i = 1, …, N, do not affect the properties of ε
nThe systematic part
n E{yi | xi } = xi'b
n can be interpreted as the conditional expectation of yi, given observations xi
n
Sept 24, 2010
Hackl, Econometrics
47

Is the OLS-estimator a Good Estimator?
n
nUnder the Gauss-Markov assumptions, the OLS estimator has nice properties; see below
nGauss-Markov assumptions are very strong and often not satisfied
nRelaxations of the Gauss-Markov assumptions and consequences of such relaxations are important
topics
Sept 24, 2010
Hackl, Econometrics
48

Properties of OLS Estimators
n1. The OLS estimator b is unbiased: E{b | X} = E{b} = β
n    Needs assumptions (A1) and (A2)
n
n2. The variance of the OLS estimator b is given by
n V{b | X} = V{b} = σ2(Σi xi xi’)-1 = σ2(X‘ X)-1
n    Needs assumptions (A1), (A2), (A3) and (A4)
n
n3. Gauss-Markov theorem: The OLS estimator b is a BLUE (best linear unbiased estimator) for β
n    Needs assumptions (A1), (A2), (A3), and (A4) and requires linearity in parameters
n
n
n
n
n
n
Sept 24, 2010
Hackl, Econometrics
49

The Gauss-Markov Theorem
nOLS estimator b is BLUE (best linear unbiased estimator) for β
nLinear estimator: b* = Ay with any full-rank KxN matrix A
nb*  is an unbiased estimator: E{b*} = E{Ay} = β
nb is BLUE: V{b*} – V{b} is positive definite, i.e. the variance of any linear combination d’b* is
not smaller than that of d’b*
n V{d’b*} ≥ V{d’b}
n e.g., V{bk*} ≥ V{bk} for any k
nThe OLS-estimator is most accurate among the linear unbiased estimators
n
n
n
Sept 24, 2010
Hackl, Econometrics
50

Standard Errors of OLS Estimators
nVariance of the OLS-estimators:
n V{b} = σ2(X‘ X)-1 = σ2(Σi xi xi’)-1
nStandard error of OLS estimate bk: The square root of the kth diagonal element of V{b}
nEstimator V{b} is proportional to the variance σ2 of the error terms
nEstimator for σ2: sampling variance s2 of the residuals ei
n s2 = (N – K)-1 Σi ei2
n Under assumptions (A1)-(A4), s2 is unbiased for σ2
n     Attention: the estimator (N – 1)-1 Σi ei2 is biased
nEstimated variance (covariance matrix) of b:
n Ṽ{b} = s2(X‘ X)-1 = s2(Σi xi xi’)-1
n
Sept 24, 2010
Hackl, Econometrics
51

Standard Errors of OLS Estimators, cont’d
nVariance of the OLS-estimators:
n V{b} = σ2(X‘ X)-1 = σ2(Σi xi xi’)-1
nStandard error of OLS estimate bk: The square root of the kth diagonal element of V{b}
n σ√ckk
n with ckk the k-th diagonal element of  (X‘ X)-1
nEstimated variance (covariance matrix) of b:
n Ṽ{b} = s2(X‘ X)-1 = s2(Σi xi xi’)-1
nEstimated standard error of bk:
n se(bk) = s√ckk
n
Sept 24, 2010
Hackl, Econometrics
52

Hackl, Econometrics
53
Two Examples
nSimple regressionYi = a + b Xi + et
n The variance for b is
n
n
n b is the more accurate, the larger N and sx² and the smaller s²
nRegression with two regressors:
n Yi = b1 + b2 Xi2 + b3 Xi3 + et
n The variance for b2 is
n
n
n b2 is most accurate if X2 and X3 are uncorrelated
Sept 24, 2010

Normality of Error Terms
nFor statistical inference purposes, a distributional assumption for the εi‘s is needed
n
n
nTogether with assumptions (A1), (A3), and (A4), (A5) implies
n εi ~ NID(0,σ2) for all i
ni.e., all εi are
nindependent drawings
nfrom a normal distribution
nwith mean 0
nand variance σ2
nError terms are “normally and independently distributed”
n
n
Sept 24, 2010
Hackl, Econometrics
54
A5
εi  normally distributed for all i

Properties of OLS Estimators
n1. The OLS estimator b is unbiased: E{b} = β
n2. The variance of the OLS estimator is given by
n V{b} = σ2(X’X)-1
n3. The OLS estimator b is a BLUE (best linear unbiased estimator) for β
n
n4. The OLS estimator b is normally distributed with mean β and covariance matrix V{b} = σ2(X‘X)-1
n b ~ N(β, σ2(X’X)-1) , bk ~ N(βk, σ2ckk)
n Needs assumptions (A2) + (A5)
n
n
Sept 24, 2010
Hackl, Econometrics
55

Example: Individual Wages
n wagei = β1 + β2 malei + εi
nWhat do the assumptions mean?
n(A1): β1 + β2 malei contains the whole systematic part of the model; no regressors besides gender
relevant?
n(A2): xi uncorrelated with εi for all i: knowledge of a person’s gender provides no information
about further variables which affect the person’s wage; is that realistic?
n(A3) V{εi} = σ2 for all i: variance of error terms (and of wages) is the same for males and
females; is that realistic?
n(A4) Cov{εi,,εj} = 0, i ≠ j: implied by random sampling
n(A5) Normality of εi : is that realistic? (Would allow, e.g., for negative wages)
n
n
n
n
n
n
n
Sept 24, 2010
Hackl, Econometrics
56

Individual Wages, cont’d
nOLS estimated wage equation (Table 2.1, Verbeek)
n
n
n
n
n
n
n
n
n b1 = 5,147, se(b1) = 0,081: mean wage p.h. for females: 5,15$, with std.error of 0,08$
n b2 = 1,166, se(b2) = 0,112
n 95% confidence interval for β1: 4,988 £ β1 £ 5,306
n
n
n
Sept 24, 2010
Hackl, Econometrics
57

Your Homework
1.Verbeek’s data set “WAGES” contains for a sample of 3294 individuals the wage and other
variables. Using GRETL, draw box plots (a) for all wages p.h. in the sample, (b) for wages p.h. of
males and of females.
2.For Verbeek’s data set “WAGES”, calculate, using GRETL, the mean wage p.h. (a) of the whole
sample, (b) of males and females, and (c) of persons with schooling between (i) 0 and 6 years, (ii)
7 and 12 years, and (iii) 13 and 16 years.
3.For the simple linear regression (Y = b1 + b2X + e): write the OLS-estimator b = (X’X)-1X’y in
summation form; X: NxK.
4.Show that V{b} = σ2(X‘X)-1.
5.Show that PxMx = 0 for the hat matrix Px and the residual generating matrix Mx.
Sept 24, 2010
Hackl, Econometrics
58