LECTURE 3 Introduction to Econometrics INTRODUCTION TO LINEAR REGRESSION ANALYSIS II October 4, 2016 1 / 37 REVISION: THE PREVIOUS LECTURE Desired properties of an estimator: An estimator is unbiased if the mean of its distribution is equal to the value of the parameter it is estimating An estimator is consistent if it converges to the value of the true parameter as the sample size increases An estimator is efficient if the variance of its sampling distribution is the smallest possible 2 / 37 REVISION: THE PREVIOUS LECTURE We explained the principle of OLS estimator: minimizing the sum of squared differences between the observation and the regression line yi = β0 + β1xi + εi We found the formulae for the estimates: β1 = n i=1 (xi − xn) yi − yn n i=1 (xi − xn)2 β0 = yn − β1xn 3 / 37 REVISION: THE PREVIOUS LECTURE We explained that the stochastic error term must be present in a regression equation because of: 1. omission of many minor influences (unavailable data) 2. measurement error 3. possibly incorrect functional form 4. stochastic character of unpredictable human behavior Remember that all of these factors are included in the error term and may alter its properties The properties of the error term determine the properties of the estimates 4 / 37 WARM-UP EXERCISE You receive a unique dataset that includes wages of all citizens of Brno as well as their experience (number of years spent working). Obviously, you are very curious about what is the effect of experience on wages. You run an OLS regression of monthly wage in CZK on the number of years of experience and obtain the following results: wagei = 14450 + 1135 · experi 1. Interpret the meaning of the coefficient of experi. 2. Use the estimates to predict the average wage of a person with 1, 5, 20, and 40 years of experience. 3. Do the predicted wages seem realistic? Explain your answer. 5 / 37 ON TODAY’S LECTURE We will derive estimation formulas for multivariate OLS We will list the assumptions about the error term and the explanatory variables that are required in classical regression models We will show that under these assumptions, OLS is the best estimator available for regression models The rest of the course will mostly deal in one way or another with the question what to do when one of the classical assumptions is not met Readings: Studenmund - chapter 4 Wooldridge - chapters 5, 8, 9, 12 6 / 37 ORDINARY LEAST SQUARES WITH SEVERAL EXPLANATORY VARIABLES Usually, there are more than one explanatory variables in regression models Multivariate model with k explanatory variables: yi = β0 + β1xi1 + β2xi2 + . . . + βkxik + εi For observations 1, 2, . . . , n, we have: y1 = β0 + β1x11 + β2x12 + . . . + βkx1k + ε1 y2 = β0 + β1x21 + β2x22 + . . . + βkx2k + ε2 ... ... yn = β0 + β1xn1 + β2xn2 + . . . + βkxnk + εn 7 / 37 MATRIX NOTATION We can write in matrix form:      y1 y2 ... yn      =      1 x11 x12 · · · x1n 1 x21 x22 · · · x2n ... ... ... ... 1 xn1 xn2 · · · xnk             β0 β1 β2 ... βk        +      ε1 ε2 ... εn      or in a simplified notation: y = Xβ + ε 8 / 37 OLS - DERIVATION UNDER MATRIX NOTATION We have to find β = argmin β (y − Xβ) (y − Xβ) = argmin β y y − y Xβ − β X y + β X Xβ FOC: ∂ ∂β : − y X − X y + X Xβ + (X X) β = 0 X Xβ = X y This gives us β = X X −1 X y 9 / 37 MEANING OF REGRESSION COEFFICIENT Consider the multivariate model Q = β0 + β1P + β2Ps + β3Y + ε estimated as Q = 31.50 − 0.73P + 0.11Ps + 0.23Y Q . . . quantity demanded P . . . commodity’s price Ps . . . price of substitute Y . . . disposable income Meaning of β1 is the impact of a one unit increase in P on the dependent variable Q, holding constant the other included independent variables Ps and Y When price increases by 1 unit (and price of substitute good and income remain the same), quantity demanded decreases by 0.73 units 10 / 37 EXERCISE Remember the unique dataset that includes wages of all citizens of Brno as well as their experience (number of years spent working). Because you realize that wages may not be linearly dependent on experience, you add an additional variable exper2 i into your model and you obtain the following results: wagei = 14450 + 1160 · experi − 25 · exper2 i 1. What is the overall impact of increasing the number of years of experience by 1 year? 2. Use the estimates to determine the average wage of a person with 1, 5, 20, and 40 years of experience. 3. Do the predicted wages seem realistic now? Explain your answer. 11 / 37 THE CLASSICAL ASSUMPTIONS 1. The regression model is linear in the coefficients, is correctly specified, and has an additive error term 2. The error term has a zero population mean 3. Observations of the error term are uncorrelated with each other 4. The error term has a constant variance 5. All explanatory variables are uncorrelated with the error term 6. No explanatory variable is a perfect linear function of any other explanatory variable(s) 7. The error term is normally distributed 12 / 37 GRAPHICAL REPRESENTATION X Y 13 / 37 1. LINEARITY IN COEFFICIENTS The regression model is linear in the coefficients, is correctly specified, and has an additive error term. Linearity in variables is not required Example: production function Y = AKβ1 Lβ2 for which we suppose A = expβ0+ε can be transformed so that ln Y = β0 + β1 ln K + β2 ln L + ε and the linearity in coefficients is restored Note that it is the linearity in coefficients that allows us to rewrite the general regression model in matrix form 14 / 37 EXERCISE Which of the following models is/are linear? y = β0 + β1x + ε ln y = β0 + β1 ln x + β2 √ z + ε y = xβ1 + ε 15 / 37 EXERCISE Which of the following models is/are linear? y = β0 + β1x + ε is a linear model ln y = β0 + β1 ln x + β2 √ z + ε is a linear model y = xβ1 + ε is NOT a linear model Regression models are linear in parameters, but they do not need to be linear in variables 16 / 37 2. ZERO MEAN OF THE ERROR TERM The error term has a zero population mean. Notation: E[εi] = 0 or E[ε] = 0 Idea: observations are distributed around the regression line, the average of deviations is zero In fact, the mean of εi is forced to be zero by the existence of the intercept (β0) in the equation Hence, this assumption is satisfied as long as there is an intercept included in the equation 17 / 37 3. ERRORS UNCORRELATED WITH EACH OTHER Observations of the error term are uncorrelated with each other. If there is a systematic correlation between one observation of the error term and another (serial correlation), it is more difficult for OLS to get precise estimates of the coefficients of the explanatory variables Technically: the OLS estimate will be consistent, but not efficient Often happens in time series data, where a random shock in one time period affects the random shock in another time period We will solve this problem using Generalized Least Squares estimator 18 / 37 GRAPHICAL REPRESENTATION X Y Estimated model True model 19 / 37 4. CONSTANT VARIANCE OF THE ERROR TERM The error term has a constant variance. This property is called homoskedasticity; if it is not satisfied, we talk about heteroskedasticity It states that each observation of the error is drawn from a distribution with the same variance and thus varies in the same manner around the regression line If the error term is heteroskedastic, it is more difficult for OLS to get precise estimates of the coefficients of the explanatory variables Technically: the OLS estimate will be consistent, but not efficient 20 / 37 4. CONSTANT VARIANCE OF THE ERROR TERM Heteroskedasticity is often present in cross-sectional data Example: Analysis of household consumption patterns Variance of the consumption of certain goods might be greater for higher-income households These have more discretionary income than do lower-income households We will solve this problem using Hull-White robust standard errors 21 / 37 GRAPHICAL REPRESENTATION X Y True model Estimated model 22 / 37 3. NO CORRELATION + 4. HOMOSKEDASTICITY Notation: no correlation: corr(εiεj) ⇒ E[εiεj] = 0 for each i, j homoskedasticity: E[ε2 i ] = σ2 for each i Matrix notation: Var[ε] =        σ2 0 0 · · · 0 0 σ2 0 · · · 0 0 0 σ2 · · · 0 ... ... ... 0 0 0 · · · σ2        = σ2 I 23 / 37 5. VARIABLES UNCORRELATED WITH THE ERROR TERM All explanatory variables are uncorrelated with the error term. Notation: E[xiεi] = 0 or E[X ε] = 0 If an explanatory variable and the error term were correlated with each other, the OLS estimates would be likely to attribute to the x some of the variation in y that actually came from the error term Example: Analysis of household consumption patterns Households with lower incomes may indicate higher consumption (because of shame) Negative correlation between X and error term (measurement error higher for lower incomes) Leads to biased and inconsistent estimates We will solve this problem using IV approach 24 / 37 GRAPHICAL REPRESENTATION X Y True model Estimated model 25 / 37 6. LINEARLY INDEPENDENT VARIABLES No explanatory variable is a perfect linear function of any other explanatory variable(s). If this condition does not hold, we talk about (multi)collinearity Multicollinearity can be perfect of imperfect Perfect multicollinearity: one explanatory variable is an exact linear function of one or more other explanatory variables In this case, the OLS model is incapable to distinguish one variable from the other Technical consequence: (X X)−1 does not exist OLS estimation cannot be conducted Example: we include dummy variables for men and women together with the intercept 26 / 37 6. LINEARLY INDEPENDENT VARIABLES Imperfect multicollinearity: There is a linear relationship between the variables, but there is some error in that relationship Example: we include two variables that proxy for individual health status Consequences of multicollinearity: Estimated coefficients remain unbiased But the standard errors of estimates are inflated - making the variable insignificant even though they might be significant Solution: drop one of the variables 27 / 37 EXERCISE Which of the following pairs of independent variables would violate the Assumption of no multicollinearity? (That is, which pairs of variables are perfect linear functions of each other?) right shoe size and left shoe size (of students in the class) consumption and disposable income (in the United States over the last 30 years) Xi and 2Xi Xi and (Xi) 2 28 / 37 7. NORMALITY OF THE ERROR TERM The error term is normally distributed. This assumption is optional, but usually it is invoked Normality of the error term is inherited by the estimate β Knowing the distribution of the estimate allows us to find its confidence intervals and to test hypotheses about coefficients 29 / 37 PROPERTIES OF THE OLS ESTIMATE OLS estimate is defined by the formula β = X X −1 X y , where y = Xβ + ε Hence, it is dependent on the random variable ε and thus β is a random variable itself The properties of β are based on the properties of ε 30 / 37 GAUSS-MARKOV THEOREM Given Classical Assumptions 1. - 6., the OLS estimator of β is the minimum variance estimator from among the set of all linear unbiased estimators of β. Assumption 7., normality, is not needed for this theorem The theorem is also known as a stating: “OLS is BLUE”, where BLUE stands for “Best Linear Unbiased Estimator” It means that: OLS is linear: β = (X X) −1 X y = Ly , OLS is unbiased (see next slide) OLS has the minimum variance of all unbiased estimators (it is efficient) 31 / 37 EXPECTED VALUE OF THE OLS ESTIMATE We show: β = X X −1 X y = X X −1 X (Xβ + ε) = = X X −1 X X I β + X X −1 X ε = β + X X −1 X ε E β = E β+ (X X)−1 X ε = E [β] + E (X X)−1 X ε = =β +(X X)−1 X E [ε] 0 = β Since E β = β, OLS is unbiased 32 / 37 VARIANCE OF THE OLS ESTIMATE We show: β = X X −1 X y = β + X X −1 X ε Var β = Var β + (X X)−1 X ε = =Var( β) + Var[(X X)−1 X ε] = =(X X)−1 X · Var [ε] · [(X X)−1 X ] = =(X X)−1 X · Var [ε] σ2I · X (X X)−1 = =σ2 (X X)−1 X X (X X)−1 = σ2 (X X)−1 33 / 37 NORMALITY OF THE OLS ESTIMATE When we assume that εi ∼ N(0, σ2), we can see that β = X X −1 X y = β + X X −1 X ε is also normally distributed (it is a linear combination of normally distributed variables) Hence, we say that β is jointly normal: β ∼ N β, σ2 X X −1 This will help us to test hypotheses about regression coefficients (see next lecture) Note that the normality of errors is not required for large samples, be-cause β is asymptotically normal anyway 34 / 37 CONSISTENCY OF THE OLS ESTIMATE When no explanatory variables are correlated with the error term (Assumption 5.), OLS estimate is consistent: E X ε = 0 ⇒ β n→∞ −→ β In other words: as the number of observations increases, the estimate converges to the true value of the coefficient Consistency is the most important property of any estimate!!! 35 / 37 CONSISTENCY OF THE OLS ESTIMATE As long as the OLS estimate of β is consistent, the residuals are consistent estimates of the error term If we have consistent estimates of the error term, we can test if it satisfies the classical assumptions Moreover, possible deviations from the classical model can be corrected As a consequence, the assumption of zero correlation between explanatory variables and the error term E X ε = 0 is the most important one to satisfy in regression models 36 / 37 SUMMARY We expressed the multivariate OLS model in matrix notation y = Xβ + ε and we found the formula of the estimate: β = X X −1 X y We listed the classical assumptions of regression models: model linear in parameters, explanatory variables linearly independent (normally distributed) error term with zero mean and constant variance, no serial autocorrelation no correlation between error term and explanatory variables We showed that if these assumptions hold, OLS estimate is consistent (if no correlation between X and ε) unbiased (if no correlation between X and ε) efficient (if homoskedasticity and no autocorrelation of ε) normally distributed (if ε normally distributed) 37 / 37