Advanced Econometrics - Lecture 1 The Linear Regression Model: A Review Econometrics … n… consists of the application of statistical data and techniques to mathematical formulations of economic theory. It serves to test the hypotheses of economic theory and to estimate the implied interrelationships.” (Tinbergen, 1952) n… is the interaction of economic theory, observed data and statistical methods. It is the interaction of these three that makes econometrics interesting, challenging, and perhaps, difficult.“ (Verbeek, 2008) n… is a methodological science with the elements qeconomic theory qmathematical language qstatistical methods qsoftware March 19, 2010 Hackl, Advanced Econometrics 2 The Contents 1.Review of linear regression and the OLS estimator 2.Heteroskedasticity and autocorrelation (MV, Ch.4) 3.Endogeneity, instrumental variables and GMM (MV, Ch.5) 4.Maximum likelihood estimation and specification tests (MV, Ch.7) 5.Univariate time series models (MV, Ch.8) 6.Multivariate time series models (MV, Ch.9) 7. 7. March 19, 2010 Hackl, Advanced Econometrics 3 Advanced Econometrics - Lecture 1 nRegression: a descriptive tool nEconomic models nOLS estimation nProperties of OLS estimators nt-test and F-test nAsymptotic properties of OLS estimators nMulticollinearity nModel specification and tests n March 19, 2010 Hackl, Advanced Econometrics 4 The Linear Model March 19, 2010 Hackl, Advanced Econometrics 5 Y: explained variable X: explanatory or regressor variable The model describes the data-generating process of Y under the condition X simple linear regression model b: coefficient of X a: intercept multiple linear regression model Fitting a Model to Data nChoice of values b1, b2 for model parameters b1, b2 of Y = b1 + b2 X, ngiven the observations (yi, xi), i = 1,…,N n nPrinciple of (Ordinary) Least Squares or OLS: n bi = arg minb1, b2 S(b1, b2), i=1,2 n nObjective function: sum of the squared deviations n S(b1, b2) = Si [yi - (b1 + b2xi)]2 = Si ui2 n nDeviations between observation and fitted values: ui = yi - (b1 + b2xi) n n March 19, 2010 Hackl, Advanced Econometrics 6 Observations and Fitted Regression Line n nSimple linear regression: Fitted line and observation points (Verbeek, Figure 2.1) March 19, 2010 Hackl, Advanced Econometrics 7 OLS-Estimators nOLS-estimators b1 und b2 result in March 19, 2010 Hackl, Advanced Econometrics 8 with mean values and and second moments Equating the partial derivatives of S(b1, b2) to zero: normal equations Example: Individual Wages nSample (US National Longitudinal Survey, 1987): wage rate (per hour), gender, experience and years of schooling; N = 3294 individuals (1569 females) nAverage wage rate (p.h.): 6.31$ for males, 5.15$ for females nModel (see eq. (2.39) in Verbeek): n wagei = β1 + β2 malei + εi nmalei: male dummy, has value 1 if individual is male, otherwise value 0 nOLS-estimation gives n wagei = 5.15 + 1.17*malei nCompare with averages! n n n n n n March 19, 2010 Hackl, Advanced Econometrics 9 Example: Individ. Wages, cont’d nOLS estimated wage equation (Table 2.1, Verbeek) n n n n n n n n n wagei = 5.15 + 1.17*malei n male: 6.313, female: 5.150 n n n n March 19, 2010 Hackl, Advanced Econometrics 10 OLS-Estimators: The General Case nModel for Y contains K-1 explanatory variables n Y = b1 + b2X2 + … + bKXK = x’b nwith x = (1, X2, …, XK)’ and b = (b1, b2, …, bK)’ nObservations: (yi, xi) = (yi, (1, xi2, …, xiK)’), i = 1, …, N nOLS-estimates b = (b1, b2, …, bK)’ are obtained by minimizing n n nthis results in n n n n n March 19, 2010 Hackl, Advanced Econometrics 11 Best Linear Approximation nGiven the observations: (yi, xi’) = (yi, (1, xi2, …, xiK)’), i = 1, …, N nFor yi, the linear combination or fitted value n n nis the best linear combination of Y from X2, …, XK and a constant (the intercept) n nResiduals: ei = yi – xi‘b, i = 1, …, N nMinimum value of objective function: S(b) = Si ei2 nOrthogonality of e = (e1, …, eN)‘ to each xi = (x1i, …, xNi)‘: e‘xi = 0 nSi ei = 0: average residual is zero, if the model has an intercept March 19, 2010 Hackl, Advanced Econometrics 12 Matrix Notation nN observations n (y1,x1), … , (yN,xN) n nModel: yi = b1 + b2xi + εi, i = 1, …,N, or n y = Xb + ε nwith n n n n nOLS-estimates: n b = (X’X)-1X’y March 19, 2010 Hackl, Advanced Econometrics 13 Advanced Econometrics - Lecture 1 nRegression: a descriptive tool nEconomic models nOLS estimation nProperties of OLS estimators nt-test and F-test nAsymptotic properties of OLS estimators nMulticollinearity nModel specification and tests n March 19, 2010 Hackl, Advanced Econometrics 14 Economic Models nDescribe economic relationships (not only a set of observations), have an economic interpretation nLinear regression model: n yi = b1 + b2xi2 + … + bKxiK + ei = xi’b + εi nVariables yi, xi2, …, xiK: observable nError term εi (disturbance term) contains all influences that are not included explicitly in the model; unobservable; assumption E{εi | xi} = 0 gives q E{yi | xi} = xi‘β qthe model describes the expected value of y given x nUnknown coefficients b1, …, bK: bk measure the change of Y if Xk changes n March 19, 2010 Hackl, Advanced Econometrics 15 Regression Coefficients nLinear regression model: n yi = b1 + b2xi2 + … + bKxiK + ei = xi’b + εi nCoefficient bk measures the change of Y if Xk changes by one unit and all other X values remain the same (ceteris paribus condition); marginal effect of changing Xk on Y n n n nExample nWage equation: wagei = β1 + β2 malei + β3 schooli + β4 experi + εi n β3 measures the impact of one additional year at school upon a person’s wage, keeping gender and years of experience fixed March 19, 2010 Hackl, Advanced Econometrics 16 Regression Coefficients, cont’d nThe marginal effect of a changing regressor may be non-constant nExample nWage equation: wagei = β1 + β2 malei + β3 agei + β4 agei2 + εi n the impact of changing age (ceteris paribus) depends on age: March 19, 2010 Hackl, Advanced Econometrics 17 Elasticities nElasticity: measures the relative change in the dependent variable Y due to a relative change in Xk nFor a linear regression, the elasticity of Y with respect to Xk is n n n nFor a loglinear model log yi = (log xi)’ β + εi, (log xi)’ = (1, log xi2,…, log xik), the elasticities are the coefficients β n March 19, 2010 Hackl, Advanced Econometrics 18 Advanced Econometrics - Lecture 1 nRegression: a descriptive tool nEconomic models nOLS estimation nProperties of OLS estimators nt-test and F-test nAsymptotic properties of OLS estimators nMulticollinearity nModel specification and tests n March 19, 2010 Hackl, Advanced Econometrics 19 Fitting Economic Models to Data nObservations allow nto estimate parameters nto assess how well the data-generating process is represented by the model, i.e., how well the model coincides with reality nto improve the model if necessary nFitting a linear regression model to data nParameter estimates b = (b1, …, bK)’ for coefficients b = (b1, …, bK)’ nStandard errors se(bk) of the estimates bk, k=1,…,K nt-statistics, F-statistic, R2, Durbin Watson test-statistic, etc. n n March 19, 2010 Hackl, Advanced Econometrics 20 OLS Estimator and OLS Estimates b nOLS estimates b are a realization of the OLS estimator nThe OLS estimator is a random variable nObservations are a random sample from the population of all possible samples nObservations are generated by some random process nDistribution of the OLS estimator nActual distribution not known nTheoretical distribution determined by assumptions on qmodel specification qthe error term εi and regressor variables xi nQuality criteria (bias, accuracy, efficiency) of OLS estimates are determined by the properties of the distribution n March 19, 2010 Hackl, Advanced Econometrics 21 Gauss-Markov Assumptions A1 E{εi} = 0 for all i A2 all εi are independent of all xi (exogeneous xi) A3 V{ei} = s2 for all i (homoskedasticity) A4 Cov{εi, εj} = 0 for all i and j with i ≠ j (no autocorrelation) March 19, 2010 Hackl, Advanced Econometrics 22 Observation yi is a linear function yi = xi'b + εi of observations xik, k =1, …, K, of the regressor variables and the error term εi for i = 1, …, N; xi' = (xi1, …, xiK); X = (xik) n Systematic Part of the Model nThe systematic part E{yi | xi } of the model yi = xi'b + εi, given observations xi , is derived under the Gauss-Markov assumptions as follows: n(A2) implies E{ε | X} = E{ε} = 0 and V{ε | X} = V{ε} = s2 IN nObservations xi do not affect the properties of ε nThe systematic part n E{yi | xi } = xi'b n can be interpreted as the conditional expectation of yi, given observations xi March 19, 2010 Hackl, Advanced Econometrics 23 Advanced Econometrics - Lecture 1 nRegression: a descriptive tool nEconomic models nOLS estimation nProperties of OLS estimators nt-test and F-test nAsymptotic properties of OLS estimators nMulticollinearity nModel specification and tests n March 19, 2010 Hackl, Advanced Econometrics 24 Is the OLS estimator a good estimator? n nUnder the Gauss-Markov assumptions, the OLS estimator has nice properties; see below nGauss-Markov assumptions are very strong and often not satisfied nRelaxations of the Gauss-Markov assumptions and consequences of such relaxations are important topics March 19, 2010 Hackl, Advanced Econometrics 25 Properties of OLS Estimators n1. The OLS estimator b is unbiased: E{b} = β n Needs assumptions (A1) and (A2) n n2. The variance of the OLS estimator b is given by n V{b} = σ2( Σi xi xi’ )-1 n Needs assumptions (A1), (A2), (A3) and (A4) n n3. The OLS estimator b is a BLUE (best linear unbiased estimator) for β n Needs assumptions (A1), (A2), (A3), and (A4) and requires linearity in parameters n n n n n n March 19, 2010 Hackl, Advanced Econometrics 26 Standard Errors of OLS Estimators nVariance of the OLS estimators: n V{b} = σ2( Σi xi xi’ )-1 nStandard error of OLS estimate bk: The square root of the kth diagonal element of V{b} nEstimator V{b} is proportional to the variance σ2 of the error terms nEstimator for σ2: sampling variance s2 of the residuals ei n s2 = (N - K) -1 Σi ei2 nUnder assumptions (A1)-(A4), s2 is unbiased for σ2 nEstimated variance (covariance matrix) of b: n s2( Σi xi xi’ )-1 n March 19, 2010 Hackl, Advanced Econometrics 27 Normality of Error Terms n n nTogether with assumptions (A1), (A3), and (A4), (A5) implies n εi ~ NID(0,σ2) for all i ni.e., all εi are nindependent drawings nfrom a normal distribution nwith mean 0 nand variance σ2 nError terms are “normally and independently distributed” n n March 19, 2010 Hackl, Advanced Econometrics 28 A5 εi normally distributed for all i Properties of OLS Estimators n1. The OLS estimator b is unbiased: E{b} = β n2. The variance of the OLS estimator is given by n V{b} = σ2( Σi xi xi’ )-1 n3. The OLS estimator b is a BLUE (best linear unbiased estimator) for β n n4. The OLS estimator b is normally distributed with mean β and covariance matrix V{b} = σ2(Σi xi xi’ )-1 n Needs assumptions (A2) + (A5) n n n n n March 19, 2010 Hackl, Advanced Econometrics 29 Example: Individual Wages n wagei = β1 + β2 malei + εi nWhat do the assumptions mean? n(A1): β1 + β2 malei contains the whole systematic part of the model; no regressors besides gender relevant? n(A2): xi independent of εi for all i: knowledge of a person’s gender provides no information about further variables which affect the person’s wage; is that realistic? n(A3) V{εi} = σ2 for all i: variance of error terms (and of wages) is the same for males and females; is that realistic? n(A4) Cov{εi,,εj} = 0, i ≠ j: implied by random sampling n(A5) Normality of εi : is that realistic? (Would allow, e.g., for negative wages) n n n n n n n March 19, 2010 Hackl, Advanced Econometrics 30 Example: Individ. Wages, cont’d nOLS estimated wage equation (Table 2.1, Verbeek) n n n n n n n n n b1 = 5.147, se(b1) = 0.081; b2 = 1.166, se(b2) = 0.112 n 95% confidence interval for β1 : 4.988 £ β1 £ 5.306 n n n March 19, 2010 Hackl, Advanced Econometrics 31 Goodness-of-fit nThe quality of the linear approximation offered by the model yi = xi'b + εi can be measured by R2 nR2 is the proportion of the variance in y that can be explained by the linear combination of the regressors xi n n n nIf the model contains an intercept (as usual): n nAlternatively, R2 can be calculated as n March 19, 2010 Hackl, Advanced Econometrics 32 Properties of R2 n0 £ R2 £ 1, if the model contains an intercept nComparisons of R2 for two models makes no sense if y is different nR2 cannot decrease if a variable is added nadjusted R2: compensated for added regressor, penalty for increasing K n n nUncentered R2 n 1 – Σi ei²/ Σi yi² March 19, 2010 Hackl, Advanced Econometrics 33 Advanced Econometrics - Lecture 1 nRegression: a descriptive tool nEconomic models nOLS estimation nProperties of OLS estimators nt-test and F-test nAsymptotic properties of OLS estimators nMulticollinearity nModel specification and tests n March 19, 2010 Hackl, Advanced Econometrics 34 Testing of a Regression Coefficient: t-Test nFor testing a restriction wrt a single regression coefficient bk: nNull hypothesis H0: bk = q nAlternative HA: bk > q (or bk < q or bk ≠ q) nTest statistic: (computed from the sample with known distribution under the null hypothesis) n n n tk follows the t-distribution with N-K degrees of freedom (d.f.) qunder H0 qgiven the Gauss-Markov assumptions and normality of the error terms εi nReject H0, if the p-value P{tN-K > tk | H0} is small (tk-value is large) n n March 19, 2010 Hackl, Advanced Econometrics 35 Example: Individ. Wages, cont’d nOLS estimated wage equation (Table 2.1, Verbeek) n n n n n n nTest of null hypothesis H0: β2 = 0 (no gender effect on wages) against HA: β2 > 0 n t2 = b2/se(b2) = 1.1661/0.1122 = 10.38 nUnder H0, t follows the t-distribution with 3294-2 = 3292 d.f. np-value = P{t3292 > 10.38 | H0} = 3.7E-25: reject H0! n n n March 19, 2010 Hackl, Advanced Econometrics 36 Example: Individ. Wages, cont’d March 19, 2010 Hackl, Advanced Econometrics 37 OLS estimated wage equation: Output from GRETL Modell 1: KQ, benutze die Beobachtungen 1-3294 Abhängige Variable: WAGE Koeffizient Std. Fehler t-Quotient P-Wert const 5,14692 0,0812248 63,3664 <0,00001 *** MALE 1,1661 0,112242 10,3891 <0,00001 *** Mittel d. abh. Var. 5,757585 Stdabw. d. abh. Var. 3,269186 Summe d. quad. Res. 34076,92 Stdfehler d. Regress. 3,217364 R-Quadrat 0,031746 Korrigiertes R-Quadrat 0,031452 F(1, 3292) 107,9338 P-Wert(F) 6,71e-25 Log-Likelihood -8522,228 Akaike-Kriterium 17048,46 Schwarz-Kriterium 17060,66 Hannan-Quinn-Kriterium 17052,82 p-value for tMALE-test: < 0,00001 „gender has a significant effect on wages p.h“ OLS Estimators: Asymptotic Distribution nIf the Gauss-Markov (A1) - (A4) assumptions hold but not the normality assumption (A5): nt-statistic n n nfollows asymptotically (N → ∞) the standard normal distribution nIn many situations, the unknown exact properties are substituted by asymptotic results (asymptotic theory) nThe t-statistic nfollows approximately the t-distribution with N-K d.f. nfollows approximately the standard normal distribution N(0,1) nThe approximation error decreases with increasing sample size N n n n March 19, 2010 Hackl, Advanced Econometrics 38 Testing Several Regression Coefficients nFor testing a restriction wrt more than one, say J with 1 F | H0} is small (F-value is large) n n March 19, 2010 Hackl, Advanced Econometrics 39 Example: Individ. Wages, cont’d nA more general model is n wagei = β1 + β2 malei + β3 schooli + β4 experi + εi nβ2 measures the difference in expected wage between a male and a female, given the other regressors fixed, i.e., with the same schooling and experience: ceteris paribus condition nHave school and exper an explanatory power? nTest of null hypothesis H0: β3 = β4 = 0 against HA: H0 not true nR02 = 0.0317 (see p.31) nR12 = 0.1326 (see p.36) n n np-value = P{F2,3290 > 191.35 | H0} = 2.43E-79 n March 19, 2010 Hackl, Advanced Econometrics 40 Example: Individ. Wages, cont’d nOLS estimated wage equation (Table 2.2, Verbeek) n n n n n n n n n n n March 19, 2010 Hackl, Advanced Econometrics 41 Testing Several Regression Coefficients, cont’d nTest again nH0: bk = 0, K-J+1 ≤ k ≤ K nHA: at least one of these bk ≠ 0 n nThe test statistic F can alternatively be calculated as n n n S0 (S1): sum of squared residuals for the (un)restricted model March 19, 2010 Hackl, Advanced Econometrics 42 Example: Individ. Wages, cont’d nA more general model is n wagei = β1 + β2 malei + β3 schooli + β4 experi + εi nβ2 measures the difference in expected wage between a male and a female, given the other regressors fixed, i.e., with the same schooling and experience: ceteris paribus condition nHave school and exper an explanatory power? nTest of null hypothesis H0: β3 = β4 = 0 against HA: H0 not true nS0 = 34076.92 (see p.32) nS1 = 30527.87 n F = [(34076.92 - 30527.87)/2]/[30527.87/(3294-4)] = 191.24 nDoes any regressor contribute to explanation? Overall F-test (see Table 2.2 or GRETL-output): F = 167.63, p-value: 4.0E-101 n March 19, 2010 Hackl, Advanced Econometrics 43 The General Case nTest of H0: Rb = q nRb = q: J linear restrictions on the coefficients (R: JxK matrix, q: K-vector) nExample: n n nWald test: ξ = (Rb-q)’[RV{b}R’]-1(Rb-q) follows under H0 approximately the Chi-squared distribution with J d.f. n nF = ξ /J is algebraically identical to the F-test with n n March 19, 2010 Hackl, Advanced Econometrics 44 p-value, Size, and Power nType I error: the null hypothesis is rejected, while it is actually true np-value: the probability to commit the type I error nIn experimental situations, the probability of committing the type I error can be chosen before applying the test; the probability of committing the type I error is denoted the size α of the test nIn model-building situations, not a decision but learning from data is intended; multiple testing is quite usual; use of p-value is more appropriate nType II error: the null hypothesis is not rejected, while it is actually wrong nThe probability to decide in favor of the true alternative, i.e., not making a type II error, is called the power of the test; depends of true parameter values n n n March 19, 2010 Hackl, Advanced Econometrics 45 p-value, Size, and Power, cont’d nThe smaller the size of the test, the larger is its power (for a given sample size) nThe more HA deviates from H0, the larger is the power of a test of a given size (given the sample size) nThe larger the sample size, the larger is the power of a test of a given size n nAttention! Significance vs relevance n n n March 19, 2010 Hackl, Advanced Econometrics 46 Advanced Econometrics - Lecture 1 nRegression: a descriptive tool nEconomic models nOLS estimation nProperties of OLS estimators nt-test and F-test nAsymptotic properties of OLS estimators nMulticollinearity nModel specification and tests n March 19, 2010 Hackl, Advanced Econometrics 47 OLS Estimators: Asymptotic Properties nGauss-Markov assumptions plus the normality assumptions are in many situations very restrictive nAn alternative are properties derived from asymptotic theory nAsymptotic results hopefully are sufficiently precise approximations for large (but finite) N nTypically, Monte Carlo simulations are used to assess the quality of asymptotic results nAsymptotic theory: deals with the case where the sample size N goes to infinity: N → ∞ n n March 19, 2010 Hackl, Advanced Econometrics 48 OLS Estimators: Consistency nConsistency of the OLS estimators b: nFor N → ∞, the probability that b differs from β by a certain amount goes to 0 nThe distribution of b collapses in β nThe OLS estimators b are consistent, n plimN → ∞ b = β, nif (A2) from the Gauss-Markov assumptions and the assumption (A6) is fulfilled: n n n n March 19, 2010 Hackl, Advanced Econometrics 49 A6 1/N ΣNi=1 xi xi’ converges with growing N to a finite, nonsingular matrix Σxx OLS Estimators: Consistency, cont’d nConsistency of the OLS estimators can also be shown to hold under weaker assumptions: nThe OLS estimators b are consistent, n plimN → ∞ b = β, nif the assumptions (A7) and (A6) are fulfilled n n n nAttention: n(A7) does not imply (A2) nThe conditions for consistency are weaker than that for unbiasedness n March 19, 2010 Hackl, Advanced Econometrics 50 A7 The error terms have zero mean and are uncorrelated with each of the regressors: E{xi εi} = 0 OLS Estimators: Consistency, cont’d nThe estimator s2 for the error term variance σ2 is consistent, n plimN → ∞ s2 = σ2, nif the assumptions (A3), (A6), and (A7) are fulfilled n n n n March 19, 2010 Hackl, Advanced Econometrics 51 OLS Estimators: Asymptotic Normality nUnder the Gauss-Markov assumptions (A1)-(A4) and assumption (A6), the OLS estimators b follow approximately the normal distribution n n n nThe approximate distribution does not make use of assumption (A5), i.e., the normality of the error terms! nTests of hypotheses on coefficients bk, nt-test nF-test ncan be perfomed making use of the approximate normal distribution n n n n n n March 19, 2010 Hackl, Advanced Econometrics 52 Advanced Econometrics - Lecture 1 nRegression: a descriptive tool nEconomic models nOLS estimation nProperties of OLS estimators nt-test and F-test nAsymptotic properties of OLS estimators nMulticollinearity nModel specification and tests n March 19, 2010 Hackl, Advanced Econometrics 53 Multicollinearity nOLS estimators b = (X’X)-1X’y for regression coefficients b require that the KxK matrix n X’X or Σi xi xi’ n can be inverted nIn real situations, regressors may be correlated, such as nexperience and schooling (measured in years) nage and experience ninflation rate and nominal interest rate ncommon trends of economic time series, e.g., in lag structures n nMulticollinearity: between the explanatory variables exists nan exact linear relationship nan approximate linear relationship n n n n March 19, 2010 Hackl, Advanced Econometrics 54 Multicollinearity: Consequences nExact linear relationship between regressors (“exact multicollinearity”): nExample: Wage equation qRegressors male and female in addition to intercept qRegressor exper defined as exper = age - school - 6 nΣi xi xi’ is not invertible nEconometric software reports ill-defined matrix Σi xi xi’ nGRETL drops regressor nApproximate linear relationship between regressors: nWhen correlations are high: hard to identify the individual impact of each of the regressors nInflated variances: if xk can be approximated by the other regressors, variance of bk is inflated; reduced power of t-test n n n n March 19, 2010 Hackl, Advanced Econometrics 55 Variance Inflation Factor nVariance of bk n n n n Rk2: R2 of the regression of xk on all other regressors nIf xk can be approximated by the other regressors, Rk2 is close to 1, the variance inflated nVariance inflation factor: VIF(bk) = (1 - Rk2)-1 nLarge values for some or all VIFs indicate multicollinearity nAttention! Large values for VIF can also have other causes nSmall value of variance of Xk nSmall number N of observations n n n n March 19, 2010 Hackl, Advanced Econometrics 56 Multicollinearity: Indicators nLarge values for some or all variance inflation factors VIF(bk) are an indicator for multicollinearity nOther indicators: nAt least one of the Rk2, k = 1, …, K, has a large value nLarge values of standard errors se(bk) (low t-statistics), but reasonable or good R2 and F-statistics nEffect of adding a regressor on standard errors se(bk) of estimates bk of regressors already in the model: increasing values of se(bk) indicate multicollinearity n n n n March 19, 2010 Hackl, Advanced Econometrics 57 Advanced Econometrics - Lecture 1 nRegression: a descriptive tool nEconomic models nOLS estimation nProperties of OLS estimators nt-test and F-test nAsymptotic properties of OLS estimators nMulticollinearity nModel specification and tests n March 19, 2010 Hackl, Advanced Econometrics 58 Selection of Regressors nSpecification errors: nOmission of a relevant variable nInclusion of a irrelevant variable nQuestions: nWhat are the consequences? nHow to avoid specification errors? nHow to detect a committed specification error? n n n n March 19, 2010 Hackl, Advanced Econometrics 59 Example: Income and Consumption Hackl, Advanced Econometrics 60 PCR: Private Consumption, real, in bn. EUROs PYR: Household's Dispos- able Income, real, in bn. EUROs 1970:1-2003:4 Basis: 1995 Source: AWM-Database March 19, 2010 Income and Consumption Hackl, Advanced Econometrics 61 PCR: Private Consumption, real, in bn. EUROs PYR: Household's Dispos- able Income, real, in bn. EUROs 1970:1-2003:4 Basis: 1995 Source: AWM-Database March 19, 2010 Income and Consumption: Growth Rates Hackl, Advanced Econometrics 62 PCR_D4: Private Consump- tion, real, growth rate PYR_D4: Household’s Dis- posable Income, real, growth rate 1970:1-2003:4 Basis: 1995 Source: AWM-Database March 19, 2010 Consumption Function nC: Private Consumption, real, growth rate (PCR_D4) nY: Household’s Disposable Income, real, growth rate (PYR_D4) nT: Trend (Ti = i/1000) n n nConsumption function with trend Ti = i/1000: n n Hackl, Advanced Econometrics 63 March 19, 2010 Consumption Function, cont’d Hackl, Advanced Econometrics 64 OLS estimated consumption function: Output from GRETL Abhängige Variable: PCR_D4 Koeffizient Std.-fehler t-Quotient P-Wert ------------------------------------------------------------- const 0,0162489 0,00187868 8,649 1,76e-014 *** PYR_D4 0,707963 0,0424086 16,69 4,94e-034 *** T -0,0682847 0,0188182 -3,629 0,0004 *** Mittel d. abh. Var. 0,024911 Stdabw. d. abh. Var. 0,015222 Summe d. quad. Res. 0,007726 Stdfehler d. Regress. 0,007739 R-Quadrat 0,745445 Korrigiertes R-Quadrat 0,741498 F(2, 129) 188,8830 P-Wert(F) 4,71e-39 Log-Likelihood 455,9302 Akaike-Kriterium -905,8603 Schwarz-Kriterium -897,2119 Hannan-Quinn-Kriterium -902,3460 rho 0,701126 Durbin-Watson-Stat 0,601668 March 19, 2010 Selection of Regressors nSpecification errors: nOmission of a relevant variable nInclusion of a irrelevant variable nQuestions: nWhat are the consequences? nHow to avoid specification errors? nHow to detect a committed specification error? n n n n March 19, 2010 Hackl, Advanced Econometrics 65 Misspecification: Omitted Regressor nTwo models: n yi = xi‘β + zi’γ + εi (A) n yi = xi‘β + vi (B) nOLS estimates bB of β from (B) can be written with yi from (A): n n nIf (A) is the true model but (B) is specified, i.e., relevant regressors zi are omitted, bB is biased by n n n Omitted variable bias nNo bias if (a) γ = 0 or if (b) variables in xi and zi are orthogonal n n n n March 19, 2010 Hackl, Advanced Econometrics 66 Misspecification: Irrelevant Regressor nTwo models: n yi = xi‘β + zi’γ + εi (A) n yi = xi‘β + vi (B) nIf (B) is the true model but (A) is specified, i.e., the model contains irrelevant regressors zi nThe OLS estimates bA nare unbiased nhave a higher variance than the OLS estimate bB obtained from fitting model (B) n n n n n March 19, 2010 Hackl, Advanced Econometrics 67 Specification Search nGeneral-to-specific modeling: 1.List all potential regressors 2.Specify the most general model: it includes all potential regressors 3.Test iteratively which variables have to be dropped 4.Stop if no more variables have to be dropped nThe procedure is also known as the LSE (London School of Economics) method n n n March 19, 2010 Hackl, Advanced Econometrics 68 Specification Search, cont’d nSome remarks nAlternatively, one can start with a small model and add variables as long as they turn out to contribute to explaining Y nStepwise regression nAdding and deleting can be based on qt-statistic, F-statistic qAdjusted R2 qAkaike’s Information Criterion AIC, Schwarz’s Bayesian Information Criterion BIC nThe corresponding probabilities for type I and type II errors can hardly be assessed nSpecification search can be subsumed under data mining n n n n March 19, 2010 Hackl, Advanced Econometrics 69 Comparison of Models nNested models [cf. p.58: model (B) is nested in model (A)] nDo the J added regressors contribute to explaining Y nF-test (t-test when J = 1) for testing H0: coefficients of added regressors are zero q q qR02 and R12 are the R2 of the models without and with the J additional regressors, respectively nComparison of adjusted R2: adj R12 > adj R02 equivalent to F > 1 nInformation Criteria: penalty for increasing number of regressors (cf. adjusted R2), e.g., Schwarz’s Bayesian Information Criterion n n n n March 19, 2010 Hackl, Advanced Econometrics 70 Comparison of Models, cont’d n nNon-nested alternative models: A: yi = xi’β + εi, B: yi = zi’γ + vi nNon-nested or encompassing F-test: compares by F-tests artificially nested models n yi = xi’β + z2i’δB + εi with z2i not element of xi: test of δB = 0 nJ-test: applies an F-test to a combined model n yi = (1 - δ) xi’β + δ z2i’γ + ui n nChoice between linear and loglinear functional form nPE-test March 19, 2010 Hackl, Advanced Econometrics 71 PE-Test nEstimate both models qA: yi = xi’β + εi qB: log yi = xi’β + vi n and calculate the fitted values ŷ (from model A) and ӱ (from B) nTest δLIN = 0 in n yi = xi’β + δLIN (log ŷi – log ӱi) + ui n not rejecting δLIN = 0 favors the linear model nTest δLOG = 0 in n log yi = xi’β + δLOG (ŷi – exp{log ӱi}) + ui n not rejecting δLOG = 0 favors the linear model nRejection both null hypotheses: find a more adequate model n March 19, 2010 Hackl, Advanced Econometrics 72 Testing the Functional Form nMisspecification of yi = xi’β + εi: violation of linearity in xi nE{yi|xi} = g(xi, β), e.g., qg(xi, β) = β1 + β2 xiβ3 qg(xi, β) = β1 xi1β2 xi2β3 nLinear model xi’β does not explain well Y nRESET (Regression Equation Specification Error Test) test (Ramsey) qAlternative model: linear model extended by adding ŷi², ŷi³, ... with ŷi: fitted values from the linear model qUses F-test to decide whether powers of fitted values contribute as additional regressors to explaining Y qPower Q of fitted values: typical choice is Q = 2 or Q = 3 March 19, 2010 Hackl, Advanced Econometrics 73 Exercise nIn Exercise 2.2 of Verbeek, the sample given in data set “wages” is used to answer the question whether women are systematically underpaid compared with men. Table 2.8, p.48, gives the output of a regression analysis, the model for the log hourly wages being explained besides male by age and educ. Use in this exercise the whole dataset (data file WAGES1) and the definition age = school + exper + 6. 1.Repeat the analysis for the model (model 1) where the log hourly wages are explained by male and age. 2.Repeat the analysis (model 2) after adding to model 1 four dummy variables for the educational levels 2 through 5 instead of the variable educ. March 19, 2010 Hackl, Advanced Econometrics 74 Exercise, cont’d 3.Use an F-test, adjusted R2, and the BIC to decide whether model 1 or that model 2 is preferable. 4.Use the PE-test (see Verbeek, p. 64) to decide whether the Verbeek’s model in Table 2.8 (where levels of hourly wages are explained) or the model 1 extended by the variable educ is to be preferred. March 19, 2010 Hackl, Advanced Econometrics 75