Econometrics - Lecture 2 Introduction to Linear Regression – Part 2 Contents nGoodness-of-Fit nHypothesis Testing nAsymptotic Properties of the OLS Estimator nMulticollinearity nPrediction n Oct 6, 2017 Hackl, Econometrics, Lecture 2 2 Goodness-of-fit R² nThe quality of the model yi = xi'b + εi , i = 1, …, N, with K regressors can be measured by R2, the goodness-of-fit (GoF) statistic nR2 is the portion of the variance in Y that can be explained by the linear regression with regressors Xk, k=1,…,K n n nIf the model contains an intercept (as usual): n n with = (Σi ei²)/(N-1) nAlternatively, R2 can be calculated as n Oct 6, 2017 Hackl, Econometrics, Lecture 2 3 Properties of R2 nR2 is the portion of the variance in Y that can be explained by the linear regression; 100R2 is measured in percent n0 £ R2 £ 1, if the model contains an intercept nR2 = 1: all residuals are zero nR2 = 0: for all regressors, bk = 0, k = 2, …, K; the model explains nothing nR2 cannot decrease if a variable is added nComparisons of R2 for two models makes no sense if the explained variables are different Oct 6, 2017 Hackl, Econometrics, Lecture 2 4 Example: Individ. Wages, cont’d nOLS estimated wage equation (Table 2.1, Verbeek) n n n n n n n n n only 3.17% of the variation of individual wages p.h. is due to the gender n n Oct 6, 2017 Hackl, Econometrics, Lecture 2 5 Individual Wages, cont’d nWage equation with three regressors (Table 2.2, Verbeek) n n n n n n n n n n nR2 increased due to adding school and exper Oct 6, 2017 Hackl, Econometrics, Lecture 2 6 Other GoF Measures nUncentered R2: for the case of no intercept; the Uncentered R2 cannot become negative n Uncentered R2 = 1 – Σi ei²/ Σi yi² nadj R2 (adjusted R2): for comparing models; compensated for added regressor, penalty for increasing K n n n for a given model, adj R2 is smaller than R2 nFor other than OLS estimated models n n it coincides with R2 for OLS estimated models Oct 6, 2017 Hackl, Econometrics, Lecture 2 7 Contents nGoodness-of-Fit nHypothesis Testing nAsymptotic Properties of the OLS Estimator nMulticollinearity nPrediction n Oct 6, 2017 Hackl, Econometrics, Lecture 2 8 Individual Wages nOLS estimated wage equation (Table 2.1, Verbeek) n n n n n n n n n b1 = 5.147, se(b1) = 0.081: mean wage p.h. for females: 5.15$, with std.error of 0.08$ n b2 = 1.166, se(b2) = 0.112 n n n Oct 6, 2017 Hackl, Econometrics, Lecture 2 9 OLS Estimator: Distributional Properties nUnder the assumptions (A1) to (A5): nThe OLS estimator b = (X’X)-1 X’y is normally distributed with mean β and covariance matrix V{b} = σ2(X‘X)-1 n b ~ N(β, σ2(X’X)-1), bk ~ N(βk, σ2ckk), k=1,…,K n with ckk the k-th diagonal element of (X’X)-1 nThe statistic n n n follows the standard normal distribution N(0,1) nThe statistic n n n follows the t-distribution with N-K degrees of freedom (df) Oct 6, 2017 Hackl, Econometrics, Lecture 2 10 Testing a Regression Coefficient: t-Test nFor testing a restriction on the (single) regression coefficient bk: nNull hypothesis H0: bk = q (most interesting case: q = 0) nAlternative HA: bk > q nTest statistic: (computed from the sample with known distribution under the null hypothesis) n n ntk is a realization of the random variable tN-K, which follows the t-distribution with N-K degrees of freedom (df = N-K) qunder H0 and qgiven the Gauss-Markov assumptions and normality of the errors nReject H0, if the p-value P{tN-K > tk | H0} is small (tk-value is large) n n Oct 6, 2017 Hackl, Econometrics, Lecture 2 11 Normal and t-Distribution nStandard normal distribution: Z ~ N(0,1) nDistribution function F(z) = P{Z ≤ z} n nt-distribution: Tdf ~ t(df) nDistribution function F(t) = P{Tdf ≤ t} np-value: P{TN-K > tk | H0} = 1 – FH0(tk) n nFor growing df, the t-distribution approaches the standard normal distribution, Tdf follows asymptotically (N → ∞) the N(0,1)-distribution n0.975-percentiles tdf,0.975 of the t(df)-distribution n n n0.975-percentile of the standard normal distribution: z0.975 = 1.96 n Oct 6, 2017 Hackl, Econometrics, Lecture 2 12 df 5 10 20 30 50 100 200 ∞ tdf,0.025 2.571 2.228 2.085 2.042 2.009 1.984 1.972 1.96 File:Normal Distribution CDF Diagram.svg OLS Estimators: Asymptotic Distribution nIf the Gauss-Markov (A1) - (A4) assumptions hold but not the normality assumption (A5): nt-statistic n n nfollows asymptotically (N → ∞) the standard normal distribution nIn many situations, the unknown true properties are substituted by approximate results (asymptotic theory) nThe t-statistic nfollows the t-distribution with N-K d.f. nfollows approximately the standard normal distribution N(0,1) nThe approximation error decreases with increasing sample size N n n n Oct 6, 2017 Hackl, Econometrics, Lecture 2 13 Two-sided t-Test nFor testing a restriction wrt a single regression coefficient bk: nNull hypothesis H0: bk = q nAlternative HA: bk ≠ q nTest statistic: (computed from the sample with known distribution under the null hypothesis) n n n follows the t-distribution with N-K d.f. nReject H0, if the p-value P{TN-K > |tk| | H0} is small (|tk|-value is large) n n Oct 6, 2017 Hackl, Econometrics, Lecture 2 14 Individual Wages, cont’d nOLS estimated wage equation (Table 2.1, Verbeek) n n n n n n nTest of null hypothesis H0: β2 = 0 (no gender effect on wages, equal wages for males and females) against HA: β2 > 0 n t2 = b2/se(b2) = 1.1661/0.1122 = 10.38 nUnder H0, T follows the t-distribution with df = 3294-2 = 3292 np-value = P{T3292 > 10.38 | H0} = 3.7E-25: reject H0! n n n Oct 6, 2017 Hackl, Econometrics, Lecture 2 15 Individual Wages, cont’d Oct 6, 2017 Hackl, Econometrics, Lecture 2 16 OLS estimated wage equation: Output from GRETL Model 1: OLS, using observations 1-3294 Dependent variable: WAGE coefficient std. error t-ratio p-value const 5,14692 0,0812248 63,3664 <0,00001 *** MALE 1,1661 0,112242 10,3891 <0,00001 *** Mean dependent var 5,757585 S.D. dependent var 3,269186 Sum squared resid 34076,92 S.E. of regression 3,217364 R- squared 0,031746 Adjusted R- squared 0,031452 F(1, 3292) 107,9338 P-value(F) 6,71e-25 Log-likelihood -8522,228 Akaike criterion 17048,46 Schwarz criterion 17060,66 Hannan-Quinn 17052,82 p-value for tMALE-test: < 0.00001 „gender has a significant effect on wages, males earn more“ Significance Tests nFor testing a restriction wrt a single regression coefficient bk: nNull hypothesis H0: bk = q nAlternative HA: bk ≠ q nTest statistic: (computed from the sample with known distribution under the null hypothesis) n n nDetermine the critical value tN-K,1-a/2 for the significance level a from n P{|Tk| > tN-K,1-a/2 | H0} = a nReject H0, if |Tk| > tN-K,1-a/2 nTypically, the value 0.05 is taken for a n Oct 6, 2017 Hackl, Econometrics, Lecture 2 17 Significance Tests, cont’d nOne-sided test : nNull hypothesis H0: bk = q nAlternative HA: bk > q (bk < q) nTest statistic: (computed from the sample with known distribution under the null hypothesis) n n nDetermine the critical value tN-K,a for the significance level a from n P{Tk > tN-K,a | H0} = a nReject H0, if tk > tN-K,a (tk < -tN-K,a) n n Oct 6, 2017 Hackl, Econometrics, Lecture 2 18 Confidence Interval for bk nRange of values (bkl, bku) for which the null hypothesis on bk is not rejected n bkl = bk - tN-K,1-a/2 se(bk) < bk < bk + tN-K,1-a/2 se(bk) = bku nRefers to the significance level a of the test nFor large values of df and a = 0.05 (1.96 ≈ 2) n bk – 2 se(bk) < bk < bk + 2 se(bk) nConfidence level: g = 1- a; typically g = 0.95 nInterpretation: nA range of values for the true bk that are not unlikely (contain the true value with probability 100g%), given the data (?) nA range of values for the true bk such that 100g% of all intervals constructed in that way contain the true bk n n Oct 6, 2017 Hackl, Econometrics, Lecture 2 19 Individual Wages, cont’d nOLS estimated wage equation (Table 2.1, Verbeek) n n n n n n nThe confidence interval for the gender wage difference (in USD p.h.) nconfidence level g = 0.95 n 1.1661 – 1.96*0.1122 < b2 < 1.1661 + 1.96*0.1122 n 0.946 < b2 < 1.386 (or 0.94 < b2 < 1.39) ng = 0.99: 0.877 < b2 < 1.455 n n n Oct 6, 2017 Hackl, Econometrics, Lecture 2 20 Testing a Linear Restriction on Regression Coefficients nLinear restriction r’b = q nNull hypothesis H0: r’b = q nAlternative HA: r’b > q nTest statistic n n n se(r’b) is the square root of V{r’b} = r’V{b}r nUnder H0 and (A1)-(A5), t follows the t-distribution with df = N-K nGRETL: The option Linear restrictions from Tests on the output window of the Model statement Ordinary Least Squares allows to test linear restrictions on the regression coefficients n Oct 6, 2017 Hackl, Econometrics, Lecture 2 21 Testing Several Regression Coefficients: F-test nFor testing a restriction wrt more than one, say J with 1 < J < K, regression coefficients: nNull hypothesis H0: bk = 0, K-J+1 ≤ k ≤ K nAlternative HA: for at least one k, K-J+1 ≤ k ≤ K, bk ≠ 0 nF-statistic: (computed from the sample, with known distribution under the null hypothesis; R02 (R12): R2 for (un)restricted model) n n n F follows the F-distribution with J and N-K d.f. qunder H0 and given the Gauss-Markov assumptions (A1)-(A4) and normality of the εi (A5) nReject H0, if the p-value P{FJ,N-K > F | H0} is small (F-value is large) nThe F-test with J = K-1 is a standard test in GRETL n n Oct 6, 2017 Hackl, Econometrics, Lecture 2 22 Individual Wages, cont’d nA more general model is n wagei = β1 + β2 malei + β3 schooli + β4 experi + εi nβ2 measures the difference in expected wages p.h. between males and females, given the other regressors fixed, i.e., with the same schooling and experience: ceteris paribus condition nHave school and exper an explanatory power? nTest of null hypothesis H0: β3 = β4 = 0 against HA: H0 not true nR02 = 0.0317 nR12 = 0.1326 n n np-value = P{F2,3290 > 191.24 | H0} = 2.68E-79 n Oct 6, 2017 Hackl, Econometrics, Lecture 2 23 Individual Wages, cont’d nOLS estimated wage equation (Table 2.2, Verbeek) n n n n n n n n n n n Oct 6, 2017 Hackl, Econometrics, Lecture 2 24 Alternatives for Testing Several Regression Coefficients nTest again nH0: bk = 0, K-J+1 ≤ k ≤ K nHA: at least one of these bk ≠ 0 1.The test statistic F can alternatively be calculated as 2. 2. nS0 (S1): sum of squared residuals for the (un)restricted model nF follows under H0 and (A1)-(A5) the F(J,N-K)-distribution 2.If s2 is known, the test can be based on n F = (S0-S1)/s2 n under H0 and (A1)-(A5): Chi-squared distributed with J d.f. nFor large N, s2 is very close to s2; test with F approximates F-test Oct 6, 2017 Hackl, Econometrics, Lecture 2 25 Individual Wages, cont’d nA more general model is n wagei = β1 + β2 malei + β3 schooli + β4 experi + εi nHave school and exper an explanatory power? nTest of null hypothesis H0: β3 = β4 = 0 against HA: H0 not true nS0 = 34076.92, S1 = 30527.87 ns = 3.046143 n F(1) = [(34076.92 - 30527.87)/2]/[30527.87/(3294-4)] = 191.24 n F(2) = [(34076.92 - 30527.87)/2]/3.046143 = 191.24 nDoes any regressor contribute to explanation? nOverall F-test for H0: β2 = … = β4 = 0 against HA: H0 not true (see Table 2.2 or GRETL-output): J=3 n F = 167.63, p-value: 4.0E-101 n Oct 6, 2017 Hackl, Econometrics, Lecture 2 26 The General Case nTest of H0: Rb = q nRb = q: J linear restrictions on coefficients (R: JxK matrix, q: J-vector) nExample: n n nWald test: test statistic n ξ = (Rb - q)’[RV{b}R’]-1(Rb - q) nfollows under H0 for large N approximately the Chi-squared distribution with J d.f. nTest based on F = ξ /J is algebraically identical to the F-test with n n Oct 6, 2017 Hackl, Econometrics, Lecture 2 27 p-value, Size, and Power nType I error: the null hypothesis is rejected, while it is actually true np-value: the probability to commit the type I error nIn experimental situations, the probability of committing the type I error can be chosen before applying the test; this probability is the significance level α, also denoted as the size of the test nIn model-building situations, not a decision but learning from data is intended; multiple testing is quite usual; the use of p-values is more appropriate than using a strict α nType II error: the null hypothesis is not rejected, while it is actually wrong; the decision is not in favor of the true alternative nThe probability to decide in favor of the true alternative, i.e., not making a type II error, is called the power of the test; depends of true parameter values n n n Oct 6, 2017 Hackl, Econometrics, Lecture 2 28 p-value, Size, and Power, cont’d nThe smaller the size of the test, the smaller is its power (for a given sample size) nThe more HA deviates from H0, the larger is the power of a test of a given size (given the sample size) nThe larger the sample size, the larger is the power of a test of a given size n nAttention! Significance vs relevance n n n Oct 6, 2017 Hackl, Econometrics, Lecture 2 29 Contents nGoodness-of-Fit nHypothesis Testing nAsymptotic Properties of the OLS Estimator nMulticollinearity nPrediction n Oct 6, 2017 Hackl, Econometrics, Lecture 2 30 OLS Estimators: Asymptotic Properties nGauss-Markov assumptions (A1)-(A4) plus the normality assumption (A5) are in many situations very restrictive nAn alternative are properties derived from asymptotic theory nAsymptotic results hopefully are sufficiently precise approximations for large (but finite) N nTypically, Monte Carlo simulations are used to assess the quality of asymptotic results nAsymptotic theory: deals with the case where the sample size N goes to infinity: N → ∞ n n Oct 6, 2017 Hackl, Econometrics, Lecture 2 31 Chebychev’s Inequality nChebychev’s Inequality: Bound for the probability of deviations from its mean n P{|z-E{z}| > rs} < r- -2 n for all r>0; true for any distribution with moments E{z} and s2 = V{z} nFor OLS estimator bk: n n n for all d>0; ckk: the k-th diagonal element of (X’X)-1 = (Σi xi xi’)-1 nFor growing N: the elements of Σi xi xi’ increase, V{bk} decreases nGiven (A6) [see next slide], for all d>0 n n bk converges in probability to bk for N → ∞; plimN → ∞ bk = βk n n n n Oct 6, 2017 Hackl, Econometrics, Lecture 2 32 Consistency of the OLS-estimator nSimple linear regression n yi = b1 + b2xi + ei nObservations: (yi, xi), i = 1, …, N nOLS estimator n n n n n and converge in probability to Cov {x, e} and V{x} nDue to (A2), Cov {x, e} =0; with V{x}>0 follows n plimN → ∞ b2 = β2 + Cov {x, e}/V{x} = β2 n n n n Oct 6, 2017 Hackl, Econometrics, Lecture 2 33 OLS Estimators: Consistency nIf (A2) from the Gauss-Markov assumptions (exogenous xi, all xi and ei are independent) and the assumption (A6) are fulfilled: n n n bk converges in probability to bk for N → ∞ nConsistency of the OLS estimators b: nFor N → ∞, b converges in probability to β, i.e., the probability that b differs from β by a certain amount goes to zero for N → ∞ nThe distribution of b collapses in β nplimN → ∞ b = β nNeeds no assumptions beyond (A2) and (A6)! Oct 6, 2017 Hackl, Econometrics, Lecture 2 34 A6 1/N (ΣNi=1xi xi’) = 1/N (X’X) converges with growing N to a finite, nonsingular matrix Σxx OLS Estimators: Consistency, cont’d nConsistency of OLS estimators can also be shown to hold under weaker assumptions: nThe OLS estimators b are consistent, n plimN → ∞ b = β, nif the assumptions (A7) and (A6) are fulfilled n n n nFollows from n n nand n plim(b - β) = Sxx-1E{xi εi} n Oct 6, 2017 Hackl, Econometrics, Lecture 2 35 A7 The error terms have zero mean and are uncorrelated with each of the regressors: E{xi εi} = 0 Consistency of s2 nThe estimator s2 for the error term variance σ2 is consistent, n plimN → ∞ s2 = σ2, nif the assumptions (A3), (A6), and (A7) are fulfilled n n n n Oct 6, 2017 Hackl, Econometrics, Lecture 2 36 Consistency: Some Properties nplim g(b) = g(β) qif plim s2 = σ2, then plim s = σ nThe conditions for consistency are weaker than those for unbiasedness n Oct 6, 2017 Hackl, Econometrics, Lecture 2 37 OLS Estimators: Asymptotic Normality nDistribution of OLS estimators mostly unknown nApproximate distribution, based on the asymptotic distribution nMany estimators in econometrics follow asymptotically the normal distribution nAsymptotic distribution of the consistent estimator b: distribution of n N1/2(b - β) for N → ∞ nUnder the Gauss-Markov assumptions (A1)-(A4) and assumption (A6), the OLS estimators b fulfill n n “→” means “is asymptotically distributed as” n n n n n Oct 6, 2017 Hackl, Econometrics, Lecture 2 38 OLS Estimators: Approximate Normality nUnder the Gauss-Markov assumptions (A1)-(A4) and assumption (A6), the OLS estimators b follow approximately the normal distribution n n nThe approximate distribution does not make use of assumption (A5), i.e., the normality of the error terms! nTests of hypotheses on coefficients bk, nt-test nF-test ncan be performed by making use of the approximate normal distribution n n n n n n Oct 6, 2017 Hackl, Econometrics, Lecture 2 39 Assessment of Approximate Normality nQuality of napproximate normal distribution of OLS estimators np-values of t- and F-tests npower of tests, confidence intervals, ec. n depends on sample size N and factors related to Gauss-Markov assumptions etc. nMonte Carlo studies: simulations that indicate consequences of deviations from ideal situations nExample: yi = b1 + b2xi + ei; distribution of b2 under classical assumptions? n1) Choose N; 2) generate xi, ei, calculate yi, i=1,…,N; 3) estimate b2 nRepeat steps 1)-3) R times: the R values of b2 allow assessment of the distribution of b2 n Oct 6, 2017 Hackl, Econometrics, Lecture 2 40 Contents nGoodness-of-Fit nHypothesis Testing nAsymptotic Properties of the OLS Estimator nMulticollinearity nPrediction n Oct 6, 2017 Hackl, Econometrics, Lecture 2 41 Multicollinearity nOLS estimators b = (X’X)-1X’y for regression coefficients b require that the KxK matrix n X’X or Σi xi xi’ n can be inverted nIn real situations, regressors may be correlated, such as nage and experience (measured in years) nexperience and schooling ninflation rate and nominal interest rate ncommon trends of economic time series, e.g., in lag structures n nMulticollinearity: between the explanatory variables exists nan exact linear relationship (exact collinearity) nan approximate linear relationship n n n n Oct 6, 2017 Hackl, Econometrics, Lecture 2 42 Multicollinearity: Consequences nApproximate linear relationship between regressors: nWhen correlations between regressors are high: difficult to identify the individual impact of each of the regressors nInflated variances qIf xk can be approximated by the other regressors, variance of bk is inflated; qSmaller tk-statistic, reduced power of t-test nExample: yi = b1xi1 + b2xi2 + ei qwith sample variances of X1 and X2 equal 1 and correlation r12, n n n n n Oct 6, 2017 Hackl, Econometrics, Lecture 2 43 r12 0,3 0,5 0,7 0,9 1/(1-r122) 1,10 1,33 1,96 5,26 Exact Collinearity nExact linear relationship between regressors nExample: Wage equation qRegressors male and female in addition to intercept qRegressor age defined as age = 6 + school + exper nΣi xi xi’ is not invertible nEconometric software reports ill-defined matrix Σi xi xi’ nGRETL drops regressor nRemedy: nExclude (one of the) regressors nExample: Wage equation qDrop regressor female, use only regressor male in addition to intercept qAlternatively: use female and intercept qNot good: use of male and female, no intercept n n n n Oct 6, 2017 Hackl, Econometrics, Lecture 2 44 Variance Inflation Factor nVariance of bk n n n Rk2: R2 of the regression of xk on all other regressors nIf xk can be approximated by a linear combination of the other regressors, Rk2 is close to 1, the variance of bk inflated nVariance inflation factor: VIF(bk) = (1 - Rk2)-1 nLarge values for some or all VIFs indicate multicollinearity nWarning! Large values of the variance of bk (and reduced power of the t-test) can have various causes nMulticollinearity nSmall value of variance of Xk nSmall number N of observations n n n n Oct 6, 2017 Hackl, Econometrics, Lecture 2 45 Other Indicators for Multicollinearity nLarge values for some or all variance inflation factors VIF(bk) are an indicator for multicollinearity nOther indicators: nAt least one of the Rk2, k = 1, …, K, has a large value nLarge values of standard errors se(bk) (low t-statistics), but reasonable or good R2 and F-statistic nEffect of adding a regressor on standard errors se(bk) of estimates bk of regressors already in the model: increasing values of se(bk) indicate multicollinearity n n n n Oct 6, 2017 Hackl, Econometrics, Lecture 2 46 Contents nGoodness-of-Fit nHypothesis Testing nAsymptotic Properties of the OLS Estimator nMulticollinearity nPrediction n Oct 6, 2017 Hackl, Econometrics, Lecture 2 47 The Predictor nGiven the relation yi = xi’b + ei nGiven estimators b, predictor for the expected value of Y at x0, i.e., y0 = x0’b + e0: ŷ0 = x0’b nPrediction error: f0 = ŷ0 - y0 = x0’(b – b) + e0 nSome properties of ŷ0 nUnder assumptions (A1) and (A2), E{b} = b and ŷ0 is an unbiased predictor nVariance of ŷ0 n V{ŷ0} = V{x0’b} = x0’ V{b} x0 = s2 x0’(X’X)-1x0 = s02 nVariance of the prediction error f0 n V{f0} = V{x0’(b – b) + e0} = s2(1 + x0’(X’X)-1x0) = sf0² n given that e0 and b are uncorrelated n n Oct 6, 2017 Hackl, Econometrics, Lecture 2 48 Prediction Intervals n100g% prediction interval nfor the expected value of Y at x0, i.e., y0 = x0’b + e0: ŷ0 = x0’b n ŷ0 – z(1+g)/2 s0 ≤ y0 ≤ ŷ0 + z(1+g)/2 s0 n with the standard error s0 of ŷ0 from s02 = s2 x0’(X’X)-1x0 nfor the prediction Y at x0 n ŷ0 – z(1+g)/2 sf0 ≤ y0 ≤ ŷ0 + z(1+g)/2 sf0 n with sf0 from sf02 = s2 (1 + x0’(X’X)-1x0); takes the error term e0 into account nCalculation of sf0 nOLS estimate s2 of s2 from regression output (GRETL: “S.E. of regression”) nSubstitution of s2 for s2: s0 = s[x0’(X’X)-1x0]0.5, sf0 = [s2 + s02]0.5 n n n Oct 6, 2017 Hackl, Econometrics, Lecture 2 49 Example: Simple Regression nGiven the relation yi = b1 + xib2 + ei nPredictor for Y at x0, i.e., y0 = b1 + x0b2 + e0: n ŷ0 = b1 + x0’b2 nVariance of the prediction error n n nFigure: Prediction inter- n vals for various x0‘s n (indicated as “x”) for n g = 0.95 Oct 6, 2017 Hackl, Econometrics, Lecture 2 50 Individual Wages: Prediction nThe fitted model is n wagei = −3.3800 + 1.3444 malei + 0.6388 schooli + 0.1248 experi nFor a male with school = 12 and exper = 5, the predicted wage is n wage0 = 6.25405 ≈ 6.25 nCalculation of variance s02: nBased on variance s02 = x0’ V{b} x0 = s2 x0’(X’X)-1x0 is laborious nRe-estimating the model for regressors m1 = male–1, s1 = school–12, e1 = exper –5 gives n wage = 6.25405+ 1.3444 m1 + 0.6388 s1 + 0.1248 e1 n with a std.err. of the intercept of 0.10695. nThe std.err. of the intercept, i.e., of the expected wage wage0 , is s0 Oct 6, 2017 Hackl, Econometrics, Lecture 2 51 Individual Wages: Prediction, cont’d nThe 95% confidence interval for wage0 is n 6.25405 – 1.96* 0.10695 ≤ wage0 ≤ 6.25405 + 1.96* 0.10695 n or 6.04 ≤ wage0 ≤ 6.47 nThe 95% prediction interval for wage0: nFrom model fit: s = 3.046143 nsf0 = [s2 + s02]0.5 = [3.0461432 + 0.106952]0.5 = 3.048 n95% prediction interval n 6.254 – 1.96* 3.048 ≤ wage0 ≤ 6.254 + 1.96* 3.048 n or 0.16 ≤ wage0 ≤ 12.35 Oct 6, 2017 Hackl, Econometrics, Lecture 2 52 Your Homework 1.For Verbeek’s data set “wages1” use GRETL (a) for estimating a linear regression model with intercept for wage p.h. with explanatory variables male and school; (b) interpret the coefficients of the model; (c) test the hypothesis that men and women, on average, have the same wage p.h., against the alternative that women‘s wage p.h. are different from men’s wage p.h.; (d) repeat this test against the alternative that women earn less; (e) calculate a 95% confidence interval for the wage difference of males and females. 2.Generate a variable exper_b by adding the Binomial random variable BE~B(2,0.5) to exper; (a) estimate two linear regression models with intercept for wage p.h. with explanatory variables (i) male and exper, and (ii) male, exper_b, and exper; compare the standard errors of the estimated coefficients; Oct 6, 2017 Hackl, Econometrics, Lecture 2 53 Your Homework n (b) compare the VIFs for the variables of the two models; (c) check the correlations of the involved regressors. 3.Show for a linear regression with intercept that R2 < adj R2 4.Show that the F-test based on 5. 5. n and the F-test based on n n n are identical. Oct 6, 2017 Hackl, Econometrics, Lecture 2 54