LECTURE 5 Introduction to Econometrics Hypothesis testing & Goodness of fit October 20, 2017 1 / 25 ON THE PREVIOUS LECTURE 2 / 25 ON THE PREVIOUS LECTURE We discussed the principle of hypothesis testing 2 / 25 ON THE PREVIOUS LECTURE We discussed the principle of hypothesis testing Type I and Type II errors 2 / 25 ON THE PREVIOUS LECTURE We discussed the principle of hypothesis testing Type I and Type II errors Critical value and rejection region 2 / 25 ON THE PREVIOUS LECTURE We discussed the principle of hypothesis testing Type I and Type II errors Critical value and rejection region We derived the t-statistic t = β−β s.e.(β) 2 / 25 ON THE PREVIOUS LECTURE We discussed the principle of hypothesis testing Type I and Type II errors Critical value and rejection region We derived the t-statistic t = β−β s.e.(β) We defined the concept of the p-value 2 / 25 ON THE PREVIOUS LECTURE We discussed the principle of hypothesis testing Type I and Type II errors Critical value and rejection region We derived the t-statistic t = β−β s.e.(β) We defined the concept of the p-value We explained what significance of a coefficient means 2 / 25 ON THE PREVIOUS LECTURE We studied the impact of years of education on wages: 3 / 25 ON THE PREVIOUS LECTURE We studied the impact of years of education on wages:               3 / 25 ON TODAY’S LECTURE 4 / 25 ON TODAY’S LECTURE We will explain how multiple hypotheses are tested in a regression model 4 / 25 ON TODAY’S LECTURE We will explain how multiple hypotheses are tested in a regression model We will define the notion of the overall significance of a regression 4 / 25 ON TODAY’S LECTURE We will explain how multiple hypotheses are tested in a regression model We will define the notion of the overall significance of a regression We will introduce a measure of the goodness of fit of a regression (R2) 4 / 25 ON TODAY’S LECTURE We will explain how multiple hypotheses are tested in a regression model We will define the notion of the overall significance of a regression We will introduce a measure of the goodness of fit of a regression (R2) Readings for this week: Studenmund, Chapters 5.5 & 2.4 Wooldridge, Chapters 4 & 3 4 / 25 TESTING MULTIPLE HYPOTHESES Suppose we have a model yi = β0 + β1xi1 + β2xi2 + β3xi3 + εi 5 / 25 TESTING MULTIPLE HYPOTHESES Suppose we have a model yi = β0 + β1xi1 + β2xi2 + β3xi3 + εi Suppose we want to test multiple linear hypotheses in this model 5 / 25 TESTING MULTIPLE HYPOTHESES Suppose we have a model yi = β0 + β1xi1 + β2xi2 + β3xi3 + εi Suppose we want to test multiple linear hypotheses in this model For example, we want to see if the following restrictions on coefficients hold jointly: β1 + β2 = 1 and β3 = 0 5 / 25 TESTING MULTIPLE HYPOTHESES Suppose we have a model yi = β0 + β1xi1 + β2xi2 + β3xi3 + εi Suppose we want to test multiple linear hypotheses in this model For example, we want to see if the following restrictions on coefficients hold jointly: β1 + β2 = 1 and β3 = 0 We cannot use a t-test in this case (t-test can be used only for one hypothesis at a time) 5 / 25 TESTING MULTIPLE HYPOTHESES Suppose we have a model yi = β0 + β1xi1 + β2xi2 + β3xi3 + εi Suppose we want to test multiple linear hypotheses in this model For example, we want to see if the following restrictions on coefficients hold jointly: β1 + β2 = 1 and β3 = 0 We cannot use a t-test in this case (t-test can be used only for one hypothesis at a time) We will use an F-test 5 / 25 RESTRICTED VS. UNRESTRICTED MODEL 6 / 25 RESTRICTED VS. UNRESTRICTED MODEL We can reformulate the model by plugging the restrictions as if they were true (model under H0) 6 / 25 RESTRICTED VS. UNRESTRICTED MODEL We can reformulate the model by plugging the restrictions as if they were true (model under H0) We call this model restricted model as opposed to the unrestricted model 6 / 25 RESTRICTED VS. UNRESTRICTED MODEL We can reformulate the model by plugging the restrictions as if they were true (model under H0) We call this model restricted model as opposed to the unrestricted model The unrestricted model is yi = β0 + β1xi1 + β2xi2 + β3xi3 + εi 6 / 25 RESTRICTED VS. UNRESTRICTED MODEL We can reformulate the model by plugging the restrictions as if they were true (model under H0) We call this model restricted model as opposed to the unrestricted model The unrestricted model is yi = β0 + β1xi1 + β2xi2 + β3xi3 + εi We derive (on the lecture) the restricted model: 6 / 25 RESTRICTED VS. UNRESTRICTED MODEL We can reformulate the model by plugging the restrictions as if they were true (model under H0) We call this model restricted model as opposed to the unrestricted model The unrestricted model is yi = β0 + β1xi1 + β2xi2 + β3xi3 + εi We derive (on the lecture) the restricted model: y∗ i = β0 + β1x∗ i + εi , where y∗ i = yi − xi2 and x∗ i = xi1 − xi2 6 / 25 IDEA OF THE F-TEST 7 / 25 IDEA OF THE F-TEST If the restrictions are true, then the restricted model fits the data in the same way as the unrestricted model 7 / 25 IDEA OF THE F-TEST If the restrictions are true, then the restricted model fits the data in the same way as the unrestricted model residuals are nearly the same 7 / 25 IDEA OF THE F-TEST If the restrictions are true, then the restricted model fits the data in the same way as the unrestricted model residuals are nearly the same If the restrictions are false, then the restricted model fits the data poorly 7 / 25 IDEA OF THE F-TEST If the restrictions are true, then the restricted model fits the data in the same way as the unrestricted model residuals are nearly the same If the restrictions are false, then the restricted model fits the data poorly residuals from the restricted model are much larger than those from the unrestricted model 7 / 25 IDEA OF THE F-TEST If the restrictions are true, then the restricted model fits the data in the same way as the unrestricted model residuals are nearly the same If the restrictions are false, then the restricted model fits the data poorly residuals from the restricted model are much larger than those from the unrestricted model The idea is thus to compare the residuals from the two models 7 / 25 IDEA OF THE F-TEST 8 / 25 IDEA OF THE F-TEST How to compare residuals in the two models? 8 / 25 IDEA OF THE F-TEST How to compare residuals in the two models? Calculate the sum of squared residuals in the two models Test if the difference between the two sums is equal to zero (statistically) H0: the difference is zero (residuals in the two models are the same, restrictions hold) HA: the difference is positive (residuals in the restricted model are bigger, restrictions do not hold) 8 / 25 IDEA OF THE F-TEST How to compare residuals in the two models? Calculate the sum of squared residuals in the two models Test if the difference between the two sums is equal to zero (statistically) H0: the difference is zero (residuals in the two models are the same, restrictions hold) HA: the difference is positive (residuals in the restricted model are bigger, restrictions do not hold) Sum of squared residuals SSE = n i=1 (yi − yi)2 = n i=1 e2 i 8 / 25 F-TEST 9 / 25 F-TEST The test statistic is defined as F = (SSER − SSEU)/J SSEU/(n − k) ∼ FJ,n−k , 9 / 25 F-TEST The test statistic is defined as F = (SSER − SSEU)/J SSEU/(n − k) ∼ FJ,n−k , where: SSER . . . sum of squared residuals from the restricted model 9 / 25 F-TEST The test statistic is defined as F = (SSER − SSEU)/J SSEU/(n − k) ∼ FJ,n−k , where: SSER . . . sum of squared residuals from the restricted model SSEU . . . sum of squared residuals from the unrestricted model 9 / 25 F-TEST The test statistic is defined as F = (SSER − SSEU)/J SSEU/(n − k) ∼ FJ,n−k , where: SSER . . . sum of squared residuals from the restricted model SSEU . . . sum of squared residuals from the unrestricted model J . . . number of restrictions 9 / 25 F-TEST The test statistic is defined as F = (SSER − SSEU)/J SSEU/(n − k) ∼ FJ,n−k , where: SSER . . . sum of squared residuals from the restricted model SSEU . . . sum of squared residuals from the unrestricted model J . . . number of restrictions n . . . number of observations 9 / 25 F-TEST The test statistic is defined as F = (SSER − SSEU)/J SSEU/(n − k) ∼ FJ,n−k , where: SSER . . . sum of squared residuals from the restricted model SSEU . . . sum of squared residuals from the unrestricted model J . . . number of restrictions n . . . number of observations k . . . number of estimated coefficients (including intercept) 9 / 25 F-TEST Rejection region : p-value = 5% Distribution FJ,n-k 5% FJ,n-k,0.95 10 / 25 EXAMPLE 11 / 25 EXAMPLE We had the model yi = β0 + β1xi1 + β2xi2 + β3xi3 + εi 11 / 25 EXAMPLE We had the model yi = β0 + β1xi1 + β2xi2 + β3xi3 + εi We wanted to test H0 : β1 + β2 = 1 β3 = 0 vs. HA : β1 + β2 = 1 β3 = 0 11 / 25 EXAMPLE We had the model yi = β0 + β1xi1 + β2xi2 + β3xi3 + εi We wanted to test H0 : β1 + β2 = 1 β3 = 0 vs. HA : β1 + β2 = 1 β3 = 0 Under H0, we obtained the restricted model y∗ i = β0 + β1x∗ i + εi , where y∗ i = yi − xi2 and x∗ i = xi1 − xi2 11 / 25 EXAMPLE We run the regression on the unrestricted model, we obtain SSEU 12 / 25 EXAMPLE We run the regression on the unrestricted model, we obtain SSEU We run the regression on the restricted model, we obtain SSER 12 / 25 EXAMPLE We run the regression on the unrestricted model, we obtain SSEU We run the regression on the restricted model, we obtain SSER We have k = 4 and J = 2 12 / 25 EXAMPLE We run the regression on the unrestricted model, we obtain SSEU We run the regression on the restricted model, we obtain SSER We have k = 4 and J = 2 We construct the F-statistic F = (SSER−SSEU)/2 SSEU/(n−4) 12 / 25 EXAMPLE We run the regression on the unrestricted model, we obtain SSEU We run the regression on the restricted model, we obtain SSER We have k = 4 and J = 2 We construct the F-statistic F = (SSER−SSEU)/2 SSEU/(n−4) We find the critical value of the F distribution with 2 and n − 4 degrees of freedom at the 95% confidence level 12 / 25 EXAMPLE We run the regression on the unrestricted model, we obtain SSEU We run the regression on the restricted model, we obtain SSER We have k = 4 and J = 2 We construct the F-statistic F = (SSER−SSEU)/2 SSEU/(n−4) We find the critical value of the F distribution with 2 and n − 4 degrees of freedom at the 95% confidence level If F > F2,n−4,0.95, we reject the null hypothesis we reject that the restrictions hold jointly 12 / 25 OVERALL SIGNIFICANCE OF THE REGRESSION 13 / 25 OVERALL SIGNIFICANCE OF THE REGRESSION Usually, we are interested in knowing if the model has some explanatory power, i.e. if the independent variables indeed “explain” the dependent variable 13 / 25 OVERALL SIGNIFICANCE OF THE REGRESSION Usually, we are interested in knowing if the model has some explanatory power, i.e. if the independent variables indeed “explain” the dependent variable We test this using the F-test of the joint significance of all (k − 1) slope coefficients: H0 :    β1 = 0 β2 = 0 ... βk−1 = 0 vs. HA :    βj = 0 for at least one j = 1, . . . , k − 1 13 / 25 OVERALL SIGNIFICANCE OF THE REGRESSION Unrestricted model: yi = β0 + β1xi1 + β2xi2 + . . . + βk−1xik−1 + εi 14 / 25 OVERALL SIGNIFICANCE OF THE REGRESSION Unrestricted model: yi = β0 + β1xi1 + β2xi2 + . . . + βk−1xik−1 + εi Restricted model: yi = β0 + εi 14 / 25 OVERALL SIGNIFICANCE OF THE REGRESSION Unrestricted model: yi = β0 + β1xi1 + β2xi2 + . . . + βk−1xik−1 + εi Restricted model: yi = β0 + εi F-statistic: F = (SSER − SSEU)/(k − 1) SSEU/(n − k) ∼ Fk−1,n−k 14 / 25 OVERALL SIGNIFICANCE OF THE REGRESSION Unrestricted model: yi = β0 + β1xi1 + β2xi2 + . . . + βk−1xik−1 + εi Restricted model: yi = β0 + εi F-statistic: F = (SSER − SSEU)/(k − 1) SSEU/(n − k) ∼ Fk−1,n−k Number of restrictions = k − 1 This F-statistic and the corresponding p-value are part of the regression output 14 / 25 EXAMPLE               15 / 25 GOODNESS OF FIT MEASURE 16 / 25 GOODNESS OF FIT MEASURE We know that education and experience have a significant influence on wages 16 / 25 GOODNESS OF FIT MEASURE We know that education and experience have a significant influence on wages But how important are they in determining wages? 16 / 25 GOODNESS OF FIT MEASURE We know that education and experience have a significant influence on wages But how important are they in determining wages? How much of difference in wages between people is explained by differences in education and in experience? 16 / 25 GOODNESS OF FIT MEASURE We know that education and experience have a significant influence on wages But how important are they in determining wages? How much of difference in wages between people is explained by differences in education and in experience? How well variation in the independent variable(s) explains variation in the dependent variable? 16 / 25 GOODNESS OF FIT MEASURE We know that education and experience have a significant influence on wages But how important are they in determining wages? How much of difference in wages between people is explained by differences in education and in experience? How well variation in the independent variable(s) explains variation in the dependent variable? This are the questions answered by the goodness of fit measure - R2 16 / 25 TOTAL AND EXPLAINED VARIATION 17 / 25 TOTAL AND EXPLAINED VARIATION Total variation in the dependent variable: n i=1 (yi − yn)2 17 / 25 TOTAL AND EXPLAINED VARIATION Total variation in the dependent variable: n i=1 (yi − yn)2 Predicted value of the dependent variable = part that is explained by independent variables: yi = β0 + β1xi (case of regression line - for simplicity of notation) 17 / 25 TOTAL AND EXPLAINED VARIATION Total variation in the dependent variable: n i=1 (yi − yn)2 Predicted value of the dependent variable = part that is explained by independent variables: yi = β0 + β1xi (case of regression line - for simplicity of notation) Explained variation in the dependent variable: n i=1 (yi − yn)2 17 / 25 GOODNESS OF FIT - R2 18 / 25 GOODNESS OF FIT - R2 Denote: SST = n i=1 yi − yn 2 . . . Total Sum of Squares SSR = n i=1 (yi − yn)2 . . . Regression Sum of Squares 18 / 25 GOODNESS OF FIT - R2 Denote: SST = n i=1 yi − yn 2 . . . Total Sum of Squares SSR = n i=1 (yi − yn)2 . . . Regression Sum of Squares Define the measure of the goodness of fit: R2 = SSR SST = Explained variation in y Total variation in y 18 / 25 GOODNESS OF FIT - R2 19 / 25 GOODNESS OF FIT - R2 In all models: 0 ≤ R2 ≤ 1 19 / 25 GOODNESS OF FIT - R2 In all models: 0 ≤ R2 ≤ 1 R2 tells us what percentage of the total variation in the dependent variable is explained by the variation in the independent variable(s) R2 = 0.3 means that the independent variables can explain 30% of the variation in the dependent variable 19 / 25 GOODNESS OF FIT - R2 In all models: 0 ≤ R2 ≤ 1 R2 tells us what percentage of the total variation in the dependent variable is explained by the variation in the independent variable(s) R2 = 0.3 means that the independent variables can explain 30% of the variation in the dependent variable Higher R2 means better fit of the regression model (not necessarily a better model!) 19 / 25 DECOMPOSING THE VARIANCE 20 / 25 DECOMPOSING THE VARIANCE For models with intercept, R2 can be rewritten using the decomposition of variance. Variance decomposition: n i=1 yi − yn 2 = n i=1 (yi − yn)2 + n i=1 e2 i 20 / 25 DECOMPOSING THE VARIANCE For models with intercept, R2 can be rewritten using the decomposition of variance. Variance decomposition: n i=1 yi − yn 2 = n i=1 (yi − yn)2 + n i=1 e2 i SST = n i=1 yi − yn 2 . . . Total Sum of Squares 20 / 25 DECOMPOSING THE VARIANCE For models with intercept, R2 can be rewritten using the decomposition of variance. Variance decomposition: n i=1 yi − yn 2 = n i=1 (yi − yn)2 + n i=1 e2 i SST = n i=1 yi − yn 2 . . . Total Sum of Squares SSR = n i=1 (yi − yn)2 . . . Regression Sum of Squares 20 / 25 DECOMPOSING THE VARIANCE For models with intercept, R2 can be rewritten using the decomposition of variance. Variance decomposition: n i=1 yi − yn 2 = n i=1 (yi − yn)2 + n i=1 e2 i SST = n i=1 yi − yn 2 . . . Total Sum of Squares SSR = n i=1 (yi − yn)2 . . . Regression Sum of Squares SSE = n i=1 e2 i . . . Sum of Squared Residuals 20 / 25 VARIANCE DECOMPOSITION AND R2 21 / 25 VARIANCE DECOMPOSITION AND R2 Variance decomposition: SST = SSR + SSE 21 / 25 VARIANCE DECOMPOSITION AND R2 Variance decomposition: SST = SSR + SSE Intuition: total variation can be divided between the explained variation and the unexplained variation the true value y is a sum of estimated (explained) y and the residual ei (unexplained part) yi = yi + ei 21 / 25 VARIANCE DECOMPOSITION AND R2 Variance decomposition: SST = SSR + SSE Intuition: total variation can be divided between the explained variation and the unexplained variation the true value y is a sum of estimated (explained) y and the residual ei (unexplained part) yi = yi + ei We can rewrite R2: R2 = SSR SST = SST − SSE SST = 1 − SSE SST 21 / 25 ADJUSTED R2 22 / 25 ADJUSTED R2 The sum of squared residuals (SSE) decreases when additional explanatory variables are introduced in the model, whereas total sum of squares (SST) remains the same 22 / 25 ADJUSTED R2 The sum of squared residuals (SSE) decreases when additional explanatory variables are introduced in the model, whereas total sum of squares (SST) remains the same R2 = 1 − SSE SST increases if we add explanatory variables Models with more variables automatically have better fit. 22 / 25 ADJUSTED R2 The sum of squared residuals (SSE) decreases when additional explanatory variables are introduced in the model, whereas total sum of squares (SST) remains the same R2 = 1 − SSE SST increases if we add explanatory variables Models with more variables automatically have better fit. To deal with this problem, we define the adjusted R2: R2 adj = 1 − SSE n−k SST n−1 ≤ R2 (k is the number of coefficients including intercept) 22 / 25 ADJUSTED R2 The sum of squared residuals (SSE) decreases when additional explanatory variables are introduced in the model, whereas total sum of squares (SST) remains the same R2 = 1 − SSE SST increases if we add explanatory variables Models with more variables automatically have better fit. To deal with this problem, we define the adjusted R2: R2 adj = 1 − SSE n−k SST n−1 ≤ R2 (k is the number of coefficients including intercept) This measure introduces a “punishment” for including more explanatory variables 22 / 25 EXAMPLE               23 / 25 F-TEST - REVISITED 24 / 25 F-TEST - REVISITED Let us recall the F-statistic: F = (SSER − SSEU)/J SSEU/(n − k) ∼ FJ,n−k 24 / 25 F-TEST - REVISITED Let us recall the F-statistic: F = (SSER − SSEU)/J SSEU/(n − k) ∼ FJ,n−k We can use the formula R2 = 1 − SSE SST to rewrite the F-statistic in R2 form: F = (R2 U − R2 R)/J (1 − R2 U)/(n − k) ∼ FJ,n−k We can use this R2 form of F-statistic under the condition that SSTU = SSTR (the dependent variables in restricted and unrestricted models are the same) 24 / 25 SUMMARY 25 / 25 SUMMARY We showed how restrictions are incorporated in regression models 25 / 25 SUMMARY We showed how restrictions are incorporated in regression models We explained the idea of the F-test 25 / 25 SUMMARY We showed how restrictions are incorporated in regression models We explained the idea of the F-test We defined the notion of the overall significance of a regression 25 / 25 SUMMARY We showed how restrictions are incorporated in regression models We explained the idea of the F-test We defined the notion of the overall significance of a regression We introduced the measure or the goodness of fit - R2 25 / 25 SUMMARY We showed how restrictions are incorporated in regression models We explained the idea of the F-test We defined the notion of the overall significance of a regression We introduced the measure or the goodness of fit - R2 We learned how total variation in the dependent variable can be decomposed 25 / 25