LECTURE 5 1 / 49 Introduction to Econometrics Nonlinear specifications and dummy variables October 22, 2021 TESTING MULTIPLE HYPOTHESES REVISITED 2 / 49 e Suppose we have a model yi = β0 + β1xi1 + β2xi2 + β3xi3 + εi e Suppose we want to test multiple linear hypotheses in this model e For example, we want to see if the following restrictions on coefficients hold jointly: β1 + β2 = 1 and β3 = 0 e We cannot use a t-test in this case (t-test can be used only for one hypothesis at a time) e We will use an F-test RESTRICTED VS. UNRESTRICTED MODEL 3 / 49 e We can reformulate the model by plugging the restrictions as if they were true (model under H0) e We call this model restricted model as opposed to the unrestricted model e The unrestricted model is yi = β0 + β1xi1 + β2xi2 + β3xi3 + εi e Restricted model can be derived to have the following form: y∗i = β0 + β1x∗i + εi , where y∗i = yi − xi2 and x∗i = xi1 − xi2 IDEA OF THE F-TEST 4 / 49 e If the restrictions are true, then the restricted model fits the data in the same way as the unrestricted model residuals are nearly the same e If the restrictions are false, then the restricted model fits the data poorly residuals from the restricted model are much larger than those from the unrestricted model e The idea is thus to compare the residuals from the two models IDEA OF THE F-TEST 5 / 49 e How to compare residuals in the two models? vCalculate the sum of squared residuals in the two models vTest if the difference between the two sums is equal to zero (statistically) H0: the difference is zero (residuals in the two models are the same, restrictions hold) HA: the difference is positive (residuals in the restricted model are bigger, restrictions do not hold) e Sum of squared residuals F-TEST 6 / 49 e The test statistic is defined as F = (SSRr − SSRur)/q SSRur/(n − k − 1) ∼ F q,n−k−1 , . . . sum of squared residuals from the restricted model . . . sum of squared residuals from the unrestricted model where: SSRr SSRur q . . . number of restrictions n . . . number of observations k . . . number of estimated coefficients GOODNESS OF FIT MEASURE 7 / 49 e We know that education and experience have a significant influence on wages e But how important are they in determining wages? e How much of difference in wages between people is explained by differences in education and in experience? e How well variation in the independent variable(s) explains variation in the dependent variable? e This are the questions answered by the goodness of fit measure - R2 TOTAL AND EXPLAINED VARIATION e Total variation in the dependent variable: e Predicted value of the dependent variable = part that is explained by independent variables: (case of regression line - for simplicity of notation) e Explained variation in the dependent variable: 8 / 49 GOODNESS OF FIT - R2 e Denote: 9 / 49 e Define the measure of the goodness of fit: R2 = SSE = Explained variation in y SST Total variation in y GOODNESS OF FIT - R2 10 / 49 e In all models: 0 ≤ R2 ≤ 1 e R2 tells us what percentage of the total variation in the dependent variable is explained by the variation in the independent variable(s) R2 = 0.3 means that the independent variables can explain 30% of the variation in the dependent variable e Higher R2 means better fit of the regression model (not necessarily a better model!) DECOMPOSING THE VARIANCE e For models with intercept, R2 can be rewritten using the decomposition of variance. e Variance decomposition: 11 / 49 VARIANCE DECOMPOSITION AND R2 12 / 49 e Variance decomposition: SST = SSE + SSR e Intuition: total variation can be divided between the explained variation and the unexplained variation residual ei (unexplained part) e We can rewrite R2: 2 R = = SSE SST − SSR SST SST = 1 − SSR SST ADJUSTED R2 13 / 49 e The sum of squared residuals (SSR) decreases when additional explanatory variables are introduced in the model, whereas total sum of squares (SST) remains the same 2 SSR SST R = 1 − increases if we add explanatory variables Models with more variables automatically have better fit. e To deal with this problem, we define the adjusted R2: R2 adj = 1 − SSR n−k−1 SST n−1 .≤ R2 (k is the number of coefficients) e This measure introduces a “punishment” for including more explanatory variables OMITTED VARIABLES 14 / 49 e We omit a variable when we forget to include it do not have data for it e This misspecification results in not having the coefficient for this variable biasing estimated coefficients of other variables in the equation −→ omitted variable bias OMITTED VARIABLES 15 / 49 OMITTED VARIABLES 16 / 49 e For the model with omitted variable: Coefficients β and γ are from the true model yi = βxi + γzi + ui Coefficient α is from a regression of z on x, i.e. zi = αxi + ei e The bias is zero if γ = 0 or α = 0 (not likely to happen) OMITTED VARIABLES 17 / 49 e Example: what would happen if you estimated a production function with capital only and omitted labor? OMITTED VARIABLES 18 / 49 e Example: estimating the price of chicken meat in the US Yt . . . per capita chicken consumption PCt . . . price of chicken PBt . . . price of beef YDt . . . per capita disposable income OMITTED VARIABLES 19 / 49 e When we omit price of beef: , n = 44 R2 = 0.895 e Compare to the true model: R2 = 0.986 , n = 44 e We observe positive bias in the coefficient of PC (was it expected?) OMITTED VARIABLES 20 / 49 e Determining the direction of bias: bias = γ ∗ α Where γ is a correlation between the omitted variable and the dependent variable (the price of beef and chicken consumption) γ is likely to be positive Where α is a correlation between the omitted variable and the included independent variable (the price of beef and the price of chicken) α is likely to be positive e Conclusion: Bias in the coefficient of the price of chicken is likely to be positive if we omit the price of beef from the equation. OMITTED VARIABLES 21 / 49 e In reality, we usually do not have the true model to compare with Because we do not know what the true model is Because we do not have data for some important variable e We can often recognize the bias if we obtain some unexpected results e We can prevent omitting variables by relying on the theory e If we cannot prevent omitting variables, we can at least determine in what way this biases our estimates IRRELEVANT VARIABLES 22 / 49 e A second type of specification error is including a variable that does not belong to the model e This misspecification does not cause bias but it increases the variances of the estimated coefficients of the included variables IRRELEVANT VARIABLES 23 / 49 e True model: yi = βxi + ui (1) (2) e Model as it looks when we add irrelevant z: yi = βxi + γzi + u˜i e We can represent the error term as u˜i = ui − γzi e but since from the true model γ = 0, we have u˜i = ui and there is no bias SUMMARY OF THE THEORY e Bias - efficiency trade-off: Omitted variable Irrelevant variable Bias Yes* No Variance Decreases * Increases* * As long as we have correlation between x and z 24 / 49 FOUR IMPORTANT SPECIFICATION CRITERIA 25 / 49 Does a variable belong to the equation? 1.Theory: Is the variable’s place in the equation unambiguous and theoretically sound? Does intuition tells you it should be included? 2. 2.t-test: Is the variable’s estimated coefficient significant in the expected direction? 3. 3.R2: Does the overall fit of the equation improve (enough) when the variable is added to the equation? 4. 4.Bias: Do other variables’ coefficients change significantly when the variable is added to the equation? FOUR IMPORTANT SPECIFICATION CRITERIA 26 / 49 e If all conditions hold, the variable belongs in the equation e If none of them holds, the variable is irrelevant and can be safely excluded e If the criteria give contradictory answers, most importance should be attributed to theoretical justification Therefore, if theory (intuition) says that variable belongs to the equation, we include it (even though its coefficients might be insignificant!). NONLINEAR SPECIFICATION 27 / 49 e We will discuss different specifications nonlinear in dependent and independent variables and their interpretation e We will define the notion of a dummy variable and we will show its different uses in linear regression models NONLINEAR SPECIFICATION 28 / 49 e There is not always a linear relationship between dependent variable and explanatory variables The use of OLS requires that the equation be linear in coefficients However, there is a wide variety of functional forms that are linear in coefficients while being nonlinear in variables! e We have to choose carefully the functional form of the relationship between the dependent variable and each explanatory variable The choice of a functional form should be based on the underlying economic theory and/or intuition Do we expect a curve instead of a straight line? Does the effect of a variable peak at some point and then start to decline? LINEAR FORM y = β0 + β1x1 + β2x2 + ε e Assumes that the effect of the explanatory variable on the dependent variable is constant: ∂y ∂xk = βk k = 1, 2 e Interpretation: if xk increases by 1 unit (in which xk is measured), then y will change by βk units (in which y is measured) e Linear form is used as default functional form until strong evidence that it is inappropriate is found 29 / 49 LOG-LOG FORM ln y = β0 + β1 ln x1 + β2 ln x2 + ε e Assumes that the elasticity of the dependent variable with respect to the explanatory variable is constant: ∂ ln y ∂y/y ∂ ln xk = ∂xk/xk = βk 30 / 49 k = 1, 2 e Interpretation: if xk increases by 1 percent, then y will change by βk percents e Before using a double-log model, make sure that there are no negative or zero observations in the data set EXAMPLE 31 / 49 e Estimating the production function of Indian sugar industry: ˆ ln Q = 2.70 + 0 . (0.14) (0.17) .59 ln L + 0.33 ln K Q . . . output L . . . labor K . . . capital employed Interpretation: if we increase the amount of labor by 1%, the production of sugar will increase by 0.59%, ceteris paribus. Ceteris paribus is a Latin phrase meaning ’other things being equal’. LOG-LINEAR FORMS 32 / 49 e Linear-log form: y = β0 + β1 ln x1 + β2 ln x2 + ε Interpretation: if xk increases by 1 percent, then y will change by (βk/100) units (k = 1, 2) e Log-linear form: ln y = β0 + β1x1 + β2x2 + ε Interpretation: if xk increases by 1 unit, then y will change by (βk ∗ 100) percent (k = 1, 2) EXAMPLES OF LOG LINEAR FORMS 33 / 49 e Estimating demand for chicken meat: Y . . . annual chicken consumption (kg.) PC . . . price of chicken PB . . . price of beef YD . . . annual disposable income e Interpretation: An increase in the annual disposable income by 1% increases chicken consumption by 0.12 kg per year, ceteris paribus. EXAMPLES OF LOG LINEAR FORMS 34 / 49 e Estimating the influence of education and experience on wages: wage educ exper . . . annual wage (USD) . . . years of education . . . years of experience e Interpretation: An increase in education by one year increases annual wage by 9.8%, ceteris paribus. An increase in experience by one year increases annual wage by 1%, ceteris paribus. POLYNOMIAL FORM 1 y = β0 + β1x1 + β2x2 + ε e To determine the effect of x1 on y, we need to calculate the derivative: ∂y ∂x1 = β1 + 2 · β2 · x1 e Clearly, the effect of x1 on y is not constant, but changes with the level of x1 35 / 49 e We might also have higher order polynomials, e.g.: y = β0 + β1x1 + β2x2 + β3x3 + β4x4 + ε 1 1 1 EXAMPLE OF POLYNOMIAL FORM e The impact of the number of hours of studying on the grade from Introductory Econometrics: e To determine the effect of hours on grade, calculate the derivative: 36 / 49 Decreasing returns to hours of studying: more hours implies higher grade, but the positive effect of additional hour of studying decreases with more hours CHOICE OF CORRECT FUNCTIONAL FORM 37 / 49 e The functional form has to be correctly specified in order to avoid biased and inconsistent estimates Remember that one of the OLS assumptions is that the model is correctly specified e Ideally: the specification is given by underlying theory of the equation e In reality: underlying theory does not give precise functional form e In most cases, either linear form is adequate, or common sense will point out an easy choice from among the alternatives CHOICE OF CORRECT FUNCTIONAL FORM 38 / 49 e Nonlinearity of explanatory variables often approximated by polynomial form missing higher powers of a variable can be detected as omitted variables (see next lecture) e Nonlinearity of dependent variable harder to detect based on statistical fit of the regression R2 is incomparable across models where the y is transformed dependent variables are often transformed to log-form in order to make their distribution closer to the normal distribution DUMMY VARIABLES 39 / 49 e Dummy variable - takes on the values of 0 or 1, depending on a qualitative attribute e Examples of dummy variables: INTERCEPT DUMMY 40 / 49 e Dummy variable included in a regression alone (not interacted with other variables) is an intercept dummy e It changes the intercept for the subset of data defined by a dummy variable condition: yi = β0 + β1Di + β2xi + εi where e We have yi = (β0 + β1) + β2xi + εi if Di = 1 yi = β0 + β2xi + εi if Di = 0 INTERCEPT DUMMY X 41 / 49 β0+β1 β0 Di=1 Slope = β2 Di=0 Slope = β2 EXAMPLE 42 / 49 e Estimating the determinants of wages: e Interpretation of the dummy variable M: men earn on average $2.156 per hour more than women, ceteris paribus SLOPE DUMMY 43 / 49 e If a dummy variable is interacted with another variable (x), it is a slope dummy. e It changes the relationship between x and y for a subset of data defined by a dummy variable condition: e We have yi = β0 + (β1 + β2)xi + εi if Di = 1 yi = β0 + β1xi + εi if Di = 0 SLOPE DUMMY X 44 / 49 β0 Di=0 Slope = β1+β2 Di=1 Slope = β1 EXAMPLE 45 / 49 e Estimating the determinants of wages: e Interpretation: men gain on average 17 cents per hour more than women for each additional year of education, ceteris paribus SLOPE AND INTERCEPT DUMMIES 46 / 49 e Allow both for different slope and intercept for two subsets of data distinguished by a qualitative condition: yi = β0 + β1Di + β2xi + β3(xi · Di) + εi where i D = . 1 if the i-th observation meets a particular condition 0 otherwise e We have yi = (β0 + β1) + (β2 + β3)xi + εi if Di = 1 yi = β0 + β2xi + εi if Di = 0 SLOPE AND INTERCEPT DUMMIES X 47 / 49 Di=0 Slope = β2+β3 Di=1 Slope = β2 β0+β1 β0 DUMMY VARIABLES - MULTIPLE CATEGORIES 48 / 49 e What if a variable defines three or more qualitative attributes? e Example: level of education - elementary school, high school, and college e Define and use a set of dummy variables: e Should we include also a third dummy in the regression, which is equal to 1 for people with elementary education? No, unless we exclude the intercept! Using full set of dummies leads to perfect multicollinearity (dummy variable trap) SUMMARY 49 / 49 e We discussed different nonlinear specifications of a regression equation and their interpretation e We defined the concept of a dummy variable and we showed its use e Further readings: Studenmund, Chapter 7 Wooldridge, Chapters 6 & 7