LECTURE 6 1 / 24 Introduction to Econometrics Nonlinear specifications and dummy variables Hieu Nguyen Fall semester, 2024 NONLINEAR SPECIFICATION 2 / 24 e We will discuss different specifications nonlinear in dependent and independent variables and their interpretation e We will define the notion of a dummy variable and we will show its different uses in linear regression models NONLINEAR SPECIFICATION 3 / 24 e There is not always a linear relationship between dependent variable and explanatory variables The use of OLS requires that the equation be linear in coefficients However, there is a wide variety of functional forms that are linear in coefficients while being nonlinear in variables! e We have to choose carefully the functional form of the relationship between the dependent variable and each explanatory variable The choice of a functional form should be based on the underlying economic theory and/or intuition Do we expect a curve instead of a straight line? Does the effect of a variable peak at some point and then start to decline? LINEAR FORM y = β0 + β1x1 + β2x2 + ε e Assumes that the effect of the explanatory variable on the dependent variable is constant: ∂y ∂xk = βk k = 1, 2 e Interpretation: if xk increases by 1 unit (in which xk is measured), then y will change by βk units (in which y is measured) e Linear form is used as default functional form until strong evidence that it is inappropriate is found 4 / 24 LOG-LOG FORM ln y = β0 + β1 ln x1 + β2 ln x2 + ε e Assumes that the elasticity of the dependent variable with respect to the explanatory variable is constant: ∂ ln y ∂y/y ∂ ln xk = ∂xk/xk = βk 5 / 24 k = 1, 2 e Interpretation: if xk increases by 1 percent, then y will change by βk percents e Before using a double-log model, make sure that there are no negative or zero observations in the data set EXAMPLE 6 / 24 e Estimating the production function of Indian sugar industry: ˆ ln Q = 2.70 + 0 . (0.14) (0.17) .59 ln L + 0.33 ln K Q . . . output L . . . labor K . . . capital employed Interpretation: if we increase the amount of labor by 1%, the production of sugar will increase by 0.59%, ceteris paribus. Ceteris paribus is a Latin phrase meaning ’other things being equal’. LOG-LINEAR FORMS 7 / 24 e Linear-log form: y = β0 + β1 ln x1 + β2 ln x2 + ε Interpretation: if xk increases by 1 percent, then y will change by (βk/100) units (k = 1, 2) e Log-linear form: ln y = β0 + β1x1 + β2x2 + ε Interpretation: if xk increases by 1 unit, then y will change by (βk ∗ 100) percent (k = 1, 2) EXAMPLES OF LOG LINEAR FORMS 8 / 24 e Estimating demand for chicken meat: Y . . . annual chicken consumption (kg.) PC . . . price of chicken PB . . . price of beef YD . . . annual disposable income e Interpretation: An increase in the annual disposable income by 1% increases chicken consumption by 0.12 kg per year, ceteris paribus. EXAMPLES OF LOG LINEAR FORMS 9 / 24 e Estimating the influence of education and experience on wages: wage educ exper . . . annual wage (USD) . . . years of education . . . years of experience e Interpretation: An increase in education by one year increases annual wage by 9.8%, ceteris paribus. An increase in experience by one year increases annual wage by 1%, ceteris paribus. POLYNOMIAL FORM 1 y = β0 + β1x1 + β2x2 + ε e To determine the effect of x1 on y, we need to calculate the derivative: ∂y ∂x1 = β1 + 2 · β2 · x1 e Clearly, the effect of x1 on y is not constant, but changes with the level of x1 10 / 24 e We might also have higher order polynomials, e.g.: y = β0 + β1x1 + β2x2 + β3x3 + β4x4 + ε 1 1 1 EXAMPLE OF POLYNOMIAL FORM e The impact of the number of hours of studying on the grade from Introductory Econometrics: e To determine the effect of hours on grade, calculate the derivative: 11 / 24 Decreasing returns to hours of studying: more hours implies higher grade, but the positive effect of additional hour of studying decreases with more hours CHOICE OF CORRECT FUNCTIONAL FORM 12 / 24 e The functional form has to be correctly specified in order to avoid biased and inconsistent estimates Remember that one of the OLS assumptions is that the model is correctly specified e Ideally: the specification is given by underlying theory of the equation e In reality: underlying theory does not give precise functional form e In most cases, either linear form is adequate, or common sense will point out an easy choice from among the alternatives CHOICE OF CORRECT FUNCTIONAL FORM 13 / 24 e Nonlinearity of independent variables often approximated by polynomial form missing higher powers of a variable can be detected as omitted variables (see next lecture) e Nonlinearity of dependent variable harder to detect based on statistical fit of the regression R2 is incomparable across models where the y is transformed dependent variables are often transformed to log-form in order to make their distribution closer to the normal distribution DUMMY VARIABLES 14 / 24 e Dummy variable - takes on the values of 0 or 1, depending on a qualitative attribute e Examples of dummy variables: INTERCEPT DUMMY 15 / 24 e Dummy variable included in a regression alone (not interacted with other variables) is an intercept dummy e It changes the intercept for the subset of data defined by a dummy variable condition: yi = β0 + β1Di + β2xi + εi where e We have yi = (β0 + β1) + β2xi + εi if Di = 1 yi = β0 + β2xi + εi if Di = 0 INTERCEPT DUMMY X 16 / 24 β0+β1 β0 Di=1 Slope = β2 Di=0 Slope = β2 EXAMPLE 17 / 24 Estimating the determinants of wages: Interpretation of the dummy variable M: men earn on average $2.156 per hour more than women, ceteris paribus SLOPE DUMMY 18 / 24 e If a dummy variable is interacted with another variable (x), it is a slope dummy. e It changes the relationship between x and y for a subset of data defined by a dummy variable condition: e We have yi = β0 + (β1 + β2)xi + εi if Di = 1 yi = β0 + β1xi + εi if Di = 0 SLOPE DUMMY X 19 / 24 β0 Di=0 Slope = β1+β2 Di=1 Slope = β1 EXAMPLE 20 / 24 e Estimating the determinants of wages: e Interpretation: men gain on average 17 cents per hour more than women for each additional year of education, ceteris paribus SLOPE AND INTERCEPT DUMMIES 21 / 24 e Allow both for different slope and intercept for two subsets of data distinguished by a qualitative condition: yi = β0 + β1Di + β2xi + β3(xi · Di) + εi where i D = . 1 if the i-th observation meets a particular condition 0 otherwise e We have yi = (β0 + β1) + (β2 + β3)xi + εi if Di = 1 yi = β0 + β2xi + εi if Di = 0 SLOPE AND INTERCEPT DUMMIES X 22 / 24 Di=0 Slope = β2+β3 Di=1 Slope = β2 β0+β1 β0 DUMMY VARIABLES - MULTIPLE CATEGORIES 23 / 24 e What if a variable defines three or more qualitative attributes? e Example: level of education - elementary school, high school, and college e Define and use a set of dummy variables: e Should we include also a third dummy in the regression, which is equal to 1 for people with elementary education? No, unless we exclude the intercept! Using full set of dummies leads to perfect multicollinearity (dummy variable trap) SUMMARY 24 / 24 e We discussed different nonlinear specifications of a regression equation and their interpretation e We defined the concept of a dummy variable and we showed its use e Further readings: Studenmund, Chapter 7 Wooldridge, Chapters 6 & 7