LECTURE 7 Introduction to Econometrics Omitted & Irrelevant Variables Hieu Nguyen Fall semester, 2024 1 / 25 SPECIFICATION OF A REGRESSION ►We discussed the specification of a regression equation ► ►Specification consists of choosing: ► 1.correct independent variables 2.correct functional form 3.correct form of the stochastic error term 2 / 25 ►A specification error occurs if any of these choices is wrong ► ►In lecture 6, we discussed the correct functional form. Now we will learn how to deal with the other two in today’s and the following two lectures. ON TODAY’S LECTURE ►We will talk about the problem of not adding relevant independent variables or adding irrelevant independent variables ► ►We will learn that •Omitting a relevant variable brings bias to our estimates of the other coefficients • •Including an irrelevant variable increase the variance of our estimates of the other coefficients • •Since in real estimation, it is often hard to judge whether or not to include a variable, we need economic theory and statistical tools to decide 3 / 25 OMITTING RELEVANT VARIABLES e We omit a variable when we forget to include it do not have data for it e This misspecification results in not having the coefficient for this variable biasing estimated coefficients of other variables in the equation −→ omitted variable bias 4 / 25 OMITTED VARIABLES 5 / 25 OMITTED VARIABLES 6 / 25 OMITTED VARIABLES 7 / 25 OMITTED VARIABLES •Example: estimating the price of chicken meat in the US Yt . . . per capita chicken consumption PCt . . . price of chicken PBt . . . price of beef YDt . . . per capita disposable income 8 / 25 OMITTED VARIABLES •When we omit price of beef: , n = 44 R2 = 0.895 •Compare to the true model: R2 = 0.986 , n = 44 •We observe positive bias in the coefficient of PC (was it expected?) 9 / 25 OMITTED VARIABLES 10 / 25 OMITTED VARIABLES •In reality, we usually do not have the true model to compare with Because we do not know what the true model is Because we do not have data for some important variable •We can often recognize the bias if we obtain some unexpected results •We can prevent omitting variables by relying on the theory •If we cannot prevent omitting variables, we can at least determine in what way this biases our estimates 11 / 25 IRRELEVANT VARIABLES •A second type of specification error is including a variable that does not belong to the model • •This misspecification • •does not cause bias •but it increases the variances of the estimated coefficients of the included variables 12 / 25 Why variance increases on estimated coefficients when we include irrelevant variable. Explanation https://stats.stackexchange.com/questions/193229/why-does-the-parameter-variance-change-when-contro l-variables-are-added-to-a-reg IRRELEVANT VARIABLES •True model: yi = βxi + ui (1) (2) 13 / 25 14 / 25 IRRELEVANT VARIABLES SUMMARY OF THE THEORY •Bias – efficiency trade-off: Omitted variable Irrelevant variable Bias Yes* No Variance Decreases * Increases* * As long as we have correlation between x and z 15 / 25 FOUR IMPORTANT SPECIFICATION CRITERIA 16 / 25 Does a variable belong to the equation? 1.Theory: Is the variable’s place in the equation unambiguous and theoretically sound? Does intuition tells you it should be included? 2. 2.t-test: Is the variable’s estimated coefficient significant in the expected direction? 3. 3.R2: Does the overall fit of the equation improve (enough) when the variable is added to the equation? 4. 4.Bias: Do other variables’ coefficients change significantly when the variable is added to the equation? FOUR IMPORTANT SPECIFICATION CRITERIA 17 / 25 •If all conditions hold, the variable belongs in the equation •If none of them holds, the variable is irrelevant and can be safely excluded •If the criteria give contradictory answers, most importance should be attributed to theoretical justification • • •Therefore, if theory (intuition) says that variable belongs to the equation, we include it (even though its coefficients might be insignificant!). EXAMPLE FOR SPECIFICATION CRITERIA 18 / 25 19 / 25 EXAMPLE FOR SPECIFICATION CRITERIA 20 / 25 EXAMPLE FOR SPECIFICATION CRITERIA THE DANGER OVERSPECIFICATION •”If you just torture the data long enough, they will confess.” • •If too many specifications are tried: • •The final result may have the desired properties only by chance •The statistical significance of the result is overestimated because the estimations of the previous regressions are ignored. • •How to solve this issue: • •Keep the number of try of regressions low •Focus on theory (very important) •Save all regression you tried 21 / 25 SPECIFICATION TEST 22 / 25 RESET TYPE I 23 / 25 24 / 25 RESET TYPE II SUMMARY •Omitting a relevant variable brings bias to our estimates of the other coefficients • •Including an irrelevant variable increase the variance of our estimates of the other coefficients • •Since in real estimation, it is often hard to judge whether or not to include a variable, we need economic theory and statistical tools to decide 25 / 25