Introductory Econometrics Lecture 3: OLS Assumptions by Hieu Nguyen Fall 2024 1. Suppose you have accepted a summer job as a weight guesser at the local amusement park, Magic Hill. Customers pay 50 cents each, which you get to keep if you guess their weight within 5 kilograms. If you miss by more than 5 kilograms, then you have to give the customer a small prize that you buy from Magic Hill for 60 cents each. Luckily, the friendly managers of Magic Hill have arranged a number of marks on the wall behind the customer so that you can accurately measure the customer’s height. Unfortunately, there is a 150 cm wall between you and the customer so that you can tell little about the person except for height and (usually) gender. On your first day on the job, you do so poorly that you work all day and somehow lose two dollars, so on the second day, you decide to collect data to run a regression to estimate the relationship between weight and height (above 150 cm). Since most of the participants are male, you decide to limit your sample to males. You hypothesize the following theoretical relationship: weighti = β0 + β1heighti + εi. The next day you collect the data summarized in the following table: 1 Customer Height (cm) Weight (kg) 1 170 65 2 180 75 3 175 80 4 160 60 5 185 85 6 155 55 7 165 70 8 170 72 9 175 78 10 180 83 11 185 90 12 190 95 13 195 100 14 160 55 15 155 50 Then you run your regression on the Magic Hill computer, obtaining the following estimates: ˆβ0 = 46.49, ˆβ1 = 1.14. (a) Interpret the estimated coefficients. (b) When you observe the table, how well do you think the regression works? (c) Identify the three customers who seem to be quite a distance from the estimated regression line. Would you have a better regression equation if we dropped these customers from the sample? (d) Look over the sample with the thought that it might not be randomly drawn. Does the sample look abnormal in any way? Would this affect the regression results and estimated weights if the sample is not random? (e) Think of at least one other factor besides height that might be a good choice as a variable in the weight/height equation. What would the expected sign of this variable’s coefficient be if the variable was added to the equation? 2 (f) Does this simple regression capture a causal relationship between height and weight? Explain. 2. The data file collegetown contains observations on 500 single-family houses sold in Baton Rouge, Louisiana during 2009–2013. The data include sale price (in thousands of dollars) PRICE and total interior area of the house in hundreds of square feet SQFT. a. Plot house price against house size in a scatter diagram. b. Estimate the linear regression model PRICE = β1 + β2SQFT + e. Interpret the estimates. Draw a sketch of the fitted line. c. Estimate the quadratic regression model PRICE = α1 + α2SQFT2 + e. Compute the marginal effect of an additional 100 square feet of living area in a home with 2000 square feet of living space. d. For the regressions in (2) and (3) compute the least squares residuals and plot them against SQFT. Do any of our assumptions appear violated? e. One basis for choosing between these two specifications is how well the data are fit by the model. Compare the sum of squared residuals (SSR) from the models in (2) and (3). Which model has a lower SSR? How does having a lower SSR indicate a “better-fitting” model? 3