Example #1: Effect of Studying on Grades What is the effect on grades of studying for an additional hour per day? Y = GPA X = study time (hours per day) Data: grades and study hours of college freshmen. Would you expect the OLS estimator of β1 (the effect on GPA of studying an extra hour per day) to be unbiased? Why or why not? Studying on grades, ctd. Stinebrickner, Ralph and Stinebrickner, Todd R. (2008) "The Causal Effect of Studying on Academic Performance," The B.E. Journal of Economic Analysis & Policy: Vol. 8: Iss. 1 (Frontiers), Article 14. • n = 210 freshman at Berea College (Kentucky) in 2001 • Y = first-semester GPA • X = average study hours per day (time use survey) • Roommates were randomly assigned • Z = 1 if roommate brought video game, = 0 otherwise Do you think Zi (whether a roommate brought a video game) is a valid instrument? 1. Is it relevant (correlated with X)? 2. Is it exogenous (uncorrelated with u)? Studying on grades, ctd. X = π0 + π1Z + vi Yi = β0 + β1 ˆ iX + ui Y = GPA (4 point scale) X = time spent studying (hours per day) Z = 1 if roommate brought video game, = 0 otherwise Stinebrinckner and Stinebrinckner’s findings 1 ˆπ = -.668 1 ˆIV β = 0.360 Do these estimates make sense in a real-world way? (Note: They actually ran the regressions including additional regressors – more on this later.) Example #2: Supply and demand for butter IV regression was first developed to estimate demand elasticities for agricultural goods, for example, butter: ln(Qi) = β0 + β1ln(Pi) + ui • β1 = price elasticity of butter = percent change in quantity for a 1% change in price (recall log-log specification discussion) • Data: observations on price and quantity of butter for different years • The OLS regression of ln(Qi) on ln(Pi) suffers from simultaneous causality bias (why?) Simultaneous causality bias in the OLS regression of ln(Qi) on ln(Pi) arises because price and quantity are determined by the interaction of demand and supply: This interaction of demand and supply produces data like… Would a regression using these data produce the demand curve? But…what would you get if only supply shifted? • TSLS estimates the demand curve by isolating shifts in price and quantity that arise from shifts in supply. • Z is a variable that shifts supply but not demand. TSLS in the supply-demand example: ln(Qi) = β0 + β1l ln(Pi) + ui Let Z = rainfall in dairy-producing regions. Is Z a valid instrument? (1) Relevant? corr(raini, ln(Pi)) ≠ 0? Plausibly: insufficient rainfall means less grazing means less butter means higher prices (2) Exogenous? corr(raini,ui) = 0? Plausibly: whether it rains in dairy-producing regions should not affect demand for butter TSLS in the supply-demand example, ctd. ln(Qi) = β0 + β1 ln(Pi) + ui Zi = raini = rainfall in dairy-producing regions. Stage 1: regress ln(Pi) on rain, get 𝑙𝑛(𝑃𝚤)� 𝑙𝑛(𝑃𝚤)� isolates changes in log price that arise from supply (part of supply, at least) Stage 2: regress ln(Qi) on 𝑙𝑛(𝑃𝚤)� The regression counterpart of using shifts in the supply curve to trace out the demand curve.