Advanced Topics in Applied Regression Day 2: Interactions & Fixed-effects Constantin Manuel Bosancianu Wissenschaftszentrum Berlin Institutions and Political Inequality unit manuel.bosancianu@wzb.eu September 30, 2017 Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 1 / 45 Why interactions? They allow for a much richer set of hypotheses to be put forward and tested. In my own area of focus (political institutions, economic phenomena, and voter attitudes/behavior), such hypotheses involving moderation are very common. One prominent example: income inequality’s effect on voter turnout at different levels of a person’s income (Solt, 2008). Despite their importance, misunderstandings still persist about how to interpret coefficients/effects in such models. Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 2 / 45 Basic setup Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 3 / 45 Why specify interactions? So far, we’ve worked with simple models. Think of the example from yesterday, with Boston neighborhood average house prices. Here, I complicated it a bit by also adding a dummy for whether the neighborhood is on the Charles river or not: Prices = a + b1Rooms + b2River + e (1) Here, the effect of River is assumed to be constant, b2, no matter the level of the other variable in the model. This is not always the case: effect of SES and union membership on political participation, where bunion likely varies. Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 4 / 45 What if the effect isn’t constant? The riverfront is a desirable real-estate location. Houses with more rooms are certainly more expensive everywhere in Boston, but it’s likely that the price difference between n + 1 and n rooms is higher on the riverfront than elsewhere. In modelling terms, we might say that the effect of Rooms on Price is different based on the value of the River dummy. Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 5 / 45 From words to equation (I) Prices =a1 + b1Rooms + b2River + e b1 =a2 + b3River a1 =a3 + b4River The second equation gives us how the effect of Rooms (b1) varies depending on River. The third equation makes sure that the intercept varies as well (which usually happens if the slope varies). Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 6 / 45 From words to equation (II) Prices =a3 + b4River + (a2 + b3River) ∗ Rooms + b2River + e =a3 + (b4 + b2) ∗ River + a2Rooms + b3River ∗ Rooms + e =a3 + (b4 + b2) ∗ River + (a2 + b3River) ∗ Rooms + e (2) The third row shows most clearly how the effect of Rooms, a2 + b3River, now varies depending on the precise value of the River indicator. This depends, of course, on the b3 being statistically significant. If not, then the effect of Rooms is always a2. Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 7 / 45 Basic interaction model Prices = a3 + (b4 + b2) ∗ River + (a2 + b3River) ∗ Rooms + e (3) If we designate a3 as γ1, b4 + b2 as γ2, a2 as γ3, and b3 as γ4, then we get a general form of the interaction: Prices = γ1 + γ2River + γ3Rooms + γ4River ∗ Rooms + e (4) Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 8 / 45 Interaction model (cont.) When River = 0, Prices =γ1 + γ20 + γ3Rooms + γ4Rooms ∗ 0 + e =γ1 + γ3Rooms + e (5) When River = 1, Prices =γ1 + γ21 + γ3Rooms + γ4Rooms ∗ 1 + e =γ1 + γ2 + Rooms(γ3 + γ4) + e (6) The effect of Rooms varies depending on the value of River. Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 9 / 45 Symmetry in interpretation When Rooms = 0, then Prices =γ1 + γ2River + γ3 ∗ 0 + γ4River ∗ 0 + e =γ1 + γ2River + e When Rooms = 1, Prices =γ1 + γ2River + γ3 ∗ 1 + γ4River ∗ 1 + e =γ1 + γ3 + River(γ2 + γ4) + e The effect of River varies depending on the level of Rooms. Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 10 / 45 Interpretation Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 11 / 45 Wages in 1976 We have information on 526 US workers: wage: wage in USD per hour; educ: years of education; gender: male or female (with 1=female); exper: labor force experience (yrs. in labor market); tenure: yrs. with current employer. The goal is to predict wages.1 1In fact, we’ll be predicting log(wage), as wages tend to be right skewed, which causes problems with the normality of errors. Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 12 / 45 Interpreting coefficients DV: Log hourly wage (USD) (Intercept) 1.762∗∗∗ (0.025) Female −0.311∗∗∗ (0.037) Yrs. education 0.089∗∗∗ (0.007) Yrs. experience 0.005∗∗ (0.002) Yrs. tenure 0.021∗∗∗ (0.003) Female * Tenure −0.013∗ (0.006) R2 0.399 Adj. R2 0.393 Num. obs. 526 RMSE 0.414 ∗∗∗p < 0.001, ∗∗p < 0.01, ∗p < 0.05. Continuous variables were demeaned. Specification with interaction: Female * Tenure Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 13 / 45 Interpreting coefficients (cont.) How do you interpret βfemale = −0.311? Important: after demeaning, the “0” for variable X refers to the mean of X, X. How do you interpret βtenure = 0.021? How do you interpret βfemale∗tenure = −0.013? How is the effect of tenure different for men, compared to women? Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 14 / 45 Graphical depiction βfemale 1 βtenure Women Men Without × Intercept β∗ 2β∗ 3β∗ Tenure (rescaled) Log(wages) Example with wages (graph adapted from Brambor et al., 2005). β∗ means βfemale∗tenure. Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 15 / 45 Difference between coefficients and effects For linear models without interactions, coefficient = effect. A βX = 2 means the effect of 1-unit increase in X on Y is 2. For linear models with (significant) interactions, coefficient = effect. Rather, the effect of an interacted variable is a function of 2 coefficients. Wage =1.762 − 0.311 ∗ Fem. + 0.021 ∗ Tnr. − 0.013 ∗ Fem. ∗ Tnr. + . . . =1.762 + 0.021 ∗ Tnr. + (−0.311 − 0.013 ∗ Tnr.) effect ∗Fem. + . . . Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 16 / 45 2nd example: differences in salaries DV: Salary in company (Intercept) 14180.85∗∗∗ (333.93) Experience 452.66∗∗∗ (60.18) Management 7172.32∗∗∗ (506.82) Exper.*Managem. 222.74∗ (104.09) R2 0.88 Adj. R2 0.87 Num. obs. 46 RMSE 1701.23 ∗∗∗p < 0.001, ∗∗p < 0.01, ∗p < 0.05. Experience has been centered by subtracting 7.5 from each value. Experience measured in years, management is dichotomous indicator (1=manager) Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 17 / 45 3rd example: Boston house prices Model 1 Model 2 (Intercept) 22.251∗∗∗ 22.250∗∗∗ (0.302) (0.301) Average num. rooms 9.024∗∗∗ 8.967∗∗∗ (0.440) (0.416) Charles river 4.194∗∗∗ 4.083∗∗∗ (1.186) (1.151) Charles*Rooms −0.536 (1.355) R2 0.496 0.496 Adj. R2 0.493 0.494 Num. obs. 506 506 RMSE 6.547 6.541 ∗∗∗p < 0.001, ∗∗p < 0.01, ∗p < 0.05. Number of rooms has been demeaned. Predicting house price in neighborhood Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 18 / 45 Interactions – other measurement scales The interpretations carry over perfectly, e.g. when both are continuous (we will practice more during the lab). Y = a + b1X1 + b2X2 + b3(X1 ∗ X2) + e (7) b2 is the effect of X2 on Y when X1 is 0. The converse interpretation, for b1, is also identical. Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 19 / 45 Collinearity Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 20 / 45 High correlations in interactions require(MASS) out <- mvrnorm(300, # number of observations mu = c(5,5), # means of the variables # correlation matrix Sigma = matrix(c(1,0.35,0.35,1), ncol = 2), empirical = TRUE) colnames(out) <- c("x1","x2") out <- as.data.frame(out) cor(out$x1, out$x2) # So, that's the correlation [1] 0.35 out$inter <- out$x1*out$x2 # Construct the interaction term cor(out$x1, out$inter) # Correlation [1] 0.8179821 cor(out$x2, out$inter) # Correlation [1] 0.8137691 In these situations, the VIF becomes very large, making the sampling variance for coefficients large as well. Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 21 / 45 High correlations – “solution” Essentially, it’s justified that we have large SEs—the software is telling us it doesn’t have enough unique information to estimate the effect precisely. The “solution”: center the variable, i.e. subtract the mean/median from all observations on the variable. X∗ i = Xi − X (8) Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 22 / 45 High correlations – “solution” out$x1mod <- out$x1 - mean(out$x1) out$x2mod <- out$x2 - mean(out$x2) cor(out$x1mod, out$x2mod) # cor(X1,X2) is the same [1] 0.35 out$intermod <- out$x1mod*out$x2mod cor(out$x1mod, out$intermod) # Correlation [1] 0.02988741 cor(out$x2mod, out$intermod) # Correlation [1] -0.005910737 Not so much a solution; more of a re-specification of the original model (Kam & Franzese Jr., 2007, pp. 93–99). Centering will produce different bs, a and SEs, simply because these refer to different quantities. Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 23 / 45 Presentation Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 24 / 45 Significance testing in interactions With interactions, significance tests also take on a different interpretation (Braumoeller, 2004). Yi = a + b1X1i + b2X2i + b3(X1i ∗ X2i ) + ei (9) The significance test on b1 is only valid for instance when b2 = 0. At other levels of b2, this significance test might no longer produce a positive result. Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 25 / 45 Sampling variance Yi = a + b1X1i + b2X2i + b3(X1i ∗ X2i ) + ei (10) Since it’s an interaction, b1 is the coefficient of X1, and effX1 is the effect of X1 on Y. If b3 is significant, b1 = effX1 V(effX1) = V(b1) + X2 2 V(b3) + 2X2Cov(b1, b3) (11) This makes it clear that the variance varies depending on X2 as well. Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 26 / 45 Presenting results There is little need to use the formula in Equation 11 to compute things by hand.2 The best way to do present results from a specification with interactions is by plotting both the effect and its associated uncertainty. An easy way to do this is with the effects package in R (but also check out Thomas Leeper’s margins package). 2An example that shows you how to do this can be found in today’s script. Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 27 / 45 Predicting salaries DV: Salary in company (Intercept) 14180.85∗∗∗ (333.93) Experience 452.66∗∗∗ (60.18) Management 7172.32∗∗∗ (506.82) Exper.*Managem. 222.74∗ (104.09) R2 0.88 Adj. R2 0.87 Num. obs. 46 RMSE 1701.23 ∗∗∗p < 0.001, ∗∗p < 0.01, ∗p < 0.05. Experience has been centered by subtracting 7.5 from each value. Experience measured in years, management is dichotomous indicator (1=manager) Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 28 / 45 Predicting salaries – effect of experience q q 400 500 600 700 800 No Yes Management Effectofexperienceonsalary Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 29 / 45 Predicting salaries – effect of management 4000 6000 8000 10000 12000 −5 0 5 10 Experience (years) Effectofmngmt.onsalary Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 30 / 45 Predicting hourly wage – 3-way interaction Female * Married * Tenure interaction Tenure Effectonlog(wage) 1.5 2.0 2.5 −5 0 5 10 15 20 25 =female 0 =married 0 =female 1 =married 0 =female 0 =married 1 −5 0 5 10 15 20 25 1.5 2.0 2.5 =female 1 =married 1 Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 31 / 45 Fixed effects Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 32 / 45 Why fixed effects? DV: House price (ave.) (Intercept) −42.757∗∗∗ (9.620) Average num. rooms 10.139∗∗∗ (1.568) R2 0.471 Adj. R2 0.460 Num. obs. 49 RMSE 8.277 ∗∗∗p < 0.001, ∗∗p < 0.01, ∗p < 0.05 Predicting house price using number of rooms Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 33 / 45 Why fixed effects? q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q 10 20 30 40 50 60 5 6 7 8 Average rooms Houseprice town q qCambridge Roxbury Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 34 / 45 Why fixed effects? 1. As a solution to the issue of heteroskedasticity, when the problem is caused by different trends in each of the groups. 2. As a solution to the issue of omitted variable bias, on the road to a better causal estimate of the effect of X on Y. These two issues are related, inasmuch as the trends in the groups are caused by variables which our model specification does not include. Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 35 / 45 Classic example We have 172 children assessed with a test at 3 points in time. The goal is to understand what predicts their test scores, and whether extra courses helps. Measurements at multiple points in time are great for boosting sample size, and lowering SEs, but they add complications to the analysis: clustering. Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 36 / 45 Classic example DV: Test score (Intercept) 48.613∗∗∗ (0.661) Female 1.647∗ (0.764) SES index −1.712∗∗ (0.531) AP courses 4.812∗∗∗ (0.447) R2 0.196 Adj. R2 0.191 Num. obs. 516 RMSE 8.605 ∗∗∗p < 0.001, ∗∗p < 0.01, ∗p < 0.05 Predicting test scores What if other factors, e.g. genetic or psychological, are at play both for AP courses and test scores? Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 37 / 45 Standard model Scorei = a + b1X1i + · · · + bk Xki + ei (12) In the standard model, one of the assumptions is that es are distributed N (0, σ2 e ). This is no longer the case is there are omitted predictors Z, which were not included in the model.3 3The bigger implication here is also the fact that the effects of X1, . .. , Xk are likely biased in this case. Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 38 / 45 The error term Scoreit = b1X1 + · · · + bk Xk + αi + eit e (13) Now the error is decomposed into an individual-specific term, αi, and an observation-specific one, eit.4 If any time-invariant factors not in the model have an effect on test score, this means estimates for some Xs are biased. 4This observation can be understood as a “individual i at time t” case. Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 39 / 45 Within- and between- 2 sources of variance: between-individuals and within-individuals (over time). Suppose that over time we have a good model. However, the between-individual variance is the source of problems, as it may include variables we cannot observe in the data: drive to succeed, or genetic factors. The solution adopted by FE is to do away with the problematic variance, as either way our interest is in the time-varying factor: number of AP courses. Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 40 / 45 FE strategy: demeaning If we average the values over time for each student, ¯Y, ¯X1, ..., ¯Xk, and then subtract observations over time from these averages, we get Scoreit − Scorei = (X1i − ¯X1)β1 + · · · + (Xki − ¯Xk)βk + eit − ¯ei (14) This takes care of the problematic between-variance, as all that remains is within-variance. Raw Demeaned t1 t2 t3 t1 t2 t3 Individual 1 10 20 30 -10 0 10 Individual 2 60 70 80 -10 0 10 Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 41 / 45 FE “cousins”: LSDV Least Squares Dummy Variable (LSDV) regression. Add a set of i − 1 dummy indicators5 for persons, which capture all the between-person variation—the problematic one. Scoreit = a + b1X1 + · · · + bk Xk + P1 + · · · + Pi−1 i − 1 terms +eit (15) These allow for the causal effect to be estimated only based on within-variance. LSDV and FE will be identical. 5That’s because we still want to estimate an intercept. Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 42 / 45 FE “cousins”: first differences (FD) Particularly valuable for cases where auto-correlation of measurements proximate in time might be an issue. Instead of trying to explain raw scores, this approach focuses on score differences between adjacent time points. ∆Yt = ∆X1t β1 + · · · + ∆Xkt βk + ∆eit (16) where ∆Yt = Yt+1 − Yt. FE and FD will be identical only in instances with 2 time points. Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 43 / 45 Thank you for the kind attention! Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 44 / 45 References Brambor, T., Clark, W. R., & Golder, M. (2005). Understanding Interaction Models: Improving Empirical Analyses. Political Analysis, 14(1), 63–82. Braumoeller, B. F. (2004). Hypothesis Testing and Multiplicative Interaction Terms. International Organization, 58(4), 807–820. Kam, C. D., & Franzese Jr., R. J. (2007). Modeling and Interpreting Interactive Hypotheses in Regression Analysis. Ann Arbor, MI: University of Michigan Press. Solt, F. (2008). Economic Inequality and Democratic Political Engagement. American Journal of Political Science, 52(1), 48–60. Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 45 / 45