Advanced Topics in Applied
Regression
Day 2: Interactions & Fixed-effects
Constantin Manuel Bosancianu
Wissenschaftszentrum Berlin
Institutions and Political Inequality unit
manuel.bosancianu@wzb.eu
September 30, 2017
Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 1 / 45
Why interactions?
They allow for a much richer set of hypotheses to be put
forward and tested.
In my own area of focus (political institutions, economic
phenomena, and voter attitudes/behavior), such
hypotheses involving moderation are very common.
One prominent example: income inequality’s effect on
voter turnout at different levels of a person’s income (Solt,
2008).
Despite their importance, misunderstandings still persist
about how to interpret coefﬁcients/effects in such models.
Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 2 / 45
Basic setup
Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 3 / 45
Why specify interactions?
So far, we’ve worked with simple models. Think of the
example from yesterday, with Boston neighborhood
average house prices. Here, I complicated it a bit by also
adding a dummy for whether the neighborhood is on the
Charles river or not:
Prices = a + b1Rooms + b2River + e (1)
Here, the effect of River is assumed to be constant, b2, no
matter the level of the other variable in the model.
This is not always the case: effect of SES and union
membership on political participation, where bunion likely
varies.
Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 4 / 45
What if the effect isn’t constant?
The riverfront is a desirable real-estate location. Houses
with more rooms are certainly more expensive
everywhere in Boston, but it’s likely that the price
difference between n + 1 and n rooms is higher on the
riverfront than elsewhere.
In modelling terms, we might say that the effect of
Rooms on Price is different based on the value of the
River dummy.
Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 5 / 45
From words to equation (I)
Prices =a1 + b1Rooms + b2River + e
b1 =a2 + b3River
a1 =a3 + b4River
The second equation gives us how the effect of Rooms
(b1) varies depending on River.
The third equation makes sure that the intercept varies as
well (which usually happens if the slope varies).
Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 6 / 45
From words to equation (II)
Prices =a3 + b4River + (a2 + b3River) ∗ Rooms + b2River + e
=a3 + (b4 + b2) ∗ River + a2Rooms + b3River ∗ Rooms + e
=a3 + (b4 + b2) ∗ River + (a2 + b3River) ∗ Rooms + e (2)
The third row shows most clearly how the effect of
Rooms, a2 + b3River, now varies depending on the
precise value of the River indicator.
This depends, of course, on the b3 being statistically
signiﬁcant. If not, then the effect of Rooms is always a2.
Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 7 / 45
Basic interaction model
Prices = a3 + (b4 + b2) ∗ River + (a2 + b3River) ∗ Rooms + e (3)
If we designate a3 as γ1, b4 + b2 as γ2, a2 as γ3, and b3 as
γ4, then we get a general form of the interaction:
Prices = γ1 + γ2River + γ3Rooms + γ4River ∗ Rooms + e (4)
Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 8 / 45
Interaction model (cont.)
When River = 0,
Prices =γ1 + γ20 + γ3Rooms + γ4Rooms ∗ 0 + e
=γ1 + γ3Rooms + e (5)
When River = 1,
Prices =γ1 + γ21 + γ3Rooms + γ4Rooms ∗ 1 + e
=γ1 + γ2 + Rooms(γ3 + γ4) + e (6)
The effect of Rooms varies depending on the value of
River.
Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 9 / 45
Symmetry in interpretation
When Rooms = 0, then
Prices =γ1 + γ2River + γ3 ∗ 0 + γ4River ∗ 0 + e
=γ1 + γ2River + e
When Rooms = 1,
Prices =γ1 + γ2River + γ3 ∗ 1 + γ4River ∗ 1 + e
=γ1 + γ3 + River(γ2 + γ4) + e
The effect of River varies depending on the level of
Rooms.
Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 10 / 45
Interpretation
Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 11 / 45
Wages in 1976
We have information on 526 US workers:
wage: wage in USD per hour;
educ: years of education;
gender: male or female (with 1=female);
exper: labor force experience (yrs. in labor market);
tenure: yrs. with current employer.
The goal is to predict wages.1
1In fact, we’ll be predicting log(wage), as wages tend to be right
skewed, which causes problems with the normality of errors.
Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 12 / 45
Interpreting coefﬁcients
DV: Log hourly wage (USD)
(Intercept) 1.762∗∗∗
(0.025)
Female −0.311∗∗∗
(0.037)
Yrs. education 0.089∗∗∗
(0.007)
Yrs. experience 0.005∗∗
(0.002)
Yrs. tenure 0.021∗∗∗
(0.003)
Female * Tenure −0.013∗
(0.006)
R2 0.399
Adj. R2 0.393
Num. obs. 526
RMSE 0.414
∗∗∗p < 0.001, ∗∗p < 0.01, ∗p < 0.05. Continuous variables were demeaned.
Speciﬁcation with interaction: Female * Tenure
Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 13 / 45
Interpreting coefﬁcients (cont.)
How do you interpret βfemale = −0.311?
Important: after demeaning, the “0” for variable X refers
to the mean of X, X.
How do you interpret βtenure = 0.021?
How do you interpret βfemale∗tenure = −0.013? How is the
effect of tenure different for men, compared to women?
Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 14 / 45
Graphical depiction
βfemale
1
βtenure
Women
Men
Without ×
Intercept
β∗
2β∗
3β∗
Tenure (rescaled)
Log(wages)
Example with wages (graph adapted from Brambor et al.,
2005). β∗ means βfemale∗tenure.
Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 15 / 45
Difference between coefﬁcients and effects
For linear models without interactions,
coefﬁcient = effect. A βX = 2 means the effect of 1-unit
increase in X on Y is 2.
For linear models with (signiﬁcant) interactions,
coefﬁcient = effect. Rather, the effect of an interacted
variable is a function of 2 coefﬁcients.
Wage =1.762 − 0.311 ∗ Fem. + 0.021 ∗ Tnr. − 0.013 ∗ Fem. ∗ Tnr. + . . .
=1.762 + 0.021 ∗ Tnr. + (−0.311 − 0.013 ∗ Tnr.)
effect
∗Fem. + . . .
Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 16 / 45
2nd example: differences in salaries
DV: Salary in company
(Intercept) 14180.85∗∗∗
(333.93)
Experience 452.66∗∗∗
(60.18)
Management 7172.32∗∗∗
(506.82)
Exper.*Managem. 222.74∗
(104.09)
R2 0.88
Adj. R2 0.87
Num. obs. 46
RMSE 1701.23
∗∗∗p < 0.001, ∗∗p < 0.01, ∗p < 0.05. Experience has been centered by subtracting 7.5 from each value.
Experience measured in years, management is dichotomous
indicator (1=manager)
Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 17 / 45
3rd example: Boston house prices
Model 1 Model 2
(Intercept) 22.251∗∗∗ 22.250∗∗∗
(0.302) (0.301)
Average num. rooms 9.024∗∗∗ 8.967∗∗∗
(0.440) (0.416)
Charles river 4.194∗∗∗ 4.083∗∗∗
(1.186) (1.151)
Charles*Rooms −0.536
(1.355)
R2 0.496 0.496
Adj. R2 0.493 0.494
Num. obs. 506 506
RMSE 6.547 6.541
∗∗∗p < 0.001, ∗∗p < 0.01, ∗p < 0.05. Number of rooms has been demeaned.
Predicting house price in neighborhood
Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 18 / 45
Interactions – other measurement scales
The interpretations carry over perfectly, e.g. when both
are continuous (we will practice more during the lab).
Y = a + b1X1 + b2X2 + b3(X1 ∗ X2) + e (7)
b2 is the effect of X2 on Y when X1 is 0.
The converse interpretation, for b1, is also identical.
Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 19 / 45
Collinearity
Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 20 / 45
High correlations in interactions
require(MASS)
out <- mvrnorm(300, # number of observations
mu = c(5,5), # means of the variables
# correlation matrix
Sigma = matrix(c(1,0.35,0.35,1), ncol = 2),
empirical = TRUE)
colnames(out) <- c("x1","x2")
out <- as.data.frame(out)
cor(out$x1, out$x2) # So, that's the correlation
[1] 0.35
out$inter <- out$x1*out$x2 # Construct the interaction term
cor(out$x1, out$inter) # Correlation
[1] 0.8179821
cor(out$x2, out$inter) # Correlation
[1] 0.8137691
In these situations, the VIF becomes very large, making
the sampling variance for coefﬁcients large as well.
Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 21 / 45
High correlations – “solution”
Essentially, it’s justiﬁed that we have large SEs—the
software is telling us it doesn’t have enough unique
information to estimate the effect precisely.
The “solution”: center the variable, i.e. subtract the
mean/median from all observations on the variable.
X∗
i = Xi − X (8)
Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 22 / 45
High correlations – “solution”
out$x1mod <- out$x1 - mean(out$x1)
out$x2mod <- out$x2 - mean(out$x2)
cor(out$x1mod, out$x2mod) # cor(X1,X2) is the same
[1] 0.35
out$intermod <- out$x1mod*out$x2mod
cor(out$x1mod, out$intermod) # Correlation
[1] 0.02988741
cor(out$x2mod, out$intermod) # Correlation
[1] -0.005910737
Not so much a solution; more of a re-speciﬁcation of the
original model (Kam & Franzese Jr., 2007, pp. 93–99).
Centering will produce different bs, a and SEs, simply
because these refer to different quantities.
Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 23 / 45
Presentation
Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 24 / 45
Signiﬁcance testing in interactions
With interactions, signiﬁcance tests also take on a
different interpretation (Braumoeller, 2004).
Yi = a + b1X1i + b2X2i + b3(X1i ∗ X2i ) + ei (9)
The signiﬁcance test on b1 is only valid for instance when
b2 = 0.
At other levels of b2, this signiﬁcance test might no longer
produce a positive result.
Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 25 / 45
Sampling variance
Yi = a + b1X1i + b2X2i + b3(X1i ∗ X2i ) + ei (10)
Since it’s an interaction, b1 is the coefﬁcient of X1, and
effX1 is the effect of X1 on Y. If b3 is signiﬁcant,
b1 = effX1
V(effX1) = V(b1) + X2
2 V(b3) + 2X2Cov(b1, b3) (11)
This makes it clear that the variance varies depending on
X2 as well.
Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 26 / 45
Presenting results
There is little need to use the formula in Equation 11 to
compute things by hand.2
The best way to do present results from a speciﬁcation
with interactions is by plotting both the effect and its
associated uncertainty.
An easy way to do this is with the effects package in R
(but also check out Thomas Leeper’s margins package).
2An example that shows you how to do this can be found in
today’s script.
Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 27 / 45
Predicting salaries
DV: Salary in company
(Intercept) 14180.85∗∗∗
(333.93)
Experience 452.66∗∗∗
(60.18)
Management 7172.32∗∗∗
(506.82)
Exper.*Managem. 222.74∗
(104.09)
R2 0.88
Adj. R2 0.87
Num. obs. 46
RMSE 1701.23
∗∗∗p < 0.001, ∗∗p < 0.01, ∗p < 0.05. Experience has been centered by subtracting 7.5 from each value.
Experience measured in years, management is dichotomous
indicator (1=manager)
Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 28 / 45
Predicting salaries – effect of experience
q
q
400
500
600
700
800
No Yes
Management
Effectofexperienceonsalary
Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 29 / 45
Predicting salaries – effect of management
4000
6000
8000
10000
12000
−5 0 5 10
Experience (years)
Effectofmngmt.onsalary
Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 30 / 45
Predicting hourly wage – 3-way interaction
Female * Married * Tenure interaction
Tenure
Effectonlog(wage)
1.5
2.0
2.5
−5 0 5 10 15 20 25
=female 0
=married 0
=female 1
=married 0
=female 0
=married 1
−5 0 5 10 15 20 25
1.5
2.0
2.5
=female 1
=married 1
Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 31 / 45
Fixed effects
Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 32 / 45
Why ﬁxed effects?
DV: House price (ave.)
(Intercept) −42.757∗∗∗
(9.620)
Average num. rooms 10.139∗∗∗
(1.568)
R2 0.471
Adj. R2 0.460
Num. obs. 49
RMSE 8.277
∗∗∗p < 0.001, ∗∗p < 0.01, ∗p < 0.05
Predicting house price using number of rooms
Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 33 / 45
Why ﬁxed effects?
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q q
q
q
q
q q
q
q
q
q q q
q
q q
q
q
q
q
q
q
q qq
q
q q
q
10
20
30
40
50
60
5 6 7 8
Average rooms
Houseprice
town q qCambridge Roxbury
Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 34 / 45
Why ﬁxed effects?
1. As a solution to the issue of heteroskedasticity, when
the problem is caused by different trends in each of
the groups.
2. As a solution to the issue of omitted variable bias, on
the road to a better causal estimate of the effect of X
on Y.
These two issues are related, inasmuch as the trends in
the groups are caused by variables which our model
speciﬁcation does not include.
Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 35 / 45
Classic example
We have 172 children assessed with a test at 3 points in
time.
The goal is to understand what predicts their test scores,
and whether extra courses helps.
Measurements at multiple points in time are great for
boosting sample size, and lowering SEs, but they add
complications to the analysis: clustering.
Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 36 / 45
Classic example
DV: Test score
(Intercept) 48.613∗∗∗
(0.661)
Female 1.647∗
(0.764)
SES index −1.712∗∗
(0.531)
AP courses 4.812∗∗∗
(0.447)
R2 0.196
Adj. R2 0.191
Num. obs. 516
RMSE 8.605
∗∗∗p < 0.001, ∗∗p < 0.01, ∗p < 0.05
Predicting test scores
What if other factors, e.g. genetic or psychological, are at
play both for AP courses and test scores?
Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 37 / 45
Standard model
Scorei = a + b1X1i + · · · + bk Xki + ei (12)
In the standard model, one of the assumptions is that es
are distributed N (0, σ2
e ).
This is no longer the case is there are omitted predictors
Z, which were not included in the model.3
3The bigger implication here is also the fact that the effects of X1,
. .. , Xk are likely biased in this case.
Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 38 / 45
The error term
Scoreit = b1X1 + · · · + bk Xk + αi + eit
e
(13)
Now the error is decomposed into an individual-speciﬁc
term, αi, and an observation-speciﬁc one, eit.4
If any time-invariant factors not in the model have an
effect on test score, this means estimates for some Xs are
biased.
4This observation can be understood as a “individual i at time t”
case.
Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 39 / 45
Within- and between-
2 sources of variance: between-individuals and
within-individuals (over time).
Suppose that over time we have a good model. However,
the between-individual variance is the source of
problems, as it may include variables we cannot observe
in the data: drive to succeed, or genetic factors.
The solution adopted by FE is to do away with the
problematic variance, as either way our interest is in the
time-varying factor: number of AP courses.
Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 40 / 45
FE strategy: demeaning
If we average the values over time for each student, ¯Y,
¯X1, ..., ¯Xk, and then subtract observations over time
from these averages, we get
Scoreit − Scorei = (X1i − ¯X1)β1 + · · · + (Xki − ¯Xk)βk + eit − ¯ei (14)
This takes care of the problematic between-variance, as all
that remains is within-variance.
Raw Demeaned
t1 t2 t3 t1 t2 t3
Individual 1 10 20 30 -10 0 10
Individual 2 60 70 80 -10 0 10
Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 41 / 45
FE “cousins”: LSDV
Least Squares Dummy Variable (LSDV) regression.
Add a set of i − 1 dummy indicators5 for persons, which
capture all the between-person variation—the
problematic one.
Scoreit = a + b1X1 + · · · + bk Xk + P1 + · · · + Pi−1
i − 1 terms
+eit (15)
These allow for the causal effect to be estimated only
based on within-variance.
LSDV and FE will be identical.
5That’s because we still want to estimate an intercept.
Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 42 / 45
FE “cousins”: ﬁrst differences (FD)
Particularly valuable for cases where auto-correlation of
measurements proximate in time might be an issue.
Instead of trying to explain raw scores, this approach
focuses on score differences between adjacent time points.
∆Yt = ∆X1t β1 + · · · + ∆Xkt βk + ∆eit (16)
where ∆Yt = Yt+1 − Yt.
FE and FD will be identical only in instances with 2 time
points.
Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 43 / 45
Thank you for the kind
attention!
Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 44 / 45
References
Brambor, T., Clark, W. R., & Golder, M. (2005). Understanding
Interaction Models: Improving Empirical Analyses. Political
Analysis, 14(1), 63–82.
Braumoeller, B. F. (2004). Hypothesis Testing and Multiplicative
Interaction Terms. International Organization, 58(4), 807–820.
Kam, C. D., & Franzese Jr., R. J. (2007). Modeling and Interpreting
Interactive Hypotheses in Regression Analysis. Ann Arbor, MI:
University of Michigan Press.
Solt, F. (2008). Economic Inequality and Democratic Political
Engagement. American Journal of Political Science, 52(1), 48–60.
Constantin Manuel Bosancianu Wissenschaftszentrum Berlin, IPI unit September 30, 2017 45 / 45