Analysing interactions of ﬁtted models
Helios De Rosario Mart´ınez
October 12, 2013
Abstract
This vignette presents a brief review about the existing approaches for the post-hoc analysis of
interactions in factorial experiments, and describes how to perform some of the cited calculations
and tests with the functions of the package phia in R. Those functions include the calculation and
plotting of cell means, and testing simple eﬀects, residual eﬀects, and interaction contrasts, among
other possibilities. They can be applied to linear and generalized linear models, with or without
covariates, and to mixed or multivariate linear models for repeated measures experiments.
1 Introduction
The post-hoc analysis of interactions in factorial ANOVA is a controversial issue, that has generated
many discussions and a variety of methods. Traditionally, the most frequent practice has been the
analysis of simple main eﬀects, i.e. the main eﬀect of one factor at ﬁxed values of the other factors.
This type of analysis, however, has severly been criticized for its mixing both main and interaction
eﬀects. Marascuilo and Levin stated in 1970 that analysing the simple eﬀects of a signiﬁcant interaction
was a typical case of the so-called “Type IV error”: a wrong interpretation of a correctly rejected null
hypothesis, since that analysis does not investigate the hypothesis that is presumably being tested [1].
On the other hand, they proposed the analysis of interaction contrasts (crossed contrasts of diﬀerent
factors) or the interaction eﬀects (the value of the interaction after removing low-order eﬀects). The
latter option was avidly supported by Rosnow and Rosenthal as well, who called that concept residual
or leftover contrasts [2, 3].
However, such proposals have not produced an established “correct” practice at all. In fact, the
analysis of interactions has been a heated ﬁeld of debate for years; see Games’ defence of simple
eﬀect tests [4] and Levin’s and Marascuilo’s response [5], or the criticisms of Meyer and Petty et al.
to Rosnow’s and Rosenthal’s proposals [6, 7], as well as the answer to the latter [8]. Although the
theoretical issues of simple eﬀects tests are generally acknowledged, an eventual consensus about the
“best” alternative method is probably diﬃcult to achieve. A general valid procedure is not possible in
the ﬁrst place, since the correct test depends on the speciﬁc problem addressed by the experiment.
Unfortunately, many researchers do not choose a method depending on the question they want to
answer. In spite of the criticism received by the analysis of simple eﬀects, various reviews of published
research in the three last decades have shown that it still is by far the most frequent practice [2, 9, 10].
According to Pardo’s et al. interpretation, this is partly due to the limitations of commonplace software
packages, which do not provide direct facilities for analysing the contrasts that isolate interaction eﬀects
[10].
On the other hand, the ﬂexibility of R does allow to analyse any kind of contrast across the factors
of ﬁtted models, even beyond the two- or three-factorial designs that are normally discussed in the
literature. After all, any contrast can be described as a linear combination of the model coeﬃcients.
Thus, since the mathematical details of ﬁtted models (like their matrices of coeﬃcients and covariances)
are easily available in R, the values and errors of those contrasts can be calculated without diﬃculty, and
moreover there are contributed functions that facilitate the statistical tests based on such combinations
of coeﬃcients, like linearHypothesis in the package car [11], or glht in multcomp [12], which is
specially suited for testing main eﬀects.
1
The package phia (Post-Hoc Interaction Analysis) provides a usable interface for calculating different
types of contrasts, that are mentioned in the literature related to the analysis of interactions, as
well as other combinations of factors that could be of interest for the researcher. The functionality of
this package also extends to more complex models, like generalized linear models, mixed-eﬀects models,
multivariate linear models for repeated measures designs, and models with covariates. The testing
procedures provided by this package are the ones covered by linearHypothesis, i.e. tests based on
F and χ2
statistics, with adjusted p-values if needed. The following sections of this paper give a more
detailed description of the main types of contrasts that can be used for analysing interactions, and
examples of using the functions of this package for calculating and testing them.
2 Mathematical formulation of interactions
Interactions are often described in terms of a linear, two-way factorial model, where the response
variable Y is a function of the factors A and B. Interactions are said to exist when a change in the
level of one factor has diﬀerent eﬀects on the response variable, depending on the value of the other
factor [13]. This is often represented by the following formula:
Yijk = µ + αi + βj + (αβ)ij + εijk, (1)
where αi, βj represent the “main eﬀect” of the i-th and j-th levels of A and B, respectively, (αβ)ij
is the eﬀect of the interaction in that combination of levels, and εijk is the error term of the k-th
observation in that combination.
R analyses such models in the more general framework of linear models, deﬁned by the following
matrix equation:
Y
(n×m)
= X
(n×(r+1))
B
((r+1)×m)
+ E
(n×m)
, (2)
where Y contains the n observations of the m-dimensional response variable (with m often equal to
1), X is the model matrix that only depends on the observed values of the predictor variables and the
structure of the model, B is the coeﬃcient matrix for that model structure and data (with r degrees
of freedom — d.o.f.), and E contains the error term.
The structure of X and B is very simple for linear regression models, where all predictors are
numeric variables. Let us take a regression model with an univariate response and two regressors (X1
and X2), including their interaction. In this case (2) would just be the formulation in matrix form of
the regression equation:



Y1
...
Yn


 =



1 X11 X21 X11X21
...
...
...
...
1 X1n X2n X1nX2n







β0
β1
β2
β12



 +



ε1
...
εn


 (3)
This model has 3 terms, each with one d.o.f., that are represented by diﬀerent columns of X and
coeﬃcients of B (besides the intercept represented by the column of ones and β0). Two of these terms
are the main eﬀects of the regressors, represented by their values in X and the “slopes” β1, β2; the
other one is the interaction term, represented by the product X1iX2i and the coeﬃcient β12. For more
complex regression models, there may be as many terms as possible products of regressors, such that
if there are k regressors, there may be up to k2
− 1 terms.
If some or all the predictor variables are factors, the same equations hold, but the representation
of the terms in the model matrix would not be scalar values as X1i, X2i, or X1iX2i. The main term
of each factor would be represented by a set of “dummy variables”, whose number would be equal to
the d.o.f. of the factor (the number of levels minus 1), and the interactions would be represented by
all the possible products of the corresponding dummy variables. Thus, for instance, if there are two
factors A and B with 3 and 4 levels, respectively, the term of A would be represented by 2 dummy
variables, B by 3 of them, and their interaction by 2 × 3 = 6 dummy variables.
2
The problem is that the coeﬃcients that deﬁne the interactions in this framework are not always
useful for describing the model in practical terms. The products of regressors are usually meaningless
variables, and this poses a diﬃculty in the interpretation of the associated coeﬃcients (see section 7
below). This issue is further aggravated when there are factors involved in the interactions, since the
meaning of the dummy variables may be even more opaque. That is one of the reasons that motivate
the diﬀerent ways of describing interactions, which are commented on next.
3 Analysis of simple eﬀects in factorial models
3.1 Calculation and plots of “cell means”
Let us take, for this and later sections, an example data set based on R.J. Boik’s hypothetical data [14],
which he used for demonstrating how to analyse interaction contrasts in linear models, although it will
be used here for a larger variety of interaction analyses. It represents a hypothetical experiment, where
people aﬀected by hemophobia were treated with diﬀerent fear reduction therapies and diﬀerent doses
of antianxiety medication, in a balanced factorial design, and the eﬀect of these combined treatments
was measured by their electrodermal response in an experimental session.
First we need to create the linear model from the data. We will use the data frame Boik also included
in the package phia, that has the response edr, and two factors (therapy, with levels control, T1, and
T2; and medication, with levels placebo, D1, D2, and D3). We use the function some of the package
car (imported by phia) to see some cases:
> library(phia)
> some(Boik)
therapy medication edr
1 control placebo 49.44617
2 control placebo 52.84019
9 control D1 48.11244
13 control D2 47.64107
16 control D2 44.86313
34 T1 D1 30.55422
35 T1 D1 36.47223
38 T1 D2 29.99902
55 T2 D1 38.67048
61 T2 D2 36.03652
Before proceeding with detail analyses of the interactions, we should ﬁrst check if the factorial
model is coherent with the data, and if the interaction between both factors is actually signiﬁcant. We
can do this by examining the residuals of the model (see ﬁgure 1) and the ANOVA table.
> mod.boik <- lm(edr ~ therapy*medication, data=Boik)
> par(mfcol=c(1,2))
> plot(mod.boik, 1:2) # Plot diagnostics for the model
> Anova(mod.boik)
Anova Table (Type II tests)
Response: edr
Sum Sq Df F value Pr(>F)
therapy 2444.1 2 63.813 1.399e-15 ***
medication 2370.9 3 41.269 1.342e-14 ***
therapy:medication 1376.4 6 11.979 8.539e-09 ***
Residuals 1149.0 60
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
3
20 30 40 50
−10−50510
Fitted values
Residuals
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
Residuals vs Fitted
15
41
30
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
−2 −1 0 1 2
−3−2−1012
Theoretical Quantiles
Standardizedresiduals
Normal Q−Q
15
41
30
Figure 1: Residuals vs. ﬁtted values and Q-Q plot of mod.boik
Although the plots of ﬁgure 1 show a minor departure of normality for the residuals, specially due
to a couple of extremely low observations, for the sake of balance we will keep all data, and assume
that the model assumptions hold. Then we see in the ANOVA table that the interaction between
therapy and medication is signiﬁcant, so it does makes sense to investigate this eﬀect.1
In factorial experiments like this one, the dependency between factor levels and the response variable
is usually represented in a contingency table, where the rows and columns are related to the diﬀerent
levels of both treatments, and each cell contains the adjusted mean of the response for the corresponding
interaction of factors. When there is an interaction eﬀect, the cell means are taken as the most
straightforward way of representing this eﬀect. These values can be obtained from the model coeﬃcients
with the function interactionMeans in the package phia, using the ﬁtted model as ﬁrst (and in this
case only) argument:
> (boik.means <- interactionMeans(mod.boik))
therapy medication adjusted mean
1 control placebo 50.20043
2 T1 placebo 49.89963
3 T2 placebo 45.69925
4 control D1 47.49899
5 T1 D1 38.20065
6 T2 D1 39.09930
7 control D2 45.99989
8 T1 D2 28.50055
9 T2 D2 36.50036
1The data set is based on the results reported in Boik’s paper for the diﬀerent tests, but not directly copied from
his original work (that actually gives no data set). Thus, the residual plots are irrespective of Boik’s paper, and due
to rounding inaccuracies, the ﬁgures presented in this vignette and the ones of Boik’s tables diﬀer in the last decimals.
Regarding the ANOVA calculations, the Anova function from the package car has been used to be consistent with later
sections, although for this set of balanced data the results would be the same if we had used anova from the base R
package.
4
20253035404550
q
q
q
q
q
therapy
control
T1
T2
q
q
q
control T1 T2
20253035404550
placebo D1 D2 D3
q
medication
placebo
D1
D2
D3
therapy medication
adjusted mean
Figure 2: Result of plot(boik.means)
10 control D3 47.89981
11 T1 D3 18.99962
12 T2 D3 32.69961
This function calculates by default the cell means for the interactions of highest order between
factors. To obtain means of lower-order interactions, the optional argument factors admits a character
vector with the names of the factors that are included in the desired interaction.2
If this argument
gives only one factor, the result will be the means of its zeroth-order interaction (i.e. the marginal
means for that factor). Thus, for instance:
> interactionMeans(mod.boik, factors="therapy")
therapy adjusted mean
1 control 47.89978
2 T1 33.90011
3 T2 38.49963
The output of interactionMeans can be plotted via the generic plot method, that produces the
set of plots shown in ﬁgure 2. The oﬀ-diagonal panels are the typical interaction plots, that can also be
created by interaction.plot from the columns of boik.means, where the lack of parallelism between
lines reveals how one factor changes the eﬀect of the other one. In this case, we see that the control
group hardly obtains any beneﬁt from the medication, whereas with the other therapies (T1 and T2)
the fear to blood is reduced proportionally to the medication dose, and more markedly for the former.
On the other hand, the diagonal panels represent the marginal means of each factor.
If the interaction involved more than two factors, the graphical device would have as many rows
and columns as factors, and the oﬀ-diagonal panels would show the ﬁrst-order interaction means for
each pair of factors. For interactions with many factors, the matrix of panels may be cluttered, so it
would be more convenient to show them in separate ﬁgures. The argument multiple (TRUE by default)
can be modiﬁed for this purpose:
2But consider the contradiction of this approach with the marginality principle discussed in the next section.
5
> plot(boik.means, multiple=FALSE) # Not printed in this paper
3.2 Caveat: low-order interactions and the marginality principle
The basic methods of statistical analysis in R favour the so-called “marginality principle”, whereby
the main eﬀects of factors with non-null interactions should not be interpreted or tested [15]. Now,
the same warning applies to interactions that are themselves contained in interactions of higher order.
Thus, although the plots described in the previous section are commonplace in the study of interactions,
they are not necessarily meaningful in all circumstances.
For instance, since the interaction between therapy and medication is signiﬁcant, according to the
marginality principle we should not be concerned with the main eﬀects of those factors, so the diagonal
panels of ﬁgure 2 would be irrelevant for this model. Likewise, if a model has more than two factors and
an interaction of second or higher order is signiﬁcant, no plot containing those factors would actually
be of interest, since the method plot on the result of interactionMeans only represents main eﬀects
and ﬁrst-order interactions. (This is a limitation of the graphic representation alone, since the data
frame can represent higher-order interactions).
A suitable alternative for representing higher-order interactions are the functions effect and
allEffects from the package eﬀects [16]. For factorial models, their outputs contain the same type
of data as interactionMeans, although they handle the numeric predictors of the model in another
way, and the types of models that can be analysed is diﬀerent (see sections 6 and 7). Those functions
are specially devised to analyse and plot the interactions of highest order in the model, through more
sophisticated graphic tools, and they also calculate and plot conﬁdence intervals of the means.
3.3 Testing simple eﬀects
The tabulation or graphical representation of cell means may give us a hint of the underlying structure
of interactions, but they do not suﬃce to verify whether a speciﬁc change in the factors plays a
signiﬁcant role in an interaction. As commented on above, the most frequent approach for solving this
issue consists in testing the simple eﬀects, as an extension of the post-hoc methods that are widely
applied to the study of main factor eﬀects.
The available methods for the post-hoc analysis of main eﬀects are manifold. The most basic
procedure consists in evaluating multiple contrasts between factor levels, possibly with corrections of
the p-value in order to protect the family-wise error rate. Pairwise comparisons between levels are
usually a default strategy when the researcher has no previous plan, although this is ineﬃcient when
the factor has many levels. Tukey’s method for testing pairwise contrasts, and Scheﬀ´e’s method for
all possible contrasts within a factor, are probably the most popular ones. The package multcomp
provides useful tools for this kind of main eﬀects contrasts.
Testing simple main eﬀects for interactions consists in evaluating contrasts across the levels of one
factor, when the values of the other interaction factors are ﬁxed at certain levels. This test is then
repeated at other ﬁxed levels, and the results are compared. For instance, we could test the eﬀect of
medication at the diﬀerent levels of therapy. This can be done with the function testInteractions,
using the arguments fixed and across to deﬁne the factors that are ﬁxed and tested across their levels
in each test:
> testInteractions(mod.boik, fixed="therapy", across="medication")
F Test:
P-value adjustment method: holm
medication1 medication2 medication3 Df Sum of Sq F Pr(>F)
control 2.3006 -0.4008 -1.8999 3 54.38 0.9465 0.4239
T1 30.9000 19.2010 9.5009 3 3153.95 54.8985 < 2.2e-16
T2 12.9996 6.3997 3.8007 3 538.99 9.3818 7.117e-05
Residuals 60 1149.01
control
6
T1 ***
T2 ***
Residuals
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The columns medication1, . . . medication3 in the resulting table contain the value of the three
orthogonal contrasts across the levels of medication, for each level of therapy (the only ﬁxed factor
in this example).3
The rest of columns show the information of the multivariate test applied to those
contrasts. These tests just quantify the qualitative interpretation that was made from the plots: the
medication does not have a signiﬁcant eﬀect for the control therapy group, but its eﬀect is remarkable
for the other groups.
The criticism often posed to this method is that interactions are mixed with main eﬀects (or lowerorder
interactions within), so the tests are not really related to the term that is supposedly under
investigation. Using this example, the post-hoc analysis of the term therapy:medication is being
performed because the ANOVA told us that it is signiﬁcant; and this means that the coeﬃcients of
the matrix B related to this term are unlikely to be null. However, the tests of simple eﬀects that
have just been described do not only involve those coeﬃcients, but also the coeﬃcients related to the
lower-order terms therapy and medication.4
On the other hand, many researchers like simple eﬀects for their relatively straightforward interpretation.
Moreover, the interference of lower-order coeﬃcients may be regarded a lesser issue when
the marginality principle is considered. In this theoretical framework, the presence of a high-order
interaction makes lower order terms meaningless, so that their eﬀects are absorbed by the interaction.
Therefore, the coeﬃcients of lower order terms would partially be related to the interaction eﬀect as
well.
4 Analysis of residual eﬀects
To address the conceptual problems of simple eﬀects, Rosnow and Rosenthal encouraged the analysis
of residual eﬀects, by “peeling away” the lower-order eﬀects from cell means [2]. For instance, let us
see the cell means calculated in boik.means, in a table with the marginal means for both factors and
the grand mean:
> boik.mtable <- xtabs(boik.means$"adjusted mean" ~ therapy+medication, boik.means)
> boik.mtable <- addmargins(boik.mtable, FUN=mean, quiet=TRUE)
> print(boik.mtable, digits=4)
medication
therapy placebo D1 D2 D3 mean
control 50.2 47.5 46.0 47.9 47.9
T1 49.9 38.2 28.5 19.0 33.9
T2 45.7 39.1 36.5 32.7 38.5
mean 48.6 41.6 37.0 33.2 40.1
The “corrected means” would be obtained by subtracting the lowest-order eﬀect (the grand mean)
from the rest of values of the table, and then sweeping out the corrected marginal means from the
individual cells.
3The speciﬁc contrasts that are calculated depend on various elements. In this case, since the original data frame
deﬁnes medication as an ordered factor, polynomial contrasts are computed by default. For unordered factors they would
have been “sum-to-zero contrasts”. This default behaviour can be overriden by setting other contrast in the original data
frame, the ﬁtted model, or with additional arguments in testInteractions.
4interactionTest does multiple calls to the function testFactors, which in its turn deﬁnes a linear combination of
the model coeﬃcients and passes it down to linearHypothesis from car. The hypothesis matrices used in these tests
can be looked at to see what coeﬃcients are actually involved.
7
> boik.resid <- boik.mtable - boik.mtable[4,5] # Subtract the mean
> boik.resid <- sweep(boik.resid, 1, boik.resid[,5]) # Subtract row means
> boik.resid <- sweep(boik.resid, 2, boik.resid[4,]) # Subtract column means
> print(boik.resid, digits=4)
medication
therapy placebo D1 D2 D3 mean
control -6.1993 -1.9006 1.1997 6.9002 0.0000
T1 7.4996 2.8007 -2.3000 -8.0003 0.0000
T2 -1.3003 -0.9001 1.1003 1.1001 0.0000
mean 0.0000 0.0000 0.0000 0.0000 0.0000
These values can also be calculated (and moreover tested) by testInteractions via the argument
residuals, instead of fixed or across:
> testInteractions(mod.boik,residual=c("therapy","medication"))
F Test:
P-value adjustment method: holm
Value Df Sum of Sq F Pr(>F)
control (resid.) : placebo (resid.) -6.1993 1 461.17 24.0819 6.672e-05 ***
T1 (resid.) : placebo (resid.) 7.4996 1 674.93 35.2438 1.724e-06 ***
T2 (resid.) : placebo (resid.) -1.3003 1 20.29 1.0595 1.0000
control (resid.) : D1 (resid.) -1.9006 1 43.35 2.2635 0.8262
T1 (resid.) : D1 (resid.) 2.8007 1 94.13 4.9153 0.2434
T2 (resid.) : D1 (resid.) -0.9001 1 9.72 0.5077 1.0000
control (resid.) : D2 (resid.) 1.1997 1 17.27 0.9019 1.0000
T1 (resid.) : D2 (resid.) -2.3000 1 63.48 3.3148 0.5155
T2 (resid.) : D2 (resid.) 1.1003 1 14.53 0.7586 1.0000
control (resid.) : D3 (resid.) 6.9002 1 571.35 29.8353 9.515e-06 ***
T1 (resid.) : D3 (resid.) -8.0003 1 768.06 40.1073 4.069e-07 ***
T2 (resid.) : D3 (resid.) 1.1001 1 14.52 0.7584 1.0000
Residuals 60 1149.01
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
However, these results may cause confusion, and their actual interest may be dubious in many
cases, including the analysis of this model. We can plot the corrected means (ommiting the margins
of the table) for a clearer inspection of what is happening; see ﬁgure 3:
> matplot(t(boik.resid[-4,-5]), type="b", xaxt="n", ylab="Interaction residuals")
> axis(1, at=1:4, labels=levels(Boik$medication))
The lines of this plot (representing the three therapies) show the typical symmetry of residual
eﬀects. The line labelled with 1 (the control group without speciﬁc therapy) shows a negative residual
eﬀect of the placebo (a lower electrodermal response), that goes up to positive values as the medication
dose increases. The residual eﬀects of the group treated with therapy T1 are the opposite, whereas the
the T2 group has a trend similar to the controls, but quite smaller.
The ﬁrst problem is that a hasty interpretation of these values would lead to nonsense. Of course,
they do not mean that the hemophobia of people that did not participate in any theraphy worsened with
the medication! The correct interpretation is that for these people, the eﬀect of increasing medication
was lower than value expected by the average of marginal means, and the opposite happened with the
group treated with therapy T1.
But this reasoned interpretation, albeit mathematically sound, only has sense as long as the expectations
based on the marginal averages mean anything, and this is contrary to the principle of
marginality commented on above. Since the factors therapy and medication do interact, the marginal
8
1
1
1
1
−505
Interactionresiduals
2
2
2
2
3
3
3 3
placebo D1 D2 D3
Figure 3: Residual eﬀects of mod.boik
means depend on the distribution of the sample across the factors; they are not reliable information
about the response that we can expect for the diﬀerent factor levels, except for a population with the
same distribution as the sample (in this case a balanced distribution). See, for instance, Games’ and
Meyer’s discussion on this issue (although they do not explicitly refer to the “principle of marginality”)
[4, 6].
That principle can be neglected if there is a good reason to study the marginal means that are
obtained with a speciﬁc experimental design, but these circumstances are rare and special [17]. In the
current example there is no compelling reason to do so, therefore the signiﬁcant diﬀerences found for
the placebo and the highest dose in diﬀerent therapies are rather uninformative.
5 Interaction contrasts
Another alternative to simple eﬀects is the study of interaction contrasts, which were in fact the subject
of the paper where our working data is derived from, although Boik used a slightly diﬀerent procedure
for their analysis. Like in the analysis of interaction residuals, the hypothesis tested by interaction
contrasts is not aﬀected by the coeﬃcients of main eﬀects, but this approach overcomes the commented
issue of interaction residuals, because it does not make use of marginal means, it only uses the data
of the cells [5, 6]. Interaction contrasts are deﬁned as “diﬀerential eﬀects”, or more descriptively as
“diﬀerences of diﬀerences”, or “contrasts between contrasts”. They basically consist in calculating
one or more contrasts across a factor, and then iterating on the results of that operation across the
remaining factors.
For instance, the test of simple eﬀects previously calculated for mod.boik could be transformed
into a test of interaction contrasts, if instead of ﬁxing the levels of therapy for evaluating the contrasts
across medication, we do pairwise contrasts between therapy levels. For this we must use the
argument pairwise instead of fixed:
> testInteractions(mod.boik, pairwise="therapy", across="medication")
F Test:
9
P-value adjustment method: holm
medication1 medication2 medication3 Df Sum of Sq F Pr(>F)
control-T1 -28.599 -19.6019 -11.4008 3 1332.10 23.1869 1.302e-09
control-T2 -10.699 -6.8005 -5.7007 3 175.95 3.0627 0.03481
T1-T2 17.900 12.8013 5.7002 3 556.55 9.6874 5.270e-05
Residuals 60 1149.01
control-T1 ***
control-T2 *
T1-T2 ***
Residuals
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
These tests show how the contrasts across medication diﬀer between pairs of therapy groups. We
can see that the medication eﬀect with the therapy T1 diﬀers from the eﬀect in controls and for the
other therapy; the medication eﬀect with T2 also diﬀers from the eﬀect in controls, but this variation
is not so remarkable. This conclusion was not so clear from the simple eﬀects tests. Moreover, these
results are not disturbed by the main eﬀects of factors at all, because the calculation of contrasts has
removed them for both factors, without having deﬁned them explicitly.
The most basic interaction contrasts involve pairwise contrasts for all factors. That is what testInteractions
does by default, when only the model is speciﬁed.
> testInteractions(mod.boik)
F Test:
P-value adjustment method: holm
Value Df Sum of Sq F Pr(>F)
control-T1 : placebo-D1 -8.9975 1 121.43 6.3411 0.1448291
control-T2 : placebo-D1 -3.8985 1 22.80 1.1905 0.6951907
T1-T2 : placebo-D1 5.0990 1 39.00 2.0365 0.6951907
control-T1 : placebo-D2 -17.1985 1 443.68 23.1687 0.0001563 ***
control-T2 : placebo-D2 -4.9984 1 37.48 1.9569 0.6951907
T1-T2 : placebo-D2 12.2002 1 223.27 11.6587 0.0149653 *
control-T1 : placebo-D3 -28.5994 1 1226.89 64.0665 8.678e-10 ***
control-T2 : placebo-D3 -10.6990 1 171.70 8.9661 0.0439050 *
T1-T2 : placebo-D3 17.9004 1 480.63 25.0981 8.163e-05 ***
control-T1 : D1-D2 -8.2010 1 100.88 5.2681 0.2270868
control-T2 : D1-D2 -1.0999 1 1.81 0.0948 0.7592877
T1-T2 : D1-D2 7.1012 1 75.64 3.9498 0.4115755
control-T1 : D1-D3 -19.6019 1 576.35 30.0962 1.479e-05 ***
control-T2 : D1-D3 -6.8005 1 69.37 3.6224 0.4326396
T1-T2 : D1-D3 12.8013 1 245.81 12.8360 0.0095516 **
control-T1 : D2-D3 -11.4008 1 194.97 10.1810 0.0270962 *
control-T2 : D2-D3 -5.7007 1 48.75 2.5455 0.6951907
T1-T2 : D2-D3 5.7002 1 48.74 2.5450 0.6951907
Residuals 60 1149.01
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
If all the factors of the model had 2 levels, this would have been an optimal strategy for analysing
the interaction, since the result would have been reduced to one test, corresponding to the single d.o.f.
of such an interaction. But the factors with more levels heavily increase the number of tests, so that
for our 3 × 4 factorial design, with 2 × 3 = 6 d.o.f., we obtain 18 overredundant tests. Such a high
number of tests is diﬃcult to interpret, let aside the lack of reliability of the p-values (with or without
corrections, that can be set by the argument adjustment in testInteractions).
10
A more sensible strategy consists in deﬁning a small number of meaningful contrasts that can be
of interest for the researcher. For instance, we might be interested in knowing the eﬀect of crossing
the following contrasts for each factor
1. For therapy: controls vs. any therapy, and one therapy vs. the other.
2. For medication: placebo vs. any real dose, the minimum dose vs. the maximum, and the
medium dose vs. the average of all doses.
The function testInteractions also allows to deﬁne such custom contrasts, via the argument
custom. This argument must be a named list of matrices, one per factor, with the vectors of coeﬃcients
that deﬁne the contrasts arranged in columns. The auxiliary function contrastCoefficients provides
a convenient interface to generate that list, from symbolic formulas that represent the contrasts:5
> (custom.contr <- contrastCoefficients(
+ therapy ~ control - (T1 + T2)/2, # Control vs. both therapies
+ therapy ~ T1 - T2, # Therapy T1 vs. T2
+ medication ~ placebo - (D1 + D2 + D3)/3, # Placebo vs. all doses
+ medication ~ D1 - D3, # Min. dose vs. max
+ medication ~ D2 - (D1 + D2 + D3)/3, # Med. dose vs. average
+ data=Boik, normalize=TRUE)) # Normalize to homogeinize the scale
$therapy
therapy therapy.1
control 0.8164966 0.0000000
T1 -0.4082483 0.7071068
T2 -0.4082483 -0.7071068
$medication
medication medication.1 medication.2
placebo 0.8660254 0.0000000 0.0000000
D1 -0.2886751 0.7071068 -0.4082483
D2 -0.2886751 0.0000000 0.8164966
D3 -0.2886751 -0.7071068 -0.4082483
Then use this list to deﬁne the contrasts in testInteractions (after renaming the columns of the
matrices for a clearer interpretation of the output):
> names(custom.contr$therapy) <- c("cntrl.vs.all", "T1.vs.T2")
> names(custom.contr$medication) <- c("plcb.vs.all", "D1.vs.D3", "D2.vs.avg")
> testInteractions(mod.boik,custom=custom.contr)
F Test:
P-value adjustment method: holm
Value Df Sum of Sq F Pr(>F)
cntrl.vs.all : plcb.vs.all -8.7671 1 461.17 24.0819 4.448e-05 ***
T1.vs.T2 : plcb.vs.all 7.1851 1 309.75 16.1749 0.0006558 ***
cntrl.vs.all : D1.vs.D3 -7.6217 1 348.54 18.2005 0.0003582 ***
T1.vs.T2 : D1.vs.D3 6.4007 1 245.81 12.8360 0.0020468 **
cntrl.vs.all : D2.vs.avg -1.3001 1 10.14 0.5296 0.9392219
T1.vs.T2 : D2.vs.avg -0.4044 1 0.98 0.0512 0.9392219
Residuals 60 1149.01
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
5These matrices are transformed combinations of Helmert and polynomial contrasts, so they could have been deﬁned
by the functions contr.helmert and contr.poly as well.
11
This table of results is much clearer than the former. Moreover, all these contrasts are orthogonal to
each other (none of them can be obtained by combination of the others), so the tests are independent,
and the adjustment of p-values is reliable. Taking some care about the meaning of positive and negative
ﬁgures of the Value column, we can obtain the following conclusions:
1. According to the ﬁrst two tests, the beneﬁt of taking medication (pooling over the three doses)
is greater if the subject also receives some therapy, and this eﬀect is specially marked for therapy
T1.
2. And according to the second two tests, the therapies interact in the same manner with the beneﬁt
of increasing the medication from the minimum to the maximum. On the other hand, we cannot
say that the therapy inﬂuences the diﬀerence between the medium dose and the average of all
doses.
6 Multivariate models for repeated-measures
Repeated-measures experiments are common in many disciplines, including psychology and agriculture,
although in the latter they are usually found with the speciﬁc strucutre and name of “split-plot”
designs. The classical approach for analysing this kind of experiments is via multi-strata ANOVA or
univariate mixed-eﬀects models, where the subjects or plots are introduced as factors with random
eﬀects, added to the error term (see section 9). However, when the design is balanced and adequately
sized, the multivariate approach is recommended if it possible, since it does not depend on the sphericity
assumption and the results are more robust [18].
An example of such an analysis in R is published in the web appendices to Fox’s and Weisberg’s
R Companion to Applied Regression [19]. We will use that example, based on the OBrienKaiser data
set from car [20].
> some(OBrienKaiser, 6)
treatment gender pre.1 pre.2 pre.3 pre.4 pre.5 post.1 post.2 post.3 post.4
1 control M 1 2 4 2 1 3 2 5 3
3 control M 5 6 5 7 7 4 5 7 5
4 control F 5 4 7 5 4 2 2 3 5
6 A M 7 8 7 9 9 9 9 10 8
12 B M 6 7 8 6 3 9 10 11 9
15 B F 2 2 3 4 4 6 6 7 9
post.5 fup.1 fup.2 fup.3 fup.4 fup.5
1 2 2 3 2 4 4
3 4 7 6 9 7 6
4 3 4 4 5 3 4
6 9 9 10 11 9 6
12 6 8 7 10 8 7
15 7 7 7 8 6 7
That data set has 16 rows with observations of people classiﬁed by two between-subjects factor
(gender, with levels F, M; and treatment, with levels control, A, and B), so that each subject has
15 measures distributed in columns. These 15 columns correspond to 5 consecutive observations in 3
diﬀerent phases (pre-test, post-test, and follow-up); this within-subjects model may be coded in a data
frame with two crossed factors:
> (idata <- expand.grid(hour=ordered(1:5), phase=c("pre", "post","fup")))
hour phase
1 1 pre
2 2 pre
3 3 pre
12
4 4 pre
5 5 pre
6 1 post
7 2 post
8 3 post
9 4 post
10 5 post
11 1 fup
12 2 fup
13 3 fup
14 4 fup
15 5 fup
The between-subjects factor treatment, however, is not balanced in this case, as can be seen in
the following frequency table:
> addmargins(table(OBrienKaiser[c("gender","treatment")]))
treatment
gender control A B Sum
F 2 2 4 8
M 3 2 3 8
Sum 5 4 7 16
We skip the exploration of the data that is already done in [19], and proceed to deﬁning the
multivariate model.
> mod.ok <- lm(cbind(pre.1, pre.2, pre.3, pre.4, pre.5,
+ post.1, post.2, post.3, post.4, post.5,
+ fup.1, fup.2, fup.3, fup.4, fup.5) ~ treatment*gender,
+ data=OBrienKaiser)
The multivariate ANOVA with response transformation for repeated measures may be done with
the function Anova in car, using the auxiliary data frame idata, and the formula idesign with the
within-subjects design. For the sake of coherence with the published example, we report a type-III
test.
> Anova(mod.ok, idata=idata, idesign=~phase*hour, type=3)
Type III Repeated Measures MANOVA Tests: Pillai test statistic
Df test stat approx F num Df den Df Pr(>F)
(Intercept) 1 0.96736 296.389 1 10 9.241e-09 ***
treatment 2 0.44075 3.940 2 10 0.0547069 .
gender 1 0.26789 3.659 1 10 0.0848003 .
treatment:gender 2 0.36350 2.855 2 10 0.1044692
phase 1 0.81363 19.645 2 9 0.0005208 ***
treatment:phase 2 0.69621 2.670 4 20 0.0621085 .
gender:phase 1 0.06614 0.319 2 9 0.7349696
treatment:gender:phase 2 0.31060 0.919 4 20 0.4721498
hour 1 0.93286 24.315 4 7 0.0003345 ***
treatment:hour 2 0.31634 0.376 8 16 0.9183275
gender:hour 1 0.33922 0.898 4 7 0.5129764
treatment:gender:hour 2 0.57022 0.798 8 16 0.6131884
phase:hour 1 0.56043 0.478 8 3 0.8202673
treatment:phase:hour 2 0.66238 0.248 16 8 0.9915531
gender:phase:hour 1 0.71151 0.925 8 3 0.5894907
13
q
q
q
pre post fup
4.04.55.05.56.06.57.0 q
treatment
control
A
B
phase
adjusted mean
Figure 4: Means of the treatment:phase interaction for the O’Brien and Kaiser model
treatment:gender:phase:hour 2 0.79277 0.328 16 8 0.9723693
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Besides the intercept, the only signiﬁcant eﬀects at α = 0.05 are the main eﬀects of phase and
hour. Nevertheless, let us suppose that we have reasons to be more liberal, and want to investigate
the interaction treatment:phase that is near the α level of signiﬁcance. (The main eﬀect treatment
is also near that level, but we may ignore it since we will focus on its interaction with another factor).
First we may explore and plot the cell means of this interaction with interactionMeans, using the
auxiliary data frame idata to specify the within-subjects model (idesign is not needed). We will
only plot the panel with phase in the X -axis and diﬀerent lines for each treatment, using the optional
arguments atx and traces:
> ok.means <- interactionMeans(mod.ok, c("hour","treatment","phase"), idata=idata)
> plot(ok.means, atx="phase", traces="treatment")
The plot of ﬁgure 4 shows that in the post-test and follow-up phases, the response of the control
group more or less remains at the same level as in the pre-test phase, whereas the response for the
other treatments increases. However, this plot does not show the conﬁdence intervals for those means,
so we need to test those diﬀerences to know if they are signiﬁcant. An analysis of all the possible
interaction pairwise contratsts between treatment and phase gives the following result:
> testInteractions(mod.ok, pairwise=c("treatment", "phase"), idata=idata)
Multivariate Test: Pillai test statistic
P-value adjustment method: holm
Value Df test stat approx F num Df den Df Pr(>F)
control-A : pre-post 1.66667 1 0.20106 2.5165 1 10 0.862456
control-B : pre-post 2.62500 1 0.44686 8.0786 1 10 0.132872
A-B : pre-post 0.95833 1 0.08787 0.9634 1 10 1.000000
14
control-A : pre-fup 2.16667 1 0.45202 8.2489 1 10 0.132872
control-B : pre-fup 3.04167 1 0.67782 21.0383 1 10 0.009002 **
A-B : pre-fup 0.87500 1 0.13478 1.5578 1 10 1.000000
control-A : post-fup 0.50000 1 0.04467 0.4675 1 10 1.000000
control-B : post-fup 0.41667 1 0.04032 0.4202 1 10 1.000000
A-B : post-fup -0.08333 1 0.00150 0.0150 1 10 1.000000
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Although the interaction contrasts result in one-dimensional values, we have trasformated a multivariate
response to obtain them, so the rows of this table report Pillai tests of MANOVA. These tests
show that the only signiﬁcant contrast is between the pre-test and follow-up phases, when compared
between the control group and treatment B. Given the similarity betwen the means of treatments A and
B, we could have expected a signiﬁcant diﬀerence between controls and treatment A as well, but the
test does not reject the null hypothesis in that case, because the number of observations for treatment
A is lower, and therefore the size of the eﬀect is relatively smaller.
The main eﬀect of hour could be analysed similarly. As told above, there are other packages
that handle main eﬀects eﬃciently; however, the most typical procedures are designed for univariate
analyses, and the recommended solution for post-hoc of this kind of multivariate models is using
linearHypothesis from car [21, 22]. The function, testInteractions actually is wrapper for linearHypothesis,
that does the required operations with a convenient user interface.
There are other possible ways of analysing this interaction, using other options of testInteractions,
or by other methods as proposed by Keselman [18]. The reader is encouraged to try these
alternatives.
7 Linear models with covariates
Data sets may include numeric predictors, as mentioned in section 2. When combined with factors,
they are called covariates and ANOVA is called ANCOVA (Analysis of Covariance). The analysis of
interactions that involve these variables is also diﬀerent. A factorial model has a ﬁnite number of factor
combinations where the adjusted mean of the response can be evaluated, but the possible values of a
covariate are inﬁnite. Therefore, the eﬀects of covariates are usually represented as continuous functions
within the range of their observed values (the model may allow the calculation of eﬀects beyond that
range, but such predictions would normally have little reliability). For instance, the functions effect
and allEffects from the package eﬀects evaluate the adjusted mean of the response at a sample of
values of the covariates, either automatically calculated by the function or speciﬁed by the user.
We have seen that in factorial models, the eﬀects of factors may be analysed by the contrasts
between their levels. Obviously, the number of “contrasts” between possible values of a covariate would
be inﬁnite, although they are constrained by the d.o.f. of the model. Let us take a pure linear regression
model without factors, and two covariates that do not interact:
Yi = β0 + β1X1i + β2X2i + εi (4)
For the eﬀect of X1, there are inﬁnite pairs of values X1a = X1b at which we could estimate such
“contrasts”, but the expected value of the result would always be proportional to the diﬀerence between
X1a and X1b. The ratio between both diﬀerences would be equal to the derivative of E (Y ) with respect
to X1, which is equal to the model coeﬃcient for X1:
∆E (Y )
∆X1
=
∂E (Y )
∂X1
= β1 (5)
Thus, when covariates do not interact, their eﬀects can just be described by the values of their
corresponding model coeﬃcients. They are a measure of the “slope” along the covariate, or the increment
in the expected value of the response, when the covariate increases in one unit. Thus, if the
researcher also wants the adjusted value of the response for diﬀerent values of the covariate, as given
15
by the functions of eﬀects, the only additional information that he or she needs is the adjusted mean
at an arbitrary point.
The functions of the package phia can report the values of those slopes. Let us take the model for
the prestige of Canadian occupations, deﬁned in [11], p. 165. That model uses the data set Prestige
from car, that contains several variables related to 102 diﬀerent occupations:
> str(Prestige)
'data.frame': 102 obs. of 6 variables:
$ education: num 13.1 12.3 12.8 11.4 14.6 ...
$ income : int 12351 25879 9271 8865 8403 11030 8258 14163 11377 11023 ...
$ women : num 11.16 4.02 15.7 9.11 11.68 ...
$ prestige : num 68.8 69.1 63.4 56.8 73.5 77.6 72.6 78.1 73.1 68.8 ...
$ census : int 1113 1130 1171 1175 2111 2113 2133 2141 2143 2153 ...
$ type : Factor w/ 3 levels "bc","prof","wc": 2 2 2 2 2 2 2 2 2 2 ...
In this model, the prestige score of each profession (prestige) is ﬁtted to a linear model that
depends on education (average years of education), income (average income), and the factor type,
that has three levels: bc (blue collar), prof (professional), and wc (white collar). The two former
variables are continuous covariates (income transformed to logarithmic scale to improve the normality
of the residuals), with diﬀerent responses for the three types of ocupation (an interaction with type).
We ﬁt this model and do an ANOVA of it:
> mod.prestige <- lm(prestige ~ (log2(income)+education)*type, data=Prestige)
> Anova(mod.prestige)
Anova Table (Type II tests)
Response: prestige
Sum Sq Df F value Pr(>F)
log2(income) 1690.8 1 41.1670 6.589e-09 ***
education 1209.3 1 29.4446 4.912e-07 ***
type 469.1 2 5.7103 0.004642 **
log2(income):type 290.3 2 3.5344 0.033338 *
education:type 178.8 2 2.1762 0.119474
Residuals 3655.4 89
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
This analysis detects signiﬁcant main eﬀects of all the predictors, plus a signiﬁcant interaction
between log2(income) and the factor type. This interaction can explored with interactionMeans
(see ﬁgure 5) and tested with testInteractions, by just giving the name of the relevant covariate(s)
in the argument slope.
> plot(interactionMeans(mod.prestige, "type", slope="log2(income)"))
> testInteractions(mod.prestige, pairwise="type", slope="log2(income)")
Adjusted slope for log2(income)
F Test:
P-value adjustment method: holm
Value Df Sum of Sq F Pr(>F)
bc-prof 6.5356 1 256.2 6.2381 0.04302 *
bc-wc 5.6530 1 140.9 3.4311 0.13459
prof-wc -0.8825 1 3.3 0.0808 0.77684
Residuals 89 3655.4
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
16
bc prof wc
567891011
type
log2(income)
Figure 5: Plot of the interaction log2(income):type
The plot and ANOVA table show the adjusted values of the slope with respect to log2(income)
(instead of the adjusted mean of the response), for the levels and contrasts of the factor type. That
slope, i.e. the proportional relation between the occupation’s income and its prestige, is greater for
blue collar occupations, whereas the diﬀerence is smaller between the other two types. However, the
tests only reveal signiﬁcant diﬀerences between blue collar and professionals, due to the unbalancedness
of data. We can see in the ANOVA table that although the the adjusted slope of bc-prof is relatively
similar to bc-wc, the sums of squares are far greater in the former case, since there are many more
cases of prof:
> table(Prestige$type) # Frequencies of occupation types
bc prof wc
44 31 23
If there had been a signiﬁcant interaction between the covariate education and type, we could
have analysed it independently, since both covariates have an additive eﬀect (they do not interact).
Things may become more complicated if there are interactions between covariates. In that case, the
slopes are not constant for given combinations of factors, but they are a function of the interacting
covariates. For instance, in a regression with two interacting variables:
Yi = β0 + β1X1i + β2X2i + β12X1X2 + εi, (6)
the slope with respect to X1 is (β1 + β12X2), and the slope for X2 is (β2 + β12X1).
This means that the results of previous calculations may depend on the values of the other covariates.
For instance, let us take another model for the Prestige data, where we consider that
log2(income) and education can interact (the inﬂuence of the occupation’s income may vary with
the average years of education), and we discard the data of white collar occupations to have a more
balanced design and simpify things.
> mod.prestige2 <- update(mod.prestige, formula=.~.+(log2(income):education)*type,
+ subset=(Prestige$type!="wc"))
> Anova(mod.prestige2)
17
Anova Table (Type II tests)
Response: prestige
Sum Sq Df F value Pr(>F)
log2(income) 1525.16 1 39.6299 2.702e-08 ***
education 719.35 1 18.6918 5.221e-05 ***
type 244.25 1 6.3466 0.01415 *
log2(income):type 194.83 1 5.0625 0.02774 *
education:type 3.80 1 0.0988 0.75424
log2(income):education 57.78 1 1.5013 0.22476
log2(income):education:type 3.30 1 0.0858 0.77048
Residuals 2578.49 67
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The signiﬁcant eﬀects shown by the ANOVA table are the same as for the previous model, including
the interaction log2(income):type. And since type now has only two levels, the post-hoc test of that
interaction gives virtually the same result as that table:
> testInteractions(mod.prestige2, pairwise="type", slope="log2(income)")
Adjusted slope for log2(income)
F Test:
P-value adjustment method: holm
Value Df Sum of Sq F Pr(>F)
bc-prof 12.721 1 194.19 5.0458 0.02799 *
Residuals 67 2578.49
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
However, the algorithm used by this test (the function testFactors) sets the covariates at deﬁnite
values, and the interaction between covariates considered by the model makes the test sensitive to
those values. The default values of the covariates are the averages of the cases observed in the model
ﬁt. Nevertheless, the user may choose other arbitrary values, using the extra argument covariates
that is passed down to testFactors. This argument may contain a named numeric vector with the
custom values of any covariate. For instance, we might want to set education at its 75th quantile,
and see what happens with the interaction log2(income):type.
> # Look quantiles of the model frame (a subset of the original data)
> quantile(model.frame(mod.prestige2)$education)
0% 25% 50% 75% 100%
6.380 8.015 9.930 13.990 15.970
> testInteractions(mod.prestige2, pairwise="type", slope="log2(income)",
+ covariates=c(education=14))
Adjusted slope for log2(income)
F Test:
P-value adjustment method: holm
Value Df Sum of Sq F Pr(>F)
bc-prof 10.862 1 62.02 1.6114 0.2087
Residuals 67 2578.49
Altering the value of education has substantially changed the result of the test on a term that
contains log2(income), because of the interaction between both covariates. See, however, that this
18
interaction, and other terms that contain it, are not signiﬁcant according to the ANOVA of the model.
So it could be wise to simplify the model (turning back to the original one), and remove this needlessly
problematic interaction.
Now, if an interaction between covariates were really signiﬁcant, the interest of the researcher should
focus on it. The high-order interactions in linear models have a constant eﬀect at any combination of
the covariates, so the problem of the arbitrary values used by the tests disappears. If the argument
slope of interactionMeans or testInteractions has the names of two or more covariates of the
model, the calculations will be done on the values of the coeﬃcients related to those interactions (and
higher-order terms that might contain it). As commented on at the beginning of this document, these
coeﬃcients may be more diﬃcult to understand, since the “slope” along a product of covariates may
seem meaningless, but it can just be interpreted as an extension of the analogy previously used between
factor contrasts and derivatives with respect to covariates.
We have already seen that when two factors interact, their eﬀect can be evaluated by means of
“contrasts between contrasts”. Likewise, when two covariates interact, we can study a “derivative of
the derivative”, i.e. the second-order partial derivative of the response, which is represented by the
interaction coeﬃcient. For the regression model of equation (6):
∂2
E (Y )
∂X1∂X2
= β12 (7)
This interpretation can be extended to third-order or higher interactions between covariates as
well, although models with that complexity are rarer, and the very meaning of such interactions will
probably more diﬃcult to interpret than the coeﬃcients used for representing them.
8 Generalized linear models
Generalized linear models (GLM) are very much like classical (Gaussian) linear models in most aspects,
let aside the distribution of the error term and the expected value of the response, µ = E (Y ), that is
related to the linear predictor by means of a link function η (µ). Accordingly, the interactions of such
models can be analysed by the methods explained in previous sections, although the interpretaton of
adjusted values in cell means and contrasts may be a bit more involved.
Let us take the example of the AMS survey about PhDs in mathematical sciences. This example
uses the data set AMSsurvey of the package car, that contains a cross-classiﬁcation of all the PhDs
awarded in the mathematical sciences for diﬀerent periods in US, assigned to 24 diﬀerent categories
depending on various characteristics of the doctorate students [23].
> str(AMSsurvey)
'data.frame': 24 obs. of 5 variables:
$ type : Factor w/ 6 levels "I(Pr)","I(Pu)",..: 2 2 1 1 3 3 4 4 5 5 ...
$ sex : Factor w/ 2 levels "Female","Male": 2 1 2 1 2 1 2 1 2 1 ...
$ citizen: Factor w/ 2 levels "Non-US","US": 2 2 2 2 2 2 2 2 2 2 ...
$ count : int 132 35 87 20 96 47 47 32 71 54 ...
$ count11: int 148 40 63 22 161 53 71 28 89 55 ...
The data are structured in a data frame, with one row per category. The variable counts tells the
number of PhDs of each category in 2008-2009, and the previous variables are factors that identify
the category.6
One of these factors is type, the type of institution the doctorate was aﬃliated with,
and has 6 levels: I(Pr), I(Pu), II, III, IV, and Va; I to III are math departments in universities of
progressively lower-quality — with private and public institutions distinguished by the parenthetical
abbreviations, IV are statistics and biostatistics departments, and Va are applied mathematics departments.
The other factors are sex (the gender of the doctorate, a factor with levels Female and Male),
and citizen (the citizenship status, a factor with levels Non-US and US). The contents of the data
frame are clearer if shown as a contingency table:
6AMSsurvey also contains data of other periods, but we will focus on the period 2008-2009. The analysis will be based
on a parsimonious model for those data, which is already ﬁtted in [11, pp. 253–255].
19
> ftable(xtabs(count ~ sex + citizen + type, AMSsurvey))
type I(Pr) I(Pu) II III IV Va
sex citizen
Female Non-US 25 29 50 39 105 12
US 20 35 47 32 54 14
Male Non-US 79 130 89 53 122 28
US 87 132 96 47 71 34
A saturated model of these data would represent count as a function of the three-way interaction
of type, sex, and citizen. But the inﬂuences of sex and citizen are independent to each other, so
we can use a simpliﬁed model [11]. Since we are dealing with count data, the appropriate family would
be a Poisson distribution. All in all, the model could be deﬁned as follows:
> mod.ams <- glm(count ~ type*(sex+citizen), family=poisson, data=AMSsurvey)
The ANOVA of this model conﬁrms that the high-order interaction terms that remain are signiﬁ-
cant:
> Anova(mod.ams)
Analysis of Deviance Table (Type II tests)
Response: count
LR Chisq Df Pr(>Chisq)
type 233.336 5 < 2.2e-16 ***
sex 182.983 1 < 2.2e-16 ***
citizen 5.923 1 0.01494 *
type:sex 71.169 5 5.851e-14 ***
type:citizen 26.075 5 8.628e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The terms of interest are the interactions of type with sex and citizen. We may have a look on
the adjusted means with interactionMeans, as in other models:
> ams.means <- interactionMeans(mod.ams)
> plot(ams.means, atx="type", traces=c("sex","citizen"))
The resulting plots (see ﬁgure 6) are very similar to the ones shown in previous section, with
the exception of the Y -axis scale, which is nonlinear in this case. The reason is that for GLM,
interactionMeans does not average over values of the response variable (the counts of awarded PhDs);
the calculations are actually done in the link function domain, and the plots are drawn in its scale,
although the resulting averages and the Y -axis labels are eventually transformed to show values of the
response. In this case the link function is η = log (µ); therefore, the adjusted means are geometric (not
arithmetic) means of the response variable,7
and the Y -axis is plotted in a logarithmic scale.
Nevertheless, the interpretation of the plot is not very aﬀected by this issue. The mean number of
male doctorates is higher in all institutions, but the diﬀerence seems larger in “ﬁrst-class” universities
(both public and private), and smaller in ‘third-class” universities or statistics departments (group IV).
On the other hand, the inﬂuence of US-citizenship seems negligible except for statistics departments,
where there are more foreign doctorates. The tables of testInteractions just conﬁrm that these
diﬀerences are signiﬁcant:
> testInteractions(mod.ams, pairwise=c("type","sex")) # test type:sex
7The adjusted link function is an arithmetic mean, but then: µ [ (ηi) /n] = exp [ (log µi) /n] = µ
1/n
i .
20
q
q
q
q
q
q
20406080120
q
sex
Female
Male
q
q q
q
q
q
I(Pr) I(Pu) II III IV Va
20406080100
q
citizen
Non−US
US
type
adjusted mean
Figure 6: Adjusted (geometric) means of the interactions in mod.ams
Chisq Test:
P-value adjustment method: holm
Value Df Chisq Pr(>Chisq)
I(Pr)-I(Pu) : Female-Male 1.10975 1 0.2274 1.0000000
I(Pr)-II : Female-Male 0.51702 1 9.8992 0.0165352 *
I(Pr)-III : Female-Male 0.38181 1 17.7152 0.0003079 ***
I(Pr)-IV : Female-Male 0.32905 1 31.1081 3.417e-07 ***
I(Pr)-Va : Female-Male 0.64643 1 2.2978 0.6430680
I(Pu)-II : Female-Male 0.46588 1 16.5948 0.0005090 ***
I(Pu)-III : Female-Male 0.34405 1 26.1549 4.096e-06 ***
I(Pu)-IV : Female-Male 0.29651 1 47.8081 7.051e-11 ***
I(Pu)-Va : Female-Male 0.58250 1 3.9450 0.3290761
II-III : Female-Male 0.73848 1 2.3092 0.6430680
II-IV : Female-Male 0.63644 1 7.5105 0.0552052 .
II-Va : Female-Male 1.25031 1 0.7098 1.0000000
III-IV : Female-Male 0.86182 1 0.6219 1.0000000
III-Va : Female-Male 1.69308 1 3.5240 0.3629141
IV-Va : Female-Male 1.96453 1 6.9022 0.0688734 .
Residuals 6
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> testInteractions(mod.ams, pairwise=c("type","citizen")) #test type:citizen
Chisq Test:
P-value adjustment method: holm
Value Df Chisq Pr(>Chisq)
I(Pr)-I(Pu) : Non-US-US 1.02087 1 0.0137 1.0000000
I(Pr)-II : Non-US-US 0.99993 1 0.0000 1.0000000
I(Pr)-III : Non-US-US 0.83462 1 0.7692 1.0000000
21
I(Pr)-IV : Non-US-US 0.53522 1 12.4565 0.0054149 **
I(Pr)-Va : Non-US-US 1.16636 1 0.3655 1.0000000
I(Pu)-II : Non-US-US 0.97949 1 0.0162 1.0000000
I(Pu)-III : Non-US-US 0.81756 1 1.1332 1.0000000
I(Pu)-IV : Non-US-US 0.52428 1 16.8929 0.0005932 ***
I(Pu)-Va : Non-US-US 1.14251 1 0.3055 1.0000000
II-III : Non-US-US 0.83468 1 0.8659 1.0000000
II-IV : Non-US-US 0.53526 1 14.6896 0.0017744 **
II-Va : Non-US-US 1.16643 1 0.3949 1.0000000
III-IV : Non-US-US 0.64128 1 5.4935 0.2099597
III-Va : Non-US-US 1.39747 1 1.6147 1.0000000
IV-Va : Non-US-US 2.17920 1 10.4188 0.0149688 *
Residuals 6
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Notice, however, that the interpretation of the column Value in these tables requires considering
the relation between the link function and the response variable. We have combined pairwise contrasts,
that result in diﬀerences in the domain of the link function; for instance if we focus on the interaction
contrast I(Pu)-II : Female-Male, the adjusted value of the link is:
ηI(P u),F − ηI(P u)M − (ηII,F − ηII,M ) (8)
But ηi,j = log (µi,j), therefore the previous equation is equivalent to:
log
µI(P u),F
µI(P u),M
÷
µII,F
µII,M
(9)
And if we go back to the response variable domain, the logarithm is cancelled, so we get that the
interaction contrasts is, in this case, a “ratio of ratios”, rather than a “diﬀerence of diﬀerences”. In
other words, the ﬁgure 0.47 in the column Value for the mentioned interaction contrast means that
the proportion of females vs. males in “ﬁrst-class” public universities is 0.47 times (less than half) the
proportion in “second-class” universities. For other families of GLM, the interpretation of the contrasts
in the domain of the response variable may be more obscure; so if desired, the argument link=TRUE
may be used in all the functions of phia to force the representation of the adjusted values in the
domain of the link function.
GLM may include covariates, and the analysis of interactions may involve terms that contain them.
In that case, the means and contrasts reported by interactionMeans or testInteractions are always
in the link function domain, since the response variable keeps no linear relation with the predictors at
all, and therefore there is no such thing as “slopes” of that variable, except in very local, diﬀerential
environments.
Anyway, if there is some reason to do it, it is possible to obtain the derivatives of the response
variable with respect to the covariates, via the “chain rule”. The slopes reported by the functions of
phia are derivatives of the link function, ∂η/∂Xi, and if model is the name of the object with the
ﬁtted GLM, we can obtain the derivative of the expected response with respect to the link, dµ/dη, as
a function:
> dm.de <- family(model)$mu.eta
Now, to get the derivative of µ with respect to the covariate Xi at a speciﬁc value X, we just have
to evaluate de.dm at that value — type eval(dm.de(X)), with the desired value bound to X —, and
then mutliply:
∂µ
∂Xi
(X) =
∂η
∂Xi
dµ
dη
(X) (10)
More generally, all the calculations that may be done with classical linear models can also be applied
to GLM, but it is necessary to consider that the model is deﬁned in terms of the link function, whereas
22
the outcomes are usually reported in units the response variable, and the calculations must be deﬁned
according to the domain where they operate.
9 Mixed-eﬀects models
9.1 Diﬀerences with ﬁxed-eﬀects models
Linear and generalized linear models assume error independency between observations, i.e. that there
is no correlation between the values of the residual error. However, this assumption may be violated in
many experiments, where the observations are clustered in groups associated to noise factors, e.g. sets
of measures taken in diﬀerent environments, with diﬀerent instruments, or in diﬀerent subjects. Such
factors are normally uninteresting for the purposes of the study, but their inﬂuence may nontheless be
signiﬁcant. This conﬂict is solved by including the inﬂuence of those factors in the model, so that (2)
is expanded as follows (simplifying the model to an univariate response):
Y
(n×1)
= X
(n×(r+1))
β
((r+1)×1)
+ Z
(n×s)
u
(s×1)
+ ε
(n×1)
, (11)
where the additional coeﬃcients u and their corresponding model matrix Z are related to the s groups
of observations associated to the noise factors (plus their possible interctions with other factors and
covariates).
Now, if it is possible to assume that the eﬀects of the noise factors (i.e. the coeﬃcients represented
in u) are random values from a normal distribution, the model may be simpliﬁed, so that instead of
ﬁtting the individual values of u, it is only necessary to ﬁt their covariance matrix Ψ, that is usually
constrained as a function of one or a few parameters. This is what is known as a mixed-eﬀects model
(with a combination of both “ﬁxed” and “random” eﬀects).8
These models may be analysed in R with
the functions included in the packages nlme [24] and lme4 [25]. In the following examples we will use
the latter, which is more ﬂexible and eﬃcient, although the procedure for analysing interactions is the
same in both cases.
The functions of phia for the post-hoc analysis of mixed-eﬀects models are used just like in the
case of linear and generalized linear models. Multivariate responses in mixed-eﬀects models are not
supported, however, although in some cases this may be worked around by a modiﬁcation of the data
structure, as will be shown in the examples. From a theoretical point of view, another diﬀerence is
that the tests are not exact, so there may be concerns about the reliability of the reported p-values,
as will be discussed in the examples as well.
Note, moreover, that the analysis is limited to the ﬁxed eﬀects, for good reasons. Random eﬀects
are normally noise that must be controlled, but are not interesting for the conclusions of the study,
and that is in fact why their eﬀect is generally simpliﬁed as a set of random values. Accordingly, the
parameters that are ﬁtted in the model are not the speciﬁc values of the random eﬀects, but their
covariances or other parameters that deﬁne their distribution. In other wordks: the model does not
really tell anything of those speciﬁc values, so we should not interrogate it about them.
Although the level of abstraction needed to understand this principle may be high in a complex
model, it may be better grasped in a trivial model, deﬁned just as the average of a set of random
values, plus the variance of the residual error. Doing a post-hoc about a random eﬀect in a mixedeﬀects
model is like asking in this case if two arbitrary observations are too far apart. This may be a
legitimate diagnostic question at best, but a positive answer would just mean that the chosen model
is inadequate for those data.
8The structure of the random part of the model is similar to the error term. In fact, in the limit case where all the
groups of random factors only contain one observation, Z is equivalent to the identity matrix, and the coeﬃcients u
cannot be distinguished from the residual errors ε. Obviously, in that case no special technique would be necessary for
ﬁtting the model, and the linear regression method would just do the work (although with an increased residual error).
It is also possible to work with generalized mixed-eﬀects models, where both the coeﬃcients of random eﬀects and the
residual error follow a non-normal distribution.
23
9.2 Fitting and analysing a mixed-eﬀects model
As an example to illustrate the post-hoc analysis of mixed-eﬀects models, we will use Snijders’ and
Bosker’s data with language achievements of 2287 high-school students [26], included in the package
nlme. Following one of the examples worked by Snijders and Bosker, we will study the scores of a
language test as a regression of the students’ IQ in verbal tasks and their socio-economical status (SES),
possibly inﬂuenced by their gender, and the average SES and IQ of their schoolmates. In addition, we
will explore the strategy to analyse repeated-measures with a mixed-eﬀects model, by comparing the
achievements in two repetitions of the language test: at the start and end of the school year. All in
all, the variables of interest in the original data frame bdf are:
langPRET: Score in the ﬁrst repetition of the test.
langPRET: Score in the second repetition of the test.
pupilNR: Student code.
IQ.ver.cen: Student IQ (school-centered).
ses: Student socio-economical status (SES).
sex: Student gender.
schoolNR: School code.
schoolSES: School average SES
avg.IQ.ver.cen: School average IQ.
We ﬁrst create a subset of the data frame with those variables, taken from the namespace of nlme
without attaching it (with the operator ::).9
We will also add meaningful labels to the sex factor,
and simplify some variable names for the sake of clarity.
> Snijders <- nlme::bdf[c("langPRET","langPOST", # Outcomes
+ "pupilNR", "IQ.ver.cen", "ses", "sex", # Student-related variables
+ "schoolNR", "schoolSES", "avg.IQ.ver.cen")] # School-related variables
> Snijders$sex <- factor(Snijders$sex, labels=c("F","M"))
> names(Snijders) <+
c("score.1","score.2","student","IQ","SES","sex","school","avgSES","avgIQ")
Following Snijders and Bosker, for a better model ﬁt we will also include quadratic components of
IQ around 0 (remember that the IQ variable is centered) [26, p. 113].
> Snijders$IQ2.pos <- with(Snijders, (IQ > 0)*IQ^2)
> Snijders$IQ2.neg <- with(Snijders, (IQ < 0)*IQ^2)
The model proposed by Snijders and Bosker was focused on the result of one of the tests (say
score.2, and contained the following ﬁxed terms: the crossed linear eﬀect of IQ and SES, both at
student level (IQ*ses) and at school level (avgIQ*avgSES), the quadratic eﬀects of IQ (IQ2.pos,
IQ2.neg), and the eﬀect of gender (sex). They also considered that the average scores and the linear
eﬀect of IQ could be grouped by school. Such a model could be ﬁtted as follows:
> library(lme4)
> form1 <- score.2 ~
+ IQ * SES + IQ2.pos + IQ2.neg + sex + avgIQ * avgSES + # Fixed part
+ (IQ | school) # Random part
> mod.snijders.1 <- lmer(form1, data=Snijders)
9Since the namespaces of nlme and lme4 partially overlap, it is not advisable to load both packages in the same
session.
24
Now, this is one of the cases where we cannot use the multivariate approach described in section 6
for analysing the eﬀect of repeating the test. We could attempt a similar analysis step-by-step, ﬁtting
two models with diﬀerent dependent variables, corresponding to the response transformations that are
used in the analysis of within-subjects eﬀects. Both models would have the same right-hand side of
the formula as form1 above. The dependent variable in one of them would be the average of score.1
and score.2; in the other one it would be their diﬀerence:
> form2.1 <- update(form1, (score.1+score.2)/2~.)
> form2.2 <- update(form1, (score.2-score.1)~.)
If we consider that score.1 and score.2 are two subsets of the same variable, distinguished by a
two-level factor (say repetition), the model ﬁtted with form2.1 would tell us the eﬀects of the terms
deﬁned in that formula on the pooled variable, whereas the model ﬁtted with form2.2 would deﬁne the
eﬀects of the interaction between repetition and those terms (i.e. repetition:IQ, repetition:SES,
etc.).
On the other hand, we may ﬁt a single model that accounts for all the terms that we want to
analyse with more ﬂexibility. For that, we ﬁrst have to transform the “wide” data frame with score.1
and score.2 as diﬀerent columns, into a “long” data frame with both of them in one column. We will
also need the repetition factor explicitly deﬁned in the data frame, and a variable that identiﬁes
each original row (the already existing variable student in the original frame can do that work). We
can get all this with the reshape function:
> Snijders.long <- reshape(Snijders, direction="long", idvar="student",
+ varying=list(c("score.1","score.2")), v.names="score", timevar="repetition")
> # The within-subjects factor must be coded as a factor
> Snijders.long$repetition <- as.factor(Snijders.long$repetition)
> # See the variables of the long data frame
> str(Snijders.long)
'data.frame': 4574 obs. of 11 variables:
$ student : Factor w/ 2287 levels "17001","17002",..: 1 2 3 4 5 6 7 8 9 10 ...
$ IQ : num 3.166 2.666 -2.334 -0.834 -3.834 ...
$ SES : num 23 10 15 23 10 10 23 10 13 15 ...
$ sex : Factor w/ 2 levels "F","M": 1 1 1 1 1 1 1 1 1 1 ...
$ school : Ord.factor w/ 131 levels "47"<"103"<"2"<..: 119 119 119 119 119 119 119 119 119 119 ..
$ avgSES : num 11 11 11 11 11 11 11 11 11 11 ...
$ avgIQ : Named num -1.51 -1.51 -1.51 -1.51 -1.51 ...
..- attr(*, "names")= chr "1" "1" "1" "1" ...
$ IQ2.pos : num 10.02 7.11 0 0 0 ...
$ IQ2.neg : num 0 0 5.448 0.696 14.7 ...
$ repetition: Factor w/ 2 levels "1","2": 1 1 1 1 1 1 1 1 1 1 ...
$ score : num 36 36 33 29 19 22 20 44 34 31 ...
- attr(*, "reshapeLong")=List of 4
..$ varying:List of 1
.. ..$ : chr "score.1" "score.2"
..$ v.names: chr "score"
..$ idvar : chr "student"
..$ timevar: chr "repetition"
Now we can use this data frame to ﬁt a model based on the formula form1, adding the interactions
of repetition with the terms we want (in the multivariate approach, all the interactions of betweensubjects
and within-subjects terms are always included). The price that is paid for that is the addition
of the random eﬀects of student, since now we have more than one observation for each value of this
factor.
We will consider that the repetition only interacts with the student-related terms. Regarding the
random eﬀects, the student factor can only inﬂuence the intercept term, since all the other terms of
25
the model are constant for the two measures of each student. Thus, the new formula and the model
ﬁtted with it will be:
> form3 <- score ~
+ repetition * (IQ * SES + IQ2.pos + IQ2.neg + sex) + # Student-related
+ avgIQ * avgSES + # School-related
+ (IQ | school) + (1 | student) # Random part
> mod.snijders.3 <- lmer(form3, data=Snijders.long)
> # See the main parameters of the model (ommit correlations table)
> print(mod.snijders.3, correlation=FALSE)
Linear mixed model fit by REML ['lmerMod']
Formula: score ~ repetition * (IQ * SES + IQ2.pos + IQ2.neg + sex) + avgIQ * avgSES + (IQ | scho
Data: Snijders.long
REML criterion at convergence: 28555.76
Random effects:
Groups Name Std.Dev. Corr
student (Intercept) 3.7060
school (Intercept) 1.8850
IQ 0.2058 -1.00
Residual 4.2474
Number of obs: 4574, groups: student, 2287; school, 131
Fixed Effects:
(Intercept) repetition2 IQ
31.85441 4.27322 2.65186
SES IQ2.pos IQ2.neg
0.09074 -0.13133 0.14604
sexM avgIQ avgSES
1.23928 1.64625 -0.04053
IQ:SES repetition2:IQ repetition2:SES
-0.01403 1.41339 0.06727
repetition2:IQ2.pos repetition2:IQ2.neg repetition2:sexM
-0.17917 0.16430 1.39074
avgIQ:avgSES repetition2:IQ:SES
-0.04550 -0.01065
As said above, the methods for analysing a model like this are similar to the ones used in ﬁxed-eﬀects
models. However, there are important caveats about the results that must be considered.
The main objective of many researchers in their using statistical techniques for data analysis is to
obtain p-values to accept or reject their hypothesis, but the calculation of those values in mixed-eﬀects
models is problematic. That is the reason why the table of coeﬃcients of a model ﬁtted by lmer does
not show p-values, as may be seen above, nor are they presented in the ANOVA table given by the
standard anova method [27].10
There are diﬀerent techniques for working around this problem, like the Kenward-Roger approximation
or parametric bootstrap, which are implemented in the package pbkrtest [28]. An alternative
approach is assuming that the covariances of the random part of the model are equal to the estimated
values, and then use standard tests [29]. The function Anova from car produces tables with p-values
based on Wald tests, following the latter assumption [30], and by extension, the same happens with
the test functions of the package phia, which are also dependent on the function linearHypohtesis
from car.
This being said, we can ﬁrst look what terms are signiﬁcant according to the Wald tests:
> Anova(mod.snijders.3)
10Mixed models ﬁtted with lme from the package nlme do show p-values, but this does not mean that they should
always be trusted.
26
Analysis of Deviance Table (Type II Wald chisquare tests)
Response: score
Chisq Df Pr(>Chisq)
repetition 2882.6756 1 < 2.2e-16 ***
IQ 479.2602 1 < 2.2e-16 ***
SES 110.2882 1 < 2.2e-16 ***
IQ2.pos 30.8978 1 2.720e-08 ***
IQ2.neg 45.3552 1 1.643e-11 ***
sex 89.5825 1 < 2.2e-16 ***
avgIQ 12.0599 1 0.0005152 ***
avgSES 0.9457 1 0.3308155
IQ:SES 13.7841 1 0.0002051 ***
repetition:IQ 52.3653 1 4.608e-13 ***
repetition:SES 28.8549 1 7.801e-08 ***
repetition:IQ2.pos 13.4414 1 0.0002461 ***
repetition:IQ2.neg 16.3300 1 5.322e-05 ***
repetition:sex 30.4621 1 3.405e-08 ***
avgIQ:avgSES 1.1707 1 0.2792549
repetition:IQ:SES 2.9505 1 0.0858503 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The ANOVA table shows that most terms of our model are signiﬁcant for the Wald test at α = 0.05.
The exceptions are the terms related to avgSES, and the second-order interaction of repetition with
IQ and SES.
All the remaining main and interaction eﬀects of factors and covariates could further be investigated,
as explained in the previous sections. Just as an example, let us focus on the only interaction between
factors: repetition:sex. We can see the table of average scores, and calculate the simple main eﬀects
and pairwise interactions:
> # Cell means
> interactionMeans(mod.snijders.3)
repetition sex adjusted mean
1 1 F 33.67415
2 2 F 39.82938
3 1 M 34.91343
4 2 M 42.45940
> # Simple effect of sex at each repetition
> testInteractions(mod.snijders.3, fixed="repetition", across="sex")
Chisq Test:
P-value adjustment method: holm
Value Df Chisq Pr(>Chisq)
1 -1.2393 1 26.638 2.454e-07 ***
2 -2.6300 1 119.973 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> # Pairwise interactions (default)
> testInteractions(mod.snijders.3)
Chisq Test:
P-value adjustment method: holm
27
Value Df Chisq Pr(>Chisq)
1-2 : F-M 1.3907 1 30.462 3.405e-08 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
These results show that female students had signiﬁcantly lower scores than boys, and that in the
second repetition that diﬀerence was two times the initial one. Note, however, that the size of the
diﬀerence is small in spite of the statistical signiﬁcance (gender diﬀerences were between 1 and 3 points,
for scores that were normally over 30 points).
That is something that often happens when models are ﬁtted to very large data bases, and that
reminds us that p-values must always be taken carefully and with a critical thought. In this case,
where we used thousands of observations, we may suspect that many of the eﬀects with low p-values
in the ANOVA table may be irrelevant. In fact, if we look at the coeﬃcients of the model (in the table
above) and the ranges of the associated covariates in the data frame, we can see that most eﬀects are
negligible in size when compared with the diﬀerences associated to repetition and the linear eﬀect
of IQ. This means that they are not relevant from a practical point of view, even though they may
contribute to a signiﬁcantly better ﬁt of the model.
Acknowledgements
I thank Professors John Fox and Michael Friendly, for their encouragement and advise in the creation
of the functions of phia.
References
[1] L. A. Marascuilo and J. R. Levin, “Appropriate post hoc comparisons for interaction and nested
hypotheses in analysis of variance designs: The elimination of type IV errors,” American Educational
Research Journal, vol. 7, no. 3, pp. 397–421, 1970.
[2] R. L. Rosnow and R. Rosenthal,“Deﬁnition and interpretation of interaction eﬀects,”Psychological
Bulletin, vol. 105, no. 1, pp. 143–146, 1989.
[3] R. L. Rosnow and R. Rosenthal, ““Some things you learn aren’t so”: Cohen’s paradox, Asch’s
paradigm, and the interpretation of interaction,” Psychological Science, vol. 6, no. 1, pp. 3–9,
1995.
[4] P. A. Games, “Type IV errors revisited.,” Psychological Bulletin, vol. 80, no. 4, pp. 304–307, 1973.
[5] J. R. Levin and L. A. Marascuilo, “Type IV errors and Games.,” Psychological Bulletin, vol. 80,
no. 4, pp. 308–309, 1973.
[6] D. L. Meyer, “Misinterpretation of interaction eﬀects: a reply to Rosnow and Rosenthal,” Psychological
Bulletin, vol. 110, no. 3, pp. 571–573; discussion 574–576, 1991.
[7] R. E. Petty, L. R. Fabrigar, D. T. Wegener, and J. R. Priester, “Understanding data when interactions
are present or hypothesized,” Psychological Science, vol. 7, no. 4, pp. 247–252, 1996.
[8] R. L. Rosnow and R. Rosenthal,“Contrasts and interactions redux: Five easy pieces,”Psychological
Science, vol. 7, no. 4, pp. 253–257, 1996.
[9] K. J. Ottenbacher, “Interpretation of interaction in factorial analysis of variance design,” Statistics
in Medicine, vol. 10, no. 10, pp. 1565–1571, 1991.
[10] A. Pardo, J. Garrido Garc´ıa, M. A. Ruiz, and R. San Mart´ın Castellanos, “La interacci´on entre
factores en el an´alisis de varianza: errores de interpretaci´on,” Psicothema, vol. 19, no. 2, pp. 343–
349, 2007.
28
[11] J. Fox and S. Weisberg, An R Companion to Applied Regression. Thousand Oaks CA: Sage,
second ed., 2011.
[12] T. Hothorn, F. Bretz, and P. Westfall, “Simultaneous inference in general parametric models,”
Biometrical Journal, vol. 50, no. 3, pp. 346–363, 2008.
[13] Y. Dodge, The concise encyclopedia of statistics. Springer, 2008.
[14] R. J. Boik, “Interactions, partial interactions, and interaction contrasts in the analysis of variance.,”
Psychological Bulletin, vol. 86, no. 5, pp. 1084–1089, 1979.
[15] J. A. Nelder, “A reformulation of linear models,” Journal of the Royal Statistical Society. Series
A (General), vol. 140, no. 1, pp. 48–77, 1977.
[16] J. Fox and J. Hong, “Eﬀect displays in R for multinomial and proportional-odds logit models:
Extensions to the eﬀects package,” Journal of Statistical Software, vol. 32, pp. 1–24, 10 2009.
[17] W. Venables, “Exegeses on linear models.” Paper presented to the S-PLUS User’s Conference
(Washington, DC, 8-9th October, 1998), 2000.
[18] H. J. Keselman, “Testing treatment eﬀects in repeated measures designs: an update for psychophysiological
researchers.,” Psychophysiology, vol. 35, no. 4, pp. 470–478, 1998.
[19] J. Fox and S. Weisberg, “Multivariate Linear Models in R. an appendix to An R Companion to
Applied Regression, second edition,” July 2011.
[20] R. G. O’Brien and M. K. Kaiser, “MANOVA method for analyzing repeated measures designs: an
extensive primer.,” Psychological Bulletin, vol. 97, no. 2, pp. 316–333, 1985.
[21] J. Fox, “Post-hoc after Anova() car package using linear.hypothesis() in a repeated measure analysis.”
R-help Archive, September 27 2007.
[22] J. Fox, “Multiple comparisons on anova.mlm object.” R-help Archive, April 17 2010.
[23] P. Phipps, J. W. Maxwell, and C. A. Rose, “2009 Annual Survey of the Mathematical Sciences in
the United States (ﬁrst report),” Notices of the AMS, vol. 57, no. 2, pp. 250–259, 2010. Supplementary
Table 4.
[24] J. Pinheiro, D. Bates, S. DebRoy, D. Sarkar, and R Core Team, nlme: Linear and Nonlinear
Mixed Eﬀects Models, 2012. R package version 3.1-105.
[25] D. Bates, M. Maechler, and B. Bolker, lme4: Linear mixed-eﬀects models using S4 classes, 2012.
R package version 0.999999-0.
[26] T. A. B. Snijders and R. J. Bosker, Multilevel Analysis: An Introduction to Basic and Advanced
Multilevel Modeling. Sage, 1999.
[27] D. Bates, “lmer, p-values and all that.” R-help Archives, May 19 2006.
[28] U. Halekoh and S. Højsgaard, pbkrtest: Parametric bootstrap and Kenward Roger based methods
for mixed model comparison, 2012. R package version 0.3-4.
[29] J. J. Faraway, Extending the Linear Model with R: generalized linear, mixed eﬀects and nonparametric
regression models. Boca Raton: Chapman & Hall / CRC, 2006.
[30] B. Bolker, “F and wald chi-square tests in mixed-eﬀects models.” R-sig-ME Archives, October 4
2011.
29