This lab illustrates methods for analyzing data collected in
experiments using a between-subjects, factorial design. The first
example analyzed data from a 2(A) x 2(B) factorial design that is stored
in the data frame df0
Load the data, inspect the data frame
df0
, and examine the design using xtabs
. The
data in each condition are plotted in Figure 1. Some R
commands that are useful for summarizing data from factorial designs can
be found here.
First, initialize R
, load the data, and then use
summary
and xtabs
to inspect the data and
confirm that the design is balanced with 10 participants per
condition.
options(contrasts=c('contr.sum','contr.poly'))
load(url("http://pnb.mcmaster.ca/bennett/psy710/datasets/rlab5-df0-2x2.rda"))
summary(df0) # A&B are factors; Y is numeric
# A B Y
# a1:20 b1:20 Min. : 60.92
# a2:20 b2:20 1st Qu.: 78.65
# Median : 99.17
# Mean :101.07
# 3rd Qu.:123.15
# Max. :175.35
xtabs(~A+B,data=df0) # balanced; n=10 per condition
# B
# A b1 b2
# a1 10 10
# a2 10 10
Next, we are going to plot the data in each condition. Note how I use
the formula Y~A*B
in boxplot
.
op <- par(no.readonly = T)
par(mar=c(5,5,5,2)+0.1, cex.axis=1.25, cex.lab=1.5,cex.main=1.5)
par(mfrow=c(1,1))
boxplot(Y~A*B,df0,main="A x B Cell Means")
Now we perform a 2(A) x 2(B) ANOVA. The ANOVA table is shown below. The main effect of A is significant, but the main effect of B is not. The AxB interaction also is significant.
# 2 x 2 ANOVA:
options(contrasts=c("contr.sum","contr.poly")) # IMPORTANT!
aov.01 <- aov(Y~A*B,data=df0)
# aov.01 <- aov(Y ~ 1 + A + B + A:B,data=df0)
summary(aov.01)
# Df Sum Sq Mean Sq F value Pr(>F)
# A 1 5096 5096 10.361 0.00273 **
# B 1 284 284 0.578 0.45207
# A:B 1 2803 2803 5.699 0.02235 *
# Residuals 36 17705 492
# ---
# Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
A
.ANOVA assumes that the residuals are distributed normally. We can
check this assumption with shapiro.test
.
shapiro.test(residuals(aov.01)) # check normality
#
# Shapiro-Wilk normality test
#
# data: residuals(aov.01)
# W = 0.98818, p-value = 0.9454
We also can evaluate the normality assumption with a qq
plot.
ANOVA also assumes that the residuals have constant variance across
conditions. We can check this assumption with
bartlett.test
. However, that command does not
accept a factorial formula of the form
Y~A+B+A:B
.
# bartlett.test(Y~A*B,data=df0) # DOES NOT WORK!"
Because bartlett.test
does not work with factorial model
formulae, we have to use interaction
create a new one-way
grouping variable that has one level for each condition, and then
re-submit the data to bartlett.test
. The test is not
significant, so we do not have any reason to reject the null hypothesis
that the variance is constant across groups. (N.B. Inspection of Figure
1 leads to the same conclusion).
df0$group <- interaction(df0$A,df0$B) # create new factor
levels(df0$group)
bartlett.test(Y~group,data=df0)
Our ANOVA found a main effect of A
which leads us to
reject the null hypothesis of no differences among the marginal means of
A
. To learn more about the pattern of differences
among marginal means, we often perform linear contrasts or pairwise
tests on the marginal means. In this example, such tests are not
necessary because A
only has two levels, and therefore main
effect of A
is unambiguous (i.e., the two levels differ
significantly). Also, an analysis of the main effect might be
premature and/or misleading because we found a significant
AxB
interaction, which implies that the main
effect of A
varies across the levels of
B
. In situations where the interaction is significant but
we are interested in variation among levels of A
, we
normally would decompose the interaction into simple main effects and
then look for patterns in the simple main effects. Despite these
concerns, I perform the tests anyway for illustrative purposes.
Here I analyze main effect of A
with pairwise Tukey
tests. Remember: it is not necessary to obtain a significant omnibus
\(F\) test before performing the Tukey
pairwise test.
# perform Tukey HSD test on levels of factor A
TukeyHSD(aov.01,which="A")
# with emmeans
library(emmeans)
aov.emm.A <- emmeans(aov.01,specs="A")
contrast(aov.emm.A,method="pairwise",adjust="Tukey")
A significant main effect also can be analyzed with a linear contrast.
levels(df0$A)
# [1] "a1" "a2"
myC <- c(-1,1)
contrasts(df0$A) <- cbind(myC)
aov.01c <- aov(Y~A*B,data=df0) # note formula
summary(aov.01c,split=list(A=list(myC=1)))
# Df Sum Sq Mean Sq F value Pr(>F)
# A 1 5096 5096 10.361 0.00273 **
# A: myC 1 5096 5096 10.361 0.00273 **
# B 1 284 284 0.578 0.45207
# A:B 1 2803 2803 5.699 0.02235 *
# A:B: myC 1 2803 2803 5.699 0.02235 *
# Residuals 36 17705 492
# ---
# Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The 2nd line in the anova table indicates that my contrast was
significant, so I reject the null hypothesis that the marginal means of
a1
and a2
are equal. Furthermore, the 5th line
in the anova table indicates that my contrast differed between
the levels of B
. Inspection of the anova table shows that
the results for my contrast are identical to the results for the main
effect of A
, and the results for my contast x
B
interaction are the same as the results for the
AxB
interaction. The reason the results are identical is
that factors A
and B
have two levels and one
degree-of-freedom, and therefore my contrasts are equivalent to the
omnibus F tests.
The next block of code illustrates how to perform the contrast with
emmeans
. Unlike the aov
command,
emmeans
does not automatically test the hypothesis that the
contrast varies across the levels of B
.
library(emmeans)
myC <- c(-1,1)
aov.emm.A <- emmeans(aov.01,specs="A")
contrast(aov.emm.A,method=list(myC),adjust="none")
The following code illustrates a simple way to visualize the
A x B
interaction.
# see: https://rpubs.com/tf_peterson/interactionplotDemo
par(mar=c(5,5,5,2)+0.1,
cex.axis=1.125,
cex.lab=1.35,
cex.main=1.5)
interaction.plot(x.factor=df0$A,
trace.factor=df0$B,
response=df0$Y,
type="b",
ylab="Y",
xlab="A",
trace.label="B",
main="AxB Interaction")
There are a variety of ways to analyze an interaction. First, we will
analyze the simple main effect of A
at each level
of B
. The first block of code uses aov
to
perform a 1-way ANOVA on subsets of the data and then recalculates the
\(F\) and \(p\) values using MS error and
degrees-of-freedom from the main ANOVA. The simple main effect of \(A\) at \(b_1\) is significant, but the simple main
effect of \(A\) at \(b_2\) is not.
# simple main effect using aov
# summary(aov.01)
MS.resid <- 492
df.resid <- 36
levels(df0$B)
aov.A_at_b1 <- aov(Y~A,data=subset(df0,B=="b1"))
aov.A_at_b2 <- aov(Y~A,data=subset(df0,B=="b2"))
summary(aov.A_at_b1)
summary(aov.A_at_b2)
# recalculate F & p values:
( F.recalc <- c(7728,170.1)/MS.resid ) # recalculate F's
( p.recalc <- 1-pf(F.recalc,df1=1,df2=df.resid)) # recalculate p's
( p.recalc < .05) # 1st simple main effect is significant
The same analysis can be performed with emmeans
. Note
that the \(F\) and \(p\) values correspond to the recalculated
values above. You can read more about how emmeans
analyzes
interactions here.
# simple main effect using emmeans
# https://cran.r-project.org/web/packages/emmeans/vignettes/interactions.html
library(emmeans)
aov.emm <- emmeans(aov.01,specs="A",by="B")
#aov.emm <- emmeans(aov.01,specs=~A+B) # yields the same result
joint_tests(aov.emm,by="B")
B
at each level of
A
.This section uses data in the data frame biofeed
. Load
the data into R
, inspect the variables, and perform a
factorial ANOVA that evaluates the effects of biofeedback
and drug
on score
. The interaction,
illustrated in Figure 4, is significant
load(url("http://pnb.mcmaster.ca/bennett/psy710/datasets/biofeed.rda"))
summary(biofeed)
# biofeedback drug score
# Bio :15 drugX:10 Min. :160.0
# None:15 drugY:10 1st Qu.:181.5
# drugZ:10 Median :196.5
# Mean :195.0
# 3rd Qu.:206.0
# Max. :228.0
bio.aov.01 <- aov(score~drug*biofeedback,data=biofeed)
summary(bio.aov.01)
# Df Sum Sq Mean Sq F value Pr(>F)
# drug 2 1882 941.0 5.210 0.01321 *
# biofeedback 1 1904 1904.0 10.542 0.00343 **
# drug:biofeedback 2 1248 624.0 3.455 0.04801 *
# Residuals 24 4335 180.6
# ---
# Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
In this section we analyze the interaction by evaluating an
interaction contrast. Let us suppose that we we
predicted that score
would be lower in the drug X condition
than the drug Y and Z conditions, and that this drug effect would be
greater in the biofeedback-present condition. This prediction can be
tested by evaluating a linear contrast that compares drug X to the
average of the means in the drug Y and Z conditions, and then seeing if
that contrast differs significantly between the two biofeedback
conditions. We perform that contrast next.
# create contrast weights
levels(biofeed$drug)
# [1] "drugX" "drugY" "drugZ"
cDrug <- c(-1,1/2,1/2) # drug X vs (Y + Z)/2
levels(biofeed$biofeedback)
# [1] "Bio" "None"
cBiofeed <- c(1,-1) # biofeedback vs none
# assign contrasts to factors:
contrasts(biofeed$drug) <- cbind(cDrug)
contrasts(biofeed$biofeedback) <- cbind(cBiofeed)
# perform ANOVA (again)
bio.aov.02 <- aov(score~drug*biofeedback,data=biofeed)
summary(bio.aov.02,split=list(drug=list(cDrug=1),biofeedback=list(cBiofeed=1)))
# Df Sum Sq Mean Sq F value Pr(>F)
# drug 2 1882 941.0 5.210 0.01321 *
# drug: cDrug 1 1837 1837.1 10.171 0.00394 **
# biofeedback 1 1904 1904.0 10.542 0.00343 **
# biofeedback: cBiofeed 1 1904 1904.0 10.542 0.00343 **
# drug:biofeedback 2 1248 624.0 3.455 0.04801 *
# drug:biofeedback: cDrug.cBiofeed 1 528 528.1 2.924 0.10019
# Residuals 24 4335 180.6
# ---
# Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The contrast under the main effect of drug
is
significant, which means that we reject the null hypothesis that the
mean score in drugX
equals the average of the mean scores
in the drugY
and drugZ
conditions. The
interaction contrast, listed under the drug:biofeedback
interation, is not significant. So we do not reject the null hypothesis
that the contrast between drugX
and the average of the
drugY
and drugZ
conditions is the same in the
two biofeedback condition.
To perform the interaction contrast with emmeans
we need
to create a new factor that contains one level for each condition, and
then apply appropriate contrast weights to the new factor. In other
words, we need to perform the contrast as though we used a one-way
design instead of a factorial design. First, we create the new factor
and assign contrast weights.
# create new factor with 1 level per condition:
biofeed$condition <- with(biofeed,interaction(drug,biofeedback))
levels(biofeed$condition)
# [1] "drugX.Bio" "drugY.Bio" "drugZ.Bio" "drugX.None" "drugY.None" "drugZ.None"
# create contrast weights
cDrug <- c(-1,1/2,1/2,-1,1/2,1/2) # X vs (Y + Z)/2
cBio <- c(1,1,1,-1,-1,-1) # Biofeedback vs None
cDrugXBio <- cDrug*cBio # interaction contrast
contrasts(biofeed$condition) <- cDrugXBio # assign contrast grouping factor
levels(biofeed$condition)
# [1] "drugX.Bio" "drugY.Bio" "drugZ.Bio" "drugX.None" "drugY.None" "drugZ.None"
cDrugXBio # c(-1, 1/2, 1/2) **MINUS** c(-1, 1/2, 1/2)
# [1] -1.0 0.5 0.5 1.0 -0.5 -0.5
Note that the weights for the interaction contrast equal the weights
for the comparison of drugX
to drugY
and
drugZ
in the biofeedback
condition, c(-1, 1/2,
1/2), MINUS the drug contrast in the none
condition, -1*c(-1, 1/2, 1/2) = c(1, -1/2, -1/2). Next we use those
weights to perform the contrast in emmeans
.
library(emmeans)
bio.aov.03 <- aov(score~condition,data=biofeed) # perform ANOVA
biofeed.emm <- emmeans(bio.aov.03,specs="condition") # create emmeans object
contrast(biofeed.emm,method=list(DxB=cDrugXBio),adjust="none") # evaluate contrast
Finally, we can also use aov
to perform this
contrast:
# perform ANOVA (again) note forumula!
#bio.aov.03 <- aov(score~condition,data=biofeed)
summary(bio.aov.03,split=list(condition=list(DrugXBio=1)))
This section uses data in the data frame df1
contains
data from a 2 (group
) by 3 (task
) factorial
design. Load the data into R and inspect the variables.
load(url("http://pnb.mcmaster.ca/bennett/psy710/datasets/rlab5-df1.rda"))
sapply(df1,class)
# group task Y
# "factor" "factor" "numeric"
xtabs(~group+task,data=df1)
# task
# group t1 t2 t3
# patient 10 10 10
# control 10 10 10
summary(df1)
# group task Y
# patient:30 t1:20 Min. : 40.87
# control:30 t2:20 1st Qu.: 83.70
# t3:20 Median : 97.22
# Mean : 99.61
# 3rd Qu.:109.94
# Max. :157.90
Plot the data with interaction.plot
.
Conduct an ANOVA to assess the main effects of A
and
B
and the AxB
interaction.
Evaluate the normality and constant-variance assumptions.
Plot the marginal means for group
and
task
.
Perform pairwise Tukey tests on the marginal means of
task
.
Use aov
to evaluate the hypotheses that i)
performance in task 1 does not differ from average performance in the
other two tasks; and ii) that this task contrast does not differ between
groups.