1 Load & Inspect Data

This lab illustrates methods for analyzing data collected in experiments using a between-subjects, factorial design. The first example analyzed data from a 2(A) x 2(B) factorial design that is stored in the data frame df0 Load the data, inspect the data frame df0, and examine the design using xtabs. The data in each condition are plotted in Figure 1. Some R commands that are useful for summarizing data from factorial designs can be found here.

First, initialize R, load the data, and then use summary and xtabs to inspect the data and confirm that the design is balanced with 10 participants per condition.

options(contrasts=c('contr.sum','contr.poly'))
load(url("http://pnb.mcmaster.ca/bennett/psy710/datasets/rlab5-df0-2x2.rda"))
summary(df0) # A&B are factors; Y is numeric
#   A       B            Y         
#  a1:20   b1:20   Min.   : 60.92  
#  a2:20   b2:20   1st Qu.: 78.65  
#                  Median : 99.17  
#                  Mean   :101.07  
#                  3rd Qu.:123.15  
#                  Max.   :175.35
xtabs(~A+B,data=df0) # balanced; n=10 per condition
#     B
# A    b1 b2
#   a1 10 10
#   a2 10 10

Next, we are going to plot the data in each condition. Note how I use the formula Y~A*B in boxplot.

op <- par(no.readonly = T)
par(mar=c(5,5,5,2)+0.1, cex.axis=1.25, cex.lab=1.5,cex.main=1.5)
par(mfrow=c(1,1))
boxplot(Y~A*B,df0,main="A x B Cell Means")
Figure 1. Data from the 2 x 2 design.

Figure 1. Data from the 2 x 2 design.

2 Perform ANOVA

Now we perform a 2(A) x 2(B) ANOVA. The ANOVA table is shown below. The main effect of A is significant, but the main effect of B is not. The AxB interaction also is significant.

# 2 x 2 ANOVA:
options(contrasts=c("contr.sum","contr.poly")) # IMPORTANT!
aov.01 <- aov(Y~A*B,data=df0)
# aov.01 <- aov(Y ~ 1 + A + B + A:B,data=df0)
summary(aov.01)
#             Df Sum Sq Mean Sq F value  Pr(>F)   
# A            1   5096    5096  10.361 0.00273 **
# B            1    284     284   0.578 0.45207   
# A:B          1   2803    2803   5.699 0.02235 * 
# Residuals   36  17705     492                   
# ---
# Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

2.1 Questions

  1. What null hypotheses are being evaluated by each line in the ANOVA table?
  1. Create a graph that illustrates the main effect of A.

3 Check Assumptions

3.1 Normality

ANOVA assumes that the residuals are distributed normally. We can check this assumption with shapiro.test.

shapiro.test(residuals(aov.01)) # check normality
# 
#   Shapiro-Wilk normality test
# 
# data:  residuals(aov.01)
# W = 0.98818, p-value = 0.9454

We also can evaluate the normality assumption with a qq plot.

3.2 Constant Variance

ANOVA also assumes that the residuals have constant variance across conditions. We can check this assumption with bartlett.test. However, that command does not accept a factorial formula of the form Y~A+B+A:B.

# bartlett.test(Y~A*B,data=df0) # DOES NOT WORK!"

Because bartlett.test does not work with factorial model formulae, we have to use interaction create a new one-way grouping variable that has one level for each condition, and then re-submit the data to bartlett.test. The test is not significant, so we do not have any reason to reject the null hypothesis that the variance is constant across groups. (N.B. Inspection of Figure 1 leads to the same conclusion).

df0$group <- interaction(df0$A,df0$B) # create new factor
levels(df0$group)
bartlett.test(Y~group,data=df0)

4 Analyzing a Main Effect

Our ANOVA found a main effect of A which leads us to reject the null hypothesis of no differences among the marginal means of A. To learn more about the pattern of differences among marginal means, we often perform linear contrasts or pairwise tests on the marginal means. In this example, such tests are not necessary because A only has two levels, and therefore main effect of A is unambiguous (i.e., the two levels differ significantly). Also, an analysis of the main effect might be premature and/or misleading because we found a significant AxB interaction, which implies that the main effect of A varies across the levels of B. In situations where the interaction is significant but we are interested in variation among levels of A, we normally would decompose the interaction into simple main effects and then look for patterns in the simple main effects. Despite these concerns, I perform the tests anyway for illustrative purposes.

4.1 Tukey HSD

Here I analyze main effect of A with pairwise Tukey tests. Remember: it is not necessary to obtain a significant omnibus \(F\) test before performing the Tukey pairwise test.

# perform Tukey HSD test on levels of factor A
TukeyHSD(aov.01,which="A") 
# with emmeans
library(emmeans)
aov.emm.A <- emmeans(aov.01,specs="A")
contrast(aov.emm.A,method="pairwise",adjust="Tukey")

4.2 Linear Contrasts

A significant main effect also can be analyzed with a linear contrast.

levels(df0$A)
# [1] "a1" "a2"
myC <- c(-1,1)
contrasts(df0$A) <- cbind(myC)
aov.01c <- aov(Y~A*B,data=df0) # note formula
summary(aov.01c,split=list(A=list(myC=1)))
#             Df Sum Sq Mean Sq F value  Pr(>F)   
# A            1   5096    5096  10.361 0.00273 **
#   A: myC     1   5096    5096  10.361 0.00273 **
# B            1    284     284   0.578 0.45207   
# A:B          1   2803    2803   5.699 0.02235 * 
#   A:B: myC   1   2803    2803   5.699 0.02235 * 
# Residuals   36  17705     492                   
# ---
# Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The 2nd line in the anova table indicates that my contrast was significant, so I reject the null hypothesis that the marginal means of a1 and a2 are equal. Furthermore, the 5th line in the anova table indicates that my contrast differed between the levels of B. Inspection of the anova table shows that the results for my contrast are identical to the results for the main effect of A, and the results for my contast x B interaction are the same as the results for the AxB interaction. The reason the results are identical is that factors A and B have two levels and one degree-of-freedom, and therefore my contrasts are equivalent to the omnibus F tests.

The next block of code illustrates how to perform the contrast with emmeans. Unlike the aov command, emmeans does not automatically test the hypothesis that the contrast varies across the levels of B.

library(emmeans)
myC <- c(-1,1)
aov.emm.A <- emmeans(aov.01,specs="A")
contrast(aov.emm.A,method=list(myC),adjust="none")

5 Plotting Interactions

The following code illustrates a simple way to visualize the A x B interaction.

# see: https://rpubs.com/tf_peterson/interactionplotDemo
par(mar=c(5,5,5,2)+0.1, 
    cex.axis=1.125, 
    cex.lab=1.35,
    cex.main=1.5)
interaction.plot(x.factor=df0$A,
                 trace.factor=df0$B,
                 response=df0$Y,
                 type="b",
                 ylab="Y",
                 xlab="A",
                 trace.label="B",
                 main="AxB Interaction")
Figure 3. A x B interaction.

Figure 3. A x B interaction.

6 Simple Main Eeffects

There are a variety of ways to analyze an interaction. First, we will analyze the simple main effect of A at each level of B. The first block of code uses aov to perform a 1-way ANOVA on subsets of the data and then recalculates the \(F\) and \(p\) values using MS error and degrees-of-freedom from the main ANOVA. The simple main effect of \(A\) at \(b_1\) is significant, but the simple main effect of \(A\) at \(b_2\) is not.

# simple main effect using aov
# summary(aov.01)
MS.resid <- 492
df.resid <- 36
levels(df0$B)
aov.A_at_b1 <- aov(Y~A,data=subset(df0,B=="b1"))
aov.A_at_b2 <- aov(Y~A,data=subset(df0,B=="b2"))
summary(aov.A_at_b1)
summary(aov.A_at_b2)
# recalculate F & p values:
( F.recalc <- c(7728,170.1)/MS.resid ) # recalculate F's
( p.recalc <- 1-pf(F.recalc,df1=1,df2=df.resid)) # recalculate p's
( p.recalc < .05) # 1st simple main effect is significant

The same analysis can be performed with emmeans. Note that the \(F\) and \(p\) values correspond to the recalculated values above. You can read more about how emmeans analyzes interactions here.

# simple main effect using emmeans
# https://cran.r-project.org/web/packages/emmeans/vignettes/interactions.html
library(emmeans)
aov.emm <- emmeans(aov.01,specs="A",by="B")
#aov.emm <- emmeans(aov.01,specs=~A+B) # yields the same result
joint_tests(aov.emm,by="B")

6.1 Question

  1. Evaluate the simple main effect of B at each level of A.

7 Interaction Contrasts

This section uses data in the data frame biofeed. Load the data into R, inspect the variables, and perform a factorial ANOVA that evaluates the effects of biofeedback and drug on score. The interaction, illustrated in Figure 4, is significant

load(url("http://pnb.mcmaster.ca/bennett/psy710/datasets/biofeed.rda"))
summary(biofeed)
#  biofeedback    drug        score      
#  Bio :15     drugX:10   Min.   :160.0  
#  None:15     drugY:10   1st Qu.:181.5  
#              drugZ:10   Median :196.5  
#                         Mean   :195.0  
#                         3rd Qu.:206.0  
#                         Max.   :228.0
bio.aov.01 <- aov(score~drug*biofeedback,data=biofeed)
summary(bio.aov.01)
#                  Df Sum Sq Mean Sq F value  Pr(>F)   
# drug              2   1882   941.0   5.210 0.01321 * 
# biofeedback       1   1904  1904.0  10.542 0.00343 **
# drug:biofeedback  2   1248   624.0   3.455 0.04801 * 
# Residuals        24   4335   180.6                   
# ---
# Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Figure 4. Drug x Biofeedback Interaction.

Figure 4. Drug x Biofeedback Interaction.

In this section we analyze the interaction by evaluating an interaction contrast. Let us suppose that we we predicted that score would be lower in the drug X condition than the drug Y and Z conditions, and that this drug effect would be greater in the biofeedback-present condition. This prediction can be tested by evaluating a linear contrast that compares drug X to the average of the means in the drug Y and Z conditions, and then seeing if that contrast differs significantly between the two biofeedback conditions. We perform that contrast next.

# create contrast weights
levels(biofeed$drug)
# [1] "drugX" "drugY" "drugZ"
cDrug <- c(-1,1/2,1/2) # drug X vs (Y + Z)/2
levels(biofeed$biofeedback)
# [1] "Bio"  "None"
cBiofeed <- c(1,-1) # biofeedback vs none

# assign contrasts to factors:
contrasts(biofeed$drug) <- cbind(cDrug)
contrasts(biofeed$biofeedback) <- cbind(cBiofeed)

# perform ANOVA (again)
bio.aov.02 <- aov(score~drug*biofeedback,data=biofeed)
summary(bio.aov.02,split=list(drug=list(cDrug=1),biofeedback=list(cBiofeed=1)))
#                                    Df Sum Sq Mean Sq F value  Pr(>F)   
# drug                                2   1882   941.0   5.210 0.01321 * 
#   drug: cDrug                       1   1837  1837.1  10.171 0.00394 **
# biofeedback                         1   1904  1904.0  10.542 0.00343 **
#   biofeedback: cBiofeed             1   1904  1904.0  10.542 0.00343 **
# drug:biofeedback                    2   1248   624.0   3.455 0.04801 * 
#   drug:biofeedback: cDrug.cBiofeed  1    528   528.1   2.924 0.10019   
# Residuals                          24   4335   180.6                   
# ---
# Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The contrast under the main effect of drug is significant, which means that we reject the null hypothesis that the mean score in drugX equals the average of the mean scores in the drugY and drugZ conditions. The interaction contrast, listed under the drug:biofeedback interation, is not significant. So we do not reject the null hypothesis that the contrast between drugX and the average of the drugY and drugZ conditions is the same in the two biofeedback condition.

To perform the interaction contrast with emmeans we need to create a new factor that contains one level for each condition, and then apply appropriate contrast weights to the new factor. In other words, we need to perform the contrast as though we used a one-way design instead of a factorial design. First, we create the new factor and assign contrast weights.

# create new factor with 1 level per condition:
biofeed$condition <- with(biofeed,interaction(drug,biofeedback))
levels(biofeed$condition)
# [1] "drugX.Bio"  "drugY.Bio"  "drugZ.Bio"  "drugX.None" "drugY.None" "drugZ.None"
# create contrast weights
cDrug <- c(-1,1/2,1/2,-1,1/2,1/2) # X vs (Y + Z)/2
cBio <- c(1,1,1,-1,-1,-1) # Biofeedback vs None
cDrugXBio <- cDrug*cBio # interaction contrast
contrasts(biofeed$condition) <- cDrugXBio # assign contrast grouping factor
levels(biofeed$condition)
# [1] "drugX.Bio"  "drugY.Bio"  "drugZ.Bio"  "drugX.None" "drugY.None" "drugZ.None"
cDrugXBio # c(-1, 1/2, 1/2) **MINUS** c(-1, 1/2, 1/2)
# [1] -1.0  0.5  0.5  1.0 -0.5 -0.5

Note that the weights for the interaction contrast equal the weights for the comparison of drugX to drugY and drugZ in the biofeedback condition, c(-1, 1/2, 1/2), MINUS the drug contrast in the none condition, -1*c(-1, 1/2, 1/2) = c(1, -1/2, -1/2). Next we use those weights to perform the contrast in emmeans.

library(emmeans)
bio.aov.03 <- aov(score~condition,data=biofeed) # perform ANOVA
biofeed.emm <- emmeans(bio.aov.03,specs="condition") # create emmeans object
contrast(biofeed.emm,method=list(DxB=cDrugXBio),adjust="none") # evaluate contrast

Finally, we can also use aov to perform this contrast:

# perform ANOVA (again) note forumula!
#bio.aov.03 <- aov(score~condition,data=biofeed)
summary(bio.aov.03,split=list(condition=list(DrugXBio=1)))

8 (2 X 3) factorial ANOVA

This section uses data in the data frame df1 contains data from a 2 (group) by 3 (task) factorial design. Load the data into R and inspect the variables.

load(url("http://pnb.mcmaster.ca/bennett/psy710/datasets/rlab5-df1.rda"))
sapply(df1,class)
#     group      task         Y 
#  "factor"  "factor" "numeric"
xtabs(~group+task,data=df1)
#          task
# group     t1 t2 t3
#   patient 10 10 10
#   control 10 10 10
summary(df1)
#      group    task          Y         
#  patient:30   t1:20   Min.   : 40.87  
#  control:30   t2:20   1st Qu.: 83.70  
#               t3:20   Median : 97.22  
#                       Mean   : 99.61  
#                       3rd Qu.:109.94  
#                       Max.   :157.90
  1. Plot the data with interaction.plot.

  2. Conduct an ANOVA to assess the main effects of A and B and the AxB interaction.

  3. Evaluate the normality and constant-variance assumptions.

  4. Plot the marginal means for group and task.

  5. Perform pairwise Tukey tests on the marginal means of task.

  6. Use aov to evaluate the hypotheses that i) performance in task 1 does not differ from average performance in the other two tasks; and ii) that this task contrast does not differ between groups.