Set R to use the sum-to-zero definition of effects and load the
runData
data frame with the following commands:
options(contrasts=c("contr.sum","contr.poly"))
load(url("http://pnb.mcmaster.ca/bennett/psy710/datasets/runData-2023.rda") )
The data frame rundata
contains data from a 3x3
between-subjects, factorial experiment that measured the time (in
seconds) to complete a 1.5-mile course. All runners were men who were
divided into 3 age groups and three fitness categories. The independent
variables were age
and fitness
and the
dependent variable was runtime
. The data frame also
contains a variable, id
, which is an id number assigned to
each subject.
age
and
fitness
on runtime.
What are the null
hypotheses for the main effects and interaction?run.aov.01 <- aov(runtime~age*fitness,data=runData)
summary(run.aov.01)
## Df Sum Sq Mean Sq F value Pr(>F)
## age 2 566371 283186 121.570 < 2e-16 ***
## fitness 2 550202 275101 118.099 < 2e-16 ***
## age:fitness 4 38689 9672 4.152 0.00602 **
## Residuals 45 104823 2329
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Answer: The null hypothesis for the two main effects
is that the marginal means (of age
and
fitness
) are equal. The null hypothesis for the interaction
is that the main effect of age
is the same across all
levels of fitness
, and the main effect of
fitness
is the same across all levels of age
.
The main effects of age
and fitness
are
significant, as is the age
x fitness
interaction. The significant main effects mean that we can reject the
null hypotheses that the marginal means of age
and
fitness
are the same. The significant interaction implies
that the effect of age
depends on the level of
fitness
(and vice versa) and therefore it might not make
sense to focus on main effects.
age
and
the main effect of fitness
.boxplot(runtime~age,data=runData,main="Main Effect of Age")
boxplot(runtime~fitness,data=runData,main="Main Effect of Fitness")
runData$condition <- interaction(runData$age,runData$fitness)
levels(runData$condition)
## [1] "a40.low" "b50.low" "c60.low" "a40.medium" "b50.medium" "c60.medium" "a40.high" "b50.high" "c60.high"
bartlett.test(runtime~condition,data=runData)
##
## Bartlett test of homogeneity of variances
##
## data: runtime by condition
## Bartlett's K-squared = 13.381, df = 8, p-value = 0.09939
Answer: The Bartlett test was not significant and therefore the null hypothesis of constant variance is not rejected.
shapiro.test(residuals(run.aov.01))
##
## Shapiro-Wilk normality test
##
## data: residuals(run.aov.01)
## W = 0.97205, p-value = 0.2369
Answer: The Shapiro test was not significant and therefore the null hypothesis of normality is not rejected. The qq plot also is consistent with the hypothesis that the residuals are distributed normally.
qqnorm(residuals(run.aov.01))
qqline(residuals(run.aov.01))
fitness
at each level of age
.fitness.a40.aov.01 <- aov(runtime~fitness,data=subset(runData,age=="a40"))
fitness.b50.aov.01 <- aov(runtime~fitness,data=subset(runData,age=="b50"))
fitness.c60.aov.01 <- aov(runtime~fitness,data=subset(runData,age=="c60"))
summary(fitness.a40.aov.01)
## Df Sum Sq Mean Sq F value Pr(>F)
## fitness 2 86695 43347 22.53 3.03e-05 ***
## Residuals 15 28854 1924
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(fitness.b50.aov.01)
## Df Sum Sq Mean Sq F value Pr(>F)
## fitness 2 225260 112630 79.2 1.07e-08 ***
## Residuals 15 21331 1422
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(fitness.c60.aov.01)
## Df Sum Sq Mean Sq F value Pr(>F)
## fitness 2 276935 138468 38.01 1.34e-06 ***
## Residuals 15 54638 3643
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# recalculate F & p using omnibus anova
# summary(run.aov.01)
MS.resid <- 2329 # from original ANOVA
df.resid <- 45 # from original ANOVA
MS.fitness <- c(43347,112630,138468)
( F.recalc <- MS.fitness/MS.resid ) # recalculate F
## [1] 18.61185 48.35981 59.45384
( p.recalc <- 1-pf(F.recalc,df1=2,df2=df.resid) ) # recalculate p
## [1] 1.287800e-06 6.167511e-12 2.338130e-13
p.recalc < .05 # all simple main effects are significant
## [1] TRUE TRUE TRUE
# simple main effect using emmeans:
library(emmeans)
run.emm.01 <- emmeans(run.aov.01,specs="fitness", by="age")
joint_tests(run.emm.01,by="age")
age
by performing a linear
contrast that evaluates the difference between mean runtime
in the b50
and c60
age groups, and determine
if the value of this linear contrast depends on the level of
fitness
.
levels(runData$age)
## [1] "a40" "b50" "c60"
w <- c(0,-1,1) # contrast weights
contrasts(runData$age) <- w
run.aov.02 <- aov(runtime~age*fitness,data=runData)
summary(run.aov.02,split=list(age=list(b50vsc60=1)))
## Df Sum Sq Mean Sq F value Pr(>F)
## age 2 566371 283186 121.570 < 2e-16 ***
## age: b50vsc60 1 311770 311770 133.841 4.46e-15 ***
## fitness 2 550202 275101 118.099 < 2e-16 ***
## age:fitness 4 38689 9672 4.152 0.00602 **
## age:fitness: b50vsc60 2 9061 4530 1.945 0.15484
## Residuals 45 104823 2329
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# using emmeans:
library(emmeans)
# contrast on marginal means:
contrast(emmeans(run.aov.02,specs="age"),method=list(w))
## contrast estimate SE df t.ratio p.value
## c(0, -1, 1) 186 16.1 45 11.569 <.0001
##
## Results are averaged over the levels of: fitness
# age x fitness interaction:
# run.emm.02 <- emmeans(run.aov.02,specs=~age|fitness) # same as next line
run.emm.02 <- emmeans(run.aov.02,specs="age", by="fitness")
con <- contrast(run.emm.02,method=list(w))
joint_tests(con) # contrast x beat interaction
Answer: The contrast listed under the main effect of
age
evaluates whether the difference between the marginal
means in the b50
and c60
age groups, averaged
across the fitness
conditions, differs from zero. The
contrast listed under the interaction evaluates the null hypothesis that
the difference between tthe b50
and c60
age
groups is the same at all levels of fitness
. The contrast
for the marginal means is significant, so we reject the null hypothesis
that the marginal means in the b50
and c60
age
groups are equal. The contrast under the interaction also is
significant, so we reject the null hypothesis that the difference
between the b50
and c60
age groups is equal at
all levels of fitness
.
b50
and c60
age groups in
the low
fitness conditions differs from the
difference between means in the b50
and c60
age groups averaged across the medium
and
high
fitness conditions. Design a set of contrast
weights that you could use to evaluate this hypothesis and then perform
the contrast.# with aov:
levels(runData$age)
## [1] "a40" "b50" "c60"
ageC <- c(0,-1,1)
levels(runData$fitness)
## [1] "low" "medium" "high"
fitC <- c(-1,1/2,1/2)
contrasts(runData$age) <- ageC
contrasts(runData$fitness) <- fitC
run.aov.03 <- aov(runtime~age*fitness,data=runData)
summary(run.aov.03,split=list(age=list(b50VSc60=1),fitness=list(lowVSmedhigh=1)))
## Df Sum Sq Mean Sq F value Pr(>F)
## age 2 566371 283186 121.570 < 2e-16 ***
## age: b50VSc60 1 311770 311770 133.841 4.46e-15 ***
## fitness 2 550202 275101 118.099 < 2e-16 ***
## fitness: lowVSmedhigh 1 428594 428594 183.993 < 2e-16 ***
## age:fitness 4 38689 9672 4.152 0.00602 **
## age:fitness: b50VSc60.lowVSmedhigh 1 5119 5119 2.197 0.14522
## Residuals 45 104823 2329
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# with emmeans
require(emmeans)
run.emm.03 <- emmeans(run.aov.03,specs=~age*fitness)
# emmip(run.emm.03,~age|fitness) # a nicer version of an interaction plot:
# list order of conditions in police.emm variable:
run.emm.03
## age fitness emmean SE df lower.CL upper.CL
## a40 low 701 19.7 45 662 741
## b50 low 806 19.7 45 766 845
## c60 low 1025 19.7 45 986 1065
## a40 medium 630 19.7 45 590 669
## b50 medium 683 19.7 45 644 723
## c60 medium 827 19.7 45 787 867
## a40 high 532 19.7 45 492 572
## b50 high 532 19.7 45 492 572
## c60 high 727 19.7 45 687 767
##
## Confidence level used: 0.95
# create contrast weights:
ageC <- c(0,-1,1,0,-1,1,0,-1,1)
fitC <- c(-1,-1,-1,1/2,1/2,1/2,1/2,1/2,1/2)
ageXfit <- ageC * fitC
# sum(ageXfit) # sums to zero
contrast(run.emm.03,method=list(c1=ageXfit))
## contrast estimate SE df t.ratio p.value
## c1 -50.6 34.1 45 -1.482 0.1452
# Here is another way of performing the contrast:
run.emm.04 <- emmeans(run.aov.03,specs="age",by="fitness")
ageC <- c(0,-1,1)
fitC <- c(-1,1/2,1/2)
contrast(run.emm.04,interaction=list(age=list(ageC),fitness=list(fitC)),by=NULL)
## age_custom fitness_custom estimate SE df t.ratio p.value
## c(0, -1, 1) c(-1, 0.5, 0.5) -50.6 34.1 45 -1.482 0.1452
Answer: My contrast evaluates the null hypothesis
that the difference between the b50
and c60
ages in the low fitness
condition is the same as the
average difference between the b50
and c60
ages in the medium
and high
fitness
conditions. The contrast is not significant, and therefore I do reject
the null hypothesis the difference between the two oldest groups in the
low fitness condition is the same as the average age difference in the
medium and high fitness conditions.
age
and fitness
? Why
or why not? Verify your answer by calculating the Type II and III sums
of squares.Answer: Yes, I would expect the Type II and III sums of squares to be equal because the design is balanced (and therefore the Type I, II, and III SS are identical). When the design is unbalanced then Type II and III sums of squares for the main effects are similar when the interaction is very small. In this case the interaction is associated with a significant portion of SS-total and therefore I would not expect Type II and III SS to be similar (if the design was unbalanced). The following code shows that Type II and III SS are identical.
xtabs(formula = ~age+fitness,data = runData)
## fitness
## age low medium high
## a40 6 6 6
## b50 6 6 6
## c60 6 6 6
library(car)
Anova(run.aov.01,type="2")
Anova(run.aov.01,type="3")