Initialize R with the following commands. The load
command loads three data frames – L4.df01
,
visRT
, and blood
– though we will not use
blood
in this lab.
options(digits=4,contrasts=c("contr.sum","contr.poly")) # IMPORTANT! [Use sum-to-zero constraint for effects]
load(file=url("http://pnb.mcmaster.ca/bennett/psy710/datasets/rlab4-data-2023.rda"))
In this section we will analyze data in the L4.df01
data
frame, which contains the data from an experiment that measured scores
in for groups of subjects. There are 12 observations per group.
with(L4.df01,tapply(score,group,length))
## 1 2 3 4
## 12 12 12 12
summary(L4.df01)
## group score
## Min. :1.00 Min. : 64.1
## 1st Qu.:1.75 1st Qu.: 88.8
## Median :2.50 Median :100.2
## Mean :2.50 Mean :101.1
## 3rd Qu.:3.25 3rd Qu.:112.8
## Max. :4.00 Max. :132.4
Use the variables group
and score
in
L4.df01
to evaluate the null hypothesis that the four group
means are equal. Based on the ANOVA table, what is your decision
regarding the null hypothesis?
What is wrong with the previous analysis? Hint: Look at the degrees of freedom.
# Note the degrees of freedom for group is 1, which is NOT correct.
# There are 4 groups so there should be 3 degrees of freedom.
# R has treated group as a numeric variable, not a factor
sapply(L4.df01,class)
## group score
## "numeric" "numeric"
The following code i) transforms group
into a factor;
and creates another factor condition
and adds it to our
data frame.
L4.df01$group <- factor(x=L4.df01$group,labels="g")
n <- 12
a <- 4
gnumber <- rep(x=1:a,each=n)
L4.df01$condition <- factor(x=gnumber,labels="c",ordered=FALSE)
class(L4.df01$condition)
## [1] "factor"
levels(L4.df01$condition)
## [1] "c1" "c2" "c3" "c4"
summary(L4.df01)
## group score condition
## g1:12 Min. : 64.1 c1:12
## g2:12 1st Qu.: 88.8 c2:12
## g3:12 Median :100.2 c3:12
## g4:12 Mean :101.1 c4:12
## 3rd Qu.:112.8
## Max. :132.4
Visualize the data by plotting box plots of score
for each condition
.
Now use aov
to evaluate the null hypothesis that all
of the group means are equal. How does the result differ from the
previous ANOVA?
Check the normality assumption.
Evaluate the null hypothesis of equal group means using an analysis that assumes the residuals are distributed normally but does not assume that the variances are equal across groups.
The following code shows how to perform all pairwise tests and adjust
the \(p\) values using the Tukey method
of adjustment. The confidence intervals listed by TukeyHSD
are adjusted, or simultaneous, confidence intervals. An individual 95%
CI includes the true population value 95% of the time (in the long run).
On the other hand, all of the adjusted 95% CIs will
(in the long run) contain the true population value. In other words, if
we repeated this experiment many times, all of the adjusted CIs would
contain their respective true values 95% of the time.
# Tukey HSD -----------
TukeyHSD(L4.aov.01,"group") # assumes equal variance
# using emmeans -------
library(emmeans)
lab4.emm <- emmeans(L4.aov.01,specs="group")
contrast(lab4.emm,method="pairwise",adjust="tukey")
Tukey HSD assumes equal variance across conditions, but Dunnett’s T3 test does not.
# Dunnett T3 Test (unequal variances) -------
library(PMCMRplus)
dunnettT3Test(score~group,data=L4.df01) # doesn't assume equal variance
Dunnett’s test is useful when you are comparing several groups to a single control group.
## Dunnett pairwise tests (to control group) ----------
# install.packages("DescTools")
library(DescTools)
levels(L4.df01$group)
DunnettTest(score~group,data=L4.df01,control="g4")
In this section we illustrate how to use linear contrasts/comparisons
to analyze the data in L4.df01
. First we create three
contrasts, demonstrate that they are a complete orthogonal
set, and assign them to our factor condition
.
# create contrasts ---------
c1 <- c(-1,-1,1,1)
c2 <- c(-1,1,0,0)
c3 <- c(0,0,-1,1)
# store 3 contrasts in one matrix:
( cMat <- cbind(c1,c2,c3) )
# make sure contrast weights sum to zero
apply(cMat,MARGIN=2,FUN=sum) # note MARGIN & FUN are capitalized
# 4 levels; 3 degrees of freedom
levels(L4.df01$condition)
# complete orthogonal set has 3 mutually-ortogonal contrasts
sum(c1*c2) # orthogonal
sum(c1*c3) # orthogonal
sum(c2*c3) # orthogonal
# finally, assign contrasts to our factor
( contrasts(L4.df01$condition) <- cMat ) # assign contrasts to our factor
Next, we incorporate the contrasts into our analysis of variance
using aov
and split
. Notice that the sum of
the SS values for the three contrasts equals the SS for condition
(within rounding error), and that average \(F\) for the three contrasts equals the
omnibus \(F\).
L4.aov.02 <- aov(score~condition,data=L4.df01)
options(digits=8)
summary(L4.aov.02,split=list(condition=list(c1=1,c2=2,c3=3)))
## Df Sum Sq Mean Sq F value Pr(>F)
## condition 3 1515.5 505.15 2.2611 0.09456 .
## condition: c1 1 1474.6 1474.60 6.6006 0.01366 *
## condition: c2 1 20.6 20.56 0.0920 0.76303
## condition: c3 1 20.3 20.29 0.0908 0.76457
## Residuals 44 9829.8 223.40
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
1474.6+20.6+20.3 # SS values
## [1] 1515.5
(6.6006+0.092+0.0908)/3 # F values
## [1] 2.2611333
options(digits=4)
The next chunk of code shows how to use emmeans
to
perform the same three contrasts.
## with emmeans -------
library(emmeans)
myContrasts <- list(c1,c2,c3)
con.res <- contrast(lab4.emm,method=myContrasts)
summary(con.res,adjust="none")
Finally, we can use linear.comparison
to evaluate the
contrasts.
# download linear.comparison function:
source(url("http://pnb.mcmaster.ca/bennett/psy710/Rscripts/linear_contrast_v2.R"))
myC.res <- linear.comparison(y=L4.df01$score,
group=L4.df01$group,
c.weights=list(c1,c2,c3),
var.equal=T)
Trend analysis uses linear contrasts to decompose the observed
effects into orthogonal linear and nonlinear trends across the levels of
our grouping variable. Trend analysis can only be applied in cases where
the groups are differentiated on a factor that has ordered levels and
the differences between levels are meaningful (e.g., grade, age,
stimulus duration, etc.). The following code creates an ordered
factor, age
, that defines four equally-spaced
levels on a grouping variable. We then use trend analysis to evaluate
the linear and nonlinear trends of score
across the levels
of age
.
a <- 4 # number of gruops
n <- 12 # n per group
L4.df01$age <- factor(x=rep(1:a,each=n),
labels=c("age6","age8","age10","age12"),
ordered=TRUE)
class(L4.df01$age) # type of variable
levels(L4.df01$age) # levels
# We set-up R to use the following coding schemes:
# options(contrasts=c("contr.sum","contr.poly"))
# so our ordered factor already uses polynomial contrasts
contrasts(L4.df01$age) # contrasts for Linear, Quadratic, & Cubic trends
# do anova:
L4.age.aov <- aov(score~age,data=L4.df01)
summary(L4.age.aov,split=list(age=list(Lin=1,Quad=2,Cubic=3)))
Although R can be set up to automatically use polynomial contrasts on
ordered factors, sometimes it is necessary to assign the weights
ourselves. For example, we may have an ordered factor with levels that
are not separated by equal intervals. In that case, the
automatic trend weights assigned by R are not correct. The following
code shows how to use contr.poly
to calculate the correct
weights and assign them to our factor.
a <- 4 # number of gruops
n <- 12 # n per group
# unequal steps in our age factor:
L4.df01$age2 <- factor(x=rep(1:a,each=n),
labels=c("age6","age8","age12","age18"),
ordered=TRUE)
levels(L4.df01$age2)
contrasts(L4.df01$age2) # these weights assume equal intervals!
# these are the correct weights:
(newWeights <- contr.poly(n=4,scores=c(6,8,12,18)))
contrasts(L4.df01$age2) <- newWeights # assign to our factor
# do anova:
L4.age2.aov <- aov(score~age2,data=L4.df01)
summary(L4.age2.aov,split=list(age2=list(Lin=1,Quad=2,Cubic=3)))
Note that you can use contr.poly
to assign trend weights
to a regular, non-ordered factor.
class(L4.df01$group)
contrasts(L4.df01$group) <- newWeights # assign to our factor
L4.group.aov <- aov(score~group,data=L4.df01)
summary(L4.group.aov,split=list(group=list(Lin=1,Quad=2,Cubic=3)))
Finally, trend analysis can be performed with emmeans
.
Note that there is no need assign trend weights to the grouping
variable. Instead, the weights are specified by using the
poly
method of in the contrast
command. The
poly
method assumes the levels of the grouping variable are
equally spaced.
class(L4.df01$condition)
L4.condition.aov <- aov(score~condition,data=L4.df01)
# library(emmeans)
L4.condition.emm <- emmeans(L4.condition.aov,specs="condition")
contrast(L4.condition.emm,method="poly")
When the levels are not equally spaced, then you must provide the weights yourself.
L4.age2.emm <- emmeans(L4.age2.aov,specs="age2")
# newWeights <- contr.poly(n=4,scores=c(6,8,12,18))
wLin <- newWeights[,".L"] # 1st column
wQuad <- newWeights[,".Q"] # 2nd column
wCubic <- newWeights[,".C"] # 3rd column
myTrends <- list("lin"=wLin,"quad"=wQuad,"cub"=wCubic)
contrast(L4.condition.emm,method=myTrends)
This section illustrates how to adjust the \(p\) values for our various trends with the
Holm and FDR procedures. First we use emmeans
:
# emmeans -------------
# library(emmeans)
# myContrasts <- list(c1,c2,c3)
# con.res <- contrast(lab4.emm,method=myContrasts)
summary(con.res,adjust="holm") # holm correction
summary(con.res,adjust="fdr") # false discovery rate
Next we use p.adjust
:
## p.adjust ---------
summary(L4.aov.02,split=list(condition=list(c1=1,c2=2,c3=3)))
# p values taken from ANOVA table:
p.adjust(c(0.0137,0.7630,0.7646),"holm")
p.adjust(c(0.0137,0.7630,0.7646),"fdr")
Finally, we use the Scheffe method to adjust \(p\) values for post-hoc
tests. Note that scheffe.rank
must equal the
degrees-of-freedom for our grouping factor, regardless of the number of
contrasts that we analyze. Also note that you need to use the
summary
method for the Scheffe test to work properly:
including scheffe.rank
directly into the
contrast
command yields an incorrect result.
## scheffe ----------
c1 <- c(-1,-1,1,1)
# next command yields INCORRECT answer (p=0.0137)!
contrast(lab4.emm,method=list(c1),adjust="scheffe",scheffe.rank=3)
# incorrect Scheffe.rank!!
# this is how you get the correct answer:
c1.post.hoc <- contrast(lab4.emm,method=list(c1),adjust="scheffe",scheffe.rank=3)
summary(c1.post.hoc,adjust="scheffe",scheffe.rank=3) # p=0.1015 is correct
# alternative method:
# install.packages("DescTools")
library(DescTools)
ScheffeTest(L4.aov.02,which="condition",contrasts=c1)
Effect size for linear contrasts can be expressed using Cohen’s d,
which divides the value of \(\Psi\) by
an estimate of the standard deviation of the within-group error. The
next chunk of code calculates Cohen’s \(d\) with eff_size
in the
emmeans
package. Note that emmeans
has a bug:
specifically, it does not scale the contrast weights correctly. Instead,
it assumes that the postive and negative weights within a contrast sum
to 1 and -1, respectively. In the following code, c1
and
c4
are equivalent contrasts and should lead to the same
effect size, but emmeans
says that the effect size for
c4
is twice the effect size for c1
.
# library(emmeans)
c1 <- c(-1/2,-1/2,1/2,1/2)
c2 <- c(-1,1,0,0)
c3 <- c(0,0,-1,1)
(c4 <- 2*c1)
myContrasts <- list(c1,c2,c3,c4)
eff_size(lab4.emm,
sigma=sigma(L4.aov.01),
edf=df.residual(L4.aov.01),
method=myContrasts)
Next we calculate effect size using linear.comparison
.
It scales the contrast weights correctly so the estimated effect size is
the same for c1
and c4
.
# source(url("http://pnb.mcmaster.ca/bennett/psy710/Rscripts/linear_contrast_v2.R"))
# closeAllConnections() # close connection
myC.res <- linear.comparison(y=L4.df01$score,
group=L4.df01$group,
c.weights=list(c1,c2,c3,c4),
var.equal=T)
class(myC.res) # this is a list
length(myC.res) # has 4 parts; 1 for each contrast
# str(myC.res[[1]]) # each part has LOTS of components
c(myC.res[[1]]$d.effect.size,
myC.res[[2]]$d.effect.size,
myC.res[[3]]$d.effect.size,
myC.res[[4]]$d.effect.size)
Finally, we can use linear.comparison
to calculate
various measures of association strength. Measures of association
strength also can be calculated directly from values in the ANOVA
table.
# Three measures of association strength for the first contrast
myC.res[[1]]$contrast
c(myC.res[[1]]$R2.alerting,
myC.res[[1]]$R2.effect.size,
myC.res[[1]]$R2.contrast)
summary(L4.aov.01,split=list(group=list(c1=1,c2=2,c3=3)))
# values taken from ANOVA table
(c1.R2.alerting = 1475/1515) # proportion of between-group SS
(c1.R2.effectsize = 1475/(1515+9830) ) # proportion of SS-total
(c1.R2.contrast = 1475/(1475+9830) ) # proportion of (SS-contrast + SS-residual)
An experiment measured the effect of dividing attention on reaction times in a visual detection task. Performance was measured in five groups of 12 participants while they simultaneously completed an auditory detection task. The difficulty of the auditory detection task was varied across the five groups by presenting the the auditory target (i.e., a pure tone) in higher levels of background noise. The experimenters hypothesized that performance in the visual task would vary non-monotonically with the level of background noise. In other words, the prediction is that visual performance will be best at an intermediate level of noise. The data are stored in the data frame visRT which contains the numeric variable, RT, and the ordered factor, noise.
sapply(visRT,class)
## $noise
## [1] "ordered" "factor"
##
## $RT
## [1] "numeric"
summary(visRT)
## noise RT
## n1:12 Min. :-162
## n2:12 1st Qu.: 261
## n3:12 Median : 380
## n4:12 Mean : 423
## n5:12 3rd Qu.: 597
## Max. :1161
Here is a plot of the data:
Use ANOVA to evaluate the differences among the means. Is the difference among group means statistically significant?
Explain why the ANOVA in the previous question is or is not an appropriate test of the experimenters’ hypothesis about the effect of noise level on visual performance.
What type of variable is noise
? What contrasts does
R use for that variable?
Evaluate the linear and quadratic trends of RT across the levels of noise. (For this analysis, you may assume that the trend analysis was planned, and that the successive levels of noise differ by a constant amount.)
Now perform the analysis under the assumption that we decided to analyze the trends only after looking at the data. Explain how the results differ from the ones obtained in the previous question.
Create a set of contrast weights to evaluate the hypothesis that mean RT in noise levels 1-4 equals the mean RT in noise level 5. Perform that linear contrast.
Now perform the contrast assuming it was done after looking at the data.