1 Initialize R

Initialize R with the following commands. The load command loads three data frames – L4.df01, visRT, and blood – though we will not use blood in this lab.

options(digits=4,contrasts=c("contr.sum","contr.poly")) # IMPORTANT! [Use sum-to-zero constraint for effects]
load(file=url("http://pnb.mcmaster.ca/bennett/psy710/datasets/rlab4-data-2023.rda"))

2 One-way ANOVA

In this section we will analyze data in the L4.df01 data frame, which contains the data from an experiment that measured scores in for groups of subjects. There are 12 observations per group.

with(L4.df01,tapply(score,group,length))
##  1  2  3  4 
## 12 12 12 12
summary(L4.df01)
##      group          score      
##  Min.   :1.00   Min.   : 64.1  
##  1st Qu.:1.75   1st Qu.: 88.8  
##  Median :2.50   Median :100.2  
##  Mean   :2.50   Mean   :101.1  
##  3rd Qu.:3.25   3rd Qu.:112.8  
##  Max.   :4.00   Max.   :132.4
  1. Use the variables group and score in L4.df01 to evaluate the null hypothesis that the four group means are equal. Based on the ANOVA table, what is your decision regarding the null hypothesis?

  2. What is wrong with the previous analysis? Hint: Look at the degrees of freedom.

# Note the degrees of freedom for group is 1, which is NOT correct.
# There are 4 groups so there should be 3 degrees of freedom.
# R has treated group as a numeric variable, not a factor
sapply(L4.df01,class)
##     group     score 
## "numeric" "numeric"

The following code i) transforms group into a factor; and creates another factor condition and adds it to our data frame.

L4.df01$group <- factor(x=L4.df01$group,labels="g")
n <- 12
a <- 4
gnumber <- rep(x=1:a,each=n)
L4.df01$condition <- factor(x=gnumber,labels="c",ordered=FALSE)
class(L4.df01$condition)
## [1] "factor"
levels(L4.df01$condition)
## [1] "c1" "c2" "c3" "c4"
summary(L4.df01)
##  group       score       condition
##  g1:12   Min.   : 64.1   c1:12    
##  g2:12   1st Qu.: 88.8   c2:12    
##  g3:12   Median :100.2   c3:12    
##  g4:12   Mean   :101.1   c4:12    
##          3rd Qu.:112.8            
##          Max.   :132.4
  1. Visualize the data by plotting box plots of score for each condition.

  2. Now use aov to evaluate the null hypothesis that all of the group means are equal. How does the result differ from the previous ANOVA?

  3. Check the normality assumption.

  4. Evaluate the null hypothesis of equal group means using an analysis that assumes the residuals are distributed normally but does not assume that the variances are equal across groups.

2.1 Pairwise Tests

The following code shows how to perform all pairwise tests and adjust the \(p\) values using the Tukey method of adjustment. The confidence intervals listed by TukeyHSD are adjusted, or simultaneous, confidence intervals. An individual 95% CI includes the true population value 95% of the time (in the long run). On the other hand, all of the adjusted 95% CIs will (in the long run) contain the true population value. In other words, if we repeated this experiment many times, all of the adjusted CIs would contain their respective true values 95% of the time.

# Tukey HSD -----------
TukeyHSD(L4.aov.01,"group") # assumes equal variance

# using emmeans -------
library(emmeans)
lab4.emm <- emmeans(L4.aov.01,specs="group")
contrast(lab4.emm,method="pairwise",adjust="tukey")

Tukey HSD assumes equal variance across conditions, but Dunnett’s T3 test does not.

# Dunnett T3 Test (unequal variances) -------
library(PMCMRplus)
dunnettT3Test(score~group,data=L4.df01) # doesn't assume equal variance

Dunnett’s test is useful when you are comparing several groups to a single control group.

## Dunnett pairwise tests (to control group) ----------
# install.packages("DescTools")
library(DescTools)
levels(L4.df01$group)
DunnettTest(score~group,data=L4.df01,control="g4")

2.2 Linear Contrasts

In this section we illustrate how to use linear contrasts/comparisons to analyze the data in L4.df01. First we create three contrasts, demonstrate that they are a complete orthogonal set, and assign them to our factor condition.

# create contrasts ---------
c1 <- c(-1,-1,1,1) 
c2 <- c(-1,1,0,0)
c3 <- c(0,0,-1,1)

# store 3 contrasts in one matrix:
( cMat <- cbind(c1,c2,c3) )

# make sure contrast weights sum to zero
apply(cMat,MARGIN=2,FUN=sum) # note MARGIN & FUN are capitalized

# 4 levels; 3 degrees of freedom
levels(L4.df01$condition) 

# complete orthogonal set has 3 mutually-ortogonal contrasts
sum(c1*c2) # orthogonal
sum(c1*c3) # orthogonal
sum(c2*c3) # orthogonal

# finally, assign contrasts to our factor
( contrasts(L4.df01$condition) <- cMat ) # assign contrasts to our factor

Next, we incorporate the contrasts into our analysis of variance using aov and split. Notice that the sum of the SS values for the three contrasts equals the SS for condition (within rounding error), and that average \(F\) for the three contrasts equals the omnibus \(F\).

L4.aov.02 <- aov(score~condition,data=L4.df01)
options(digits=8)
summary(L4.aov.02,split=list(condition=list(c1=1,c2=2,c3=3)))
##                 Df Sum Sq Mean Sq F value  Pr(>F)  
## condition        3 1515.5  505.15  2.2611 0.09456 .
##   condition: c1  1 1474.6 1474.60  6.6006 0.01366 *
##   condition: c2  1   20.6   20.56  0.0920 0.76303  
##   condition: c3  1   20.3   20.29  0.0908 0.76457  
## Residuals       44 9829.8  223.40                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
1474.6+20.6+20.3 # SS values
## [1] 1515.5
(6.6006+0.092+0.0908)/3 # F values
## [1] 2.2611333
options(digits=4)

The next chunk of code shows how to use emmeans to perform the same three contrasts.


## with emmeans -------
library(emmeans)
myContrasts <- list(c1,c2,c3)
con.res <- contrast(lab4.emm,method=myContrasts)
summary(con.res,adjust="none")

Finally, we can use linear.comparison to evaluate the contrasts.

# download linear.comparison function:
source(url("http://pnb.mcmaster.ca/bennett/psy710/Rscripts/linear_contrast_v2.R"))
myC.res <- linear.comparison(y=L4.df01$score,
                             group=L4.df01$group,
                             c.weights=list(c1,c2,c3),
                             var.equal=T)
  1. What null hypothesis is evaluated by each contrast?

2.3 Trend Analysis

Trend analysis uses linear contrasts to decompose the observed effects into orthogonal linear and nonlinear trends across the levels of our grouping variable. Trend analysis can only be applied in cases where the groups are differentiated on a factor that has ordered levels and the differences between levels are meaningful (e.g., grade, age, stimulus duration, etc.). The following code creates an ordered factor, age, that defines four equally-spaced levels on a grouping variable. We then use trend analysis to evaluate the linear and nonlinear trends of score across the levels of age.

a <- 4 # number of gruops
n <- 12 # n per group
L4.df01$age <- factor(x=rep(1:a,each=n),
                      labels=c("age6","age8","age10","age12"),
                      ordered=TRUE)
class(L4.df01$age) # type of variable
levels(L4.df01$age) # levels
# We set-up R to use the following coding schemes:
# options(contrasts=c("contr.sum","contr.poly"))
# so our ordered factor already uses polynomial contrasts
contrasts(L4.df01$age) # contrasts for Linear, Quadratic, & Cubic trends

# do anova:
L4.age.aov <- aov(score~age,data=L4.df01)
summary(L4.age.aov,split=list(age=list(Lin=1,Quad=2,Cubic=3)))

Although R can be set up to automatically use polynomial contrasts on ordered factors, sometimes it is necessary to assign the weights ourselves. For example, we may have an ordered factor with levels that are not separated by equal intervals. In that case, the automatic trend weights assigned by R are not correct. The following code shows how to use contr.poly to calculate the correct weights and assign them to our factor.

a <- 4 # number of gruops
n <- 12 # n per group
# unequal steps in our age factor:
L4.df01$age2 <- factor(x=rep(1:a,each=n),
                      labels=c("age6","age8","age12","age18"),
                      ordered=TRUE)
levels(L4.df01$age2)
contrasts(L4.df01$age2) # these weights assume equal intervals!
# these are the correct weights:
(newWeights <- contr.poly(n=4,scores=c(6,8,12,18)))
contrasts(L4.df01$age2) <- newWeights # assign to our factor
# do anova:
L4.age2.aov <- aov(score~age2,data=L4.df01)
summary(L4.age2.aov,split=list(age2=list(Lin=1,Quad=2,Cubic=3)))

Note that you can use contr.poly to assign trend weights to a regular, non-ordered factor.

class(L4.df01$group)
contrasts(L4.df01$group) <- newWeights # assign to our factor
L4.group.aov <- aov(score~group,data=L4.df01)
summary(L4.group.aov,split=list(group=list(Lin=1,Quad=2,Cubic=3)))

Finally, trend analysis can be performed with emmeans. Note that there is no need assign trend weights to the grouping variable. Instead, the weights are specified by using the poly method of in the contrast command. The poly method assumes the levels of the grouping variable are equally spaced.

class(L4.df01$condition)
L4.condition.aov <- aov(score~condition,data=L4.df01)
# library(emmeans)
L4.condition.emm <- emmeans(L4.condition.aov,specs="condition")
contrast(L4.condition.emm,method="poly")

When the levels are not equally spaced, then you must provide the weights yourself.

L4.age2.emm <- emmeans(L4.age2.aov,specs="age2")
# newWeights <- contr.poly(n=4,scores=c(6,8,12,18))
wLin <- newWeights[,".L"] # 1st column
wQuad <- newWeights[,".Q"] # 2nd column
wCubic <- newWeights[,".C"] # 3rd column
myTrends <- list("lin"=wLin,"quad"=wQuad,"cub"=wCubic)
contrast(L4.condition.emm,method=myTrends)

2.4 Multiple Comparisons

This section illustrates how to adjust the \(p\) values for our various trends with the Holm and FDR procedures. First we use emmeans:

# emmeans -------------
# library(emmeans)
# myContrasts <- list(c1,c2,c3)
# con.res <- contrast(lab4.emm,method=myContrasts)
summary(con.res,adjust="holm") # holm correction
summary(con.res,adjust="fdr") # false discovery rate

Next we use p.adjust:

## p.adjust ---------
summary(L4.aov.02,split=list(condition=list(c1=1,c2=2,c3=3)))
# p values taken from ANOVA table:
p.adjust(c(0.0137,0.7630,0.7646),"holm")
p.adjust(c(0.0137,0.7630,0.7646),"fdr")

Finally, we use the Scheffe method to adjust \(p\) values for post-hoc tests. Note that scheffe.rank must equal the degrees-of-freedom for our grouping factor, regardless of the number of contrasts that we analyze. Also note that you need to use the summary method for the Scheffe test to work properly: including scheffe.rank directly into the contrast command yields an incorrect result.

## scheffe ----------
c1 <- c(-1,-1,1,1)
# next command yields INCORRECT answer (p=0.0137)!
contrast(lab4.emm,method=list(c1),adjust="scheffe",scheffe.rank=3)
# incorrect Scheffe.rank!!

# this is how you get the correct answer:
c1.post.hoc <- contrast(lab4.emm,method=list(c1),adjust="scheffe",scheffe.rank=3)
summary(c1.post.hoc,adjust="scheffe",scheffe.rank=3) # p=0.1015  is correct

# alternative method:
# install.packages("DescTools")
library(DescTools)
ScheffeTest(L4.aov.02,which="condition",contrasts=c1)

2.5 Effect Size & Association Strength

Effect size for linear contrasts can be expressed using Cohen’s d, which divides the value of \(\Psi\) by an estimate of the standard deviation of the within-group error. The next chunk of code calculates Cohen’s \(d\) with eff_size in the emmeans package. Note that emmeans has a bug: specifically, it does not scale the contrast weights correctly. Instead, it assumes that the postive and negative weights within a contrast sum to 1 and -1, respectively. In the following code, c1 and c4 are equivalent contrasts and should lead to the same effect size, but emmeans says that the effect size for c4 is twice the effect size for c1.

# library(emmeans)
c1 <- c(-1/2,-1/2,1/2,1/2)
c2 <- c(-1,1,0,0)
c3 <- c(0,0,-1,1)
(c4 <- 2*c1)
myContrasts <- list(c1,c2,c3,c4)
eff_size(lab4.emm,
         sigma=sigma(L4.aov.01),
         edf=df.residual(L4.aov.01),
         method=myContrasts)

Next we calculate effect size using linear.comparison. It scales the contrast weights correctly so the estimated effect size is the same for c1 and c4.

# source(url("http://pnb.mcmaster.ca/bennett/psy710/Rscripts/linear_contrast_v2.R"))
# closeAllConnections() # close connection

myC.res <- linear.comparison(y=L4.df01$score,
                             group=L4.df01$group,
                             c.weights=list(c1,c2,c3,c4),
                             var.equal=T)
class(myC.res) # this is a list
length(myC.res) # has 4 parts; 1 for each contrast
# str(myC.res[[1]]) # each part has LOTS of components
c(myC.res[[1]]$d.effect.size,
  myC.res[[2]]$d.effect.size,
  myC.res[[3]]$d.effect.size,
  myC.res[[4]]$d.effect.size)

Finally, we can use linear.comparison to calculate various measures of association strength. Measures of association strength also can be calculated directly from values in the ANOVA table.

# Three measures of association strength for the first contrast
myC.res[[1]]$contrast
c(myC.res[[1]]$R2.alerting,
  myC.res[[1]]$R2.effect.size,
  myC.res[[1]]$R2.contrast)

summary(L4.aov.01,split=list(group=list(c1=1,c2=2,c3=3)))
# values taken from ANOVA table
(c1.R2.alerting = 1475/1515) # proportion of between-group SS
(c1.R2.effectsize = 1475/(1515+9830) ) # proportion of SS-total
(c1.R2.contrast = 1475/(1475+9830) ) # proportion of (SS-contrast + SS-residual)

3 Divided Attention Experiment

An experiment measured the effect of dividing attention on reaction times in a visual detection task. Performance was measured in five groups of 12 participants while they simultaneously completed an auditory detection task. The difficulty of the auditory detection task was varied across the five groups by presenting the the auditory target (i.e., a pure tone) in higher levels of background noise. The experimenters hypothesized that performance in the visual task would vary non-monotonically with the level of background noise. In other words, the prediction is that visual performance will be best at an intermediate level of noise. The data are stored in the data frame visRT which contains the numeric variable, RT, and the ordered factor, noise.

sapply(visRT,class)
## $noise
## [1] "ordered" "factor" 
## 
## $RT
## [1] "numeric"
summary(visRT)
##  noise         RT      
##  n1:12   Min.   :-162  
##  n2:12   1st Qu.: 261  
##  n3:12   Median : 380  
##  n4:12   Mean   : 423  
##  n5:12   3rd Qu.: 597  
##          Max.   :1161

Here is a plot of the data:

  1. Use ANOVA to evaluate the differences among the means. Is the difference among group means statistically significant?

  2. Explain why the ANOVA in the previous question is or is not an appropriate test of the experimenters’ hypothesis about the effect of noise level on visual performance.

  3. What type of variable is noise? What contrasts does R use for that variable?

  4. Evaluate the linear and quadratic trends of RT across the levels of noise. (For this analysis, you may assume that the trend analysis was planned, and that the successive levels of noise differ by a constant amount.)

  5. Now perform the analysis under the assumption that we decided to analyze the trends only after looking at the data. Explain how the results differ from the ones obtained in the previous question.

  1. Create a set of contrast weights to evaluate the hypothesis that mean RT in noise levels 1-4 equals the mean RT in noise level 5. Perform that linear contrast.

  2. Now perform the contrast assuming it was done after looking at the data.