1 Initialize R

Clear R’s workspace and set up your working directory. Then initialize R with the following commands.

options(contrasts=c("contr.sum","contr.poly"),width=100) # IMPORTANT!

2 Wide & long data frames

In this section we download two numeric vectors and use them to create data frames in the so-called wide and long formats. The numeric vectors are named cog.at.risk and cog.control and can be loaded into R with the following command:

load(file=url('http://pnb.mcmaster.ca/bennett/psy710/datasets/cogTest.rda'))

First combine the two vectors into a single, wide-format data frame consisting of two numeric variables: risk and control.

# store vectors in wide-format data frame:
cogWide.df <- data.frame(risk=cog.at.risk,control=cog.control)
summary(cogWide.df)
##       risk          control     
##  Min.   : 55.6   Min.   : 80.0  
##  1st Qu.: 87.8   1st Qu.: 87.5  
##  Median : 95.9   Median : 93.8  
##  Mean   : 94.0   Mean   :100.0  
##  3rd Qu.:104.2   3rd Qu.:113.2  
##  Max.   :115.9   Max.   :128.8
head(cogWide.df) # inspect first several rows


Next, we use the melt command in the reshape2 package to transform cogWide.df into a long-format data frame, cogLong.df:

# install.packages("reshape2")
library(reshape2)
cogLong.df <- melt(cogWide.df,
                   measure.vars=c("risk","control"), # dependent vars
                   value.name="score", # new dependent var
                   variable.name="group") # new grouping var
# head(cogLong.df) # inspect first six rows; compare to last 6 rows
# tail(cogLong.df) # last six rows

The gather command in tidyr does the same thing:

# install.packages("tidyr")
library(tidyr)
cogLong.df <- gather(cogWide.df,key="group",value="score",risk,control)

Next, compare the summaries of the wide and long data frames.

The wide data frame contains two variables, risk and control, that are separated into two columns, so each row contains two dependent measures (i.e., two test scores). The long data frame also contains two variables: a numeric variable, score, which contains the test scores from the risk and control groups, and a factor, group, that consists of two levels (i.e., risk and group). Unlike the wide format data frame, the long data frame contains a single dependent variable in each row.

summary(cogWide.df)
##       risk          control     
##  Min.   : 55.6   Min.   : 80.0  
##  1st Qu.: 87.8   1st Qu.: 87.5  
##  Median : 95.9   Median : 93.8  
##  Mean   : 94.0   Mean   :100.0  
##  3rd Qu.:104.2   3rd Qu.:113.2  
##  Max.   :115.9   Max.   :128.8
summary(cogLong.df)
##      group        score      
##  risk   :20   Min.   : 55.6  
##  control:20   1st Qu.: 87.5  
##               Median : 94.8  
##               Mean   : 97.0  
##               3rd Qu.:107.9  
##               Max.   :128.8

The difference between the formats has important implications for how you analyze and graph data. For example, the following code illustrates how the means of risk and control are computed for the two formats.

# wide format
colMeans(cogWide.df)
##    risk control 
##      94     100
# apply(X = cogWide.df,MARGIN=2,FUN = "mean") # same results as colMeans
# long format
with(cogLong.df,tapply(score,group,mean))
##    risk control 
##      94     100
# aggregate(score~group,cogLong.df,mean) # same result

During the term you will find that some things are more easily done with wide-format data frames, but that many others are more easily done with long-format data frames. For example, it is arguably easier to use cor.test and t.test with wide-format data, but it is much easier to perform ANOVAs with long-format data. So you should become comfortable with both formats.

You can read more about converting between wide and long formats at the following websites

  1. www.cookbook-r.com/Manipulating_data/Converting_data_between_wide_and_long_format/

  2. ademos.people.uic.edu/Chapter8.html

  3. www.r-bloggers.com/2019/07/how-to-reshape-a-dataframe-from-wide-to-long-or-long-to-wide-format/

  4. www.r-bloggers.com/2016/09/reshape-r-package-reshape2-melt-and-cast/

  5. www.r-bloggers.com/2016/06/how-to-reshape-data-in-r-tidyr-vs-reshape2/

2.1 Acuity experiment data

An experiment is conducted that measures visual acuity in groups of infants ranging from 1 to 5 months of age. Load the data into R with the following command:

load(file=url('http://pnb.mcmaster.ca/bennett/psy710/datasets/acuityWide.rda'))

Tasks:

  1. Identify the number, names, and types of variables in acuityWide.df.

The data from 10 infants in 5 age groups (i.e., 1-5 months) are stored in the wide format data frame acuityWide.df, which contains five numeric variables named m1, m2, m3, m4, and m5.

dim(acuityWide.df) # [rows,columns]
## [1] 10  5
sapply(acuityWide.df,class) # list types of variables
##        m1        m2        m3        m4        m5 
## "numeric" "numeric" "numeric" "numeric" "numeric"
summary(acuityWide.df) # summary of each variable
##        m1              m2              m3              m4              m5       
##  Min.   : 7.89   Min.   : 8.34   Min.   : 8.18   Min.   : 9.66   Min.   : 8.94  
##  1st Qu.: 8.44   1st Qu.: 9.38   1st Qu.: 9.48   1st Qu.: 9.70   1st Qu.: 9.42  
##  Median : 9.96   Median : 9.75   Median :10.10   Median : 9.80   Median :10.81  
##  Mean   : 9.45   Mean   : 9.75   Mean   :10.00   Mean   :10.20   Mean   :10.50  
##  3rd Qu.:10.22   3rd Qu.:10.30   3rd Qu.:10.74   3rd Qu.:10.37   3rd Qu.:11.31  
##  Max.   :10.76   Max.   :11.09   Max.   :11.12   Max.   :12.29   Max.   :11.96
  1. Convert acuityWide.df into a long-format data frame named acuityLong.df. Store the five ages in a single variable named age and the acuity measures in a single variable named acuity.
library(reshape2)
acuityLong.df <- melt(acuityWide.df,
                   measure.vars=c("m1","m2","m3","m4","m5"), # dependent vars
                   value.name="acuity", # new dependent var
                   variable.name="age") # new grouping var
names(acuityLong.df)
## [1] "age"    "acuity"
  1. Use summary to get a summary of the variables in the new data frame. Verify that age is a factor and acuity is numeric.
sapply(acuityLong.df,class)
##       age    acuity 
##  "factor" "numeric"
summary(acuityLong.df)
##  age         acuity     
##  m1:10   Min.   : 7.89  
##  m2:10   1st Qu.: 9.38  
##  m3:10   Median : 9.94  
##  m4:10   Mean   : 9.98  
##  m5:10   3rd Qu.:10.76  
##          Max.   :12.29
  1. Calculate the mean acuity for each age.
op <- par(no.readonly = T)
par(cex=1.5,cex.lab=1.5,cex.main=2,mar=c(5,5,3,1)+0.1,mgp=c(3,1,0))
boxplot(acuity~age,
        data=acuityLong.df,
        xlab="Age (months)",
        names=c("1","2","3","4","5"),
        main="Acuity Data")

par(op)
  1. Plot a box plot of acuity for each age.
library(reshape2)
acuityLong.df <- melt(acuityWide.df,
                   measure.vars=c("m1","m2","m3","m4","m5"), # dependent vars
                   value.name="acuity", # new dependent var
                   variable.name="age") # new grouping var
summary(acuityLong.df)
##  age         acuity     
##  m1:10   Min.   : 7.89  
##  m2:10   1st Qu.: 9.38  
##  m3:10   Median : 9.94  
##  m4:10   Mean   : 9.98  
##  m5:10   3rd Qu.:10.76  
##          Max.   :12.29
sapply(acuityLong.df,class)
##       age    acuity 
##  "factor" "numeric"
boxplot(acuity~age,data=acuityLong.df)

3 ANOVA

The analysis of variance (ANOVA) is a method for evaluating differences among 3 or more group means. In the following examples, the numeric dependent variable score and an independent variable, the four-level factor condition, are contained in the data frame theAOVdata.

load(file=url('http://pnb.mcmaster.ca/bennett/psy710/datasets/theAOVdataRL3.rda'))
summary(theAOVdata)
##  condition     score      
##  c1:20     Min.   : 54.4  
##  c2:20     1st Qu.: 87.2  
##  c3:20     Median : 98.8  
##  c4:20     Mean   : 98.9  
##            3rd Qu.:112.9  
##            Max.   :147.2

Task: Plot the distributions with a box plot for each group.

par(cex=1.5,cex.lab=1.5,cex.main=2,mar=c(5,5,3,1)+0.1,mgp=c(3,1,0))
boxplot(score~condition,data=theAOVdata,xlab="Condition",ylab="Score",main="theAOVdata")

par(op)

In R, ANOVA is performed with the lm and aov commands. The formula score~1+condition defines our linear model: score is modeled as the sum of an intercept (1) and an effect of condition. Note that the intercept is added by default, and so we would obtain the same results if we used the formula score~condition. The aov and lm commands compute the best-fitting (least-squares) coefficients for our model. Use aov and lm to perform an ANOVA (i.e., compute the least-squares estimates of the coefficients of our linear model), and then display the results in a table using the anova command.


options(contrasts=c("contr.sum","contr.poly")) # important!
# contr.sum means we are using sum-to-zero definition of an "effect"

# using aov()
aov.res.01 <- aov(score~1+condition,data=theAOVdata)
class(aov.res.01)
## [1] "aov" "lm"
anova(aov.res.01) 
# summary(aov.res.01) # same as anova()

# using lm()
lm.res.01 <- lm(score~1+condition,data=theAOVdata)
class(lm.res.01)
## [1] "lm"
anova(lm.res.01)
# summary(lm.res.01) # not the same as anova() !

Question: The anova tables contain an \(F\) test and a \(p\) value for condition. What hypothesis is being evaluated by this \(F\) test?

Answer: The \(F\) test is evaluating the null hypothesis that all of the condition means are equal or, equivalently, that all of the effects (i.e., alpha’s) of condition are zero. When the null hypothesis is true, all of the data are being selected from populations with the same mean, and therefore the variation among group means is due solely to sampling variation.

3.1 coefficients

We can view the best-fitting coefficients with the coef and dummy.coef commands.

coef(aov.res.01) # coefficients for 3 groups listed here
## (Intercept)  condition1  condition2  condition3 
##      98.911      -2.647      -8.552       8.637
dummy.coef(aov.res.01) # coefficients for all 4 groups listed here... why?
## Full coefficients are 
##                                            
## (Intercept):     98.91                     
## condition:          c1     c2     c3     c4
##                 -2.647 -8.552  8.637  2.562

coef(lm.res.01) # coefficients for 3 groups listed here
## (Intercept)  condition1  condition2  condition3 
##      98.911      -2.647      -8.552       8.637
dummy.coef(lm.res.01) # coefficients for all 4 groups listed here... why?
## Full coefficients are 
##                                            
## (Intercept):     98.91                     
## condition:          c1     c2     c3     c4
##                 -2.647 -8.552  8.637  2.562

We are using the sum-to-zero constraint on our alpha’s, which means that all of the group effects (alpha’s) must sum to zero. Therefore, our ANOVA only estimates the values of the intercept and three group effects (because the fourth is given by the sum-to-zero constraint). The coef command lists the coefficients estimated by ANOVA whereas dummy.coef lists the effects for all groups.

The coefficients for an lm object, but not an aov object, can be submitted easily to a t test.

summary(lm.res.01) # t test on each coefficient
## 
## Call:
## lm(formula = score ~ 1 + condition, data = theAOVdata)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -38.51 -10.52   0.66  11.49  45.76 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    98.91       1.87   52.81   <2e-16 ***
## condition1     -2.65       3.24   -0.82   0.4172    
## condition2     -8.55       3.24   -2.64   0.0102 *  
## condition3      8.64       3.24    2.66   0.0095 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 16.8 on 76 degrees of freedom
## Multiple R-squared:  0.131,  Adjusted R-squared:  0.0971 
## F-statistic: 3.83 on 3 and 76 DF,  p-value: 0.013

Question: What null hypothesis is being evaluated by each \(t\) test?

Answer: Each \(t\) test is evaluating the null hypothesis that the coefficient was drawn from a population with \(\mu=0\). Note that these \(t\) tests differ from the \(F\) test performed in ANOVA, which evaluates the hypothesis that all of the effects/coefficients are zero.

Because we are using the sum-to-zero constraint on our alpha’s, the effect of being group j is defined as the difference between the mean for group j and the unweighted mean (i.e., the average of the group means), and the intercept corresponds to the unweighted group mean. We have equal n in each group, so the unweighted mean equals the grand mean of the scores.

Task: Calculate the group means, grand mean, the unweighted mean, and the effect for each group (i.e., difference between each group mean and the unweighted mean). Verify that these alphas are the same as those estimated by aov and/or lm.

dummy.coef(aov.res.01) 
## Full coefficients are 
##                                            
## (Intercept):     98.91                     
## condition:          c1     c2     c3     c4
##                 -2.647 -8.552  8.637  2.562
( grpMeans <- with(theAOVdata,tapply(score,condition,mean)) ) # group means
##     c1     c2     c3     c4 
##  96.26  90.36 107.55 101.47
( grandMean <- mean(theAOVdata$score) ) # equals Intercept
## [1] 98.91
( unweightedMean <- mean(grpMeans)) # equal n so this equals grand mean & intercept
## [1] 98.91
grpMeans - grandMean # equals alphas/coefficients
##     c1     c2     c3     c4 
## -2.647 -8.552  8.637  2.562

3.2 treatment definition of effects

It is possible to use alternative definitions of effects. For example we could define one of our conditions (c1) as a baseline condition, the others (i.e., c2, c3, & c4) as so-called treatment conditions, and the effect of a treatment as the difference between the treatment group mean and the baseline group mean. Using the treatment definition of effects, which is the default definition in R, will change the values of the best-fitting coefficients estimated by ANOVA: the intercept equals the mean of the baseline group and the three coefficients correspond to the difference between each treatment group and the baseline group.

options(contrasts=c("contr.treatment","contr.poly")) # important!
aov.res.02 <- aov(score~condition,data=theAOVdata) # recalculate least-squares coefficients
coef(aov.res.02)
## (Intercept) conditionc2 conditionc3 conditionc4 
##      96.264      -5.905      11.283       5.208
dummy.coef(aov.res.02) 
## Full coefficients are 
##                                            
## (Intercept):     96.26                     
## condition:          c1     c2     c3     c4
##                  0.000 -5.905 11.283  5.208
grpMeans - grpMeans[1]
##     c1     c2     c3     c4 
##  0.000 -5.905 11.283  5.208
options(contrasts=c("contr.sum","contr.poly")) # restore sum-to-zero definition

Which definition of an effect is best? It depends on the circumstances: in some sense it is arbitrary. Fortunately, the \(F\) test in a 1-way ANOVA always evaluates the same null hypothesis (H0: \(\alpha_j = 0\) for all \(j\)) and yields the same \(p\) value regardless of how we define \(\alpha_j\). (However, this is not true in the case of an unbalanced factorial ANOVA, a point that we will return to later in the course).

anova(aov.res.02)

3.3 anova table

The ANOVA table is listed with the anova command. The first line in the ANOVA table lists the change in goodness-of-fit (i.e., SS-residuals) when we force all of the group effects to equal zero. The \(F\) test evaluates the null hypothesis that all of the group effects are zero or, equivalently, that the four conditions are samples from populations with the same mean. Essentially, we are comparing the goodness-of-fit (i.e., SS-residuals) of our full model, which includes and intercept and an effect for each group, and the fit of a reduced model that includes only an intercept. We expect the full model to provide a better fit because it has more parameters. The question is whether the change in goodness-of-fit is unusually large assuming that the true values of the coefficients are zero.

anova(aov.res.01) # standard anova table:
# anova(lm.res.01) # gives same results

In this case, the \(F\) test is significant \(p<0.05\) and therefore we may reject the null hypothesis that the effects are all zero for the alternative hypothesis that they are not all zero.

The data presented in an ANOVA table can be interpreted as a comparison of the goodness-of-fit obtained by a full model, which includes an intercept and effects of condition, and a reduced model that includes only an intercept (i.e., the condition effects are set to zero). To illustrate this point, we will analyze theAOVdata with a reduced linear model that contains only an intercept (i.e., no effect of condition), and compare the goodness-of-fit obtained with the full and reduced models.

# compare directly to reduced model:
aov.reduced.model <- aov(score~1,theAOVdata) # intercept only
anova(aov.reduced.model,aov.res.01) # compare 2 models (compare SS to ANOVA table)

Now calculate the difference between the sum-of-squared residuals for each model. Can you find this difference in the original ANOVA table?

# calculate sum of squared residuals from each model
SS.resid.reduced <- sum( residuals(aov.reduced.model)^2 )
SS.resid.full <- sum( residuals(aov.res.01)^2 )
SS.resid.reduced - SS.resid.full # difference
## [1] 3226

Answer: The difference between SS residuals is listed as the SS condition in the original ANOVA table.

Note that the Sum of Squares (SS) value for condition in the full ANOVA table corresponds to the difference between the SS residuals in the full and reduced models, and therefore is the amount of variation in our dependent variable that is accounted for by the condition. Also note that the degrees of freedom for condition corresponds to the number of coefficients for that factor in the full model, and equals the number of levels in condition minus 1. The degrees of freedom for the residuals in the full model corresponds to the total number of observations in the data set (80) minus the number of parameters in the full model, which in this case is four (i.e., the intercept and 3 condition effects).

3.4 association strength & effect size

Association strength and effect size are used to characterize the magnitude of the association between the independent and dependent variables that (in theory) are not dependent on sample size. Association strength refers to the proportion of variance that is accounted for by the grouping variable. One measure, eta-squared (\(\eta^2\)), is simply SS-group divided by SS-total. Another common measure, omega-squared (\(\omega^2\)), is an unbiased estimate of the value of \(\eta^2\) in the population. Omega-squared typically is less than eta-squared, though the difference is small when sample size is large.

# install.packages("effectsize") # use this if you don't have effectsize on your computer
library(effectsize)
eta_squared(aov.res.01) # in effectsize package
omega_squared(aov.res.01) # in effectsize package

Effect size is a standardized measure of differences among means. Cohen’s \(d\), for example, expresses the difference between two group menas relative to the pooled standard deviation. Cohen’s \(f\) is a generalization of Cohen’s \(d\) to cases of 3 or more means.

# install.packages("effectsize") # use this if you don't have effectsize on your computer
# library(effectsize)
cohens_f(aov.res.01)
cohens_f_squared(aov.res.01) # f-squared also is common

One interpretation of Cohen’s \(f\) is that it represents the average standardized deviation of group means from the grand mean:

MS.residuals <- 281 # from ANOVA table
( within.grp.sd <- sqrt(MS.residuals) )
## [1] 16.76
# mean( with(theAOVdata,tapply(score,condition,sd)) ) # same as sqrt(MS.residuals)
( Cohen_d_squared <- ((grpMeans-grandMean)/within.grp.sd)^2 ) # cohen d^2 for each group
##      c1      c2      c3      c4 
## 0.02493 0.26025 0.26544 0.02335
sqrt( mean( Cohen_d_squared ) ) # approximate Cohen's f 
## [1] 0.3788

You can get measures of association strength and effect size directly from the values in an ANOVA table, which is useful when you are trying to estimate these quantities from previously published data.

( eta2 <- 3226/(3226+21328)) # values from anova table
## [1] 0.1314
sqrt(eta2 / (1-eta2)) # estimate of cohen's f
## [1] 0.3889

3.5 checking assumptions

The analysis of variance assumes that the residuals are distributed normally and that the variance is equal across groups. We can evaluate those assumptions with shapiro.test and bartlett.test.

shapiro.test(residuals(aov.res.01)) # H0: residuals are distributed normally
## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(aov.res.01)
## W = 0.99, p-value = 0.8
bartlett.test(score~condition,theAOVdata) # H0: variance is equal across groups
## 
##  Bartlett test of homogeneity of variances
## 
## data:  score by condition
## Bartlett's K-squared = 1, df = 3, p-value = 0.8
par(cex=1.5,cex.lab=1.25,cex.main=2,mar=c(5,5,3,1)+0.1,mgp=c(3,1,0),mfrow=c(1,2))
qqnorm(residuals(aov.res.01),main=""); # graphical test of normality
qqline(residuals(aov.res.01)) 
boxplot(score~condition,theAOVdata) # graphical test of variance

par(op)

3.6 ANOVA questions

Use the data in acuityLong.df (created in Section 2.1) to answer the following questions. Make sure to use the sum-to-zero definition of effects ( options(contrasts=c("contr.sum","contr.poly")) ).

  1. Use the analysis of variance to evaluate the effect of age on acuity and list the ANOVA table.
summary(acuityLong.df) # check variable names
##  age         acuity     
##  m1:10   Min.   : 7.89  
##  m2:10   1st Qu.: 9.38  
##  m3:10   Median : 9.94  
##  m4:10   Mean   : 9.98  
##  m5:10   3rd Qu.:10.76  
##          Max.   :12.29
acuity.lm0 <- lm(acuity~age,data=acuityLong.df)
anova(acuity.lm0)
# library(knitr)
# kable(anova(acuity.lm0),col.names=c("df","SS","MS","F","p")) # print ANOVA table
  1. What null hypothesis is being evaluated by the \(F\) test?

The ANOVA table is shown above The null hypothesis is that the effects of age are all zero. Equivalently, we can say that the population means of acuity are equal across age

  1. Calculate omega-squared and Cohen’s \(f\) for condition. What do these values mean?
omega_squared(acuity.lm0) 
cohens_f(acuity.lm0)

Omega-squared is the proportion of variance in acuity that is accounted for, or explained, by age. Cohen’s \(f\) can be thought of as an estimate of the average, standardized difference between the group means and the grand mean.

  1. Check the normality and homogeneity of variance assumptions.
# check normality:
# shapiro.test(acuityLong.df$acuity) # not correct!
shapiro.test(residuals(acuity.lm0)) # correct (test is on residuals not raw scores)
## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(acuity.lm0)
## W = 0.97, p-value = 0.3
# qqnorm(residuals(acuity.lm0));qqline(residuals(acuity.lm0))

# check variance
bartlett.test(acuity~age,data=acuityLong.df)
## 
##  Bartlett test of homogeneity of variances
## 
## data:  acuity by age
## Bartlett's K-squared = 1.2, df = 4, p-value = 0.9
# boxplot(acuity~age,data=acuityLong.df)
  1. Show that the SS value for age is the change in the goodness-of-fit when the group effects in the full model are set to zero.
# create full and restricted models
lm.full <- acuity.lm0
lm.restricted <- lm(acuity ~ 1,data=acuityLong.df) # intercept only
sum(residuals(lm.restricted)^2) - sum(residuals(lm.full)^2)
## [1] 6.53
anova(lm.restricted,lm.full)
  1. Verify that MS-residuals in the full model equals the average within-group variance.
( error.sd <- sigma(lm.full) )# extract estimated error standard deviation from fitted model
## [1] 1
( error.var <- error.sd^2 )
## [1] 1
( grpVar <- with(acuityLong.df,tapply(acuity,age,var))) # within-group variance
##  m1  m2  m3  m4  m5 
## 1.2 0.8 1.0 0.7 1.3
mean(grpVar) #  average within-group variance
## [1] 1