1 Initialize R

Initialize R with the following commands.

options(contrasts=c("contr.sum","contr.poly")) # IMPORTANT!!
load(file=url('http://pnb.mcmaster.ca/bennett/psy710/datasets/hw3.rda'))

The options command sets up R to define ANOVA effects using the sum-to-zero constraint. The load command loads the data frames df0, df1, and df3.

Answer the questions in Section 3 and submit your answers as a script file on Avenue 2 Learn. Make sure to begin your script file with the following lines:

# PSYCH 710 Lab 3 Homework
# Script File
# SEP-2023
# Your Name: <<Your name here>>
# Student ID: <<Your ID here>>
# Collaborators: <<Names of your collaborators here>>

Also, make sure that text that is not an R command is preceded by a comment symbol (#). For example, you can insert questions or comments among your commands like this:

# The following command doesn't work... not sure why...
# ttest(x=g1,y=g2) # was trying to do a t test

2 Questions

The data frames df0 and df1 each contain two numeric variables, X and Y. Use df0 and df1 to answer questions 1-4. Unless stated otherwise, you may assume that the data are consistent with the assumptions of the t test, and that alpha is 0.05.

  1. Use a 2-sample (independent groups) t test to evaluate the hypothesis of no difference between X and Y means for both df0 and df1. Compute Cohen’s d for each t test.
library(effectsize)
with(df0,t.test(X,Y))
## 
##  Welch Two Sample t-test
## 
## data:  X and Y
## t = -1.6, df = 38, p-value = 0.1
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.402   1.402
## sample estimates:
## mean of x mean of y 
##       100       105
cohens_d(x="X",y="Y",data=df0,paired=FALSE)
## Cohen's d |        95% CI
## -------------------------
## -0.50     | [-1.13, 0.13]
## 
## - Estimated using pooled SD.
with(df1,t.test(X,Y))
## 
##  Welch Two Sample t-test
## 
## data:  X and Y
## t = -1.6, df = 38, p-value = 0.1
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.402   1.402
## sample estimates:
## mean of x mean of y 
##       100       105
cohens_d(x="X",y="Y",data=df1,paired=FALSE)
## Cohen's d |        95% CI
## -------------------------
## -0.50     | [-1.13, 0.13]
## 
## - Estimated using pooled SD.
  1. Use a paired-sample t test to evaluate the hypothesis of no difference between X and Y means for both df0 and df1. Compute Cohens d for each t test.
with(df0,t.test(X,Y,paired=T))
## 
##  Paired t-test
## 
## data:  X and Y
## t = -2.9, df = 19, p-value = 0.009
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  -8.625 -1.375
## sample estimates:
## mean difference 
##              -5
cohens_d(x="X",y="Y",data=df0,paired=TRUE)
## Cohen's d |         95% CI
## --------------------------
## -0.65     | [-1.12, -0.16]
with(df1,t.test(X,Y,paired=T))
## 
##  Paired t-test
## 
## data:  X and Y
## t = -1.7, df = 19, p-value = 0.1
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  -11.279   1.279
## sample estimates:
## mean difference 
##              -5
cohens_d(x="X",y="Y",data=df1,paired=TRUE)
## Cohen's d |        95% CI
## -------------------------
## -0.37     | [-0.82, 0.09]
  1. Were the effects of going from a 2-sample t test to a paired-sample t test the same for df0 and df1? Why or why not?
## No, the effect of using a paired-sample t test was much larger for df0 than df1. The reason the effect differed is because the correlation between X and Y was much greater in df0 (r-0.7) than df1 (r=0.1). Therefore the benefits of pairing -- which reduces within-group variance -- are much greater in df0 than df1.
cor(df0$X,df0$Y)
## [1] 0.7
cor(df1$X,df1$Y)
## [1] 0.1
  1. Perform an equivalence test to determine if the difference between the paired values X and Y in df0 is equivalent to zero. For this test, define the equivalence bounds as being between ±6. Your answer should include a description of the equivalence test’s null and alternative hypotheses.
# equivalence H0: true diff < LOWER.BOUND OR true diff > UPPER.BOUND
# equivalence H0: true diff < LOWER.BOUND OR true diff > UPPER.BOUND

LOWER.BOUND <- -6
UPPER.BOUND <- 6

with(df0,t.test(X,Y,paired=T,mu=0,conf.level=0.9))
## 
##  Paired t-test
## 
## data:  X and Y
## t = -2.9, df = 19, p-value = 0.009
## alternative hypothesis: true mean difference is not equal to 0
## 90 percent confidence interval:
##  -7.995 -2.005
## sample estimates:
## mean difference 
##              -5
# are edges of 90% CI in-between lower and upper bound?
-7.99 > LOWER.BOUND  # FALSE
## [1] FALSE
-2.005 < UPPER.BOUND # TRUE
## [1] TRUE

# TOST Lower Bound:
with(df0,t.test(X,Y,paired=T,mu=LOWER.BOUND,alternative="greater")) # p = 0.29
## 
##  Paired t-test
## 
## data:  X and Y
## t = 0.58, df = 19, p-value = 0.3
## alternative hypothesis: true mean difference is greater than -6
## 95 percent confidence interval:
##  -7.995    Inf
## sample estimates:
## mean difference 
##              -5
# TOST Upper Bound:
with(df0,t.test(X,Y,paired=T,mu=UPPER.BOUND,alternative="less")) # p < 0.001
## 
##  Paired t-test
## 
## data:  X and Y
## t = -6.4, df = 19, p-value = 2e-06
## alternative hypothesis: true mean difference is less than 6
## 95 percent confidence interval:
##    -Inf -2.005
## sample estimates:
## mean difference 
##              -5

# do not reject equivalence H0 (true diff < LOWER.BOUND) OR (true diff > UPPER.BOUND)
# EQUIVALENCE TESTS:
library(TOSTER)
t_TOST(x=df0$X,
       y=df0$Y,
       paired=TRUE,
       low_eqbound=LOWER.BOUND,
       high_eqbound=UPPER.BOUND,
       eqbound_type = "raw")
## 
## Paired t-test
## 
## The equivalence test was non-significant, t(19) = 0.58, p = 0.29
## The null hypothesis test was significant, t(19) = -2.89p < 0.01
## NHST: reject null significance hypothesis that the effect is equal to zero 
## TOST: don't reject null equivalence hypothesis
## 
## TOST Results 
##                  t df p.value
## t-test     -2.8868 19   0.009
## TOST Lower  0.5774 19   0.285
## TOST Upper -6.3509 19 < 0.001
## 
## Effect Sizes 
##               Estimate     SE               C.I. Conf. Level
## Raw            -5.0000 1.7321 [-7.9949, -2.0051]         0.9
## Hedges's g(z)  -0.6196 0.2441  [-1.0017, -0.223]         0.9
## Note: SMD confidence intervals are an approximation. See vignette("SMD_calcs").
## I do NOT reject both one-sided tests (alpha=.05), and therefore I do NOT reject the null hypothesis that the true difference between the two groups is less than -6 OR greater than +6.

The data frame df3 contains the results of a study that measured performance on a cognitive test in children in grades 1 through 5. The data frame contains a factor, grade, and a numeric variable, score. Use df3 to answer questions 5-8. Unless stated otherwise, you may assume that the data are consistent with the assumptions of the F test, and that alpha is 0.05.

  1. Create a figure that plots the mean score as a function of grade.
plot(x=1:5,
     y=with(df3,tapply(score,grade,mean)),
     type="b",
     ylim=c(90,106),
     xlab="Grade",
     ylab="Score")

  1. Conduct an ANOVA to examine the hypothesis that the means do not vary among grades. Your answer should include the ANOVA table, a description of the null and alternative hypotheses, and your conclusion regarding the null hypothesis.
aov.01 <- aov(score~grade,data=df3)
summary(aov.01)
##             Df Sum Sq Mean Sq F value Pr(>F)  
## grade        4   1361     340    2.26  0.072 .
## Residuals   70  10551     151                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## The null hypothesis is that the groups were selected from populations with equal means. The null hypothesis is that not all of the group means are equal. Alternatively, we could say that the null hypothesis is that all of the group effects (i.e., the alpha's estimated by ANOVA) are zero, and that the alternative is that they are not all equal to zero. Using an alpha of 0.05, the omnibus F test is not significant, and therefore we would not reject the null hypothesis.
  1. ANOVA assumes that the within-group variance is constant. Is that assumption reasonable for this case?
bartlett.test(score~grade,data=df3)
## 
##  Bartlett test of homogeneity of variances
## 
## data:  score by grade
## Bartlett's K-squared = 6.6, df = 4, p-value = 0.2
# The bartlett test was not significant, and therefore we do not reject the null hypothesis of equal group variances. Therefore, the homogeneity assumption does not appear to be unreasonable in this case.
  1. Regardless of how you answered question 7, evaluate the effect of grade on score using an alternative to ANOVA that does not assume that variance is constant across grades.
oneway.test(score~grade,data=df3)
## 
##  One-way analysis of means (not assuming equal variances)
## 
## data:  score and grade
## F = 2.8, num df = 4, denom df = 35, p-value = 0.04

# We reject the null hypothesis of no difference among group means.