1 Initialize R

Initialize R with the following commands:

# Initialize R -------------------------------------

# I set the working directory to be the folder that includes this document
graphics.off() # clear all graphs
options(digits=6,width=80) # R console display parameters
options(contrasts=c("contr.sum","contr.poly") ) # set up coding scheme for factors:
op <- par(no.readonly = TRUE) # save copy of current parameters
# usually a good idea to CLEAR WORKSPACE MEMORY
# use command in RStudio "Session" menu
# load libraries into R:
library(effectsize)
library(pwr)
library(TOSTER) # optional

2 One-sample t test

Load the iq data file.

load(file=url("http://pnb.mcmaster.ca/bennett/psy710/datasets/iq.rda"))

The numeric variable iq contains IQ scores from 20 individuals. Graphically inspect the data: check for outliers, extreme skew, etc.

# Check distributions:
par(mfrow=c(1,2)); # create 2 graphs on 1 page
boxplot(iq); # box plot
qqnorm(iq);qqline(iq) # qq plot

The data were analyzed with the following command.

# The data were analyzed with the following command.
t.test(iq,mu=100) # do t test
  1. What are the null and alternative hypotheses that are being evaluated by this t test?
# What are the null and alternative hypotheses
# that are begin evaluated by this t test?
tres <- t.test(iq,mu=100) # do t test
print(tres$alternative)
## [1] "two.sided"

Answer: The null hypothesis (H0) is that the \(\mu=100\). The alternative hypothesis (H1) is \(\mu \ne 100\).

  1. Assuming alpha is 0.05, can we conclude that our sample came from a population with a mean that differed from 100?
t.test(iq,mu=100)
## 
##  One Sample t-test
## 
## data:  iq
## t = -1.874, df = 19, p-value = 0.0764
## alternative hypothesis: true mean is not equal to 100
## 95 percent confidence interval:
##   91.1297 100.4900
## sample estimates:
## mean of x 
##   95.8099

Answer: If my alpha is 0.05, then I do not reject H0. But I do NOT accept H0 because the failure to reject H0 could reflect low power.

  1. Use t.test to calculate the 95% and 90% confidence intervals for the population mean of iq. Which interval is narrower? Why?
# The t.test command lists the confidence interval with all of the other results.
# The next 2 commands list ONLY the confidence intervals:
t.test(iq,mu=100)$conf.int # default is 95%
## [1]  91.1297 100.4900
## attr(,"conf.level")
## [1] 0.95
t.test(iq,mu=100,conf.level=0.90)$conf.int # 90%
## [1] 91.9434 99.6763
## attr(,"conf.level")
## [1] 0.9

Answer: The 90% CI is narrower than the 95% CI, and because it is narrower we expect to contain the true population mean less often than the 95% CI. In fact, the 90% CI will (in the long run) contain the true value only 90% of the time instead of 95% of the time. A 90% CI is said to have lower coverage than a 95% CI.

  1. Calculate the mean and standard deviation of iq. How many standard deviations is the mean away from 100? Next, read the help page for cohens_d and then use that command to compute Cohens d, a common measure of effect size, for iq (assuming that the true value of the population mean is 100).
( mean(iq) )
## [1] 95.8099
( sd(iq) )
## [1] 10
( ( mean(iq)-100 ) / sd(iq) )
## [1] -0.419014
# library(effectsize)
# help(package="effectsize")
# ?cohens_d
cohens_d(iq,mu=100) # need to specify the H0 mean
## Cohen's d |        95% CI
## -------------------------
## -0.42     | [-0.87, 0.04]
## 
## - Deviation from a difference of 100.
# The effect size d = 0.42 

Answer: Our observed mean (95.81) is 0.42 standard deviations below the hypothesized mean (\(\mu=100\)). This value is the same as that returned by cohens_d.

  1. Statistical power refers to the probability of correctly rejecting a false null hypothesis. Assuming that our sample of 20 was drawn from a population with a true mean of 95 and a standard deviation of 10, use the command power.t.test to calculate the power of our t test to reject the false null hypothesis \(\mu=100\). (You may want to read the help page for power.t.test.)
# ?power.t.test # see help page
n <- 20 # number of subjects
power.t.test(n=n,delta=(100-95),sd=10,type="one.sample",alternative="two.sided")
## 
##      One-sample t test power calculation 
## 
##               n = 20
##           delta = 5
##              sd = 10
##       sig.level = 0.05
##           power = 0.564483
##     alternative = two.sided
# power = 0.56

# you can get the same result using Cohen's d:
(100-95)/10 # we are assuming an effect size Cohen's d = 1
## [1] 0.5
# recalculate power using Cohen's d:
power.t.test(n=n,delta=0.5,sd=1,type="one.sample",alternative="two.sided")
## 
##      One-sample t test power calculation 
## 
##               n = 20
##           delta = 0.5
##              sd = 1
##       sig.level = 0.05
##           power = 0.564483
##     alternative = two.sided
# power is 0.56 assuming Cohen's d is 0.5

5b. Now use power.t.test to calculate the sample size that we would need to attain a power of 0.8.

# specify power=0.8 and set n to NULL to
# calculate sample size need for power = 0.8
power.t.test(n=NULL,power=0.8,delta=(100-95),sd=10,type="one.sample",alternative="two.sided")
## 
##      One-sample t test power calculation 
## 
##               n = 33.3672
##           delta = 5
##              sd = 10
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided
# repeat with Cohen's d
power.t.test(n=NULL,power=0.8,delta=0.5,sd=1,type="one.sample",alternative="two.sided")
## 
##      One-sample t test power calculation 
## 
##               n = 33.3672
##           delta = 0.5
##              sd = 1
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided
# for power = 0.8, n = 34
  1. Use a t test to determine if the sample was drawn from a population with a mean that is less than 100.
# H0: mu >= 100; H1: mu < 100
t.test(iq,mu=100,alternative="less") # NOTE THE CONFIDENCE INTERVAL!
## 
##  One Sample t-test
## 
## data:  iq
## t = -1.874, df = 19, p-value = 0.0382
## alternative hypothesis: true mean is less than 100
## 95 percent confidence interval:
##     -Inf 99.6763
## sample estimates:
## mean of x 
##   95.8099

3 One-sample equivalence test

An equivalence test is used to determine if an effect falls between pre-defined lower and upper bounds. IQ tests are designed to have a population mean of 100 and a standard deviation of 15. Let’s assume that the smallest difference of interest is 7 (i.e., approximately 1/2 of a standard deviation). In other words, we will assume that any population mean 93 and 107 is essentially equivalent to a mean of 100.

  1. Use two one-sided tests (TOST) to evaluate the null hypothesis that the population mean is less than 93 OR greater than 107. Set alpha to 0.05 for these tests. Based on those tests, do you reject the null hypothesis that our sample was selected from a population with a mean that is equivalent to 100?
LOWER.BOUND <- 93
UPPER.BOUND <- 107
# t.test(x=iq,mu=100,alternative="two.sided") # not significantly different from 100
t.test(x=iq,mu=UPPER.BOUND,alternative="less") # significantly less than 107
## 
##  One Sample t-test
## 
## data:  iq
## t = -5.004, df = 19, p-value = 3.94e-05
## alternative hypothesis: true mean is less than 107
## 95 percent confidence interval:
##     -Inf 99.6763
## sample estimates:
## mean of x 
##   95.8099
t.test(x=iq,mu=LOWER.BOUND,alternative="greater") # not significantly greater than 93
## 
##  One Sample t-test
## 
## data:  iq
## t = 1.257, df = 19, p-value = 0.112
## alternative hypothesis: true mean is greater than 93
## 95 percent confidence interval:
##  91.9434     Inf
## sample estimates:
## mean of x 
##   95.8099

Answer: For the equivalence test, the null hypothesis is that \(\mu \le 93\) OR \(\mu \ge 107\). Our 1-tailed \(t\) for the upper bound of the equivalence zone was significant: we rejected the null hypothesis \(\mu \ge 107\) in favor of the alternative hypothesis \(\mu < 107\). However, the 1-tailed test on the lower bound was not significant: we failed to reject the null hypothesis \(\mu \le 93\). Because only one 1-tailed test was significant, we do not reject the null hypothesis that the population mean is not equivalent to 100. When reporting the statistical result, it is standard practice to report the results from the test with larger \(p\) value. In this case, we would say that the equivalence test was not significant \(t(19) =1.257\), \(p=0.11\).

We also can perform the TOST procedure using t_TOST in the TOSTER package.

# install.packages("TOSTER")
# library(TOSTER)
# ?t_TOST
#       low_eqbound=LOWER.BOUND,
#       high_eqbound=UPPER.BOUND,

LOWER.BOUND <- 93
UPPER.BOUND <- 107
t_TOST(x=iq,
       eqb=c(LOWER.BOUND,UPPER.BOUND),
       mu=100)
## 
## One Sample t-test
## 
## The equivalence test was non-significant, t(19) = 1.26, p = 0.11
## The null hypothesis test was non-significant, t(19) = -1.87p = 0.08
## NHST: don't reject null significance hypothesis that the effect is equal to 100 
## TOST: don't reject null equivalence hypothesis
## 
## TOST Results 
##                 t df p.value
## t-test     -1.874 19   0.076
## TOST Lower  1.257 19   0.112
## TOST Upper -5.004 19 < 0.001
## 
## Effect Sizes 
##            Estimate     SE               C.I. Conf. Level
## Raw         -4.1901 2.2361 [91.9434, 99.6763]         0.9
## Hedges's g  -0.4022 0.2325  [-0.766, -0.0284]         0.9
## Note: SMD confidence intervals are an approximation. See vignette("SMD_calcs").
  1. Use a two-tailed t test to evaluate the null hypothesis that the population mean IQ equals 100. However, for this this test I want you to set the parameter conf.level to 0.90 (rather than the default value of 0.95). This parameter sets the width of the confidence interval returned by t.test. What is the value of the confidence interval, and how does it compare to the equivalence test performed in the previous question?
t.test(iq,mu=100,conf.level=0.90)
## 
##  One Sample t-test
## 
## data:  iq
## t = -1.874, df = 19, p-value = 0.0764
## alternative hypothesis: true mean is not equal to 100
## 90 percent confidence interval:
##  91.9434 99.6763
## sample estimates:
## mean of x 
##   95.8099

The boundaries of the two-sided 90% CI equal the boundaries of the two one-sided CIs calculated in the previous question. We can use this CI to perform the equivalence test: if the 90% CI lies within the equivalence region, then we reject the null hypothesis that the mean differs from zero in favour of the alternative hypothesis that the mean is equivalent to zero. In this case we would not reject the null hypothesis that the population mean is equivalent to 100, \(p > .05\).

4 Two-sample t test & equivalence test

Load the vitcap data file.

load(file=url("http://pnb.mcmaster.ca/bennett/psy710/datasets/vitcap.rda"))

The data frame vitcap contains two numeric variables, control and exposed, each containing 75 values. The data are from an experiment that measured so-called vital capacity (a measure of lung volume) in liters in a group of workers exposed to cadmium and an age-matched control group. Use box plots to graphically inspect each variable for outliers, extreme skew, etc.

# look for outliers & skew:
with(vitcap,boxplot(control,exposed,names=c("control","exposed")))

  1. Use a two-sample t test to evaluate the null hypothesis that the population means for the two groups are equal.
# The data were analyzed with the following command.
t.test(x=vitcap$control,y=vitcap$exposed,mu=0) # 2-tailed t test is default
## 
##  Welch Two Sample t-test
## 
## data:  vitcap$control and vitcap$exposed
## t = 1.573, df = 131.9, p-value = 0.118
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.0621838  0.5452014
## sample estimates:
## mean of x mean of y 
##   5.04252   4.80101
  1. Use an equivalence test to evaluate the null hypothesis that the true difference between populations means is less than -0.5 OR greater than +0.5. Using an alpha of 0.05, what is your decision about this null hypothesis.
UPPER.BOUND <- 0.5
LOWER.BOUND <- -0.5
# is difference (y-x) less than upper bound?
t.test(x=vitcap$control,y=vitcap$exposed,mu=UPPER.BOUND,alternative="less") 
## 
##  Welch Two Sample t-test
## 
## data:  vitcap$control and vitcap$exposed
## t = -1.684, df = 131.9, p-value = 0.0473
## alternative hypothesis: true difference in means is less than 0.5
## 95 percent confidence interval:
##      -Inf 0.495823
## sample estimates:
## mean of x mean of y 
##   5.04252   4.80101

# is difference (y-x) greater than lower bound?
t.test(x=vitcap$control,y=vitcap$exposed,mu=LOWER.BOUND,alternative="greater") 
## 
##  Welch Two Sample t-test
## 
## data:  vitcap$control and vitcap$exposed
## t = 4.83, df = 131.9, p-value = 1.86e-06
## alternative hypothesis: true difference in means is greater than -0.5
## 95 percent confidence interval:
##  -0.0128058        Inf
## sample estimates:
## mean of x mean of y 
##   5.04252   4.80101

# alternative method with 90% CI
t.test(x=vitcap$control,y=vitcap$exposed,mu=0,conf.level=0.90) # do t test
## 
##  Welch Two Sample t-test
## 
## data:  vitcap$control and vitcap$exposed
## t = 1.573, df = 131.9, p-value = 0.118
## alternative hypothesis: true difference in means is not equal to 0
## 90 percent confidence interval:
##  -0.0128058  0.4958233
## sample estimates:
## mean of x mean of y 
##   5.04252   4.80101

Answer: Using the TOST procedure, I used two 1-tailed tests to i) reject the null hypothesis \(\mu_d \le -0.5\) in favor of the alternative \(\mu_d > -0.5\); and ii) reject the null hypothesis \(\mu_d \ge 0.5\) in favor of the alternative \(\mu_d < 0.5\). Because both tests were significant, we reject the hypothesis of non-equivalence \(t(131.9)=-1.684\), \(p=0.047\), in favor of the alternative that the means are equivalent \(-0.5 < \mu_d < 0.5\). The same result is obtained with the second method: the boundaries of the two-tailed 90% CI, [-0.0128, 0.495], are within the equivalence zone, \(\pm 0.5\), so we reject the hypothesis of non-equivalence, \(p<.05\).