1 Initialize R

options(contrasts=c("contr.sum","contr.poly") )
options(width = 70,digits=3) 
load(url("http://pnb.mcmaster.ca/bennett/psy710/datasets/goats.rda") )

2 Goats Data

The following text was taken from material that was posted on www.statlab.uni-heidelberg.de/data/ancova/goats.story.html.

Experiments were carried out on six commercial goat farms to determine whether the standard worm drenching program was adequate. Forty goats were used in each experiment. Twenty of these, chosen at random, were drenched according to the standard program, while the remaining twenty were drenched more frequently. The goats were individually tagged and weighed at the start and end of the year-long study. For the first farm in the study the resulting live weight gains, along with the initial live weights, are given for the first farm in the study. In each experiment the main interest was in the comparison of the live weight gains between the two treatments.

  1. Use linear regression to evaluate the linear relation between gain and wt:
goats.lm.01 <- lm(gain~wt,data=goats)
summary(goats.lm.01)
## 
## Call:
## lm(formula = gain ~ wt, data = goats)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -3.607 -1.207  0.163  1.054  3.871 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  14.3958     1.8505    7.78  2.2e-09 ***
## wt           -0.3540     0.0791   -4.48  6.7e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.72 on 38 degrees of freedom
## Multiple R-squared:  0.345,  Adjusted R-squared:  0.328 
## F-statistic: 20.1 on 1 and 38 DF,  p-value: 6.68e-05
  1. Plot the linear regression line and weight gain in a single figure.
# with(goats,plot(wt,gain,"p",xlab="wt",ylab="gain",cex.lab=2))
plot(gain~wt,goats,"p",xlab="wt",ylab="gain",ylim=c(0,12),cex.lab=2)
abline(goats.lm.01)

Answer: This is an important part of ANCOVA! We want to account for the effect of the covariate using linear regression, and therefore we should make sure linear regression is appropriate. For example, we should see if the association between the covariate and dependent variable is (approximately) linear. We could do this by plotting the residuals against the covariate: the covariate is linearly associated with the dependent variable, then the residuals not be associated with the covariate and a residual-vs.-covariate plot should not exhibit any obvious trends. Also, the spread of the residuals should be constant across values of the covariate. Finally, the residuals should be distributed normally. The residuals-vs-wt figure shown below suggests there is a non-linear trend in the residuals, but it is small and not significant. Also, there is no obvious change in the spread of the residuals across wt. Finally, the qq plot shows that the residuals are distributed normally. We should also see if there are outliers or so-called high-leverage points that are having a strong result on the regression: there do not seem to be any unusual points. So linear regression is reasonable in this case. You can read more about diagnostic plots for regression here and here. Residuals also can be visualized with the residualPlots command in the car package.

par(mfrow=c(1,2),cex=1)
plot(goats.lm.01,which=c(1,2),ask=FALSE) # see ?plot.lm for more information

par(mfrow=c(1,2),cex=1)
plot(x=goats$wt,y=residuals(goats.lm.01),type="p",xlab="wt",ylab="residsuals",ylim=c(-4,4))
abline(h=0,lt=2)

# regression on residuals:
x <- goats$wt
y <- residuals(goats.lm.01)
lm.01 <- lm(y~1+x+I(x^2)) # note quadratic/curved component
lines(sort(x),fitted(lm.01)[order(x)],col="red")

# www.sthda.com/english/articles/39-regression-model-diagnostics/161-linear-regression-assumptions-and-diagnostics-in-r-essentials/

# https://www.statmethods.net/stats/rdiagnostics.html

# check for linear & non-linear trends in residuals:
summary(lm.01)
## 
## Call:
## lm(formula = y ~ 1 + x + I(x^2))
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -3.359 -1.104  0.094  1.177  3.718 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)  14.1799    12.7217    1.11     0.27
## x            -1.2283     1.0932   -1.12     0.27
## I(x^2)        0.0260     0.0231    1.13     0.27
## 
## Residual standard error: 1.72 on 37 degrees of freedom
## Multiple R-squared:  0.0332, Adjusted R-squared:  -0.0191 
## F-statistic: 0.635 on 2 and 37 DF,  p-value: 0.536

# check normality
shapiro.test(residuals(goats.lm.01))
## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(goats.lm.01)
## W = 1, p-value = 0.8
qqnorm(residuals(goats.lm.01),main="");qqline(residuals(goats.lm.01))

  1. Plot the marginal means of treatment, and then use ANOVA to evaluate the effect of treatment.
par(mfrow=c(1,1))
boxplot(gain~treatment,data=goats)

# ANOVA
with(goats,tapply(gain,treatment,mean)) # group means
## intensive  standard 
##      6.85      5.55
options(contrasts=c("contr.sum","contr.poly") ) # make sure to use sum-to-zero effects
goats.lm.02 <- lm(gain~treatment,data=goats)
anova(goats.lm.02) # anova table
  1. Use ANCOVA to evaluate the effect of treatment after controlling for the linear association between weight gain and initial wt.
goats.lm.03 <- lm(gain~wt+treatment,data=goats) # ancova model
anova(goats.lm.03) # anova table
  1. List the coefficients of the ANCOVA model. What do the coefficients mean?
dummy.coef(goats.lm.03)
## Full coefficients are 
##                                   
## (Intercept):         14.3         
## wt:                -0.351         
## treatment:      intensive standard
##                     0.632   -0.632
alpha.intensive <- 0.632;
alpha.standard <- 0 - 0.632;

Answer: The parameters define 2 straight lines that relate weight gain to pre-treatment weight. The lines have the same slope, \(m=-0.351\). The intercepts of the lines are \(14.3 + 0.632\) for the intensive group and \(14.3 - 0.632\) for the standard group.

  1. Plot the data and best-fitting lines computed by the ANCOVA.
x.range <- range(goats$wt) # [min,max] for x-axis
y.range <- range(goats$gain) # [min,max] for y=axis
goats.intensive <- subset(goats,treatment=="intensive")
goats.standard <- subset(goats,treatment=="standard")
with(goats.intensive,
     plot(wt,gain,"p",pch=21,cex=2,xlim=x.range,ylim=y.range,xlab="wt",ylab="gain"))
with(goats.standard,points(wt,gain,pch=19)) # add points for standard group
abline(a=14.33+alpha.intensive,b=-0.351,lty=2) # line for intensive group
abline(a=14.33+alpha.standard,b=-0.351,lty=2) # line for standard group
abline(goats.lm.01) # overall regression line
# label lines:
text(x=28.5,y=5,"intensive") 
text(x=27,y=4,"standard")
text(x=28,y=4.5,"all points")
legend(x=26,y=10.5,legend=c("intensive","standard"),pch=c(21,19))

  1. Evaluate the homogeneity of slopes assumption.
goats.lm.04 <- lm(gain~wt+treatment+wt:treatment,data=goats)
anova(goats.lm.04)
# effects plot created with effects package
library(effects)
plot(effect(mod=goats.lm.04,term="wt:treatment"),
     lines=list(multiline=TRUE), 
     confint=list(style="auto"))

  1. Using the model with the interaction term, what are the values of the slopes of the best-fitting lines for the two treatment groups?
dummy.coef(goats.lm.04)
## Full coefficients are 
##                                    
## (Intercept):          14.4         
## wt:                 -0.353         
## treatment:       intensive standard
##                     0.0104  -0.0104
## wt:treatment:    intensive standard
##                     0.0269  -0.0269
beta <- - 0.353;
alpha.intensive <- 0.0104;
alpha.standard <- 0-0.0104;
delta.beta.intensive <- 0.02686
delta.beta.standard <- -0.02686
# slopes for each group are -0.326 and -0.380:
beta + c(delta.beta.intensive, delta.beta.standard)
## [1] -0.326 -0.380
  1. Calculate strength of association and effect size for treatment.
library(effectsize)
omega_squared(goats.lm.03,partial=FALSE)
omega_squared(goats.lm.03,partial=TRUE) # default
cohens_f(goats.lm.03,partial=TRUE)
  1. Calculate the means and the adjusted means of weight gain for each treatment group. What is the (conceptual) difference between these two sets of means? Is there a substantial numerical difference between the two sets of means? Why or why not?
with(goats,tapply(gain,treatment,mean)) # group means on gain
## intensive  standard 
##      6.85      5.55
# adjusted means:
library(effects)
effect(term="treatment",goats.lm.03)
## 
##  treatment effect
## treatment
## intensive  standard 
##      6.83      5.57
library(emmeans)
emmeans(goats.lm.03,specs=~treatment)
##  treatment emmean    SE df lower.CL upper.CL
##  intensive   6.83 0.362 37     6.10     7.57
##  standard    5.57 0.362 37     4.83     6.30
## 
## Confidence level used: 0.95
with(goats,tapply(wt,treatment,mean)) # group means on covariate
## intensive  standard 
##      23.1      23.2

Answer: Group means of gain include the effects of any differences in the covariate, whereas adjusted means are the group means after controlling for differences in the covariate. The two sets of means are nearly identical, which suggests that the distributions of the covariate are similar in the two groups.

summary(aov(wt~treatment,data=goats))
##             Df Sum Sq Mean Sq F value Pr(>F)
## treatment    1      0     0.1    0.01   0.93
## Residuals   38    475    12.5
boxplot(wt~treatment,data=goats)

  1. Evaluate the difference between adjusted-means in the two treatment groups. How do the results of this analysis compare to a \(t\) test on the non-adjusted group means?
t.test(gain~treatment,data=goats) # regular t test
## 
##  Welch Two Sample t-test
## 
## data:  gain by treatment
## t = 2, df = 38, p-value = 0.05
## alternative hypothesis: true difference in means between group intensive and group standard is not equal to 0
## 95 percent confidence interval:
##  0.00493 2.59507
## sample estimates:
## mean in group intensive  mean in group standard 
##                    6.85                    5.55
# comparison of adjusted group means
library(emmeans)
goat.emm <- emmeans(goats.lm.03,specs=~treatment)
pairs(goat.emm)
##  contrast             estimate    SE df t.ratio p.value
##  intensive - standard     1.26 0.512 37   2.472  0.0182
# there are only 2 treatment levels so ancova yields same result:
anova(goats.lm.03) # t^2 = 2.472^2 = F = 6.11

Answer: There are only two treatment groups, so the \(F\) test in the ANCOVA is equivalent to a pairwise comparison of the adjusted group means. These tests find that the effect size for treatment is larger than the effect size estimated in the \(t\) test. Why? Because the test on adjusted means is done after removing variation associated with the covariate, whereas the \(t\) includes that variation in the error/residuals term.