options(contrasts=c("contr.sum","contr.poly") )
options(width = 70,digits=3)
load(url("http://pnb.mcmaster.ca/bennett/psy710/datasets/goats.rda") )
The following text was taken from material that was posted on www.statlab.uni-heidelberg.de/data/ancova/goats.story.html.
Experiments were carried out on six commercial goat farms to determine whether the standard worm drenching program was adequate. Forty goats were used in each experiment. Twenty of these, chosen at random, were drenched according to the standard program, while the remaining twenty were drenched more frequently. The goats were individually tagged and weighed at the start and end of the year-long study. For the first farm in the study the resulting live weight gains, along with the initial live weights, are given for the first farm in the study. In each experiment the main interest was in the comparison of the live weight gains between the two treatments.
gain
and wt
:goats.lm.01 <- lm(gain~wt,data=goats)
summary(goats.lm.01)
##
## Call:
## lm(formula = gain ~ wt, data = goats)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.607 -1.207 0.163 1.054 3.871
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 14.3958 1.8505 7.78 2.2e-09 ***
## wt -0.3540 0.0791 -4.48 6.7e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.72 on 38 degrees of freedom
## Multiple R-squared: 0.345, Adjusted R-squared: 0.328
## F-statistic: 20.1 on 1 and 38 DF, p-value: 6.68e-05
# with(goats,plot(wt,gain,"p",xlab="wt",ylab="gain",cex.lab=2))
plot(gain~wt,goats,"p",xlab="wt",ylab="gain",ylim=c(0,12),cex.lab=2)
abline(goats.lm.01)
Answer: This is an important part of ANCOVA! We want
to account for the effect of the covariate using linear
regression, and therefore we should make sure linear regression is
appropriate. For example, we should see if the association between the
covariate and dependent variable is (approximately) linear. We could do
this by plotting the residuals against the covariate: the covariate is
linearly associated with the dependent variable, then the residuals not
be associated with the covariate and a residual-vs.-covariate plot
should not exhibit any obvious trends. Also, the spread of the residuals
should be constant across values of the covariate. Finally, the
residuals should be distributed normally. The residuals-vs-wt figure
shown below suggests there is a non-linear trend in the residuals, but
it is small and not significant. Also, there is no obvious change in the
spread of the residuals across wt
. Finally, the qq plot
shows that the residuals are distributed normally. We should also see if
there are outliers or so-called high-leverage points that are having a
strong result on the regression: there do not seem to be any unusual
points. So linear regression is reasonable in this case. You can read
more about diagnostic plots for regression here
and here.
Residuals also can be visualized with the residualPlots
command in the car
package.
par(mfrow=c(1,2),cex=1)
plot(goats.lm.01,which=c(1,2),ask=FALSE) # see ?plot.lm for more information
par(mfrow=c(1,2),cex=1)
plot(x=goats$wt,y=residuals(goats.lm.01),type="p",xlab="wt",ylab="residsuals",ylim=c(-4,4))
abline(h=0,lt=2)
# regression on residuals:
x <- goats$wt
y <- residuals(goats.lm.01)
lm.01 <- lm(y~1+x+I(x^2)) # note quadratic/curved component
lines(sort(x),fitted(lm.01)[order(x)],col="red")
# www.sthda.com/english/articles/39-regression-model-diagnostics/161-linear-regression-assumptions-and-diagnostics-in-r-essentials/
# https://www.statmethods.net/stats/rdiagnostics.html
# check for linear & non-linear trends in residuals:
summary(lm.01)
##
## Call:
## lm(formula = y ~ 1 + x + I(x^2))
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.359 -1.104 0.094 1.177 3.718
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 14.1799 12.7217 1.11 0.27
## x -1.2283 1.0932 -1.12 0.27
## I(x^2) 0.0260 0.0231 1.13 0.27
##
## Residual standard error: 1.72 on 37 degrees of freedom
## Multiple R-squared: 0.0332, Adjusted R-squared: -0.0191
## F-statistic: 0.635 on 2 and 37 DF, p-value: 0.536
# check normality
shapiro.test(residuals(goats.lm.01))
##
## Shapiro-Wilk normality test
##
## data: residuals(goats.lm.01)
## W = 1, p-value = 0.8
qqnorm(residuals(goats.lm.01),main="");qqline(residuals(goats.lm.01))
treatment
, and then use
ANOVA to evaluate the effect of
treatment
.par(mfrow=c(1,1))
boxplot(gain~treatment,data=goats)
# ANOVA
with(goats,tapply(gain,treatment,mean)) # group means
## intensive standard
## 6.85 5.55
options(contrasts=c("contr.sum","contr.poly") ) # make sure to use sum-to-zero effects
goats.lm.02 <- lm(gain~treatment,data=goats)
anova(goats.lm.02) # anova table
treatment
after controlling for the linear association
between weight gain and initial wt
.goats.lm.03 <- lm(gain~wt+treatment,data=goats) # ancova model
anova(goats.lm.03) # anova table
dummy.coef(goats.lm.03)
## Full coefficients are
##
## (Intercept): 14.3
## wt: -0.351
## treatment: intensive standard
## 0.632 -0.632
alpha.intensive <- 0.632;
alpha.standard <- 0 - 0.632;
Answer: The parameters define 2 straight lines that relate weight gain to pre-treatment weight. The lines have the same slope, \(m=-0.351\). The intercepts of the lines are \(14.3 + 0.632\) for the intensive group and \(14.3 - 0.632\) for the standard group.
x.range <- range(goats$wt) # [min,max] for x-axis
y.range <- range(goats$gain) # [min,max] for y=axis
goats.intensive <- subset(goats,treatment=="intensive")
goats.standard <- subset(goats,treatment=="standard")
with(goats.intensive,
plot(wt,gain,"p",pch=21,cex=2,xlim=x.range,ylim=y.range,xlab="wt",ylab="gain"))
with(goats.standard,points(wt,gain,pch=19)) # add points for standard group
abline(a=14.33+alpha.intensive,b=-0.351,lty=2) # line for intensive group
abline(a=14.33+alpha.standard,b=-0.351,lty=2) # line for standard group
abline(goats.lm.01) # overall regression line
# label lines:
text(x=28.5,y=5,"intensive")
text(x=27,y=4,"standard")
text(x=28,y=4.5,"all points")
legend(x=26,y=10.5,legend=c("intensive","standard"),pch=c(21,19))
goats.lm.04 <- lm(gain~wt+treatment+wt:treatment,data=goats)
anova(goats.lm.04)
# effects plot created with effects package
library(effects)
plot(effect(mod=goats.lm.04,term="wt:treatment"),
lines=list(multiline=TRUE),
confint=list(style="auto"))
dummy.coef(goats.lm.04)
## Full coefficients are
##
## (Intercept): 14.4
## wt: -0.353
## treatment: intensive standard
## 0.0104 -0.0104
## wt:treatment: intensive standard
## 0.0269 -0.0269
beta <- - 0.353;
alpha.intensive <- 0.0104;
alpha.standard <- 0-0.0104;
delta.beta.intensive <- 0.02686
delta.beta.standard <- -0.02686
# slopes for each group are -0.326 and -0.380:
beta + c(delta.beta.intensive, delta.beta.standard)
## [1] -0.326 -0.380
treatment
.library(effectsize)
omega_squared(goats.lm.03,partial=FALSE)
omega_squared(goats.lm.03,partial=TRUE) # default
cohens_f(goats.lm.03,partial=TRUE)
with(goats,tapply(gain,treatment,mean)) # group means on gain
## intensive standard
## 6.85 5.55
# adjusted means:
library(effects)
effect(term="treatment",goats.lm.03)
##
## treatment effect
## treatment
## intensive standard
## 6.83 5.57
library(emmeans)
emmeans(goats.lm.03,specs=~treatment)
## treatment emmean SE df lower.CL upper.CL
## intensive 6.83 0.362 37 6.10 7.57
## standard 5.57 0.362 37 4.83 6.30
##
## Confidence level used: 0.95
with(goats,tapply(wt,treatment,mean)) # group means on covariate
## intensive standard
## 23.1 23.2
Answer: Group means of gain
include the
effects of any differences in the covariate, whereas adjusted means are
the group means after controlling for differences in the
covariate. The two sets of means are nearly identical, which suggests
that the distributions of the covariate are similar in the two
groups.
summary(aov(wt~treatment,data=goats))
## Df Sum Sq Mean Sq F value Pr(>F)
## treatment 1 0 0.1 0.01 0.93
## Residuals 38 475 12.5
boxplot(wt~treatment,data=goats)
t.test(gain~treatment,data=goats) # regular t test
##
## Welch Two Sample t-test
##
## data: gain by treatment
## t = 2, df = 38, p-value = 0.05
## alternative hypothesis: true difference in means between group intensive and group standard is not equal to 0
## 95 percent confidence interval:
## 0.00493 2.59507
## sample estimates:
## mean in group intensive mean in group standard
## 6.85 5.55
# comparison of adjusted group means
library(emmeans)
goat.emm <- emmeans(goats.lm.03,specs=~treatment)
pairs(goat.emm)
## contrast estimate SE df t.ratio p.value
## intensive - standard 1.26 0.512 37 2.472 0.0182
# there are only 2 treatment levels so ancova yields same result:
anova(goats.lm.03) # t^2 = 2.472^2 = F = 6.11
Answer: There are only two treatment groups, so the
\(F\) test in the ANCOVA is equivalent
to a pairwise comparison of the adjusted group means. These tests find
that the effect size for treatment
is larger than the
effect size estimated in the \(t\)
test. Why? Because the test on adjusted means is done after removing
variation associated with the covariate, whereas the \(t\) includes that variation in the
error/residuals term.