Trend Analysis vs Regression

The following code compares trend analysis and regression. The data frame trendsData.rda contains data from a hypothetical experiment that measured performance (score) in infants ranging from 1 to 5 months of age. Age is contained in the numeric variable age, and it is rounded to the nearest the nearest month in age.months. Age also is represented by the ordered-factor age.group and the un-ordered factor group.

load(file=url('http://pnb.mcmaster.ca/bennett/psy710/trends/trendsData.rda'))
summary(data.df)

##      score             age         age.group   age.months group  
##  Min.   : 69.31   Min.   :0.6885   a1:10     Min.   :1    g1:10  
##  1st Qu.: 94.83   1st Qu.:1.9441   a2:10     1st Qu.:2    g2:10  
##  Median : 99.65   Median :2.9784   a3:10     Median :3    g3:10  
##  Mean   :100.81   Mean   :3.0023   a4:10     Mean   :3    g4:10  
##  3rd Qu.:107.20   3rd Qu.:4.1805   a5:10     3rd Qu.:4    g5:10  
##  Max.   :132.95   Max.   :5.1608             Max.   :5

The following code was used to create Figure 1. The filled and open symbols represent group means and individual scores, respectively. The solid line is the linear regression line, which indicates that there is a slight upward linear trend of score across age. But the group means also exhibit a clear curvilinear trend around the regression line.

gmeans <- with(data.df,tapply(score,age.group,mean))
age.levels <- sort(unique(data.df$age.group))
plot(score~age,data=data.df,type="p",xlab="age (months)",ylab="scores")
points(x=age.levels,y=gmeans,"p",pch=19,cex=2)
#lines(x=age.levels,y=gmeans,"l")
abline(lm(score~age,data=data.df))

Figure 1. Score plotted as a function of age.The black points represent the group means.

These observations can be evaluated quantitatively with a trend analysis, which uses the ordered factor age.group as the grouping variable.

options(contrasts=c("contr.sum","contr.poly")) # set up anova contrasts
options(digits=3) # set up anova contrasts
class(data.df$age.group)
## [1] "ordered" "factor"
contrasts(data.df$age.group) # age.group is coded with polynomial trends
##             .L     .Q        .C     ^4
## [1,] -6.32e-01  0.535 -3.16e-01  0.120
## [2,] -3.16e-01 -0.267  6.32e-01 -0.478
## [3,] -3.29e-17 -0.535  9.64e-17  0.717
## [4,]  3.16e-01 -0.267 -6.32e-01 -0.478
## [5,]  6.32e-01  0.535  3.16e-01  0.120
aov01 <- aov(score~age.group,data=data.df)
summary(aov01,split=list(age.group=list(lin=1,quad=2,cubic=3, quart=4)))
##                    Df Sum Sq Mean Sq F value Pr(>F)   
## age.group           4   1145     286    3.62 0.0122 * 
##   age.group: lin    1    348     348    4.41 0.0414 * 
##   age.group: quad   1    630     630    7.97 0.0071 **
##   age.group: cubic  1     30      30    0.38 0.5420   
##   age.group: quart  1    136     136    1.72 0.1960   
## Residuals          45   3558      79                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The trends also could be analyzed with emmeans:

require(emmeans)
lin <- contr.poly(n=5,scores=sort(unique(data.df$age.months)))[,1]
quad <- contr.poly(n=5,scores=sort(unique(data.df$age.months)))[,2]
cubic <- contr.poly(n=5,scores=sort(unique(data.df$age.months)))[,3]
quart <- contr.poly(n=5,scores=sort(unique(data.df$age.months)))[,4]
contrast.list <- list(L=lin,Q=quad,Cube=cubic,Quart=quart) # load w into list
groupMeans <- lsmeans(aov01,"age.group") # get & store group means
contrast(groupMeans,contrast.list,adjust="none") # do comparison
##  contrast estimate   SE df t.ratio p.value
##  L            5.90 2.81 45   2.099  0.0410
##  Q           -7.94 2.81 45  -2.823  0.0070
##  Cube        -1.73 2.81 45  -0.615  0.5420
##  Quart        3.69 2.81 45   1.312  0.1960

Next, we compare the results to a linear regression model. The value for the sum-of-squares associated with the linear trend is the same as the values obtained with aov and lsmeans. However, the F and p values differ from those obtained previously.

aovLinear <- aov(score~age.months,data=data.df)
summary(aovLinear)
##             Df Sum Sq Mean Sq F value Pr(>F)  
## age.months   1    348     348    3.84  0.056 .
## Residuals   48   4354      91                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The reason for this difference is that the sum-of-squares and degrees-of-freedom for the residuals differ from the values obtained previously. Why? Because the previous models contained non-linear trends that captured some of the variation in the dependent variable, and therefore resulted in a smaller SS residuals. If these non-linear terms are incorporated into a fourth-order polynomial model, the regression analysis yields the same results as before.

aovPoly4 <- aov(score~age.months+I(age.months^2)+I(age.months^3)+I(age.months^4),data=data.df)
summary(aovPoly4)

##                 Df Sum Sq Mean Sq F value Pr(>F)   
## age.months       1    348     348    4.41 0.0414 * 
## I(age.months^2)  1    630     630    7.97 0.0071 **
## I(age.months^3)  1     30      30    0.38 0.5420   
## I(age.months^4)  1    136     136    1.72 0.1960   
## Residuals       45   3558      79                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

It turns out that the cubic and quartic terms capture almost no variance, so we can delete these components of the model and get nearly the same results.

aovPoly2 <- lm(score~age.months+I(age.months^2),data=data.df)
summary(aovPoly2)
## 
## Call:
## lm(formula = score ~ age.months + I(age.months^2), data = data.df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -23.524  -6.295   0.846   5.394  27.899 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       80.359      6.037   13.31   <2e-16 ***
## age.months        14.595      4.601    3.17   0.0027 ** 
## I(age.months^2)   -2.121      0.752   -2.82   0.0070 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.9 on 47 degrees of freedom
## Multiple R-squared:  0.208,  Adjusted R-squared:  0.174 
## F-statistic: 6.17 on 2 and 47 DF,  p-value: 0.00416
anova(aovPoly2,aovPoly4) # compare 2 regression models

Trend Analysis vs Regression

PJB

23/09/2023