Last updated: 2022-10-14 10:06:48
In the context of linear statistical models, marginality refers to the fact that some effects are marginal to others. For polynomial models linear terms (e.g., x) are marginal to quadratic terms (e.g., x^2), which are marginal to cubic terms (e.g., x^3), and so forth. In analysis of variance models, main effects (e.g., A and B) are marginal to interactions that contain those terms (e.g., A x B, and A x B x C). The so-called principle of marginality holds linear models that contain a higher order term also should include all terms that are marginal to it. For example, if a model contains x^2, then it should also include x, and a model that contains an AxB interaction should also include the main effects A and B. Models that do not obey the principle of marginality have some problematic properties, one of which is illustrated below.
First we create two linear models that obey the marginality principle.
knitr::opts_chunk$set(collapse = TRUE)
options(contrasts=c('contr.sum','contr.poly'))
Create x and y values:
set.seed(81201)
x <- seq(-10,10,.25)
y <- 2*x + 0.25*(x^2) + rnorm(length(x),0,10)
Fit a polynomial model:
lm01 <- lm(y~1+x+I(x^2))
yFit <- predict(lm01)
# #+ test-b , fig.width=5, fig.height=5
# par(mar = c(4, 4, .1, .1));
# plot(y~x,cex.lab=1.5,cex.axis=1.25)
# lines(yFit~x)
Now linearly-transform x to create a new variable:
a <- 4+0.5*x # linear transformation of x
lm02 <- lm(y~1+a+I(a^2))
yFit2 <- predict(lm02)
par(mar = c(4, 4, .1, .1),mfrow=c(1,2));
plot(y~x,cex.lab=1.5,cex.axis=1.25,xlab='X',ylab='Y')
lines(yFit~x)
plot(y~a,cex.lab=1.5,cex.axis=1.25,xlab='A',ylab='Y')
lines(yFit2~a)
The two linear models fit the data equally well. This observation is confirmed quantitatively by examining the regression and anova tables.
summary(lm01)
##
## Call:
## lm(formula = y ~ 1 + x + I(x^2))
##
## Residuals:
## Min 1Q Median 3Q Max
## -25.1658 -8.0243 -0.8025 8.6979 27.3940
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.01108 1.86267 0.006 0.995
## x 2.23895 0.21242 10.540 < 2e-16 ***
## I(x^2) 0.29643 0.04064 7.294 2.16e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 11.17 on 78 degrees of freedom
## Multiple R-squared: 0.6781, Adjusted R-squared: 0.6698
## F-statistic: 82.15 on 2 and 78 DF, p-value: < 2.2e-16
summary(lm02)
##
## Call:
## lm(formula = y ~ 1 + a + I(a^2))
##
## Residuals:
## Min 1Q Median 3Q Max
## -25.1658 -8.0243 -0.8025 8.6979 27.3940
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.0710 2.4288 0.441 0.660478
## a -5.0078 1.3681 -3.661 0.000456 ***
## I(a^2) 1.1857 0.1626 7.294 2.16e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 11.17 on 78 degrees of freedom
## Multiple R-squared: 0.6781, Adjusted R-squared: 0.6698
## F-statistic: 82.15 on 2 and 78 DF, p-value: < 2.2e-16
anova(lm01)
## Analysis of Variance Table
##
## Response: y
## Df Sum Sq Mean Sq F value Pr(>F)
## x 1 13873.2 13873.2 111.100 < 2.2e-16 ***
## I(x^2) 1 6643.9 6643.9 53.206 2.159e-10 ***
## Residuals 78 9740.0 124.9
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(lm02)
## Analysis of Variance Table
##
## Response: y
## Df Sum Sq Mean Sq F value Pr(>F)
## a 1 13873.2 13873.2 111.100 < 2.2e-16 ***
## I(a^2) 1 6643.9 6643.9 53.206 2.159e-10 ***
## Residuals 78 9740.0 124.9
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The bottom line is that a linear transformation of the predictor varaible (i.e., transforming x into a) alters the values of the coefficients of the best-fitting models but does not alter the goodness-of-fit.
In this section, we examine what happens when we violate marginality by using models that include a quadratic term but not a linear term.
lm01B <- lm(y~1+I(x^2))
lm02B <- lm(y~1+I(a^2))
yFit1B <- predict(lm01B)
yFit2B <- predict(lm02B)
par(mar = c(4, 4, .1, .1),mfrow=c(1,2));
plot(y~x,cex.lab=1.5,cex.axis=1.25,xlab='X',ylab='Y')
lines(yFit1B~x)
plot(y~a,cex.lab=1.5,cex.axis=1.25,xlab='A',ylab='Y')
lines(yFit2B~a)
The graphs show that the data are not fit equally well by the two models. This observation is confirmed quantitatively by examining the regression and ANOVA tables.
summary(lm01B)
##
## Call:
## lm(formula = y ~ 1 + I(x^2))
##
## Residuals:
## Min 1Q Median 3Q Max
## -34.752 -13.062 2.539 13.779 32.624
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.01108 2.88182 0.004 0.997
## I(x^2) 0.29643 0.06287 4.715 1.02e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 17.29 on 79 degrees of freedom
## Multiple R-squared: 0.2196, Adjusted R-squared: 0.2097
## F-statistic: 22.23 on 1 and 79 DF, p-value: 1.025e-05
summary(lm02B)
##
## Call:
## lm(formula = y ~ 1 + I(a^2))
##
## Residuals:
## Min 1Q Median 3Q Max
## -29.9279 -8.3211 -0.1873 8.0349 26.5127
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -5.0792 1.8866 -2.692 0.00866 **
## I(a^2) 0.6201 0.0543 11.421 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 12.02 on 79 degrees of freedom
## Multiple R-squared: 0.6228, Adjusted R-squared: 0.618
## F-statistic: 130.4 on 1 and 79 DF, p-value: < 2.2e-16
anova(lm01B)
## Analysis of Variance Table
##
## Response: y
## Df Sum Sq Mean Sq F value Pr(>F)
## I(x^2) 1 6643.9 6643.9 22.228 1.025e-05 ***
## Residuals 79 23613.2 298.9
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(lm02B)
## Analysis of Variance Table
##
## Response: y
## Df Sum Sq Mean Sq F value Pr(>F)
## I(a^2) 1 18844 18843.9 130.43 < 2.2e-16 ***
## Residuals 79 11413 144.5
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Thus, a linear transformation of the predictor variable can alter the goodness-of-fit when the model violates marginality. This is not a good outcome, because it means that the result of our analysis depends strongly on our more-or-less arbitrary choice of the units used to express the predictor variable (e.g., pounds vs grams). Within the context of ANOVA models, the results can vary significantly depending on the definition used to estimate the group effects. These are very good reasons for avoiding models that violate the principle of marginality.