Last updated: 2022-10-14 10:06:48

Overview

In the context of linear statistical models, marginality refers to the fact that some effects are marginal to others. For polynomial models linear terms (e.g., x) are marginal to quadratic terms (e.g., x^2), which are marginal to cubic terms (e.g., x^3), and so forth. In analysis of variance models, main effects (e.g., A and B) are marginal to interactions that contain those terms (e.g., A x B, and A x B x C). The so-called principle of marginality holds linear models that contain a higher order term also should include all terms that are marginal to it. For example, if a model contains x^2, then it should also include x, and a model that contains an AxB interaction should also include the main effects A and B. Models that do not obey the principle of marginality have some problematic properties, one of which is illustrated below.

Models that obey marginality

First we create two linear models that obey the marginality principle.

knitr::opts_chunk$set(collapse = TRUE)
options(contrasts=c('contr.sum','contr.poly'))

Create x and y values:

set.seed(81201)
x <- seq(-10,10,.25)
y <- 2*x + 0.25*(x^2) + rnorm(length(x),0,10)

Fit a polynomial model:

lm01 <- lm(y~1+x+I(x^2))
yFit <- predict(lm01)

# #+ test-b , fig.width=5, fig.height=5
# par(mar = c(4, 4, .1, .1));
# plot(y~x,cex.lab=1.5,cex.axis=1.25)
# lines(yFit~x)

Now linearly-transform x to create a new variable:

a <- 4+0.5*x # linear transformation of x
lm02 <- lm(y~1+a+I(a^2))
yFit2 <- predict(lm02)
par(mar = c(4, 4, .1, .1),mfrow=c(1,2));
plot(y~x,cex.lab=1.5,cex.axis=1.25,xlab='X',ylab='Y')
lines(yFit~x)
plot(y~a,cex.lab=1.5,cex.axis=1.25,xlab='A',ylab='Y')
lines(yFit2~a)

The two linear models fit the data equally well. This observation is confirmed quantitatively by examining the regression and anova tables.

Compare regression tables:

summary(lm01)
## 
## Call:
## lm(formula = y ~ 1 + x + I(x^2))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -25.1658  -8.0243  -0.8025   8.6979  27.3940 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.01108    1.86267   0.006    0.995    
## x            2.23895    0.21242  10.540  < 2e-16 ***
## I(x^2)       0.29643    0.04064   7.294 2.16e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 11.17 on 78 degrees of freedom
## Multiple R-squared:  0.6781, Adjusted R-squared:  0.6698 
## F-statistic: 82.15 on 2 and 78 DF,  p-value: < 2.2e-16
summary(lm02)
## 
## Call:
## lm(formula = y ~ 1 + a + I(a^2))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -25.1658  -8.0243  -0.8025   8.6979  27.3940 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   1.0710     2.4288   0.441 0.660478    
## a            -5.0078     1.3681  -3.661 0.000456 ***
## I(a^2)        1.1857     0.1626   7.294 2.16e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 11.17 on 78 degrees of freedom
## Multiple R-squared:  0.6781, Adjusted R-squared:  0.6698 
## F-statistic: 82.15 on 2 and 78 DF,  p-value: < 2.2e-16

Compare ANOVA tables:

anova(lm01)
## Analysis of Variance Table
## 
## Response: y
##           Df  Sum Sq Mean Sq F value    Pr(>F)    
## x          1 13873.2 13873.2 111.100 < 2.2e-16 ***
## I(x^2)     1  6643.9  6643.9  53.206 2.159e-10 ***
## Residuals 78  9740.0   124.9                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(lm02)
## Analysis of Variance Table
## 
## Response: y
##           Df  Sum Sq Mean Sq F value    Pr(>F)    
## a          1 13873.2 13873.2 111.100 < 2.2e-16 ***
## I(a^2)     1  6643.9  6643.9  53.206 2.159e-10 ***
## Residuals 78  9740.0   124.9                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The bottom line is that a linear transformation of the predictor varaible (i.e., transforming x into a) alters the values of the coefficients of the best-fitting models but does not alter the goodness-of-fit.

Models that violate marginality

In this section, we examine what happens when we violate marginality by using models that include a quadratic term but not a linear term.

lm01B <- lm(y~1+I(x^2))
lm02B <- lm(y~1+I(a^2))
yFit1B <- predict(lm01B)
yFit2B <- predict(lm02B)
par(mar = c(4, 4, .1, .1),mfrow=c(1,2));
plot(y~x,cex.lab=1.5,cex.axis=1.25,xlab='X',ylab='Y')
lines(yFit1B~x)
plot(y~a,cex.lab=1.5,cex.axis=1.25,xlab='A',ylab='Y')
lines(yFit2B~a)

The graphs show that the data are not fit equally well by the two models. This observation is confirmed quantitatively by examining the regression and ANOVA tables.

Compare regression tables:

summary(lm01B)
## 
## Call:
## lm(formula = y ~ 1 + I(x^2))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -34.752 -13.062   2.539  13.779  32.624 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.01108    2.88182   0.004    0.997    
## I(x^2)       0.29643    0.06287   4.715 1.02e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 17.29 on 79 degrees of freedom
## Multiple R-squared:  0.2196, Adjusted R-squared:  0.2097 
## F-statistic: 22.23 on 1 and 79 DF,  p-value: 1.025e-05
summary(lm02B)
## 
## Call:
## lm(formula = y ~ 1 + I(a^2))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -29.9279  -8.3211  -0.1873   8.0349  26.5127 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -5.0792     1.8866  -2.692  0.00866 ** 
## I(a^2)        0.6201     0.0543  11.421  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 12.02 on 79 degrees of freedom
## Multiple R-squared:  0.6228, Adjusted R-squared:  0.618 
## F-statistic: 130.4 on 1 and 79 DF,  p-value: < 2.2e-16

Compare ANOVA tables:

anova(lm01B)
## Analysis of Variance Table
## 
## Response: y
##           Df  Sum Sq Mean Sq F value    Pr(>F)    
## I(x^2)     1  6643.9  6643.9  22.228 1.025e-05 ***
## Residuals 79 23613.2   298.9                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(lm02B)
## Analysis of Variance Table
## 
## Response: y
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## I(a^2)     1  18844 18843.9  130.43 < 2.2e-16 ***
## Residuals 79  11413   144.5                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Thus, a linear transformation of the predictor variable can alter the goodness-of-fit when the model violates marginality. This is not a good outcome, because it means that the result of our analysis depends strongly on our more-or-less arbitrary choice of the units used to express the predictor variable (e.g., pounds vs grams). Within the context of ANOVA models, the results can vary significantly depending on the definition used to estimate the group effects. These are very good reasons for avoiding models that violate the principle of marginality.