Analysis of Variance

Inference comparing multiple population means

Today’s agenda

  1. [lecture] inference comparing several population means
  2. [lab] fitting ANOVA models in R

More than two means?

You previously considered this data on chick weights at 20 days of age by diet:

Here we have four means to compare rather than just two.

diet mean se sd n
1 170.4 13.45 55.44 17
2 205.6 22.22 70.25 10
3 258.9 20.63 65.24 10
4 233.9 12.52 37.57 9

Does mean weight at 20 days differ by diet? How do you test this?

Hypotheses for a difference in means

Let \(\mu_i = \text{mean weight on diet } i = 1, 2, 3, 4\).

The hypothesis that there are no differences in means by diet is:

\[ H_0: \mu_1 = \mu_2 = \mu_3 = \mu_4 \quad (\text{no difference in means}) \]

The alternative, if this is false, is that there is at least one difference:

\[ H_A: \mu_i \neq \mu_j \quad (\text{at least one difference}) \]

How much difference is enough?

Here are two made-up examples of four sample means.

Why does it look like there’s a difference at right but not at left?

Think about the \(t\)-test: we say there’s a difference if \(T = \frac{\text{estimate} - \text{hypothesis}}{\text{variability}}\) is large.

Same idea here: we see differences if they are big relative to the variability in estimates.

Partitioning variation

Partitioning variation into two or more components is called “analysis of variance”

For the chick data, two sources of variability:

  • group variability between diets

  • error variability among chicks

The analysis of variance (ANOVA) model:

\[\color{grey}{\text{total variation}} = \color{red}{\text{group variation}} + \color{blue}{\text{error variation}}\]

We’ll base the test on the ratio \(F = \frac{\color{red}{\text{group variation}}}{\color{blue}{\text{error variation}}}\).

The \(F\) statistic: a variance ratio

The \(F\) statistic measures variability attributable to group differences relative to variability attributable to individual differences.

Notation:

  • \(\bar{x}\): “grand” mean of all observations
  • \(\bar{x}_i\): mean of observations in group \(i\)
  • \(s_i\): SD of observations in group \(i\)
  • \(k\) groups
  • \(n\) total observations
  • \(n_i\) observations per group

Measures of variability:

\[\color{red}{MSG} = \frac{1}{k - 1}\sum_i n_i(\bar{x}_i - \bar{x})^2 \quad(\color{red}{\text{group}})\] \[\color{blue}{MSE} = \frac{1}{n - k}\sum_i (n_i - 1)s_i^2 \quad(\color{blue}{\text{error}})\] Ratio:

\[F = \frac{\color{red}{MSG}}{\color{blue}{MSE}} \quad\left(\frac{\color{red}{\text{group variation}}}{\color{blue}{\text{error variation}}}\right)\]

Sampling distribution for \(F\)

If the data satisfy these conditions:

  1. the distribution of values is symmetric and unimodal within each group
  2. the variability (standard deviation) is roughly the same across groups

Then the \(F\) statistic has a sampling distribution well-approximated by an \(F_{k - 1, n - k}\) model.

  • numerator degrees of freedom \(k - 1\)
  • denominator degrees of freedom \(n - k\)

\(F\) models for several different numerator degrees of freedom \(k - 1\) with fixed \(n = 30\).

\(p\)-values for the \(F\) test

To test the hypotheses:

\[ \begin{cases} H_0: &\mu_1 = \mu_2 = \mu_3 = \mu_4 \\ H_0: &\mu_i \neq \mu_j \quad\text{for some}\quad i \neq j \end{cases} \] Calculate the \(F\) statistic:

# ingredients of mean squares
k <- nrow(chicks.summary)
n <- nrow(chicks)
n.i <- chicks.summary$n
xbar.i <- chicks.summary$mean
s.i <- chicks.summary$sd
xbar <- mean(chicks$weight)

# mean squares
msg <- sum(n.i*(xbar.i - xbar)^2)/(k - 1)
mse <- sum((n.i - 1)*s.i^2)/(n - k)

# f statistic
fstat <- msg/mse
fstat
[1] 5.463598

And reject \(H_0\) when \(F\) is large.

For a significance level \(\alpha\) test, reject \(H_0\) when \(\underbrace{P(F > F_\text{obs})}_\text{p-value} < \alpha\).

pf(fstat, 4 - 1, 46 - 4, lower.tail = F)
[1] 0.002909054

Interpreting \(F\) statistics and \(p\)-values

\[ \begin{cases} H_0: &\mu_1 = \mu_2 = \mu_3 = \mu_4 \\ H_0: &\mu_i \neq \mu_j \quad\text{for some}\quad i \neq j \end{cases} \]

\(F = \frac{\color{red}{\text{group variation}}}{\color{blue}{\text{error variation}}} = \frac{MSG}{MSE} = 5.4636\).

pf(fstat, 4 - 1, 46 - 4, lower.tail = F)
[1] 0.002909054

F = 5.4636 means the proportion of variation in weight attributable to diets is 5.46 times greater than the proportion of variation attributable to chicks.

The statistical significance of this result is measured by the \(p\)-value:

  • if there is in fact no difference in means, then only 0.29% of samples (i.e., 2 in 1000) would produce at least as much diet-to-diet variability as we observed.

  • so in this case we reject \(H_0\) at the 1% level

ANOVA in R

The aov(...) function fits ANOVA models using a formula/dataframe specification:

# fit anova model
fit <- aov(weight ~ diet, data = chicks)

# generate table
summary(fit)
Analysis of Variance Model
  Df Sum Sq Mean Sq F value Pr(>F)
diet 3 55881 18627 5.464 0.002909
Residuals 42 143190 3409 NA NA

The typical style for interpretation closely follows that of previous inferences for the mean:

The data provide strong evidence of an effect of diet on mean weight (F = 5.464 on 3 and 42 df, p = 0.0029).

Analysis of variance table

The results of an analysis of variance are traditionally displayed in a table.

Source degrees of freedom Sum of squares Mean square F statistic p-value
Group \(k - 1\) SSG \(MSG = \frac{SSG}{k - 1}\) \(\frac{MSG}{MSE}\) \(P(F > F_\text{obs})\)
Error \(n - k\) SSE \(MSE = \frac{SSE}{n - k}\)
  • the sum of square terms are ‘raw’ measures of variability
  • the mean square terms are averages adjusted for the amount of data available to estimate variability due to each source

Formally, the ANOVA model says \((n - 1)s^2 = SSG + SSE\).

Checking assumptions

The ANOVA test assumes:

  1. the distribution of values is symmetric and unimodal within each group
  2. the variability (standard deviation) is roughly the same across groups

To check these assumptions:

  • compare group standard deviations for similarity
  • visually inspect distributions within each group for approximate symmetry

Similar to the \(t\) test, greater departures from these assumptions are allowable for larger sample sizes.

diet mean se sd n
1 170.4 13.45 55.44 17
2 205.6 22.22 70.25 10
3 258.9 20.63 65.24 10
4 233.9 12.52 37.57 9

Another example: treating anorexia

Weight change was measured for 72 young female anorexia patients randomly allocated to three treatment groups:

  • cognitive behavioral therapy (CBT)
  • family treatment (FT)
  • a control (Cont)

Grouped summary statistics:

treat post - pre sd n
CBT 3.007 7.309 29
Cont -0.45 7.989 26
FT 7.265 7.157 17

Were any of the treatments more effective than others?

Another example: treating anorexia

# fit anova model
fit <- aov(change ~ treat, data = anorexia)

# generate table
summary(fit)
Analysis of Variance Model
  Df Sum Sq Mean Sq F value Pr(>F)
treat 2 614.6 307.3 5.422 0.006499
Residuals 69 3911 56.68 NA NA

The data provide strong evidence of an effect of therapeutic treatment on mean weight change among young women with anorexia (F = 5.422 on 2 and 69 degrees of freedom, p = 0.0065).