Inference comparing multiple population means
You previously considered this data on chick weights at 20 days of age by diet:
Here we have four means to compare rather than just two.
diet | mean | se | sd | n |
---|---|---|---|---|
1 | 170.4 | 13.45 | 55.44 | 17 |
2 | 205.6 | 22.22 | 70.25 | 10 |
3 | 258.9 | 20.63 | 65.24 | 10 |
4 | 233.9 | 12.52 | 37.57 | 9 |
Does mean weight at 20 days differ by diet? How do you test this?
Let \(\mu_i = \text{mean weight on diet } i = 1, 2, 3, 4\).
The hypothesis that there are no differences in means by diet is:
\[ H_0: \mu_1 = \mu_2 = \mu_3 = \mu_4 \quad (\text{no difference in means}) \]
The alternative, if this is false, is that there is at least one difference:
\[ H_A: \mu_i \neq \mu_j \quad (\text{at least one difference}) \]
Here are two made-up examples of four sample means.
Why does it look like there’s a difference at right but not at left?
Think about the \(t\)-test: we say there’s a difference if \(T = \frac{\text{estimate} - \text{hypothesis}}{\text{variability}}\) is large.
Same idea here: we see differences if they are big relative to the variability in estimates.
Partitioning variation into two or more components is called “analysis of variance”
For the chick data, two sources of variability:
group variability between diets
error variability among chicks
The analysis of variance (ANOVA) model:
\[\color{grey}{\text{total variation}} = \color{red}{\text{group variation}} + \color{blue}{\text{error variation}}\]
We’ll base the test on the ratio \(F = \frac{\color{red}{\text{group variation}}}{\color{blue}{\text{error variation}}}\).
The \(F\) statistic measures variability attributable to group differences relative to variability attributable to individual differences.
Notation:
Measures of variability:
\[\color{red}{MSG} = \frac{1}{k - 1}\sum_i n_i(\bar{x}_i - \bar{x})^2 \quad(\color{red}{\text{group}})\] \[\color{blue}{MSE} = \frac{1}{n - k}\sum_i (n_i - 1)s_i^2 \quad(\color{blue}{\text{error}})\] Ratio:
\[F = \frac{\color{red}{MSG}}{\color{blue}{MSE}} \quad\left(\frac{\color{red}{\text{group variation}}}{\color{blue}{\text{error variation}}}\right)\]
If the data satisfy these conditions:
Then the \(F\) statistic has a sampling distribution well-approximated by an \(F_{k - 1, n - k}\) model.
To test the hypotheses:
\[ \begin{cases} H_0: &\mu_1 = \mu_2 = \mu_3 = \mu_4 \\ H_0: &\mu_i \neq \mu_j \quad\text{for some}\quad i \neq j \end{cases} \] Calculate the \(F\) statistic:
# ingredients of mean squares
k <- nrow(chicks.summary)
n <- nrow(chicks)
n.i <- chicks.summary$n
xbar.i <- chicks.summary$mean
s.i <- chicks.summary$sd
xbar <- mean(chicks$weight)
# mean squares
msg <- sum(n.i*(xbar.i - xbar)^2)/(k - 1)
mse <- sum((n.i - 1)*s.i^2)/(n - k)
# f statistic
fstat <- msg/mse
fstat
[1] 5.463598
And reject \(H_0\) when \(F\) is large.
\[ \begin{cases} H_0: &\mu_1 = \mu_2 = \mu_3 = \mu_4 \\ H_0: &\mu_i \neq \mu_j \quad\text{for some}\quad i \neq j \end{cases} \]
\(F = \frac{\color{red}{\text{group variation}}}{\color{blue}{\text{error variation}}} = \frac{MSG}{MSE} = 5.4636\).
F = 5.4636 means the proportion of variation in weight attributable to diets is 5.46 times greater than the proportion of variation attributable to chicks.
The statistical significance of this result is measured by the \(p\)-value:
if there is in fact no difference in means, then only 0.29% of samples (i.e., 2 in 1000) would produce at least as much diet-to-diet variability as we observed.
so in this case we reject \(H_0\) at the 1% level
The aov(...)
function fits ANOVA models using a formula/dataframe specification:
Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
---|---|---|---|---|---|
diet | 3 | 55881 | 18627 | 5.464 | 0.002909 |
Residuals | 42 | 143190 | 3409 | NA | NA |
The typical style for interpretation closely follows that of previous inferences for the mean:
The data provide strong evidence of an effect of diet on mean weight (F = 5.464 on 3 and 42 df, p = 0.0029).
The results of an analysis of variance are traditionally displayed in a table.
Source | degrees of freedom | Sum of squares | Mean square | F statistic | p-value |
---|---|---|---|---|---|
Group | \(k - 1\) | SSG | \(MSG = \frac{SSG}{k - 1}\) | \(\frac{MSG}{MSE}\) | \(P(F > F_\text{obs})\) |
Error | \(n - k\) | SSE | \(MSE = \frac{SSE}{n - k}\) |
Formally, the ANOVA model says \((n - 1)s^2 = SSG + SSE\).
The ANOVA test assumes:
To check these assumptions:
Similar to the \(t\) test, greater departures from these assumptions are allowable for larger sample sizes.
diet | mean | se | sd | n |
---|---|---|---|---|
1 | 170.4 | 13.45 | 55.44 | 17 |
2 | 205.6 | 22.22 | 70.25 | 10 |
3 | 258.9 | 20.63 | 65.24 | 10 |
4 | 233.9 | 12.52 | 37.57 | 9 |
Weight change was measured for 72 young female anorexia patients randomly allocated to three treatment groups:
Grouped summary statistics:
treat | post - pre | sd | n |
---|---|---|---|
CBT | 3.007 | 7.309 | 29 |
Cont | -0.45 | 7.989 | 26 |
FT | 7.265 | 7.157 | 17 |
Were any of the treatments more effective than others?
Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
---|---|---|---|---|---|
treat | 2 | 614.6 | 307.3 | 5.422 | 0.006499 |
Residuals | 69 | 3911 | 56.68 | NA | NA |
The data provide strong evidence of an effect of therapeutic treatment on mean weight change among young women with anorexia (F = 5.422 on 2 and 69 degrees of freedom, p = 0.0065).
STAT218