Analysis of Variance

Inference comparing multiple population means

Today’s agenda

[lecture] inference comparing several population means
[lab] fitting ANOVA models in R

More than two means?

You previously considered this data on chick weights at 20 days of age by diet:

Here we have four means to compare rather than just two.

diet	mean	se	sd	n
1	170.4	13.45	55.44	17
2	205.6	22.22	70.25	10
3	258.9	20.63	65.24	10
4	233.9	12.52	37.57	9

Does mean weight at 20 days differ by diet? How do you test this?

Hypotheses for a difference in means

Let \(\mu_i = \text{mean weight on diet } i = 1, 2, 3, 4\).

The hypothesis that there are no differences in means by diet is:

\[ H_0: \mu_1 = \mu_2 = \mu_3 = \mu_4 \quad (\text{no difference in means}) \]

The alternative, if this is false, is that there is at least one difference:

\[ H_A: \mu_i \neq \mu_j \quad (\text{at least one difference}) \]

How much difference is enough?

Here are two made-up examples of four sample means.

Why does it look like there’s a difference at right but not at left?

Think about the \(t\)-test: we say there’s a difference if \(T = \frac{\text{estimate} - \text{hypothesis}}{\text{variability}}\) is large.

Same idea here: we see differences if they are big relative to the variability in estimates.

Partitioning variation

Partitioning variation into two or more components is called “analysis of variance”

For the chick data, two sources of variability:

group variability between diets
error variability among chicks

The analysis of variance (ANOVA) model:

\[\color{grey}{\text{total variation}} = \color{red}{\text{group variation}} + \color{blue}{\text{error variation}}\]

We’ll base the test on the ratio \(F = \frac{\color{red}{\text{group variation}}}{\color{blue}{\text{error variation}}}\).

The \(F\) statistic: a variance ratio

The \(F\) statistic measures variability attributable to group differences relative to variability attributable to individual differences.

Notation:

\(\bar{x}\): “grand” mean of all observations
\(\bar{x}_i\): mean of observations in group \(i\)
\(s_i\): SD of observations in group \(i\)
\(k\) groups
\(n\) total observations
\(n_i\) observations per group

Measures of variability:

\[\color{red}{MSG} = \frac{1}{k - 1}\sum_i n_i(\bar{x}_i - \bar{x})^2 \quad(\color{red}{\text{group}})\] \[\color{blue}{MSE} = \frac{1}{n - k}\sum_i (n_i - 1)s_i^2 \quad(\color{blue}{\text{error}})\] Ratio:

\[F = \frac{\color{red}{MSG}}{\color{blue}{MSE}} \quad\left(\frac{\color{red}{\text{group variation}}}{\color{blue}{\text{error variation}}}\right)\]

Sampling distribution for \(F\)

If the data satisfy these conditions:

the distribution of values is symmetric and unimodal within each group
the variability (standard deviation) is roughly the same across groups

Then the \(F\) statistic has a sampling distribution well-approximated by an \(F_{k - 1, n - k}\) model.

numerator degrees of freedom \(k - 1\)
denominator degrees of freedom \(n - k\)

\(F\) models for several different numerator degrees of freedom \(k - 1\) with fixed \(n = 30\).

\(p\)-values for the \(F\) test

To test the hypotheses:

\[ \begin{cases} H_0: &\mu_1 = \mu_2 = \mu_3 = \mu_4 \\ H_0: &\mu_i \neq \mu_j \quad\text{for some}\quad i \neq j \end{cases} \] Calculate the \(F\) statistic:

# ingredients of mean squares
k <- nrow(chicks.summary)
n <- nrow(chicks)
n.i <- chicks.summary$n
xbar.i <- chicks.summary$mean
s.i <- chicks.summary$sd
xbar <- mean(chicks$weight)

# mean squares
msg <- sum(n.i*(xbar.i - xbar)^2)/(k - 1)
mse <- sum((n.i - 1)*s.i^2)/(n - k)

# f statistic
fstat <- msg/mse
fstat

[1] 5.463598

And reject \(H_0\) when \(F\) is large.

For a significance level \(\alpha\) test, reject \(H_0\) when \(\underbrace{P(F > F_\text{obs})}_\text{p-value} < \alpha\).

pf(fstat, 4 - 1, 46 - 4, lower.tail = F)

[1] 0.002909054

Interpreting \(F\) statistics and \(p\)-values

\[ \begin{cases} H_0: &\mu_1 = \mu_2 = \mu_3 = \mu_4 \\ H_0: &\mu_i \neq \mu_j \quad\text{for some}\quad i \neq j \end{cases} \]

\(F = \frac{\color{red}{\text{group variation}}}{\color{blue}{\text{error variation}}} = \frac{MSG}{MSE} = 5.4636\).

pf(fstat, 4 - 1, 46 - 4, lower.tail = F)

[1] 0.002909054

F = 5.4636 means the proportion of variation in weight attributable to diets is 5.46 times greater than the proportion of variation attributable to chicks.

The statistical significance of this result is measured by the \(p\)-value:

if there is in fact no difference in means, then only 0.29% of samples (i.e., 2 in 1000) would produce at least as much diet-to-diet variability as we observed.
so in this case we reject \(H_0\) at the 1% level

ANOVA in R

The aov(...) function fits ANOVA models using a formula/dataframe specification:

# fit anova model
fit <- aov(weight ~ diet, data = chicks)

# generate table
summary(fit)

Analysis of Variance Model
	Df	Sum Sq	Mean Sq	F value	Pr(>F)
diet	3	55881	18627	5.464	0.002909
Residuals	42	143190	3409	NA	NA

The typical style for interpretation closely follows that of previous inferences for the mean:

The data provide strong evidence of an effect of diet on mean weight (F = 5.464 on 3 and 42 df, p = 0.0029).

Analysis of variance table

The results of an analysis of variance are traditionally displayed in a table.

Source	degrees of freedom	Sum of squares	Mean square	F statistic	p-value
Group	\(k - 1\)	SSG	\(MSG = \frac{SSG}{k - 1}\)	\(\frac{MSG}{MSE}\)	\(P(F > F_\text{obs})\)
Error	\(n - k\)	SSE	\(MSE = \frac{SSE}{n - k}\)

the sum of square terms are ‘raw’ measures of variability
the mean square terms are averages adjusted for the amount of data available to estimate variability due to each source

Formally, the ANOVA model says \((n - 1)s^2 = SSG + SSE\).

Checking assumptions

The ANOVA test assumes:

the distribution of values is symmetric and unimodal within each group
the variability (standard deviation) is roughly the same across groups

To check these assumptions:

compare group standard deviations for similarity
visually inspect distributions within each group for approximate symmetry

Similar to the \(t\) test, greater departures from these assumptions are allowable for larger sample sizes.

diet	mean	se	sd	n
1	170.4	13.45	55.44	17
2	205.6	22.22	70.25	10
3	258.9	20.63	65.24	10
4	233.9	12.52	37.57	9

Another example: treating anorexia

Weight change was measured for 72 young female anorexia patients randomly allocated to three treatment groups:

cognitive behavioral therapy (CBT)
family treatment (FT)
a control (Cont)

Grouped summary statistics:

treat	post - pre	sd	n
CBT	3.007	7.309	29
Cont	-0.45	7.989	26
FT	7.265	7.157	17

Were any of the treatments more effective than others?

Another example: treating anorexia

# fit anova model
fit <- aov(change ~ treat, data = anorexia)

# generate table
summary(fit)

Analysis of Variance Model
	Df	Sum Sq	Mean Sq	F value	Pr(>F)
treat	2	614.6	307.3	5.422	0.006499
Residuals	69	3911	56.68	NA	NA

The data provide strong evidence of an effect of therapeutic treatment on mean weight change among young women with anorexia (F = 5.422 on 2 and 69 degrees of freedom, p = 0.0065).