STAT218 - Analysis of Variance

treatment	mean	sd	n
seeded	442	650.8	26
unseeded	164.6	278.4	26

treatment	mean	sd	n
seeded	442	650.8	26
unseeded	164.6	278.4	26

treatment	mean	sd	n
seeded	442	650.8	26
unseeded	164.6	278.4	26

sex	mean.temp	se	n
female	98.39	0.09222	65
male	98.1	0.08667	65

Chick weight data

Chick weights at 20 days of age by diet:

Here we have four means to compare.

diet	mean	se	sd	n
1	170.4	13.45	55.44	17
2	205.6	22.22	70.25	10
3	258.9	20.63	65.24	10
4	233.9	12.52	37.57	9

This is experimental data: chicks were randomly allocated one of the four diets.

looks like there are differences in mean weight by diet
how do you test for an effect of diet?

Hypotheses comparing many means

Let .

The hypothesis that there are no differences in means by diet is:

The alternative, if this is false, is that there is at least one difference:

This is known as an “omnibus” test because of the composite alternative.

How much difference is enough?

Here are two made-up examples of the same four sample means with different SEs.

Why does it look like there’s a difference at right but not at left?

Think about the -test: we say there’s a difference if is large.

Same idea here: we see differences if they are big relative to the variability in estimates.

Partitioning variation

Partitioning variation into two or more components is called “analysis of variance”

For the chick data, two sources of variability:

group variability between diets
error variability among chicks

The analysis of variance (ANOVA) model:

We’ll base the test on the ratio and reject if the ratio is large enough.

The statistic: a variance ratio

The statistic measures variability attributable to group differences relative to variability attributable to individual differences.

Notation:

: “grand” mean of all observations
: mean of observations in group
: SD of observations in group
groups
total observations
observations per group

Measures of variability:

Ratio:

Sampling distribution for

Provided that

each group satisfies conditions for a test
the variability (standard deviation) is about the same across groups

the statistic has a sampling distribution well-approximated by an model.

numerator degrees of freedom
denominator degrees of freedom

models for several different numerator degrees of freedom with fixed .

The ANOVA test “by hand”

To test the hypotheses:

Calculate the statistic:

# ingredients of mean squares
k <- nrow(chicks.summary)
n <- nrow(chicks)
n.i <- chicks.summary$n
xbar.i <- chicks.summary$mean
s.i <- chicks.summary$sd
xbar <- mean(chicks$weight)

# mean squares
msg <- sum(n.i*(xbar.i - xbar)^2)/(k - 1)
mse <- sum((n.i - 1)*s.i^2)/(n - k)

# f statistic
fstat <- msg/mse
fstat

[1] 5.463598

And reject when is large.

For a significance level test, reject when .

pf(fstat, 4 - 1, 46 - 4, lower.tail = F)

[1] 0.002909054

Test outcome

.

pf(fstat, 4 - 1, 46 - 4, lower.tail = F)

[1] 0.002909054

F = 5.4636 means the variation in weight attributable to diets is 5.46 times greater than individual variation among chicks.

The -value for the test is 0.0029:

if there is truly no difference in means, then under 1% of samples (about 2 in 1000) would produce at least as much diet-to-diet variability as we observed
so in this case we reject at the 1% level

ANOVA in R

The aov(...) function fits ANOVA models using a formula/dataframe specification:

# fit anova model
fit <- aov(weight ~ diet, data = chicks)

# generate table
summary(fit)

Analysis of Variance Model
	Df	Sum Sq	Mean Sq	F value	Pr(>F)
diet	3	55881	18627	5.464	0.002909
Residuals	42	143190	3409	NA	NA

Conventional interpretation style closely follows that of previous inferences for one or two means:

The data provide evidence of an effect of diet on mean weight (F = 5.464 on 3 and 42 df, p = 0.0029).

Analysis of variance table

The results of an analysis of variance are traditionally displayed in a table.

Source	degrees of freedom	Sum of squares	Mean square	F statistic	p-value
Group		SSG
Error		SSE

the sum of square terms are ‘raw’ measures of variability from each source
the mean square terms are adjusted for the amount of data available and number of parameters

Formally, the ANOVA model says

Constructing the table “by hand”

Source	DF	Sum sq.	Mean sq.	F statistic	p-value
Group		SSG
Error		SSE

Here is what the fitted model looks like in R:

# fitted anova model
aov(weight ~ diet, data = chicks)

Call:
   aov(formula = weight ~ diet, data = chicks)

Terms:
                     diet Residuals
Sum of Squares   55881.02 143190.31
Deg. of Freedom         3        42

Residual standard error: 58.38915
Estimated effects may be unbalanced

Source	DF	Sum Sq.	Mean Sq.	F statistic
Diet
Error

Rules of thumb for level tests:

almost always rejects
usually rejects for (need )
usually rejects for (need )
rarely rejects unless

Measuring effect size

A common measure of effect size is where This is the proportion of total variation attributed to the grouping factor.

possible values
near 1 large effect
near 0 small effect

library(effectsize)
fit <- aov(weight ~ diet, data = chicks)
eta_squared(fit, partial = F)

# Effect Size for ANOVA (Type I)

Parameter | Eta2 |       95% CI
-------------------------------
diet      | 0.28 | [0.08, 1.00]

- One-sided CIs: upper bound fixed at [1.00].

An estimated 28% of variation in weight is attributable to diet.

The confidence interval is for the population analogue of .

Powering for effect size

How many chicks should we measure to detect an average group difference of 20g?

Group variation on the order of 20g amounts to about (since varies) which is an effect size of i.e., group variation is around 9.14% of total variation.

To power the study to detect use the approximation (assumes ):

power.anova.test(groups = 4,
                 sig.level = 0.05,
                 power = 0.8,
                 between.var = 0.1, # eta^2
                 within.var = 0.9) # 1 - eta^2


     Balanced one-way analysis of variance power calculation 

         groups = 4
              n = 33.70068
    between.var = 0.1
     within.var = 0.9
      sig.level = 0.05
          power = 0.8

NOTE: n is number in each group

We’d need 34 chicks per group.

diet	mean	sd	n
NP	27.4	6.134	49
N/N85	32.69	5.125	57
N/R50	42.3	7.768	71
N/R40	45.12	6.703	60

treat	post - pre	sd	n
ctrl	-0.45	7.989	26
cbt	3.007	7.309	29
ft	7.265	7.157	17

Analysis of Variance

Today’s agenda

Statistical power

-values and type I errors

Type II errors

Simulating type II errors

Larger effect size

Smaller effect size

Statistical power

Power curves

Sample size calculation

Revisiting body temperatures

What if we had more data?

A statistical trap

ANOVA