subject | actual | desired | difference |
---|---|---|---|
1 | 265 | 225 | 40 |
2 | 150 | 150 | 0 |
3 | 137 | 150 | -13 |
4 | 159 | 125 | 34 |
5 | 145 | 125 | 20 |
Hypothesis tests and intervals for comparing two population means
Practice problem: test whether actual body weight exceeds desired body weight.
subject | actual | desired | difference |
---|---|---|---|
1 | 265 | 225 | 40 |
2 | 150 | 150 | 0 |
3 | 137 | 150 | -13 |
4 | 159 | 125 | 34 |
5 | 145 | 125 | 20 |
One Sample t-test
data: weight.diffs
t = 4.2172, df = 59, p-value = 4.311e-05
alternative hypothesis: true mean is greater than 0
95 percent confidence interval:
10.99824 Inf
sample estimates:
mean of x
18.21667
The data provide very strong evidence that the average U.S. adult’s actual weight exceeds their desired weight (T = 4.2172 on 59 degrees of freedom, p < 0.0001).
Inference is on the mean difference: \(H_0: \delta = 0\) vs. \(H_A: \delta > 0\).
Can we also do inference on a difference in means?
Peter and Rosemary Grant caught and measured birds from more than 20 generations of finches on Daphne Major.
severe drought in 1977 limited food to large tough seeds
selection pressure favoring larger and stronger beaks
hypothesis: beak depth increased in 1978 relative to 1976
year | depth |
---|---|
1976 | 10.8 |
1976 | 7.4 |
1978 | 11.4 |
1978 | 10.6 |
To answer this, we need to test a hypothesis involving two means:
\[ \begin{cases} H_0: &\mu_{1976} = \mu_{1978} \\ H_A: &\mu_{1976} < \mu_{1978} \end{cases} \]
If \(x_1, \dots, x_{58}\) are the 1976 observations and \(y_1, \dots, y_{65}\) are the 1978 observations:
Inference uses a new \(T\) statistic:
\[ T = \frac{\bar{x} - \bar{y} - \delta_0}{SE(\bar{x} - \bar{y})} \]
The two-sample test is appropriate whenever two one-sample tests would be.
In other words, the test assumes that both samples are either:
To check, simply inspect each histogram.
The two-sample test is appropriate whenever two one-sample tests would be.
In other words, the test assumes that both samples are either:
Could also check side-by-side boxplots for:
This is also a nice visualization of differences between samples.
Welch Two Sample t-test
data: depth by year
t = -4.5727, df = 111.79, p-value = 6.255e-06
alternative hypothesis: true difference in means between group 1976 and group 1978 is less than 0
95 percent confidence interval:
-Inf -0.4698812
sample estimates:
mean in group 1976 mean in group 1978
9.453448 10.190769
The data provide very strong evidence that mean beak depth increased following the drought (T = -4.5727 on 111.79 degrees of freedom, p < 0.0001). With 95% confidence, the mean increase is estimated to be at least 0.4699 mm, with a point estimate of 0.7373 (SE 0.1612).
Highly similar, but notice:
depth ~ year
(“depth depends on year”) and data frame finch
mu
now indicates hypothesized difference in meansDoes seeding clouds with silver iodide increase mean rainfall?
Data are rainfall measurements in a target area from 26 days when clouds were seeded and 26 days when clouds were not seeded.
rainfall
gives volume of rainfall in acre-feettreatment
indicates whether clouds were seededHypotheses to test: \[ \begin{cases} H_0: &\mu_\text{seeded} = \mu_\text{unseeded} \\ H_A: &\mu_\text{seeded} > \mu_\text{unseeded} \end{cases} \]
rainfall | treatment |
---|---|
334.1 | seeded |
489.1 | seeded |
200.7 | seeded |
40.6 | seeded |
21.7 | unseeded |
17.3 | unseeded |
68.5 | unseeded |
830.1 | unseeded |
Does seeding clouds with silver iodide increase mean rainfall?
Welch Two Sample t-test
data: rainfall by treatment
t = 1.9982, df = 33.855, p-value = 0.9731
alternative hypothesis: true difference in means between group seeded and group unseeded is less than 0
95 percent confidence interval:
-Inf 512.1582
sample estimates:
mean in group seeded mean in group unseeded
441.9846 164.5885
Welch Two Sample t-test
data: rainfall by treatment
t = 1.9982, df = 33.855, p-value = 0.02689
alternative hypothesis: true difference in means between group seeded and group unseeded is greater than 0
95 percent confidence interval:
42.63408 Inf
sample estimates:
mean in group seeded mean in group unseeded
441.9846 164.5885
You can tell which group R considers first based on which estimate is printed first.
'greater'
is interpreted as [FIRST GROUP] > [SECOND GROUP]'less'
is interpreted as [FIRST GROUP] < [SECOND GROUP]Does seeding clouds with silver iodide increase mean rainfall?
Welch Two Sample t-test
data: rainfall by treatment
t = 1.9982, df = 33.855, p-value = 0.02689
alternative hypothesis: true difference in means between group seeded and group unseeded is greater than 0
95 percent confidence interval:
42.63408 Inf
sample estimates:
mean in group seeded mean in group unseeded
441.9846 164.5885
The data provide moderate evidence that cloud seeding increases mean rainfall (T = 1.9982 on 33.855 degrees of freedom, p = 0.02689). With 95% confidence, seeding is estimated to increase mean rainfall by at least 42.63 acre-feet, with a point estimate of 277.4 (SE 138.8199).
Does mean body temperature differ between men and women?
Test \(H_0: \mu_F = \mu_M\) against \(H_A: \mu_F \neq \mu_M\)
Welch Two Sample t-test
data: body.temp by sex
t = 1.7118, df = 34.329, p-value = 0.09595
alternative hypothesis: true difference in means between group female and group male is not equal to 0
95 percent confidence interval:
-0.09204497 1.07783444
sample estimates:
mean in group female mean in group male
98.65789 98.16500
Suggestive but insufficient evidence that mean body temperature differs by sex.
Notice: estimated difference (F - M) is 0.493 °F (SE 0.2879)
Here are estimates from two larger samples of 65 individuals each (compared with 19, 20):
sex | mean.temp | se | n |
---|---|---|---|
female | 98.39 | 0.09222 | 65 |
male | 98.1 | 0.08667 | 65 |
Welch Two Sample t-test
data: body.temp by sex
t = 2.2854, df = 127.51, p-value = 0.02394
alternative hypothesis: true difference in means between group female and group male is not equal to 0
95 percent confidence interval:
0.03881298 0.53964856
sample estimates:
mean in group female mean in group male
98.39385 98.10462
The data provide moderate evidence that mean body temperature differs by sex (T = 2.29 on 127.51 degrees of freedom, p = 0.02394).
How much data do you need to collect in order to detect a difference of \(\delta\)?
The statistical power of a test captures how often it detects a specified alternative.
measures how often the test correctly rejects (proportion of samples)
value depends on…
power.t.test(power = 0.95,
delta = 0.5,
sig.level = 0.05,
type = 'two.sample',
alternative = 'two.sided')
Two-sample t test power calculation
n = 104.928
delta = 0.5
sd = 1
sig.level = 0.05
power = 0.95
alternative = two.sided
NOTE: n is number in *each* group
\(\Rightarrow\) need 105 observations in each group to detect a difference of 0.5 standard deviations for 95% of samples with a 5% significance level test
If you collect enough data, you can detect an arbitrarily small difference in means almost always.
So keep in mind:
If it is reasonable to assume the (population) standard deviations are the same in each group, one can gain a bit of power by using a different standard error:
\[SE_\text{pooled}(\bar{x} - \bar{y}) = \sqrt{\frac{\color{red}{s_p^2}}{n_x} + \frac{\color{red}{s_p^2}}{n_y}} \quad\text{where}\quad \color{red}{s_p} = \underbrace{\sqrt{\frac{(n_x - 1)s_x^2 + (n_y - 1)s_y^2}{n_x + n_y - 2}}}_{\text{weighted average of } s_x^2 \;\&\; s_y^2}\]
Implement by adding var.equal = T
as an argument to t.test()
.
STAT218