Studies have provided evidence that the hippocampus is smaller in schizophrenic patients on average. The hippocampus dataset contains data on volumes of the left hippocampus in cubic centimeters for pairs of monozygotic twins; one twin in each pair was affected by schizophrenia and the other was not. The difference variable shows the difference (affected - unaffected) in hippocampal volume between twins in each pair.
[L3] Construct a histogram of the differences in hippocampal volume. Are there any major outliers?
[L5] Test whether hippocampal volume differs between unaffected and affected twins at the 1% significance level. Report the result in context following conventional style.
[L5] Test whether hippocampal volume is greater among unaffected individuals compared with their affected twins at the 1% significance level. Report the result in context following conventional style.
[L5] Construct a lower 99% confidence bound for the mean difference in hippocampal volume (unaffected - affected). Interpret the result in context following conventional style.
Solution
# load dataload('data/hippocampus.RData')# histogram of differenceshist(hippocampus$difference)
# test whether hippocampal volume differst.test(hippocampus$difference, alternative ='two.sided', conf.level =0.99)
One Sample t-test
data: hippocampus$difference
t = 3.2289, df = 14, p-value = 0.006062
alternative hypothesis: true mean is not equal to 0
99 percent confidence interval:
0.01551009 0.38182325
sample estimates:
mean of x
0.1986667
# test whether hippocampal volume is greater among unaffectedt.test(hippocampus$difference, alternative ='greater', conf.level =0.99)
One Sample t-test
data: hippocampus$difference
t = 3.2289, df = 14, p-value = 0.003031
alternative hypothesis: true mean is greater than 0
99 percent confidence interval:
0.03718909 Inf
sample estimates:
mean of x
0.1986667
The histogram is shown above. There are no major outliers.
The data provide evidence that mean hippocampal volume differs among affected and unaffected twins (T = 3.2289 on 14 df, p = 0.006062).
The data provide evidence that mean hippocampal volume is greater among unaffected individuals compared with their affected twins (T = 3.2289 on 14 df, p = 0.003031).
With 99% confidence, mean hippocampal volume is at least 0.037 greater among unaffected individuals compared with their affected twins.
The table and figures below show summary statistics and distributions of egg clutch sizes for frogs at two sites in the Tibetan Plateau. Data are from Chen, W., et al., Maternal investment increases with altitude in a frog on the Tibetan Plateau. Journal of Evolutionary Biology 26-12 (2013).
[L4] Compute a point estimate and standard error for the difference in mean clutch size between the two sites. Report these quantities in context following conventional style.
[L4] Construct an approximate 99% confidence interval for the difference in mean clutch size.
[L5] Compute the test statistic you would use to test whether mean clutch size differs by site.
[L5] What would the conclusion of the test be at the 1% level?
site
altitude
csize.mean
csize.sd
n
063
3,098
625.67
197.01
23
077
2,035
733.44
202.72
37
Solution
# point estimate and std errorest <-733.44-625.67se <-sqrt(197.01^2/23+202.72^2/37)c(estimate = est, stderr = se)
estimate stderr
107.77000 52.89807
# 99% intervalest +c(-1, 1)*3*se
[1] -50.9242 266.4642
# test statisticest/se
[1] 2.037314
The difference in mean clutch size between site 77 and site 63 is estimated to be 107.8 eggs (SE = 52.9).
With 99% confidence, the difference in mean clutch size between site 77 and site 63 is estimated to be between -50.9 and 266.5 eggs.
The T statistic for a test of whether the means differ is 2.037.
At the 1% level, there is no evidence of a difference in mean clutch size between frog populations at the two sites. This is because the 99% interval includes zero (so no difference is plausible at the corresponding confidence level), or, equivalently, .
The lizards data contains top running speeds in meters per second (m/s) from two species of lizard: western fence and sagebrush.
[L3] Construct side-by-side boxplots and comment on whether there appears to be a difference in mean top running speed between species. If so, which species appears to run faster?
[L5] Test for a difference in mean top running speed between species at the 5% significance level. Report the test result following conventional style.
[L4] Compute a point estimate for the difference in means and a standard error. Report the estimate following conventional style.
[L4] Construct a confidence interval at a level consistent with your test and interpret the interval in context following conventional style.
[L5] How many lizards of each species would you need to measure to detect a difference in mean top speed of 0.5 m/s using the test you performed above 85% of the time? Use the larger standard deviation of the two species in the lizards data to perform the calculation.
# test for a difference in mean top speedtt.out <-t.test(top.speed ~ species, data = lizards)tt.out
Welch Two Sample t-test
data: top.speed by species
t = -5.2217, df = 32.57, p-value = 9.939e-06
alternative hypothesis: true difference in means between group western.fence and group sagebrush is not equal to 0
95 percent confidence interval:
-0.9754526 -0.4282537
sample estimates:
mean in group western.fence mean in group sagebrush
1.612692 2.314545
# point estimate and standard errorc(diff =diff(tt.out$estimate), se = tt.out$stderr)
diff.mean in group sagebrush se
0.7018531 0.1344115
Two-sample t test power calculation
n = 23.12677
delta = 0.5
sd = 0.555
sig.level = 0.05
power = 0.85
alternative = two.sided
NOTE: n is number in *each* group
The plot is shown above. Sagebrush lizards appear to run faster than western fence lizards.
The data provide evidence of a difference in mean top running speeds between western fence and sagebrush lizards (T = -5.2217 on 32.57 df, p < 0.0001).
The mean top running speed of sagebrush lizards is estimated to be 0.702 m/s faster than that of western fence lizards (SE 0.1344).
With 95% confidence, the mean top running speed of sagebrush lizards is estimated to be between 0.4283 and 0.9755 m/s faster than that of western fence lizards.
You would need 24 per species.
Meadowfoam is a small plant found growing in moist meadows of the US Pacific Northwest. Researchers reported the results from one study in a series designed to find out how to elevate meadowfoam production to a profitable crop. In a controlled growth chamber, they focused on the effect of light intensity (μmol/m^2/sec) on flowering by recording the average number of flowers per plant in experimental plots when exposed to one of six light intensity levels. The resulting data is stored in the meadow dataset.
[L3] Construct side-by-side boxplots of the distributions of flowers per plant by intensity level. Does it appear that mean flowering varies depending on light intensity? If so, describe the apparent relationship.
[L9] Do the data seem to satisfy the assumptions for ANOVA? Why or why not?
[L9] Test for an effect of intensity on mean flowering at the 1% significance level. Report the result of the omnibus test in context following conventional style.
[L9] Estimate the effect size. Provide a 99% confidence interval and interpret the interval in context following conventional style.
[L9] Estimate the proportion of variation in flowering not attributable to light intensity.
[L9] Suppose you were redesigning the study with only the 300, 600, and 900 levels of light intensity. How many experimental plots would you need in total to detect an effect of using a 1% level omnibus test 80% of the time?
# fit anova modelfit <-aov(flowers ~ intensity, data = meadow)summary(fit)
Df Sum Sq Mean Sq F value Pr(>F)
intensity 5 2684 536.7 5.839 0.00224 **
Residuals 18 1654 91.9
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# estimate effect sizeeta_squared(fit, partial = F, alternative ='two.sided', ci =0.99)
# Effect Size for ANOVA (Type I)
Parameter | Eta2 | 99% CI
-------------------------------
intensity | 0.62 | [0.05, 0.80]
# power sample size calculationspower.anova.test(groups =3,sig.level =0.01,power =0.8,between.var =0.25,within.var =0.75)
Balanced one-way analysis of variance power calculation
groups = 3
n = 22.3919
between.var = 0.25
within.var = 0.75
sig.level = 0.01
power = 0.8
NOTE: n is number in each group
The boxplots are shown above. It does look like there’s an effect: flowering appears to decrease with light intensity.
Yes: the variation in flowering is similar across groups and there are no major outliers.
The data provide evidence of an effect of light intensity on mean flowering (F = 5.839 on 5 and 18 df, p = 0.00224).
An estimated 62% of variation in flowering is attributable to light intensity. With 99% confidence, an estimated 5%-80% of variation in flowering is attributable to light intensity.
If an estimated 62% of variation in flowering is attributable to light intensity, then the estimated share of variation in flowering not attributable to light intensity is 38%.
You would need 23 plots per group, or 69 in total.