Homework 2

With solutions

Due

July 3, 2025

Remarks: problem 1 is based on Lab 5: t-tests in R; problems 2 and 3 are based on Lab 6: Two-sample inference; problem 4 is based on Lab 7: Analysis of variance.

  1. Studies have provided evidence that the hippocampus is smaller in schizophrenic patients on average. The hippocampus dataset contains data on volumes of the left hippocampus in cubic centimeters for pairs of monozygotic twins; one twin in each pair was affected by schizophrenia and the other was not. The difference variable shows the difference (affected - unaffected) in hippocampal volume between twins in each pair.

    1. [L3] Construct a histogram of the differences in hippocampal volume. Are there any major outliers?
    2. [L5] Test whether hippocampal volume differs between unaffected and affected twins at the 1% significance level. Report the result in context following conventional style.
    3. [L5] Test whether hippocampal volume is greater among unaffected individuals compared with their affected twins at the 1% significance level. Report the result in context following conventional style.
    4. [L5] Construct a lower 99% confidence bound for the mean difference in hippocampal volume (unaffected - affected). Interpret the result in context following conventional style.
# load data
load('data/hippocampus.RData')

# histogram of differences
hist(hippocampus$difference)

# test whether hippocampal volume differs
t.test(hippocampus$difference, alternative = 'two.sided', conf.level = 0.99)

    One Sample t-test

data:  hippocampus$difference
t = 3.2289, df = 14, p-value = 0.006062
alternative hypothesis: true mean is not equal to 0
99 percent confidence interval:
 0.01551009 0.38182325
sample estimates:
mean of x 
0.1986667 
# test whether hippocampal volume is greater among unaffected
t.test(hippocampus$difference, alternative = 'greater', conf.level = 0.99)

    One Sample t-test

data:  hippocampus$difference
t = 3.2289, df = 14, p-value = 0.003031
alternative hypothesis: true mean is greater than 0
99 percent confidence interval:
 0.03718909        Inf
sample estimates:
mean of x 
0.1986667 
  1. The histogram is shown above. There are no major outliers.
  2. The data provide evidence that mean hippocampal volume differs among affected and unaffected twins (T = 3.2289 on 14 df, p = 0.006062).
  3. The data provide evidence that mean hippocampal volume is greater among unaffected individuals compared with their affected twins (T = 3.2289 on 14 df, p = 0.003031).
  4. With 99% confidence, mean hippocampal volume is at least 0.037 cm3 greater among unaffected individuals compared with their affected twins.
  1. The table and figures below show summary statistics and distributions of egg clutch sizes for frogs at two sites in the Tibetan Plateau. Data are from Chen, W., et al., Maternal investment increases with altitude in a frog on the Tibetan Plateau. Journal of Evolutionary Biology 26-12 (2013).

    1. [L4] Compute a point estimate and standard error for the difference in mean clutch size between the two sites. Report these quantities in context following conventional style.
    2. [L4] Construct an approximate 99% confidence interval for the difference in mean clutch size.
    3. [L5] Compute the test statistic you would use to test whether mean clutch size differs by site.
    4. [L5] What would the conclusion of the test be at the 1% level?

site altitude csize.mean csize.sd n
063 3,098 625.67 197.01 23
077 2,035 733.44 202.72 37
# point estimate and std error
est <- 733.44 - 625.67
se <- sqrt(197.01^2/23 + 202.72^2/37)
c(estimate = est, stderr = se)
 estimate    stderr 
107.77000  52.89807 
# 99% interval
est + c(-1, 1)*3*se
[1] -50.9242 266.4642
# test statistic
est/se
[1] 2.037314
  1. The difference in mean clutch size between site 77 and site 63 is estimated to be 107.8 eggs (SE = 52.9).
  2. With 99% confidence, the difference in mean clutch size between site 77 and site 63 is estimated to be between -50.9 and 266.5 eggs.
  3. The T statistic for a test of whether the means differ is 2.037.
  4. At the 1% level, there is no evidence of a difference in mean clutch size between frog populations at the two sites. This is because the 99% interval includes zero (so no difference is plausible at the corresponding confidence level), or, equivalently, |T|3.
  1. The lizards data contains top running speeds in meters per second (m/s) from two species of lizard: western fence and sagebrush.

    1. [L3] Construct side-by-side boxplots and comment on whether there appears to be a difference in mean top running speed between species. If so, which species appears to run faster?
    2. [L5] Test for a difference in mean top running speed between species at the 5% significance level. Report the test result following conventional style.
    3. [L4] Compute a point estimate for the difference in means and a standard error. Report the estimate following conventional style.
    4. [L4] Construct a confidence interval at a level consistent with your test and interpret the interval in context following conventional style.
    5. [L5] How many lizards of each species would you need to measure to detect a difference in mean top speed of 0.5 m/s using the test you performed above 85% of the time? Use the larger standard deviation of the two species in the lizards data to perform the calculation.
# load data
load('data/lizards.RData')

# side-by-side boxplots
boxplot(top.speed ~ species, data = lizards, horizontal = T)

# test for a difference in mean top speed
tt.out <- t.test(top.speed ~ species, data = lizards)
tt.out

    Welch Two Sample t-test

data:  top.speed by species
t = -5.2217, df = 32.57, p-value = 9.939e-06
alternative hypothesis: true difference in means between group western.fence and group sagebrush is not equal to 0
95 percent confidence interval:
 -0.9754526 -0.4282537
sample estimates:
mean in group western.fence     mean in group sagebrush 
                   1.612692                    2.314545 
# point estimate and standard error
c(diff = diff(tt.out$estimate), se = tt.out$stderr)
diff.mean in group sagebrush                           se 
                   0.7018531                    0.1344115 
# confidence interval
tt.out$conf.int
[1] -0.9754526 -0.4282537
attr(,"conf.level")
[1] 0.95
# sample size power calculation
power.t.test(delta = 0.5,
             sd = 0.555,
             type = 'two.sample',
             alternative = 'two.sided',
             sig.level = 0.05,
             power = 0.85)

     Two-sample t test power calculation 

              n = 23.12677
          delta = 0.5
             sd = 0.555
      sig.level = 0.05
          power = 0.85
    alternative = two.sided

NOTE: n is number in *each* group
  1. The plot is shown above. Sagebrush lizards appear to run faster than western fence lizards.
  2. The data provide evidence of a difference in mean top running speeds between western fence and sagebrush lizards (T = -5.2217 on 32.57 df, p < 0.0001).
  3. The mean top running speed of sagebrush lizards is estimated to be 0.702 m/s faster than that of western fence lizards (SE 0.1344).
  4. With 95% confidence, the mean top running speed of sagebrush lizards is estimated to be between 0.4283 and 0.9755 m/s faster than that of western fence lizards.
  5. You would need 24 per species.
  1. Meadowfoam is a small plant found growing in moist meadows of the US Pacific Northwest. Researchers reported the results from one study in a series designed to find out how to elevate meadowfoam production to a profitable crop. In a controlled growth chamber, they focused on the effect of light intensity (μmol/m^2/sec) on flowering by recording the average number of flowers per plant in experimental plots when exposed to one of six light intensity levels. The resulting data is stored in the meadow dataset.

    1. [L3] Construct side-by-side boxplots of the distributions of flowers per plant by intensity level. Does it appear that mean flowering varies depending on light intensity? If so, describe the apparent relationship.
    2. [L9] Do the data seem to satisfy the assumptions for ANOVA? Why or why not?
    3. [L9] Test for an effect of intensity on mean flowering at the 1% significance level. Report the result of the omnibus test in context following conventional style.
    4. [L9] Estimate the effect size. Provide a 99% confidence interval and interpret the interval in context following conventional style.
    5. [L9] Estimate the proportion of variation in flowering not attributable to light intensity.
    6. [L9] Suppose you were redesigning the study with only the 300, 600, and 900 levels of light intensity. How many experimental plots would you need in total to detect an effect of η2=0.25 using a 1% level omnibus test 80% of the time?
library(effectsize)

# load data
load('data/meadow.RData')

# side-by-side boxplots
boxplot(flowers ~ intensity, data = meadow, xlab = 'light intensity')

# fit anova model
fit <- aov(flowers ~ intensity, data = meadow)
summary(fit)
            Df Sum Sq Mean Sq F value  Pr(>F)   
intensity    5   2684   536.7   5.839 0.00224 **
Residuals   18   1654    91.9                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# estimate effect size
eta_squared(fit, partial = F, alternative = 'two.sided', ci = 0.99)
# Effect Size for ANOVA (Type I)

Parameter | Eta2 |       99% CI
-------------------------------
intensity | 0.62 | [0.05, 0.80]
# power sample size calculations
power.anova.test(groups = 3,
                 sig.level = 0.01,
                 power = 0.8,
                 between.var = 0.25,
                 within.var = 0.75)

     Balanced one-way analysis of variance power calculation 

         groups = 3
              n = 22.3919
    between.var = 0.25
     within.var = 0.75
      sig.level = 0.01
          power = 0.8

NOTE: n is number in each group
  1. The boxplots are shown above. It does look like there’s an effect: flowering appears to decrease with light intensity.
  2. Yes: the variation in flowering is similar across groups and there are no major outliers.
  3. The data provide evidence of an effect of light intensity on mean flowering (F = 5.839 on 5 and 18 df, p = 0.00224).
  4. An estimated 62% of variation in flowering is attributable to light intensity. With 99% confidence, an estimated 5%-80% of variation in flowering is attributable to light intensity.
  5. If an estimated 62% of variation in flowering is attributable to light intensity, then the estimated share of variation in flowering not attributable to light intensity is 38%.
  6. You would need 23 plots per group, or 69 in total.