Test 1 practice

Course

STAT218

Updated

April 17, 2025

NHANES data

We have used the NHANES data for several examples so far in class. The top few rows are shown below.

load('data/nhanes.RData')
head(nhanes)
  subj.id gender age poverty pulse bpsys1 bpdia1 totchol sleephrsnight
1       1   male  34    1.36    70    114     88    3.49             4
2       2   male  34    1.36    70    114     88    3.49             4
3       3   male  34    1.36    70    114     88    3.49             4
4       5 female  49    1.91    86    118     82    6.70             8
5       8 female  45    5.00    62    106     62    5.82             8
6       9 female  45    5.00    62    106     62    5.82             8

Here are some example questions representative of what you might see on the test.

  1. [L2] Is this observational or experimental data?
  2. [L1] Which variables are numeric and which are categorical?
  3. [L1] Classify each numeric variable as discrete or continuous.
  4. [L1] How many observations and variables are in the dataset?
  5. [L3] What proportion of respondents are male?
  6. [L3] Make histograms of systolic and diastolic blood pressure (bpsys1 and bpdia1, respectively). Describe the distributions.
  7. [L4] Construct 99% confidence intervals for mean systolic and diastolic blood pressure and interpret the intervals in context.
  8. [L4] A person is considered to have hypertension if their systolic pressure is over 130 OR their diastolic pressure is over 80. Do the intervals suggest that the average adult has hypertension?
  9. [L3] (challenge problem) How many individuals in the dataset have hypertension?
  10. [L5] Test whether mean resting pulse is 72 bpm at the 5% level. Interpret the result of the test in context.
# proportion of male respondents
table(nhanes$gender) |> proportions()

   female      male 
0.4995282 0.5004718 
# histograms of systolic and diastolic blood pressure
hist(nhanes$bpsys1)

hist(nhanes$bpdia1)

# intervals for systolic and diastolic blood pressure (quick way)
t.test(nhanes$bpsys1, conf.level = 0.99)

    One Sample t-test

data:  nhanes$bpsys1
t = 402.24, df = 3178, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 0
99 percent confidence interval:
 119.9750 121.5224
sample estimates:
mean of x 
 120.7487 
t.test(nhanes$bpdia1, conf.level = 0.99)

    One Sample t-test

data:  nhanes$bpdia1
t = 314.42, df = 3178, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 0
99 percent confidence interval:
 69.01999 70.16088
sample estimates:
mean of x 
 69.59044 
# intervals for systolic and diastolic blood pressure (long way)
bps.mean <- mean(nhanes$bpsys1)
bps.se <- sd(nhanes$bpsys1)/sqrt(3179)
cval <- qt(0.995, df = 3178)
bps.mean + c(-1, 1)*cval*bps.se
[1] 119.9750 121.5224
bpd.mean <- mean(nhanes$bpdia1)
bpd.se <- sd(nhanes$bpdia1)/sqrt(3179)
cval <- qt(0.995, df = 3178)
bpd.mean + c(-1, 1)*cval*bpd.se
[1] 69.01999 70.16088
# how many individuals have hypertension (challenge)
sys.over130 <- (nhanes$bpsys1 > 130) 
dia.over80 <- (nhanes$bpdia1 > 80)
table(sys.over130 + dia.over80)

   0    1    2 
2169  770  240 
# test whether resting pulse is 72 bpm
t.test(nhanes$pulse, mu = 72, conf.level = 0.95)

    One Sample t-test

data:  nhanes$pulse
t = 2.8235, df = 3178, p-value = 0.00478
alternative hypothesis: true mean is not equal to 72
95 percent confidence interval:
 72.18071 73.00205
sample estimates:
mean of x 
 72.59138 
  1. Observational; this is survey data with no intervention.
  2. All variables are numeric except for gender.
  3. All variables are discrete (integer-valued) except for totchol.
  4. There are 3179 observations of 8 variables.
  5. 50.05% of respondents were male.
  6. Histograms are above; the distribution of systolic pressure is right-skewed, and the distribution of diastolic pressure is symmetric. Both are unimodal.
  7. With 99% confidence, mean systolic pressure is estimated to be between 119.98 and 121.52 mmHg. With 99% confidence, mean diastolic pressure is estimated to be between 69.02 and 70.16 mmHg.
  8. No: the intervals suggest neither mean is in the elevated range.
  9. 1010 individuals, or approximately one third of respondents, have hypertension.
  10. The data provide evidence that mean resting pulse differs from 72 bpm (T = 2.82 on 3178 degrees of freedom, p = 0.00478).

Egg clutches

Data from Chen, W., et al., Maternal investment increases with altitude in a frog on the Tibetan Plateau. Journal of Evolutionary Biology 26-12 (2013) includes measurements pertaining to egg clutches of several populations of frog at breeding ponds (sites) in the eastern Tibetan Plateau. The first few rows are shown below.

load('data/frog.RData')
head(frog)
  site altitude clutch.size clutch.volume egg.size body.size
1  040    3,462    181.9701      177.8279 1.949845  3.630781
2  040    3,462    269.1535      257.0396 1.949845  3.630781
3  040    3,462    158.4893      151.3561 1.949845  3.715352
4  040    3,462    234.4229      223.8721 1.949845  3.801894
5  040    3,462    245.4709      234.4229 1.949845  3.890451
6  040    3,462    301.9952      288.4032 1.949845  3.890451

One of the problems on the test uses this dataset. Use this space to explore the data and carry out possible summaries and analyses. You might practice one or more of the following:

  • descriptive statistics
  • graphical summaries (histograms, barplots)
  • point and interval estimates for a population mean
  • t-tests for a population mean

Carry out any scratch work in this cell:

And take any notes here.