Final study guide

The information below provides an overview of the final exam: what it covers, how to prepare, the format, and practice exercises.

Scope

The final is comprehensive in scope – any material covered in lecture, labs, or assignments could appear. The material for the course broadly falls under two umbrellas:

  1. statistical inference from continuous data

    • one- and two-sample t-tests and confidence intervals
    • analysis of variance
    • nonparametric alternatives to t-tests and ANOVA
    • simple linear regression
  2. statistical inference from categorical data (weeks 8-10)

    • exact and approximate tests and intervals for proportions
    • χ2 tests
    • relative risk and odds ratios

All of the methods above are instances of population parameter estimation, hypothesis tests, and confidence intervals. Thus, you are expected to be fluent with respect to the following general concepts and their specific manifestations in the above-listed methods:

  • population/model parameters
  • sample statistics
  • point estimates
  • standard errors and sampling variability
  • sampling distributions
  • interval coverage and construction
  • statistical hypotheses/alternatives
  • test statistics
  • p-values
  • type I and type II errors
  • statistical power

These ideas are very general and provide a core conceptual framework for statistical methodology that extends well beyond the scope of this class.

Preparation

I recommend preparing a set of review notes based on the above, your existing notes, lecture slides, assigned reading, and assignments and labs.

One straightforward strategy would be to list the “core concepts” above for the main methods we discussed and identify one example from a past assignment. For instance, expand the callout block below.

Example method summary

The one-sample t-test.

Method

  • Population/model parameters: population mean μ
  • Sample statistics: sample mean and SD x¯,Sx
  • Point estimate: μ^=x¯
  • Standard error: SE(x¯)=Sxn
  • Sampling distribution: tn1 model for x¯μSE(x¯)
  • Interval: x¯±c×SE(x¯), where c is a quantile from the t model
  • Statistical hypothesis and alternatives: H0:μ=μ0 and HA:μ><μ0
  • Test statistic: T=x¯μSE(x¯)
  • p-value: from T model, proportion of samples exceeding observed test statistic in the direction of the alternative

Example application

Body temperature data. Is mean body temp actually 98.6 degrees Farenheit?

{H0:μ=98.6HA:μ98.6

# load data
load('data/temps.RData')
btemp <- temps$body.temp

# two-sided test
t.test(btemp, mu = 98.6, alternative = 'two.sided')

    One Sample t-test

data:  btemp
t = -1.3283, df = 38, p-value = 0.192
alternative hypothesis: true mean is not equal to 98.6
95 percent confidence interval:
 98.10813 98.70213
sample estimates:
mean of x 
 98.40513 

The data provide no evidence that mean body temperature differs from 98.6 degrees Farenheit (T = -1.3283 on 38 df, p = 0.192). With 95% confidence, mean body temperature is estimated to be between 98.11 and 98.70 degrees Farenheit.

You may also wish to repeat a few example problems from topics that you struggled with (or found easy, if you want a confidence boost).

Format

The test comprises a series of short data analyses in which quantitative results are provided for you. In each analysis, there are several prompts which require you to interpret results in context or perform simple subsequent calculations.

During the exam you may consult any of your course notes and are allowed the use of a calculator. However, you are not permitted access to digital materials. My personal recommendation is that you prepare a few sheets of concise notes and bring or print relevant class notes.