Test 1 study guide

Course

STAT218

Updated

January 21, 2025

The information below provides an overview of the first test: what it covers, how to prepare, the format and due date, and relevant datasets.

Scope

The first test covers the following topics:

  • data semantics
  • descriptive statistics
  • point and interval estimation for a population mean

The relevant learning outcomes on which you’ll be assessed are:

  • [L2] distinguish between observational studies and experiments and understand the limitations (practical and consequential) of each
  • [L3] summarize data using graphical and numerical techniques
  • [L4] construct and interpret confidence intervals for means

In terms of class material, this covers all lectures in weeks 1 and 2 and textbook sections 1.1-1.5, 3.3.1-3.3.3, and 4.1-4.2.

Preparation

I recommend reviewing the following materials to prepare:

  1. Lecture notes and slides
  2. Textbook sections
  3. Homework assignments
  4. Lab activities

You may also wish to try a few example problems from the relevant textbook sections.

Format

The first test will be a take-home assessment that you may complete anytime on Thursday, January 23. In format it will be similar to your homework assignments, insofar as you will have:

  • a Posit Cloud project to complete
  • a Microsoft Forms quiz to submit

The test is open-book and open-note, but you must complete it individually. No discussion or collaboration with other students in the class is allowed; failure to adhere to this policy will result in loss of credit.

To complete the test on time, you must submit the MS Forms quiz by 11:59pm Thursday January 23. In accord with class policy, there is a one-hour grace period on the submission deadline; otherwise, extensions must be arranged in advance.

Datasets

The test will involve conducting simple analyses of real datasets. The datasets you’ll work with are provided to you in advance below to help with your preparation. I recommend that you do the following:

  1. familiarize yourself with the ‘semantics’ of each dataset, including variables and their interpretations, study units, and dimensions;
  2. consider what analyses you might be asked to perform;
  3. conduct some exploratory analyses

Each dataset is provided along with a short description below.

GSS data

The General Social Survey (GSS) is a public opinion survey of U.S. households that has been ongoing for over 50 years. The gss dataset is a small subset of data from a recent year.

# load and preview data
gss <- read.csv('data/gss.csv')
head(gss)
  age    sex college.degree political.party household.size weekly.hours.worked
1  36   male         degree             ind              3                  50
2  34 female      no degree             rep              4                  31
3  24   male         degree             ind              1                  40
4  42   male      no degree             ind              4                  40
5  31   male         degree             rep              2                  40
6  32 female      no degree             rep              4                  53
          class
1  middle class
2 working class
3 working class
4 working class
5  middle class
6  middle class

Egg clutches

Data from Chen, W., et al., Maternal investment increases with altitude in a frog on the Tibetan Plateau. Journal of Evolutionary Biology 26-12 (2013) includes measurements pertaining to egg clutches of several populations of frog at breeding ponds (sites) in the eastern Tibetan Plateau.

load('data/frog.RData')
head(frog)
  site altitude clutch.size clutch.volume egg.size body.size
1  040    3,462    181.9701      177.8279 1.949845  3.630781
2  040    3,462    269.1535      257.0396 1.949845  3.630781
3  040    3,462    158.4893      151.3561 1.949845  3.715352
4  040    3,462    234.4229      223.8721 1.949845  3.801894
5  040    3,462    245.4709      234.4229 1.949845  3.890451
6  040    3,462    301.9952      288.4032 1.949845  3.890451

YRBSS

The objective of the CDC’s Youth Risk Behavior Surveillance System (YRBSS) is to track behaviors with potential negative physical and mental health impacts among adolescents. The yrbss dataset contains measurements on a few variables from 10,712 survey responses collected between 1991 and 2013.

load('data/yrbss.RData')
head(yrbss)
  age    sex grade sleep.hours exercise.days
1  14 female     9           8             4
2  14 female     9           6             2
3  15 female     9          <5             7
4  15 female     9           6             0
5  15 female     9           9             2
6  15 female     9           8             1