The information below provides an overview of the first test: what it covers, how to prepare, the format and due date, and relevant datasets.
Scope
The first test covers the following topics:
data semantics
descriptive statistics
point and interval estimation for a population mean
The relevant learning outcomes on which you’ll be assessed are:
[L2] distinguish between observational studies and experiments and understand the limitations (practical and consequential) of each
[L3] summarize data using graphical and numerical techniques
[L4] construct and interpret confidence intervals for means
In terms of class material, this covers all lectures in weeks 1 and 2 and textbook sections 1.1-1.5, 3.3.1-3.3.3, and 4.1-4.2.
Preparation
I recommend reviewing the following materials to prepare:
Lecture notes and slides
Textbook sections
Homework assignments
Lab activities
You may also wish to try a few example problems from the relevant textbook sections.
Format
The first test will be a take-home assessment that you may complete anytime on Thursday, January 23. In format it will be similar to your homework assignments, insofar as you will have:
a Posit Cloud project to complete
a Microsoft Forms quiz to submit
The test is open-book and open-note, but you must complete it individually. No discussion or collaboration with other students in the class is allowed; failure to adhere to this policy will result in loss of credit.
To complete the test on time, you must submit the MS Forms quiz by 11:59pm Thursday January 23. In accord with class policy, there is a one-hour grace period on the submission deadline; otherwise, extensions must be arranged in advance.
Datasets
The test will involve conducting simple analyses of real datasets. The datasets you’ll work with are provided to you in advance below to help with your preparation. I recommend that you do the following:
familiarize yourself with the ‘semantics’ of each dataset, including variables and their interpretations, study units, and dimensions;
consider what analyses you might be asked to perform;
conduct some exploratory analyses
Each dataset is provided along with a short description below.
GSS data
The General Social Survey (GSS) is a public opinion survey of U.S. households that has been ongoing for over 50 years. The gss dataset is a small subset of data from a recent year.
# load and preview datagss <-read.csv('data/gss.csv')head(gss)
age sex college.degree political.party household.size weekly.hours.worked
1 36 male degree ind 3 50
2 34 female no degree rep 4 31
3 24 male degree ind 1 40
4 42 male no degree ind 4 40
5 31 male degree rep 2 40
6 32 female no degree rep 4 53
class
1 middle class
2 working class
3 working class
4 working class
5 middle class
6 middle class
Egg clutches
Data from Chen, W., et al., Maternal investment increases with altitude in a frog on the Tibetan Plateau. Journal of Evolutionary Biology 26-12 (2013) includes measurements pertaining to egg clutches of several populations of frog at breeding ponds (sites) in the eastern Tibetan Plateau.
The objective of the CDC’s Youth Risk Behavior Surveillance System (YRBSS) is to track behaviors with potential negative physical and mental health impacts among adolescents. The yrbss dataset contains measurements on a few variables from 10,712 survey responses collected between 1991 and 2013.