Homework 1

Course

STAT218

Due

January 14, 2025

The census dataset contains a sample of data for 377 individuals included in the 2000 U.S. census. Load and inspect the dataset, and determine:

how many variables are in the dataset, not including census year and FIPS code
how many categorical variables are in the dataset, not including FIPS code
how many individuals are in the dataset
the youngest and oldest individual in the sample

Then:

construct a histogram of total family incomes with an appropriate amount of binning
determine an appropriate measure of center and explain your choice
determine an appropriate measure of spread; explain your choice and interpret it in context

Solution

# load and inspect dataset

[your answer here]
[your answer here]
[your answer here]

# part d: minimum and maximum age

# part e: histogram of family incomes

# part f: measure of center

# part g: measure of spread

Explanation: [your answer here]
Explanation and interpretation: [your answer here]

The cdc.samp dataset in the oibiostat package contains a sample of data for 60 individuals surveyed by the CDC’s Behavioral Risk Factors Surveillance System (BRFSS). Use the provided commands to load the dataset and then inspect it the usual way. Notice that several of the variables are 1’s and 0’s. Use the provided command ?oibiostat::cdc.samp to view the data documentation.
1. What do the values (1’s and 0’s) mean in the exerany variable?
2. What proportion of the sample are men? What proportion are women?
3. For each general health category, find the proportion of respondents who rated themselves in that category.
4. How many of the respondents have health coverage? (Hint: sum(x) will add up the values in a vector x; adding up a collection of 1’s and 0’s is equivalent to counting the number of 1’s.)
5. What percentage of the respondents have health coverage?

Solution

# load data
data('cdc.samp', package = 'oibiostat')

# check documentation
?oibiostat::cdc.samp

# part b: proportions of men and women

# part c: proportions of respondents in each general health category

# part d: number of respondents with health coverage

# part e: percentage of respondents with health coverage