Homework 1

Course

STAT218

Due

January 14, 2025

  1. The census dataset contains a sample of data for 377 individuals included in the 2000 U.S. census. Load and inspect the dataset, and determine:
  1. how many variables are in the dataset, not including census year and FIPS code
  2. how many categorical variables are in the dataset, not including FIPS code
  3. how many individuals are in the dataset
  4. the youngest and oldest individual in the sample

Then:

  1. construct a histogram of total family incomes with an appropriate amount of binning
  2. determine an appropriate measure of center and explain your choice
  3. determine an appropriate measure of spread; explain your choice and interpret it in context
# load and inspect dataset
  1. [your answer here]
  2. [your answer here]
  3. [your answer here]
# part d: minimum and maximum age

# part e: histogram of family incomes

# part f: measure of center

# part g: measure of spread
  1. Explanation: [your answer here]

  2. Explanation and interpretation: [your answer here]

  1. The cdc.samp dataset in the oibiostat package contains a sample of data for 60 individuals surveyed by the CDC’s Behavioral Risk Factors Surveillance System (BRFSS). Use the provided commands to load the dataset and then inspect it the usual way. Notice that several of the variables are 1’s and 0’s. Use the provided command ?oibiostat::cdc.samp to view the data documentation.

    1. What do the values (1’s and 0’s) mean in the exerany variable?
    2. What proportion of the sample are men? What proportion are women?
    3. For each general health category, find the proportion of respondents who rated themselves in that category.
    4. How many of the respondents have health coverage? (Hint: sum(x) will add up the values in a vector x; adding up a collection of 1’s and 0’s is equivalent to counting the number of 1’s.)
    5. What percentage of the respondents have health coverage?
Solution
# load data
data('cdc.samp', package = 'oibiostat')

# check documentation
?oibiostat::cdc.samp

# part b: proportions of men and women

# part c: proportions of respondents in each general health category

# part d: number of respondents with health coverage

# part e: percentage of respondents with health coverage