# load and inspect dataset
Homework 1
- The
census
dataset contains a sample of data for 377 individuals included in the 2000 U.S. census. Load and inspect the dataset, and determine:
- how many variables are in the dataset, not including census year and FIPS code
- how many categorical variables are in the dataset, not including FIPS code
- how many individuals are in the dataset
- the youngest and oldest individual in the sample
Then:
- construct a histogram of total family incomes with an appropriate amount of binning
- determine an appropriate measure of center and explain your choice
- determine an appropriate measure of spread; explain your choice and interpret it in context
Solution
- [your answer here]
- [your answer here]
- [your answer here]
# part d: minimum and maximum age
# part e: histogram of family incomes
# part f: measure of center
# part g: measure of spread
Explanation: [your answer here]
Explanation and interpretation: [your answer here]
The
cdc.samp
dataset in theoibiostat
package contains a sample of data for 60 individuals surveyed by the CDC’s Behavioral Risk Factors Surveillance System (BRFSS). Use the provided commands to load the dataset and then inspect it the usual way. Notice that several of the variables are 1’s and 0’s. Use the provided command?oibiostat::cdc.samp
to view the data documentation.- What do the values (1’s and 0’s) mean in the
exerany
variable? - What proportion of the sample are men? What proportion are women?
- For each general health category, find the proportion of respondents who rated themselves in that category.
- How many of the respondents have health coverage? (Hint:
sum(x)
will add up the values in a vectorx
; adding up a collection of 1’s and 0’s is equivalent to counting the number of 1’s.) - What percentage of the respondents have health coverage?
- What do the values (1’s and 0’s) mean in the
Solution
# load data
data('cdc.samp', package = 'oibiostat')
# check documentation
::cdc.samp
?oibiostat
# part b: proportions of men and women
# part c: proportions of respondents in each general health category
# part d: number of respondents with health coverage
# part e: percentage of respondents with health coverage