library(epitools)
Test 3 practice
Practice problems
The American Community Survey (ACS) is conducted annually by the U.S. Census Bureau to gather data on socioeconomic and demographic composition of American households in communities across the nation. The
acs
dataset contains 1605 responses from the 2012 ACS.- [L3] Compute the proportion of respondents of each employment status.
- [L6] Construct a 95% confidence interval for the 2012 unemployment rate (defined here as the share of U.S. adults who are unemployed).
- [L7] Test for an association between education level and employment (at the 5% level).
- [L8] Estimate the relative likelihood of unemployment by education level. Provide a 95% confidence interval.
- [L7] How do you explain the association given that there is not a significant difference in unemployment rates?
# load data
load('data/acs.RData')
# proportion of respondents in each employment category
table(acs$employment) |> prop.table()
not in labor force unemployed employed
0.41957563 0.07118412 0.50924025
# CI for unemployment rate
<- 0.06604361
phat <- sqrt(phat*(1 - phat)/1605)
phat.se + 2*c(-1, 1)*phat.se phat
[1] 0.05364505 0.07844217
# test for association between employment and education
table(acs$edu, acs$employment) |> chisq.test()
Pearson's Chi-squared test
data: table(acs$edu, acs$employment)
X-squared = 54.813, df = 2, p-value = 1.252e-12
# relative likelihood of unemployment
table(acs$edu, acs$employment) |> prop.table(margin = 1)
not in labor force unemployed employed
hs or lower 0.4727768 0.0707804 0.4564428
college 0.2562674 0.0724234 0.6713092
<- 0.0724234
phat.college <- 0.0707804
phat.hs <- phat.college/phat.hs
rl
# confidence interval for relative likelihood
table(acs$edu)
hs or lower college
1102 359
<- sqrt((1 - phat.college)/(phat.college*359) + (1 - phat.hs)/(phat.hs*1102))
log.rl.se <- log(rl) + 2*c(-1, 1)*log.rl.se
log.rl.ci
# inspect residuals
<- table(acs$edu, acs$employment) |> chisq.test()
test_rslt $residuals test_rslt
not in labor force unemployed employed
hs or lower 2.72650675 -0.05023204 -2.45607768
college -4.77694400 0.08800845 4.30314194
The
avandia
dataset contains data from a 2010 JAMA study investigating the risk of cardiovascular problems (acute myocardial infarction or heart failure) among elderly patients on two common diabetes medications. Data were obtained from health records for 31,840 patients treated with pioglitazone and 13,674 patients treated with rosiglitazone.- [L6] Which population proportions are estimable? Compute point estimates for the proportions you identify.
- [L6] Compute a point estimate for the difference in the risk of cardiovascular problems between the two medications. Interpret the estimate in context.
- [L7] Test at the 5% level for an association between medication and the rate of cardiovascular problems.
- [L8] Construct a 95% confidence interval for the relative risk of cardiovascular issues between the two treatments.
- [L8] Interpret your interval in context. Which drug is safer?
# load data
load('data/avandia.RData')
# point estimates for estimable population propotions
table(avandia) |> prop.table(margin = 1)
cardiovascular_problems
treatment yes no
Pioglitazone 0.03278894 0.96721106
Rosiglitazone 0.03707767 0.96292233
# estimate for difference in risk of cardiovascular issues
0.03707767 - 0.03278894
[1] 0.00428873
# test for association
table(avandia) |> chisq.test()
Pearson's Chi-squared test with Yates' continuity correction
data: table(avandia)
X-squared = 5.2158, df = 1, p-value = 0.02238
# estimate measure of association
riskratio(avandia$treatment, avandia$cardiovascular_problems,
rev = 'columns')
$data
Outcome
Predictor no yes Total
Pioglitazone 30796 1044 31840
Rosiglitazone 13167 507 13674
Total 43963 1551 45514
$measure
risk ratio with 95% C.I.
Predictor estimate lower upper
Pioglitazone 1.000000 NA NA
Rosiglitazone 1.130798 1.018914 1.254968
$p.value
two-sided
Predictor midp.exact fisher.exact chi.square
Pioglitazone NA NA NA
Rosiglitazone 0.02163307 0.02243094 0.0207785
$correction
[1] FALSE
attr(,"method")
[1] "Unconditional MLE & normal approximation (Wald) CI"
The
ethanol
dataset contains observations from a 2017 study comparing ethanol ablation treatments for superficial solid tumors with respect to clinical regression. In the study, hamsters with oral cancer tumors were randomly allocated to receive treatments; after a period of time, it was recorded whether the tumor had diminished (‘regressed’).- [L6] What proportion of each treatment group showed regression?
- [L7] Test for a treatment effect at the 5% level.
- [L8] Estimate the relative likelihood of regression on the ethyl cellulose treatment compared with the pure ethanol treatment. Provide a point estimate and 95% confidence interval.
# load data
load('data/ethanol.RData')
# proportion of regressions in each treatment group
table(ethanol) |> prop.table(margin = 1)
regress
treatment no yes
ethyl_cellulose 0.1428571 0.8571429
pure_ethanol_16x 0.7272727 0.2727273
# test for treatment effect
table(ethanol) |> fisher.test()
Fisher's Exact Test for Count Data
data: table(ethanol)
p-value = 0.04977
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
0.00122006 0.99821923
sample estimates:
odds ratio
0.07544934
# estimate relative likelihood of regression
riskratio(ethanol$treatment, ethanol$regress, rev = 'rows')
$data
Outcome
Predictor no yes Total
pure_ethanol_16x 8 3 11
ethyl_cellulose 1 6 7
Total 9 9 18
$measure
risk ratio with 95% C.I.
Predictor estimate lower upper
pure_ethanol_16x 1.000000 NA NA
ethyl_cellulose 3.142857 1.143202 8.64025
$p.value
two-sided
Predictor midp.exact fisher.exact chi.square
pure_ethanol_16x NA NA NA
ethyl_cellulose 0.0260181 0.04977376 0.01562887
$correction
[1] FALSE
attr(,"method")
[1] "Unconditional MLE & normal approximation (Wald) CI"
The
antibiotics
dataset contains observations of pre-existing medical conditions of 92 children involved in a study on the optimal duration of antibiotic use in treatment of tracheitis, which is an upper respiratory infection. Assume the subjects are a representative sample of children that develop tracheitis.- [L3] Based on the data, which pre-existing condition is estimated to be most common among children who develop tracheitis?
- [L6] Perform an exact 5% level test to determine whether the prevalence of pre-existing respiratory conditions exceeds 10%.
- [L6] Perform an exact 5% level test to determine whether the prevalence of pre-existing cardiovascular conditions exceeds 10%.
# load data
load('data/antibiotics.RData')
# proportions of pre-existing conditions
table(antibiotics) |> prop.table()
condition
Cardiovascular Gastrointestinal Genetic/metabolic Immunocompromised
0.17391304 0.02173913 0.06521739 0.02173913
Neuromuscular Prematurity Respiratory Trauma
0.10869565 0.35869565 0.14130435 0.10869565
# exact test of whether respiratory condition rate exceeds 10%
table(antibiotics)
condition
Cardiovascular Gastrointestinal Genetic/metabolic Immunocompromised
16 2 6 2
Neuromuscular Prematurity Respiratory Trauma
10 33 13 10
binom.test(x = 13, n = 92, p = 0.1, alternative = 'greater')
Exact binomial test
data: 13 and 92
number of successes = 13, number of trials = 92, p-value = 0.1277
alternative hypothesis: true probability of success is greater than 0.1
95 percent confidence interval:
0.08565304 1.00000000
sample estimates:
probability of success
0.1413043
# exact test of whether cardiovascular condition rate exceeds 10%
table(antibiotics)
condition
Cardiovascular Gastrointestinal Genetic/metabolic Immunocompromised
16 2 6 2
Neuromuscular Prematurity Respiratory Trauma
10 33 13 10
binom.test(x = 16, n = 92, p = 0.1, alternative = 'greater')
Exact binomial test
data: 16 and 92
number of successes = 16, number of trials = 92, p-value = 0.01985
alternative hypothesis: true probability of success is greater than 0.1
95 percent confidence interval:
0.1122375 1.0000000
sample estimates:
probability of success
0.173913