Test 3 practice

Course

STAT218

In class activity

March 6, 2025

Practice problems

library(epitools)

The American Community Survey (ACS) is conducted annually by the U.S. Census Bureau to gather data on socioeconomic and demographic composition of American households in communities across the nation. The acs dataset contains 1605 responses from the 2012 ACS.
1. [L3] Compute the proportion of respondents of each employment status.
2. [L6] Construct a 95% confidence interval for the 2012 unemployment rate (defined here as the share of U.S. adults who are unemployed).
3. [L7] Test for an association between education level and employment (at the 5% level).
4. [L8] Estimate the relative likelihood of unemployment by education level. Provide a 95% confidence interval.
5. [L7] How do you explain the association given that there is not a significant difference in unemployment rates?

Solution

# load data
load('data/acs.RData')

# proportion of respondents in each employment category
table(acs$employment) |> prop.table()


not in labor force         unemployed           employed 
        0.41957563         0.07118412         0.50924025

# CI for unemployment rate
phat <- 0.06604361
phat.se <- sqrt(phat*(1 - phat)/1605)
phat + 2*c(-1, 1)*phat.se

[1] 0.05364505 0.07844217

# test for association between employment and education
table(acs$edu, acs$employment) |> chisq.test()


    Pearson's Chi-squared test

data:  table(acs$edu, acs$employment)
X-squared = 54.813, df = 2, p-value = 1.252e-12

# relative likelihood of unemployment
table(acs$edu, acs$employment) |> prop.table(margin = 1)

             
              not in labor force unemployed  employed
  hs or lower          0.4727768  0.0707804 0.4564428
  college              0.2562674  0.0724234 0.6713092

phat.college <- 0.0724234
phat.hs <- 0.0707804
rl <- phat.college/phat.hs

# confidence interval for relative likelihood
table(acs$edu)


hs or lower     college 
       1102         359

log.rl.se <- sqrt((1 - phat.college)/(phat.college*359) + (1 - phat.hs)/(phat.hs*1102))
log.rl.ci <- log(rl) + 2*c(-1, 1)*log.rl.se

# inspect residuals
test_rslt <- table(acs$edu, acs$employment) |> chisq.test()
test_rslt$residuals

             
              not in labor force  unemployed    employed
  hs or lower         2.72650675 -0.05023204 -2.45607768
  college            -4.77694400  0.08800845  4.30314194

The avandia dataset contains data from a 2010 JAMA study investigating the risk of cardiovascular problems (acute myocardial infarction or heart failure) among elderly patients on two common diabetes medications. Data were obtained from health records for 31,840 patients treated with pioglitazone and 13,674 patients treated with rosiglitazone.
1. [L6] Which population proportions are estimable? Compute point estimates for the proportions you identify.
2. [L6] Compute a point estimate for the difference in the risk of cardiovascular problems between the two medications. Interpret the estimate in context.
3. [L7] Test at the 5% level for an association between medication and the rate of cardiovascular problems.
4. [L8] Construct a 95% confidence interval for the relative risk of cardiovascular issues between the two treatments.
5. [L8] Interpret your interval in context. Which drug is safer?

Solution

# load data
load('data/avandia.RData')

# point estimates for estimable population propotions
table(avandia) |> prop.table(margin = 1)

               cardiovascular_problems
treatment              yes         no
  Pioglitazone  0.03278894 0.96721106
  Rosiglitazone 0.03707767 0.96292233

# estimate for difference in risk of cardiovascular issues
0.03707767 - 0.03278894

[1] 0.00428873

# test for association
table(avandia) |> chisq.test()


    Pearson's Chi-squared test with Yates' continuity correction

data:  table(avandia)
X-squared = 5.2158, df = 1, p-value = 0.02238

# estimate measure of association
riskratio(avandia$treatment, avandia$cardiovascular_problems, 
          rev = 'columns')

$data
               Outcome
Predictor          no  yes Total
  Pioglitazone  30796 1044 31840
  Rosiglitazone 13167  507 13674
  Total         43963 1551 45514

$measure
               risk ratio with 95% C.I.
Predictor       estimate    lower    upper
  Pioglitazone  1.000000       NA       NA
  Rosiglitazone 1.130798 1.018914 1.254968

$p.value
               two-sided
Predictor       midp.exact fisher.exact chi.square
  Pioglitazone          NA           NA         NA
  Rosiglitazone 0.02163307   0.02243094  0.0207785

$correction
[1] FALSE

attr(,"method")
[1] "Unconditional MLE & normal approximation (Wald) CI"

The ethanol dataset contains observations from a 2017 study comparing ethanol ablation treatments for superficial solid tumors with respect to clinical regression. In the study, hamsters with oral cancer tumors were randomly allocated to receive treatments; after a period of time, it was recorded whether the tumor had diminished (‘regressed’).
1. [L6] What proportion of each treatment group showed regression?
2. [L7] Test for a treatment effect at the 5% level.
3. [L8] Estimate the relative likelihood of regression on the ethyl cellulose treatment compared with the pure ethanol treatment. Provide a point estimate and 95% confidence interval.

Solution

# load data
load('data/ethanol.RData')

# proportion of regressions in each treatment group
table(ethanol) |> prop.table(margin = 1)

                  regress
treatment                 no       yes
  ethyl_cellulose  0.1428571 0.8571429
  pure_ethanol_16x 0.7272727 0.2727273

# test for treatment effect
table(ethanol) |> fisher.test()


    Fisher's Exact Test for Count Data

data:  table(ethanol)
p-value = 0.04977
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
 0.00122006 0.99821923
sample estimates:
odds ratio 
0.07544934

# estimate relative likelihood of regression
riskratio(ethanol$treatment, ethanol$regress, rev = 'rows')

$data
                  Outcome
Predictor          no yes Total
  pure_ethanol_16x  8   3    11
  ethyl_cellulose   1   6     7
  Total             9   9    18

$measure
                  risk ratio with 95% C.I.
Predictor          estimate    lower   upper
  pure_ethanol_16x 1.000000       NA      NA
  ethyl_cellulose  3.142857 1.143202 8.64025

$p.value
                  two-sided
Predictor          midp.exact fisher.exact chi.square
  pure_ethanol_16x         NA           NA         NA
  ethyl_cellulose   0.0260181   0.04977376 0.01562887

$correction
[1] FALSE

attr(,"method")
[1] "Unconditional MLE & normal approximation (Wald) CI"

The antibiotics dataset contains observations of pre-existing medical conditions of 92 children involved in a study on the optimal duration of antibiotic use in treatment of tracheitis, which is an upper respiratory infection. Assume the subjects are a representative sample of children that develop tracheitis.
1. [L3] Based on the data, which pre-existing condition is estimated to be most common among children who develop tracheitis?
2. [L6] Perform an exact 5% level test to determine whether the prevalence of pre-existing respiratory conditions exceeds 10%.
3. [L6] Perform an exact 5% level test to determine whether the prevalence of pre-existing cardiovascular conditions exceeds 10%.

Solution

# load data
load('data/antibiotics.RData')

# proportions of pre-existing conditions
table(antibiotics) |> prop.table()

condition
   Cardiovascular  Gastrointestinal Genetic/metabolic Immunocompromised 
       0.17391304        0.02173913        0.06521739        0.02173913 
    Neuromuscular       Prematurity       Respiratory            Trauma 
       0.10869565        0.35869565        0.14130435        0.10869565

# exact test of whether respiratory condition rate exceeds 10%
table(antibiotics)

condition
   Cardiovascular  Gastrointestinal Genetic/metabolic Immunocompromised 
               16                 2                 6                 2 
    Neuromuscular       Prematurity       Respiratory            Trauma 
               10                33                13                10

binom.test(x = 13, n = 92, p = 0.1, alternative = 'greater')


    Exact binomial test

data:  13 and 92
number of successes = 13, number of trials = 92, p-value = 0.1277
alternative hypothesis: true probability of success is greater than 0.1
95 percent confidence interval:
 0.08565304 1.00000000
sample estimates:
probability of success 
             0.1413043

# exact test of whether cardiovascular condition rate exceeds 10%
table(antibiotics)

condition
   Cardiovascular  Gastrointestinal Genetic/metabolic Immunocompromised 
               16                 2                 6                 2 
    Neuromuscular       Prematurity       Respiratory            Trauma 
               10                33                13                10

binom.test(x = 16, n = 92, p = 0.1, alternative = 'greater')


    Exact binomial test

data:  16 and 92
number of successes = 16, number of trials = 92, p-value = 0.01985
alternative hypothesis: true probability of success is greater than 0.1
95 percent confidence interval:
 0.1122375 1.0000000
sample estimates:
probability of success 
              0.173913