Measures of association

Inference for relative risk and odds ratios from 2x2 contingency tables

Today’s agenda

  1. [lecture] relative risk and odds ratios
  2. [lab] using epitools to estimate risk and odds ratios in R
  3. [discussion] final project guidelines

From last time: smoking

Researchers sampled 86 lung cancer patients (cases) and 86 healthy individuals (controls) and recorded smoking status:

  Smokers NonSmokers
cancer 83 3
control 72 14

At the 5% level, association is significant:

smoking.tbl <- table(smoking$group, 
                     smoking$smoking) 
smoking.tbl |> chisq.test()

    Pearson's Chi-squared test with Yates' continuity correction

data:  smoking.tbl
X-squared = 6.5275, df = 1, p-value = 0.01062

But there is a difficulty for measuring the association:

  • data are really two independent samples
  • so can’t estimate cancer prevalence

Case-control studies

Outcome-based sampling limits which proportions are estimable

  • cases and controls are two independent samples
  • exposure prevalence is estimable (within case and control populations)
  • case prevalence is not estimable

Smoking example

So due to study design, the only estimable difference in proportions is:

smoking.prop.test <- smoking.tbl |> prop.test()
smoking.prop.test$conf
[1] 0.029149 0.226665
attr(,"conf.level")
[1] 0.95

With 95% confidence, the proportion of smokers among cancer patients is estimated to be betwteen 2.9 and 22.7 percentage points higher than among controls.

This makes for an awkward conclusion:

  • we really want to know how smoking affects cancer risk
  • …not how cancer affects smoking risk

So for this kind of study, we need a different measure of association: odds ratios.

Odds

The odds of an outcome measure its relative likelihood

If \(p\) is the true cancer prevalence (a population proportion), then the odds of cancer are defined as the ratio: \[ \text{odds} = \frac{p}{1 - p} \] The odds represent the factor by which cancer is more likely than not, e.g.:

  • \(\text{odds} = 2\) indicates the outcome (cancer) is twice as likely as not
  • \(\text{odds} = 1/2\) indicates the outcome (cancer) is half as likely as not

Odds ratios

An odds ratio compares the relative likelihood of an outcome between two groups

If \(a, b, c, d\) are population proportions:

\(\;\) outcome 1 (O1) outcome 2 (O2)
group 1 (G1) a b
group 2 (G2) c d
  • the odds of outcome 1 in group 1 is \(\frac{\textcolor{red}{a}}{\textcolor{blue}{b}}\)
  • the odds of outcome 1 in group 2 is \(\frac{\textcolor{orange}{c}}{\textcolor{purple}{d}}\)

The odds ratio of outcome 1 comparing group 1 with group 2 is:

\[ \frac{\text{odds}_{G1}(O1)}{\text{odds}_{G2}(O1)} = \frac{\textcolor{red}{a}/\textcolor{blue}{b}}{\textcolor{orange}{c}/\textcolor{purple}{d}} = \frac{\textcolor{red}{a}\textcolor{purple}{d}}{\textcolor{blue}{b}\textcolor{orange}{c}} \] A surprising algebraic fact is that:

\[ \frac{\text{odds}_{G1}(O1)}{\text{odds}_{G2}(O1)} =\frac{\text{odds}_{O1}(G1)}{\text{odds}_{O2}(G1)} \]

This means that the ratio of odds of cancer between smokers and nonsmokers can be estimated from the ratio of odds of smoking between cases and controls.

Estimating odds ratios

For notation let:

  • \(\hat{p}_\text{case}\) : proportion of smokers among cases
  • \(\hat{p}_\text{control}\) : proportion of smokers among controls
  Smokers NonSmokers
cancer 83 3
control 72 14

An estimate of ratio of odds of cancer among smokers compared with nonsmokers (\(\omega\)) is the estimated ratio of odds of smoking among cancer patients compared with controls: \[ \hat{\omega} =\left(\frac{\hat{p}_\text{case}}{1 - \hat{p}_\text{case}}\right)\Bigg/\left(\frac{\hat{p}_\text{control}}{1 - \hat{p}_\text{control}}\right) = \frac{83/3}{72/14} = 5.38 \]

It is estimated that the odds of lung cancer are 5.38 times greater for smokers compared with nonsmokers.

Confidence intervals for odds ratios

The sampling distribution of the log odds ratio can be approximated by a normal model.

\[ \log\left(\hat{\omega}\right) \pm c \times SE\left(\log\left(\hat{\omega}\right)\right) \quad\text{where}\quad SE\left(\log\left(\hat{\omega}\right)\right) = \sqrt{\frac{1}{n_{11}} + \frac{1}{n_{12}} + \frac{1}{n_{21}} + \frac{1}{n_{22}}} \]

The oddsratio(...) function in the epitools package will compute and back-transform the interval for you.

oddsratio(smoking.tbl, rev = 'both', 
          method = 'wald', correction = T)
         odds ratio with 95% C.I.
          estimate    lower    upper
  control  1.00000       NA       NA
  cancer   5.37963 1.486376 19.47045
  • critical value \(c\) from the normal model
  • exponentiate to obtain an interval for \(\omega\)

With 95% confidence, the odds of lung cancer are estimated to be between 1.49 and 19.47 times greater for smokers compared with nonsmokers.

Implementation details

oddsratio(smoking.tbl, rev = 'both',
          method = 'wald', correction = T)
$data
         
          NonSmokers Smokers Total
  control         14      72    86
  cancer           3      83    86
  Total           17     155   172

$measure
         odds ratio with 95% C.I.
          estimate    lower    upper
  control  1.00000       NA       NA
  cancer   5.37963 1.486376 19.47045

$p.value
         two-sided
           midp.exact fisher.exact chi.square
  control          NA           NA         NA
  cancer  0.005116319  0.008822805 0.01062183

oddsratio is picky about data inputs:

  • outcome of interest should be second column

  • group of interest should be second row

  • the rev argument will reverse orders

    • rev = neither keeps original orientation (default)
    • rev = rows reverses order of rows
    • rev = columns reverses order of columns
    • rev = both reverses both

Interpretation

oddsratio(smoking.tbl, rev = 'both',
          method = 'wald', correction = T)
$data
         
          NonSmokers Smokers Total
  control         14      72    86
  cancer           3      83    86
  Total           17     155   172

$measure
         odds ratio with 95% C.I.
          estimate    lower    upper
  control  1.00000       NA       NA
  cancer   5.37963 1.486376 19.47045

$p.value
         two-sided
           midp.exact fisher.exact chi.square
  control          NA           NA         NA
  cancer  0.005116319  0.008822805 0.01062183

First report the test result, then the measure of association:

The data provide evidence of an association between smoking and lung cancer (\(\chi^2\) = 6.53 on 1 degree of freedom, p = 0.1062). With 95% confidence, the odds of cancer are estimated to be between 1.49 amd 19.47 times greater among smokers compared with nonsmokers, with a point estimate of 5.38.

Comments:

  • the chi.square \(p\)-value is from the test of association/independence
  • if assumptions aren’t met, can use the fisher.exact instead (see V&H 8.3.5)

Asthma data

Consider estimating the difference in proportions:

  asthma no asthma
female 49 781
male 30 769
asthma.tbl <- table(asthma$sex, asthma$asthma)
prop.test(asthma.tbl, conf.level = 0.9)

    2-sample test for equality of proportions with continuity correction

data:  asthma.tbl
X-squared = 3.6217, df = 1, p-value = 0.05703
alternative hypothesis: two.sided
90 percent confidence interval:
 0.002841347 0.040137075
sample estimates:
    prop 1     prop 2 
0.05903614 0.03754693 

With 90% confidence, asthma prevalence is estimated to be between 0.28 and 4.01 percentage points higher among women than among men.

Is a difference of up to 4 percentage points practically meaningful? Well, it depends:

  • yes if prevalence is very low
  • no if prevalence is very high

Relative risk

If \(p_F, p_M\) are the (population) proportions of women and men with asthma, then the relative risk of asthma among women compared with men is defined as:

\[ RR = \frac{p_F}{p_M} \qquad \left(\frac{\text{risk among women}}{\text{risk among men}}\right) \]

An estimate of the relative risk is simply the ratio of estimated proportions. For the asthma data, an estimate is: \[ \widehat{RR} = \frac{\hat{p}_F}{\hat{p}_M} = \frac{0.059}{0.038} = 1.57 \]

It is estimated that the risk of asthma among women is 1.57 times greater than among men.

Confidence intervals for relative risk

A normal model can be used to approximate the sampling distribution of \(\log(RR)\) and construct a confidence interval. If \(\hat{p}_1\) and \(\hat{p}_2\) are the two estimated proportions:

\[\log\left(\widehat{RR}\right) \pm c \times SE\left(\log\left(\widehat{RR}\right)\right) \quad\text{where}\quad SE\left(\log\left(\widehat{RR}\right)\right) = \sqrt{\frac{1 - p_1}{p_1n_1} + \frac{1 - p_2}{p_2n_2}}\]

The riskratio(...) function from the epitools package will compute and back-transform the interval for you:

riskratio(asthma.tbl, 
          rev = 'both', conf.level = 0.9,
          method = 'wald', correction = T)
        risk ratio with 90% C.I.
         estimate    lower    upper
  male   1.000000       NA       NA
  female 1.572329 1.083353 2.282007
  • critical value \(c\) from normal model
  • exponentiate for an interval for \(RR\)

With 90% confidence, the risk of asthma is estimated to be betwen 1.08 and 2.28 times greater for women than for men, with a point estimate of 1.57.

Implementation details

riskratio(asthma.tbl, 
          rev = 'both', conf.level = 0.9,
          method = 'wald', correction = T)
$data
        
         no asthma asthma Total
  male         769     30   799
  female       781     49   830
  Total       1550     79  1629

$measure
        risk ratio with 90% C.I.
         estimate    lower    upper
  male   1.000000       NA       NA
  female 1.572329 1.083353 2.282007

$p.value
        two-sided
         midp.exact fisher.exact chi.square
  male           NA           NA         NA
  female 0.04412095   0.04961711 0.05703135

Implementation is identical to oddsratio:

  • outcome of interest should be second column
  • group of interest should be second row
  • rearrange the input data using rev

Also similarly:

  • chi.square gives the \(p\)-value for the \(\chi^2\) test of association
  • fisher.exact gives an exact p-value you can use if assumptions aren’t met

Interpretation

riskratio(asthma.tbl, 
          rev = 'both', conf.level = 0.9,
          method = 'wald', correction = T)
$data
        
         no asthma asthma Total
  male         769     30   799
  female       781     49   830
  Total       1550     79  1629

$measure
        risk ratio with 90% C.I.
         estimate    lower    upper
  male   1.000000       NA       NA
  female 1.572329 1.083353 2.282007

$p.value
        two-sided
         midp.exact fisher.exact chi.square
  male           NA           NA         NA
  female 0.04412095   0.04961711 0.05703135

First report the test result, then the measure of association:

The data provide evidence at the 10% significance level of an association between asthma and sex (\(\chi^2\) = 3.62 on 1 degree of freedom, p = 0.057). With 90% confidence, the risk of asthma is estimated to be betwen 1.08 and 2.28 times greater for women than for men, with a point estimate of 1.57.

Comments:

  • the chi.square \(p\)-value is from the test of association/independence
  • if assumptions aren’t met, can use the fisher.exact instead

Further examples: cyclosporiasis

An outbreak of cyclosporiasis was detected among residents of New Jersey. In a case-control study, investigators found that 21 of 30 case-patients and 4 of 60 controls had eaten raspberries.

  raspberries no raspberries
case 21 9
control 4 56
  • outcome-based sampling
  • odds ratio should be used
# check test assumptions
chisq.test(cyclo.tbl)$expected
                cyclosporiasis
raspberries           case  control
  raspberries     8.333333 16.66667
  no raspberries 21.666667 43.33333
# compute measure of association and p value
oddsratio(cyclo.tbl, rev = 'both',
          method = 'wald', correction = T)
$measure
                odds ratio with 95% C.I.
raspberries      estimate    lower    upper
  no raspberries  1.00000       NA       NA
  raspberries    32.66667 9.081425 117.5048

$p.value
                two-sided
raspberries        midp.exact fisher.exact   chi.square
  no raspberries           NA           NA           NA
  raspberries    6.358611e-10 6.183017e-10 1.247883e-09

The data provide very strong evidence of an association between eating raspberries during the outbreak and case incidence (\(\chi^2\) = 36.89 on 1 degree of freedom, p < 0.0001). With 95% confidence, the odds of indcidence are estimated to be between 9.08 and 117.5 times higher among NJ residents who ate raspberries during the outbreak.

Further examples: smoking and CHD

A cohort study of 3,000 smokers and 5,000 nonsmokers investigated the link between smoking and the development of coronary heart disease (CHD) over 1 year.

  CHD no CHD
smoker 84 2916
nonsmoker 87 4913
  • two independent samples, but not outcome-based sampling
# check test assumptions
chisq.test(chd.tbl)$expected
           chd
smoker          CHD   no CHD
  smoker     64.125 2935.875
  nonsmoker 106.875 4893.125
riskratio(chd.tbl, rev = 'both',
          method = 'wald', correction = T)
$measure
           risk ratio with 95% C.I.
smoker      estimate    lower    upper
  nonsmoker 1.000000       NA       NA
  smoker    1.609195 1.196452 2.164325

$p.value
           two-sided
smoker       midp.exact fisher.exact  chi.square
  nonsmoker          NA           NA          NA
  smoker    0.001799736  0.001800482 0.001976694

The data provide very strong evidence that smoking is associated with coronary heart disease (\(\chi^2\) = 9.5711 on 1 degree of freedom, p = 0.00198). With 95% confidence, the risk of CHD is estimated to be between 1.196 and 2.164 times greater among smokers compared with nonsmokers, with a point estimate of 1.609.

Further examples: malaria vaccine

In a randomized trial for a malaria vaccine, 20 individuals were randomly allocated to receive a dose of the vaccine or a placebo.

  infection no infection
placebo 6 0
vaccine 5 9
  • small sample size, so expected counts are likely too small to meet \(\chi^2\) test assumptions
# check test assumptions
chisq.test(malaria.tbl)$expected
         outcome
treatment infection no infection
  placebo       3.3          2.7
  vaccine       7.7          6.3
riskratio(malaria.tbl, rev = 'columns',
          method = 'wald', correction = T)
$measure
         risk ratio with 95% C.I.
treatment  estimate     lower     upper
  placebo 1.0000000        NA        NA
  vaccine 0.3571429 0.1768593 0.7212006

$p.value
         two-sided
treatment midp.exact fisher.exact chi.square
  placebo         NA           NA         NA
  vaccine  0.0119195   0.01408669 0.03094368

The data provide moderate evidence that the malaria vaccine is effective (Fisher’s exact test, p = 0.0141).

A twist: how to interpret the CI? Answer: \(\text{efficacy = 1 - RR}\)

With 95% confidence, the risk of infection is estimated to be between 27.88% and 82.31% lower among vaccinated individuals, with a point estimate of 64.29% efficacy.