
Applied Statistics for Life Sciences

Mean flowering depends on light intensity.
In an ANOVA model, the explanatory variable is treated as categorical.
But what if we had observational data instead?
Could we estimate mean flowering as a continuous function of intensity?
Ruff Figural Fluency Test (RFFT) is a cognitive assessment.
| casenr | age | rfft |
|---|---|---|
| 126 | 37 | 136 |
| 33 | 36 | 80 |
| 145 | 37 | 102 |
| 146 | 37 | 85 |

Question of interest:
How much does cognitive ability as measured by RFFT decline with age on average?
Ruff Figural Fluency Test (RFFT) is a cognitive assessment.
| casenr | age | rfft |
|---|---|---|
| 126 | 37 | 136 |
| 33 | 36 | 80 |
| 145 | 37 | 102 |
| 146 | 37 | 85 |

A straight line seems to describe the RFFT-age relationship well enough.
This suggests a model:
The equation of a line in slope-intercept form is:
There is exactly one line through any two points.

We say a set of data points
Some things to keep in mind:

We can articulate two properties of linear trends:
Note that trends in each row are identical.

Correlation is a signed measure of strength of linear relationship.
When data points are far from the mean at the same time in the same direction, the magnitude will be larger.
Common mistake: strength

Most of these are bad. Some are worse than others. How might one measure this?

Hint: consider the residuals – distances from the line to each point.


Residuals are the distances to each point:
Quality of fit can be measured by:
Now consider what the bias and SSE (total squared error) capture.


The line with no bias and minimal total error is called the least squares line:
With each year of age, RFFT decreases by 1.191 points on average.
The least squares line has an analytic solution:
The simple linear regression model is:
The values that minimize error subject to the model being unbiased are:
These are called the least squares estimates.
According to the model, a one-unit increment in
With each additional year of age, mean RFFT score decreases by an estimated 1.191 points.
formula = <RESPONSE> ~ <EXPLANATORY> specifies the modeldata = <DATAFRAME> specifies the observationsThe residual standard deviation provides an estimate of error variability:
Is the relationship actually linear?

Two ways to check:
Local smoothing is shown in blue.
The linear model is a fine approximation here – the curvature is very minor – but let’s consider an alternative model specification as a thought exercise.

Linear models can be used to capture more than just linear relationships!
Kleiber’s law refers to the relationship between metabolic rate and body mass.
Exponentiating both sides of the fitted SLR model equation:
So we’ve really estimated what’s known as a power law relationship:
The estimate and interval for
Every doubling of body mass is associated with an estimated 66.87% increase in median metabolism.
How much does RFFT decline with age?

Simple linear regression (SLR) model:
Call:
lm(formula = rfft ~ age, data = prevend)
Coefficients:
(Intercept) age
134.098 -1.191
Interpretation:
With each additional year of age, mean RFFT score decreases by an estimated 1.191 points.
The residual standard deviation is an estimate of the unexplained variation in RFFT.
More unexplained variation entails more sampling variability in the model fit.
Standard errors for the coefficients are:
While you won’t need to know these formulae, do notice that:
If the errors are symmetric and unimodal, then the sampling distribution of

Significance test:
Confidence interval:
confidence interval using
Inference for the intercept is analogous, but not very common.
The model summary shows most quantities of interest, except CIs.
Call:
lm(formula = rfft ~ age, data = prevend)
Residuals:
Min 1Q Median 3Q Max
-56.085 -14.690 -2.937 12.744 77.975
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 134.0981 6.0701 22.09 <2e-16 ***
age -1.1908 0.1007 -11.82 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 20.52 on 206 degrees of freedom
Multiple R-squared: 0.4043, Adjusted R-squared: 0.4014
F-statistic: 139.8 on 1 and 206 DF, p-value: < 2.2e-16
Age explains an estimated 40.43% of variation in RFFT.
With each year of age mean RFFT declines by an estimated 1.19 points (SE 0.10).
There is a significant association between age and mean RFFT score (T = -11.82 on 206 degrees of freedom, p < 0.0001).
Take a moment to locate the quantities that support the conclusions listed at right.
Confidence interval:
2.5 % 97.5 %
(Intercept) 122.130647 146.0654574
age -1.389341 -0.9922471
With 95% confidence, each additional year of age is associated with a decrease in mean RFFT score of between 0.99 and 1.39 points.
Since the intercept is not meaningful in this context, we don’t interpret that interval.
There are two possible ways to interpret model predictions:

With 95% confidence, the mean RFFT score among 55-year-olds is estimated to be between 65.71 and 71.50 points.
There are two possible ways to interpret model predictions:

With 95% confidence, the RFFT score for an individual 55 year old is estimated to be between 28.05 and 109.16 points.
Pointwise intervals shown along the line provide a visual of the model uncertainty.


Why the difference? Individual observations are more variable than averages.
Call:
lm(formula = log.metab ~ log.mass, data = kleiber)
Residuals:
Min 1Q Median 3Q Max
-1.14216 -0.26466 -0.04889 0.25308 1.37616
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.63833 0.04709 119.73 <2e-16 ***
log.mass 0.73874 0.01462 50.53 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.4572 on 93 degrees of freedom
Multiple R-squared: 0.9649, Adjusted R-squared: 0.9645
F-statistic: 2553 on 1 and 93 DF, p-value: < 2.2e-16

There is a significant association between body mass and metabolism (p < 0.0001): body mass explains 96.49% of variation in metabolism; with 95% confidence, a unit increment in log mass is associated with an estimated increase in mean log metabolism between 0.7097 and 0.7678.
How much energy do we consume on a daily basis?
Conversions:

Using the SLR model, estimated resting energy consumption is:
Left, prediction curve with 95% confidence interval.
How much energy do you consume on a daily basis?
Conversions:

Using the SLR model, estimated resting energy consumption is:
Left, prediction curve with 95% prediction interval.
The Hubble constant
Least squares estimate of
90% CI for the age of the universe:
# interval for age of universe in bn yr
km.mpc <- 3.09e19
yr.sec <- 1/(60*60*24*365)
confint(fit, level = 0.9)*km.mpc*yr.sec/1e9 5 % 95 %
velocity 10.98235 13.12108
With 90% confidence, the universe is estimated to be between 10.98 and 13.12 billion years old.
STAT218