15.5 Examples¶
15.5.1 Dementia and sex¶
We now return to the dementia dataset and explore the relationship between sex and diagnosis of dementia during the study period. In this is example, our outcome \(Y\) is the binary variable of whether the patient was diagnosed with dementia during follow-up (1=yes, 0=no). Our single independent variable \(S\) is sex (0=male, 1=female). The logistic regression model we will fit is:
where \(\pi_i=E(Y| S=s_i)\).
We will use the glm()
to perform simple linear regressions in R. Click here for details of how this command works.
The following code can be used to perform this logistic regression in R. We need to specify the formula for the model, which is very similar to the syntax used in linear regression modelling. In addition, we now need to tell R that we are using the logit
function and that we are assuming that the data are assumed to follow a Bernoulli distribution (which, recall is a special case of the Binomial distribution).
dementia <- read.csv("Practicals/Datasets/Dementia/dementia2.csv")
dementia1 <- glm(dementia ~ sex, data = dementia, family = binomial(link="logit"))
summary(dementia1)
Call:
glm(formula = dementia ~ sex, family = binomial(link = "logit"),
data = dementia)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.2211 -0.2211 -0.1771 -0.1771 2.8855
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.14722 0.02439 -170.01 <2e-16 ***
sex 0.44771 0.03264 13.72 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 38333 on 199999 degrees of freedom
Residual deviance: 38143 on 199998 degrees of freedom
AIC: 38147
Number of Fisher Scoring iterations: 7
We interpret the two estimated coefficients as follows:
The estimated log-odds of dementia diagnosis among males (the “baseline” group, with \(S=0\)) is -4.147.
The estimated log odds ratio for females, compared with males, is 0.4477.
For a slightly more intuitive interpretation, we will take the exponential transformation.
exp(coefficients(dementia1))
- (Intercept)
- 0.0158083366518193
- sex
- 1.5647202094551
Now we can equivalently, and perhaps more intuitively, interpret the coefficients as follows:
The estimated odds of dementia diagnosis among males is 0.0158.
The estimated odds ratio for females, compared with males, is 1.576. In other words, the odds of dementia diagnosis among females is estimated to be 1.576 times higher than among males.
15.5.2 Dementia and age¶
We now explore the relationship of dementia diagnosis and age, measured in years. In this is example, our outcome \(Y\) remains dementia diagnosis, as above, but our single independent variable \(A\) is age, measured in years. The logistic regression model we will fit is:
where \(\pi_i=E(Y| A=a_i)\).
dementia2 <- glm(dementia ~ age, data = dementia, family = binomial(link="logit"))
summary(dementia2)
Call:
glm(formula = dementia ~ age, family = binomial(link = "logit"),
data = dementia)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.9935 -0.1989 -0.1140 -0.0721 3.5947
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -10.533958 0.103139 -102.13 <2e-16 ***
age 0.101865 0.001402 72.66 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 38333 on 199999 degrees of freedom
Residual deviance: 31876 on 199998 degrees of freedom
AIC: 31880
Number of Fisher Scoring iterations: 8
We interpret the two estimated coefficients as follows:
The estimated log-odds of dementia diagnosis among people aged 0 is is -10.53. Of course, this is not a meaningful quantity. As for linear regression, we could center the age variable to provide an interpretable intercept.
The estimated log odds ratio for each increase of one year in age is 0.101.
For a slightly more intuitive interpretation, we will take the exponential transformation.
exp(coefficients(dementia2))
- (Intercept)
- 2.66170781376626e-05
- age
- 1.10723429559232
Now we can interpret the two estimated coefficients as follows:
The estimated odds of dementia diagnosis among people aged 0 is is 2.66.
The estimated odds ratio for each increase of one year in age is 1.107. In other words, the estimated odds of dementia diagnosis is multiplied by 1.11 (or, increased by 11%) with each increase in year of age at study baseline.