15.7 Multivariable logistic regression¶
Suppose we wish to relate a binary outcome (\(Y\)) to \(p\) predictor variables \((X_1, X_2, ..., X_p)\). The appropriate multivariable logistic regression model is a straightforward extension of the simple logistic regression model:
where, \(x_{ji}\) is the value of the jth predictor variable for the ith participant and \(\pi_i = P(Y_i = 1 | X_1=x_1, ..., X_p=x_p)\).
The parameters in the model are interpreted as follows:
\(\beta_0\) is the intercept. It is the estimated log-odds of \(Y\) when all the \(X_j\)’s are zero.
\(\beta_j\) is the expected change in the log-odds of \(Y\) for a 1 unit increase in \(X_j\) with all the other covariates held constant.
The \(\beta_j\)’s are the regression coefficients (otherwise known as partial regression coefficients). Each one measures the effect of one covariate controlled (or adjusted) for all of the others.
The maximum likelihood estimation process outlined earlier can be naturally extended to the multivariable model above.
15.7.1 Example¶
We consider an example using the dementia dataset. This time, our interest lies in modeling the relationship between the odds of being diagnosed with dementia during study follow-up and to sex (\(S\)), age (\(A\)) and BMI (\(B\)) at study baseline.
Our multivariable logistic regression model is:
This model can be estimated in R
using the glm
function
dementia <- read.csv("Practicals/Datasets/Dementia/dementia2.csv")
dementia2 <- glm(dementia ~ sex + age + bmi, data = dementia, family = binomial(link="logit"))
summary(dementia2)
Call:
glm(formula = dementia ~ sex + age + bmi, family = binomial(link = "logit"),
data = dementia)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.1067 -0.1959 -0.1134 -0.0732 3.6917
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -9.783837 0.152138 -64.309 < 2e-16 ***
sex 0.306798 0.033773 9.084 < 2e-16 ***
age 0.098682 0.001413 69.826 < 2e-16 ***
bmi -0.025619 0.003596 -7.124 1.05e-12 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 38333 on 199999 degrees of freedom
Residual deviance: 31732 on 199996 degrees of freedom
AIC: 31740
Number of Fisher Scoring iterations: 8
cbind(exp(coefficients(dementia2)), exp(confint(dementia2)))
Waiting for profiling to be done...
2.5 % | 97.5 % | ||
---|---|---|---|
(Intercept) | 5.635516e-05 | 4.179134e-05 | 7.587428e-05 |
sex | 1.359066e+00 | 1.272090e+00 | 1.452170e+00 |
age | 1.103716e+00 | 1.100675e+00 | 1.106790e+00 |
bmi | 9.747061e-01 | 9.678335e-01 | 9.815740e-01 |
We can interpret the parameters as follows:
sex: Females are estimated to have 1.36 times higher odds of being diagnosed with dementia than men who have the same age and BMI at study baseline. The data are consistent with the true odds ratio lying between 1.27 and 1.45. The p-value, \(p<0.001\), provides strong evidence against the null hypothesis of no association between sex and dementia after adjusting for age and BMI.
age: The odds of being diagnosed with dementia is estimated to increase 1.1-fold for each increase in year of age at study baseline. The data are consistent with the true odds ratio lying between 1.1006 and 1.107. The p-value, \(p<0.001\), provides strong evidence against the null hypothesis of no association between age and dementia after adjusting for sex and BMI.
bmi: The odds of being diagnosed with dementia is estimated to reduce by 0.97 times for each increase in unit of BMI, suggesting an inverse association between BMI and odds of dementia diagnosis. The p-value, \(p<0.001\), provides strong evidence against the null hypothesis of no association between BMI and dementia after adjusting for sex and age.