15.7 Multivariable logistic regression

Suppose we wish to relate a binary outcome (\(Y\)) to \(p\) predictor variables \((X_1, X_2, ..., X_p)\). The appropriate multivariable logistic regression model is a straightforward extension of the simple logistic regression model:

\[ \mathrm{logit}(\pi_i) = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + ,..., \beta_p x_{ip} \]

where, \(x_{ji}\) is the value of the jth predictor variable for the ith participant and \(\pi_i = P(Y_i = 1 | X_1=x_1, ..., X_p=x_p)\).

The parameters in the model are interpreted as follows:

  • \(\beta_0\) is the intercept. It is the estimated log-odds of \(Y\) when all the \(X_j\)’s are zero.

  • \(\beta_j\) is the expected change in the log-odds of \(Y\) for a 1 unit increase in \(X_j\) with all the other covariates held constant.

The \(\beta_j\)’s are the regression coefficients (otherwise known as partial regression coefficients). Each one measures the effect of one covariate controlled (or adjusted) for all of the others.

The maximum likelihood estimation process outlined earlier can be naturally extended to the multivariable model above.

15.7.1 Example

We consider an example using the dementia dataset. This time, our interest lies in modeling the relationship between the odds of being diagnosed with dementia during study follow-up and to sex (\(S\)), age (\(A\)) and BMI (\(B\)) at study baseline.

Our multivariable logistic regression model is:

\[ \mathrm{logit}(\pi_i) = \beta_0 + \beta_1 s_i + \beta_2 a_i + \beta_3 b_i \]

This model can be estimated in Rusing the glm function

dementia <- read.csv("Practicals/Datasets/Dementia/dementia2.csv")
dementia2 <- glm(dementia ~ sex + age + bmi, data = dementia, family = binomial(link="logit"))
summary(dementia2)
Call:
glm(formula = dementia ~ sex + age + bmi, family = binomial(link = "logit"), 
    data = dementia)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.1067  -0.1959  -0.1134  -0.0732   3.6917  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept) -9.783837   0.152138 -64.309  < 2e-16 ***
sex          0.306798   0.033773   9.084  < 2e-16 ***
age          0.098682   0.001413  69.826  < 2e-16 ***
bmi         -0.025619   0.003596  -7.124 1.05e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 38333  on 199999  degrees of freedom
Residual deviance: 31732  on 199996  degrees of freedom
AIC: 31740

Number of Fisher Scoring iterations: 8
cbind(exp(coefficients(dementia2)), exp(confint(dementia2)))
Waiting for profiling to be done...
2.5 %97.5 %
(Intercept)5.635516e-054.179134e-057.587428e-05
sex1.359066e+001.272090e+001.452170e+00
age1.103716e+001.100675e+001.106790e+00
bmi9.747061e-019.678335e-019.815740e-01

We can interpret the parameters as follows:

  • sex: Females are estimated to have 1.36 times higher odds of being diagnosed with dementia than men who have the same age and BMI at study baseline. The data are consistent with the true odds ratio lying between 1.27 and 1.45. The p-value, \(p<0.001\), provides strong evidence against the null hypothesis of no association between sex and dementia after adjusting for age and BMI.

  • age: The odds of being diagnosed with dementia is estimated to increase 1.1-fold for each increase in year of age at study baseline. The data are consistent with the true odds ratio lying between 1.1006 and 1.107. The p-value, \(p<0.001\), provides strong evidence against the null hypothesis of no association between age and dementia after adjusting for sex and BMI.

  • bmi: The odds of being diagnosed with dementia is estimated to reduce by 0.97 times for each increase in unit of BMI, suggesting an inverse association between BMI and odds of dementia diagnosis. The p-value, \(p<0.001\), provides strong evidence against the null hypothesis of no association between BMI and dementia after adjusting for sex and age.