13.3 Including multiple covariates¶
We are interested in investigating a model that relates birthweight to length of pregnancy and mother’s height. We will use the following multivariable linear regression model:
The outcome \(y_i\) denotes the birthweight (in oz) for the \(i^{th}\) baby. The predictors \(l_i\) and \(h_i\) denote the length of pregnancy (i.e. number of gestational days), and the height of the mother (in inches), for the \(i^{th}\) baby, respectively.
The linear regression can be conducted in R using the lm()
command:
# Model 4: Relating birthweight to length of pregnancy and mother's height
data<- read.csv('https://www.inferentialthinking.com/data/baby.csv')
model4<-lm(Birth.Weight~Gestational.Days+Maternal.Height, data=data)
summary(model4)
Call:
lm(formula = Birth.Weight ~ Gestational.Days + Maternal.Height,
data = data)
Residuals:
Min 1Q Median 3Q Max
-53.829 -10.589 0.246 10.254 54.403
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -88.51993 14.31910 -6.182 8.73e-10 ***
Gestational.Days 0.45237 0.03006 15.051 < 2e-16 ***
Maternal.Height 1.27598 0.19049 6.698 3.27e-11 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 16.44 on 1171 degrees of freedom
Multiple R-squared: 0.1969, Adjusted R-squared: 0.1955
F-statistic: 143.5 on 2 and 1171 DF, p-value: < 2.2e-16
Interpretation of the regression coefficients
\(\hat{\beta}_1=0.45\). This is the estimated regression coefficient for number of gestational days. It is interpreted as: the expected increase in a baby’s birthweight for each gestational day, amongst babies whose mothers were of the same height, is 0.45 ounces.
It may be tempting to make causal inference from regression models such as Model 4, i.e. “longer pregnancies cause an increase in birthweight”. However, this is far from straightforward. Based on the results presented above, it would be reasonable to say that “birthweight increases with length of pregnancy”. However, it is much less reasonable to claim that higher birthweight is caused by longer pregnancies (based on these results alone), because there may be an unobserved third variable that is the “real” cause of both increased length of pregancy and birthweight. Causal statements require more than just the results of a statistical model to make them plausible; this is a topic that we return to in the next lesson.
Excerise: What is the interpretation of \(\hat{\beta}_2?\)
Interpretation of the intercept
\(\hat{\beta}_0=-88.52\). The interpretation is that the estimated mean birthweight for a child who was born after 0 gestastional days and whose mother’s height is 0 inches is -88.52 ouces. Clearly this is an absurd value to estimate because no babies are born that quickly and no mothers are that short. If we wish to obtain a more reasonable intercept, we can use a technique called centering.