13.3 Including multiple covariates

We are interested in investigating a model that relates birthweight to length of pregnancy and mother’s height. We will use the following multivariable linear regression model:

\[ \text{Model 4: } y_i = \beta_0 + \beta_1 l_i + \beta_2h_i + \epsilon_i \]

The outcome \(y_i\) denotes the birthweight (in oz) for the \(i^{th}\) baby. The predictors \(l_i\) and \(h_i\) denote the length of pregnancy (i.e. number of gestational days), and the height of the mother (in inches), for the \(i^{th}\) baby, respectively.

The linear regression can be conducted in R using the lm() command:

# Model 4: Relating birthweight to length of pregnancy and mother's height
data<- read.csv('https://www.inferentialthinking.com/data/baby.csv')
model4<-lm(Birth.Weight~Gestational.Days+Maternal.Height, data=data)
summary(model4)
Call:
lm(formula = Birth.Weight ~ Gestational.Days + Maternal.Height, 
    data = data)

Residuals:
    Min      1Q  Median      3Q     Max 
-53.829 -10.589   0.246  10.254  54.403 

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
(Intercept)      -88.51993   14.31910  -6.182 8.73e-10 ***
Gestational.Days   0.45237    0.03006  15.051  < 2e-16 ***
Maternal.Height    1.27598    0.19049   6.698 3.27e-11 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 16.44 on 1171 degrees of freedom
Multiple R-squared:  0.1969,	Adjusted R-squared:  0.1955 
F-statistic: 143.5 on 2 and 1171 DF,  p-value: < 2.2e-16

Interpretation of the regression coefficients

  • \(\hat{\beta}_1=0.45\). This is the estimated regression coefficient for number of gestational days. It is interpreted as: the expected increase in a baby’s birthweight for each gestational day, amongst babies whose mothers were of the same height, is 0.45 ounces.

It may be tempting to make causal inference from regression models such as Model 4, i.e. “longer pregnancies cause an increase in birthweight”. However, this is far from straightforward. Based on the results presented above, it would be reasonable to say that “birthweight increases with length of pregnancy”. However, it is much less reasonable to claim that higher birthweight is caused by longer pregnancies (based on these results alone), because there may be an unobserved third variable that is the “real” cause of both increased length of pregancy and birthweight. Causal statements require more than just the results of a statistical model to make them plausible; this is a topic that we return to in the next lesson.

Excerise: What is the interpretation of \(\hat{\beta}_2?\)

Interpretation of the intercept

  • \(\hat{\beta}_0=-88.52\). The interpretation is that the estimated mean birthweight for a child who was born after 0 gestastional days and whose mother’s height is 0 inches is -88.52 ouces. Clearly this is an absurd value to estimate because no babies are born that quickly and no mothers are that short. If we wish to obtain a more reasonable intercept, we can use a technique called centering.