12.5 Example: continuous independent variable

We now return to our first example, where we are interested in investigating the association between birthweight and length of pregnancy. We will fit a linear model to explore this association.

12.5.1 The model

The outcome is birthweight, which is measured in ounces (oz). The independent variable is length of pregnancy, \(L\) (i.e. number of gestational days).The following model defines our assumed relationship between the length of pregnancy (\(L\)) and a baby’s birthweight (\(Y\)):

\[ \text{Model 1: }y_i = \beta_0 + \beta_1 l_i + \epsilon_i \]

We will use the lm() to perform simple linear regressions in R. Click here for details of how this command works.

The following code can be used to perform this linear regression in R:

# Model 1: Investigating the relationship between birthweight and length of pregancy
data<- read.csv('https://www.inferentialthinking.com/data/baby.csv')
model1<-lm(Birth.Weight~Gestational.Days, data=data)
summary(model1)
Call:
lm(formula = Birth.Weight ~ Gestational.Days, data = data)

Residuals:
    Min      1Q  Median      3Q     Max 
-49.348 -11.065   0.218  10.101  57.704 

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
(Intercept)      -10.75414    8.53693   -1.26    0.208    
Gestational.Days   0.46656    0.03054   15.28   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 16.74 on 1172 degrees of freedom
Multiple R-squared:  0.1661,	Adjusted R-squared:  0.1654 
F-statistic: 233.4 on 1 and 1172 DF,  p-value: < 2.2e-16

There is a lot of information contained in this output. For the moment, we will focus on the estimates of the intercept and slope. These can be found under the column heading Estimate.

  • The estimated intercept, \(\hat{\beta}_0\) is equal to -10.75. This is interpreted as: the estimated mean birthweight of a child born after 0 gestational days is -10.75oz. Since there are no observations with 0 gestational days in the study, this is an extrapolation based on the observed data and an assumption of linearity. Estimates based on extrapolation should be interpreted with caution and in this case, the results make little sense because a negative birthweight is estimated. Moreover, no child is born after 0 gestational days and so this intercept is of little interest. Later on, we will discuss a technique called centering which is often used to make the intercept term more interpretable.

  • The estimated slope, \(\hat{\beta}_1\) is equal to 0.47. This is interpreted as: the mean birthweight of a baby is estimated to increase by 0.47oz for each daily increase in the gestational period.

  • The estimated residual standard error, \(\hat{\sigma}\) is equal to 16.74 (the residual variance is equal to \(16.74^2\)). This means that the observed outcomes are scattered around the fitted regression line with a standard deviation of 16.74oz.

It is always useful to look at the data. The code below graphs the data and superimposes the fitted regression line.

options(repr.plot.width=5, repr.plot.height=5)
with(data, plot(Gestational.Days, Birth.Weight))
abline(model1)
_images/12.f. Linear Regression I_4_0.png