12.1 Introduction

A parametric statistical model is an algebraic description of how one or more outcome variables are influenced by covariates. Such models are widely used in medical research. Some examples of questions that we can investigate using statistical models include:

  • Does birthweight increase with length of pregnancy?

  • Does taking drug A reduce inflammation more than taking drug B in patients with arthritis?

  • Can we predict the risk of heart disease for our patients?

In the above examples, the outcome variables are birthweight, inflammation and heart disease. In the first two examples, the length of pregnancy and drug use are covariates. In the third example, no covariates are explicitly mentioned. However, when answering the third question, researchers may want to consider a range of patient characteristics that are associated with the risk of heart disease as covariates in their model, for example: diet, exercise, comorbodities, medications etc.

Recall that statistical models contain population parameters and representations of uncertainty. The population parameters are unknown quantities that we want to estimate from our sample and the uncertainty is a measure of the variability in the outcome variable that is not explained by the covariates.

This is the first ofthe sessions on linear regression. In this session, we will learn how to define linear regression models, how to estimate their population parameters and how to estimate measures of uncertainty. We begin by introducing the simple linear regression model which includes one outcome and one covariate. In the second session, we introduce the multivariable linear regression model, which is an extension of the simple linear regression model to situations with multiple covariates. We explore linear regression models with categorical variables, interactions and non-linear terms. In the third session, we discuss the key assumptions underlying linear regression models and important model diagnostics. The optional material to the session explores how to conduct an analysis of variance of statistical models.

Before delving in, it is worth making a note of the different terminologies that you may come across in the medical literature. Here, I have already used the terms: outcome and covariates. Table 1 summarises alternatives terms that may be used to describe the same concepts.

Outcome

Covariates

\(Y\)-variable

\(x\)-variables

Dependent variable

Independent variables

Response variable

Regressors

Output variable

Input variables

(no direct analogy)

Explanatory variables

(no direct analogy)

Predictor variables

Table 1: Different terminology used for outcome and covariates

Finally, it is important to understand that statistical models make assumptions about the form of relationships between outcomes and covariates. Although we can examine our data to investigate the validity of these assumptions (using methods covered in the next session), we can never be certain that the model is correct.