16.6 Introduction to Poisson Generalised Linear Modelling (Poisson Regression)¶

In the previous session we met an important type of GLM - logistic regression. Now we will now consider another important GLM - Poisson regression.

16.6.1 Poisson Distribution Recap¶

The Poisson distribution was first published by Siméon Denis Poisson in 1838. Poisson was a French mathematician, engineer, and physicist, his name is one of 72 engraved on the Eiffel Tower in Paris. The Poisson distribution is a skewed, discrete distribution restricted to non-negative numbers. The shape of the distribution is defined by the shape parameter \(\lambda\) which represents the average number of events in the given time interval. As \(\lambda\) increases the distribution looks more and more like the normal distribution. When \(\lambda\) is about 10 or greater, then a normal distribution is a good approximation.

16.6.2 Why can’t we just use Ordinary Linear Regression?¶

One of the main assumptions required for fitting an ordinary linear regression (OLR) is that the residual errors must follow a normal distribution. For this to be achieved with data from a skewed distribution, a transformation must be applied however with discrete data this can be very problematic (making the interpretation of the findings unfeasibly difficult) or impossible (for example, a high number of 0’s could prevent normality from being achieved). Another issue is that an OLR has the ability to create negative predicted values which would be theoretically impossible. For these reasons it is better to apply a method which actually reflects the natural distribution instead of trying to make the distribution reflect the method. This is why a Poisson regression is generally more suited to count data than OLR.

16.6.3 Poisson Regression¶

A GLM for Poisson distributed outcome is commonly known as Poisson regression but is sometimes referred to as a log-linear model.

Random component¶

Supppose we have an outcome \(Y\) representing counts of events over a fixed time period \(T\). For simplicity, we will let \(T=1\), i.e. we have followed our individuals up for a single unit of time (e.g. one year).

Suppose we are happy to assume that \(Y\) follows a Poisson distribution. We wish to model the relationship between a vector of \(p\) covariates \(\mathbf{X}\) and the expectation of \(Y\) (and therefore also the variance of \(Y\), since the mean is equal to the variance for a Poisson variable).

We let \(\mu = E[Y | X]\) and assume:

\[ \mathbf{Y} \overset{iid}{\sim} Poisson(\mu) \]

Systematic component (linear predictor)¶

The linear predictor is a linear function of the covariates \(\mathbf{X}\):

\[ \beta_0 + \beta_1 X_1 + \beta_2 X_2 + .... + \beta_p X_p \]

Using vector notation to simplify the maths, this is equivalent to:

\[ \mathbf{X}^T\mathbf{\beta} \]

(Note that we assume the vector \(\mathbf{X}\) contains a constant, so \(\mathbf{X}^T = (1, X_1, X_2, ..., X_p)\).

Link function¶

We want an equation that connects the expected outcome \(\mu\) (which must be positive) to the linear predictor \(\mathbf{X}^T\mathbf{\beta}\) (which can be any real value, positive or negative). The link function is applied to the expected outcome, therefore in this case an appropriate link function must map the positive real values to all real values. An obvious candidate is the natural logarithm. This is the default (often called the canonical) link function for Poisson variables. Then, applying the link function to the expected value of the outcome, we have:

\[ ln(\mu) = \mathbf{X}^T\mathbf{\beta} \]

This is the Poisson regression model.

Regression coefficients¶

In the model above, \(\beta\) is a vector of regression coefficients. An element of \(\beta\) represents the expected change in the natural \(log\) of the mean per unit change of one explanatory variable in \(X\) (constraining the other elements to not change).

We can interpret \(\mu\) as the expected rate of the outcome (remember we assume the fixed time period being considered is \(T=1\)). Consider a simpler example with a single binary covariate \(X\). Then the model above becomes

\[ ln(\mu) = \beta_0 + \beta_1 X \]

This says that

\[\begin{split} \begin{align*} ln(\mu) &= \begin{cases} \beta_0 &\mbox{if } X=0 \\ \beta_0 + \beta_1 &\mbox{if } X=1 \\ \end{cases} \end{align*} \end{split}\]

The expeted rates in the two groups \((X=0)\) and \((X=1)\) are then:

\[\begin{split} \begin{align*} \mu &= \begin{cases} e^{\beta_0} &\mbox{if } X=0 \\ e^{\beta_0 + \beta_1} &\mbox{if } X=1 \\ \end{cases} \end{align*} \end{split}\]

So the rate ratio comparing the rate in the exposed \((X=1)\) with the rate in the unexposed \((X=0)\) is:

\[ \begin{align*} RR = \frac{e^{\beta_0 + \beta_1}}{e^{\beta_0}} = e^{\beta_1} \end{align*} \]

Therefore, \(e^{\beta_1}\) can be interpreted as the (incidence) rate ratio comparing the exposed with the unexposed groups. Or \(\beta_1\) can be interpreted as the log rate ratio.

More generally, a regression coefficient can be intepreted as the log rate ratio associated with a unit change of that covariate. Its exponential is the analogous rate ratio.

Eestimating the regression coefficients¶

As for all GLMs, the regression coefficients, \(\mathbf{\beta}\), are estimated by maximum likelihood estimation.

The mean and variance¶

To obtain an equation for the mean of the outcome, we need to apply the inverse link function:

\[ E[\mathbf{Y}|\mathbf{X}] = \mu = g^{-1} (\mathbf{X}^T\mathbf{\beta}) \]

Which here is equal to

\[ E[\mathbf{Y}|\mathbf{X}] = \mu = e^{\mathbf{X^T\beta}}, \]

Similarly the variance of \(\mathbf{Y}\) is written:

\[ Var[\mathbf{Y}|\mathbf{X}] = Var[\mu] =Var[ g^{-1} (\mathbf{X}^T\mathbf{\beta})]. \]

16.6.4 Offsets [optional]¶

In this short sub-section, we explore an extension of the Poisson regression model above which allows us to take into account the fact that the observations in our data may represent counts from different lengths of observation time. This is a common occurrence in practice. We handle this through something called an offset term.

Above, we simplified the model by assuming each individual was observed for the same period of time. But suppose that was not he case. Suppose, for example, we are counting the number of asthma attacks experienced by school-aged children over time. Some children are followed up for one year and others are followed up for up to five years. Naturally, we would expect those followed up for longer to experience more asthma attacks, on average.

Suppose individual \(i\) is followed up for \(T_i\) years and experiences \(Y_i\) asthma attacks. We have:

\[ \mathbf{Y_i} \overset{iid}{\sim} Poisson(\lambda_i T_i) \]

where \(\lambda_i\) is the annual rate of asthma attacks for individual \(i\). The expected number of attacks for individual \(i\) is \(\mu_i = \lambda_i T_i\). We might wish to propose the following model for the rate (based on the Poisson regression model we met previously):

\[ ln(\lambda_i) = \mathbf{X}_i^T\mathbf{\beta} \]

This implies the following model for the expected number of attacks:

\[ ln(\mu_i) = ln(\lambda_i T_i) = \mathbf{X}_i^T\mathbf{\beta} + ln(T_i) \]

This is very similar to our previous model. The only difference is that we now have a covariate (\(ln(T_i)\)) appearing in the model without a regression coefficient. In other words, the regression coefficient for \(ln(T_i)\) is constrained to be equal to 1.

This is exactly what an offset term is: a covariate in a regression model with its regression coefficient constrained to be equal to 1.

Statistics for Health Data Science