16.2 Generalised Linear Model Components¶

A generalised linear model consists of three components:

A random component¶

This refers to the probability distribution of the outcome variable \(Y_{i}\) (for the ith of n independently sampled observations). It specifies the conditional distribution of the outcome given the values of the predictors (covariates) in the model. \(Y_{i}\) is generally formulated as distribution from the exponential family, however subsequent work has extended GLMs to multivariate exponential families, to certain non-exponential families and to also to situations where the distribution of \(Y_{i}\) is not completely specified. Within this chapter we will only explore the application to distributions from the exponential family (i.e. Normal, Gamma, Poisson, Bernoulli etc.)

A systematic component (the linear predictor)¶

This is the linear function of the predictors (covariates) in the model

\[ \eta_{i} = \beta_{0} + \beta_{1}X_{i1} + \beta_{2}X_{i2} + … + \beta_{k}X_{ik} \]

Just as in linear and logistic regression models, the predictors (covariates) \(X_{ij}\) may be continuous and/or categorical.

A link function¶

This function transforms the expectation of the predictors (covariates) to be linear with the outcome variable.

Suppose we let \(\mu_{i} = E[Y_{i}]\). Then the link function is a function \(g(.)\) with

\[ g(\mu_{i} ) = \eta_{i} \]

Or in other words,

\[ g(\mu_{i} ) = \beta_{0} + \beta_{1}X_{i1} + \beta_{2}X_{i2} + … + \beta_{k}X_{ik} \]

This can be inverted so

\[ \mu_{i} = g^{-1}(\eta_{i}) = g^{-1}(\beta_{0} + \beta_{1}X_{i1} + \beta_{2}X_{i2} + … + \beta_{k}X_{ik}). \]

The inverse link \(g^{-1}(.)\) is often called the mean function as it gives the expected value of the outcome.

16.2.1 An example - logistic regression¶

Suppose we wish to fit a logistic regression model (which we will see is one particular type of GLM) for a binary outcome \(Y\) and a single covariate \(X\). Normally we write \(\pi_i\) for the expected outcome but here we will use \(\mu_i\) instead, so you can see how the model we had previously connects with the more general notation above. Thus, we let \(\mu_i = E[Y_i]\) (=\(\pi_i\) in previous sessions). We have:

\[ Y_i \overset{iid}{\sim} Bernouilli(\mu_i), \qquad \text{for} \ i=1, 2, ..., n \]

The linear predictor is given by:

\[ \eta_i = \beta_0 + \beta_1 X_i \]

The link function can be defined generically. In the equation below, \(z\) has no intrinsic meaning; it is just used here to enable us to define a function. The link function for logistic regression is the logit function:

\[ g(z) = log \left\{ \frac{z}{1-z} \right\} \]

Setting this equal to \(\eta_i\), as per the definition above, we get:

\[ g(\mu_i) = log \left\{ \frac{\mu_i}{1-\mu_i} \right\} = \beta_0 + \beta_1 X_i \]

which is the logistic regression model we met previously.

Statistics for Health Data Science

16.2 Generalised Linear Model Components¶

A random component¶

A systematic component (the linear predictor)¶

A link function¶

16.2.1 An example - logistic regression¶