16.2 Generalised Linear Model Components

A generalised linear model consists of three components:

A random component

This refers to the probability distribution of the outcome variable \(Y_{i}\) (for the ith of n independently sampled observations). It specifies the conditional distribution of the outcome given the values of the predictors (covariates) in the model. \(Y_{i}\) is generally formulated as distribution from the exponential family, however subsequent work has extended GLMs to multivariate exponential families, to certain non-exponential families and to also to situations where the distribution of \(Y_{i}\) is not completely specified. Within this chapter we will only explore the application to distributions from the exponential family (i.e. Normal, Gamma, Poisson, Bernoulli etc.)

A systematic component (the linear predictor)

This is the linear function of the predictors (covariates) in the model

\[ \eta_{i} = \beta_{0} + \beta_{1}X_{i1} + \beta_{2}X_{i2} + … + \beta_{k}X_{ik} \]

Just as in linear and logistic regression models, the predictors (covariates) \(X_{ij}\) may be continuous and/or categorical.

16.2.1 An example - logistic regression

Suppose we wish to fit a logistic regression model (which we will see is one particular type of GLM) for a binary outcome \(Y\) and a single covariate \(X\). Normally we write \(\pi_i\) for the expected outcome but here we will use \(\mu_i\) instead, so you can see how the model we had previously connects with the more general notation above. Thus, we let \(\mu_i = E[Y_i]\) (=\(\pi_i\) in previous sessions). We have:

\[ Y_i \overset{iid}{\sim} Bernouilli(\mu_i), \qquad \text{for} \ i=1, 2, ..., n \]

The linear predictor is given by:

\[ \eta_i = \beta_0 + \beta_1 X_i \]

The link function can be defined generically. In the equation below, \(z\) has no intrinsic meaning; it is just used here to enable us to define a function. The link function for logistic regression is the logit function:

\[ g(z) = log \left\{ \frac{z}{1-z} \right\} \]

Setting this equal to \(\eta_i\), as per the definition above, we get:

\[ g(\mu_i) = log \left\{ \frac{\mu_i}{1-\mu_i} \right\} = \beta_0 + \beta_1 X_i \]

which is the logistic regression model we met previously.