15.4 Estimating the parameters¶

Having specified our model, we now want to use a sample of data to obtain estimates of the model parameters.

15.4.1 Statistical model and observed data¶

Data: Suppose we have a sample of \(n\) people. Person \(i\) has an observed \(X\) value of \(x_i\) and an observed outcome \(y_i\). Therefore, our sample of data consists of: \(\{ (x_i, y_i); i=1,2,...,n\}\).

Statistical model: Our statistical model assumes that these observations are independent (between people) and are drawn from the distribution:

\[ Y_i | X=x_i \sim Bernoulli(\pi_i) \]

where

\[ \pi_i = P(Y_i=1 | X=x_i) \]

We further have a model relating the outcome to the independent variable:

\[ logit(\pi_i) = \beta_0 + \beta_1 x_i \qquad \text{or, equivalently:} \qquad \pi_i = \frac{e^{\beta_0 + \beta_1 x_i}}{1 + e^{\beta_0 + \beta_1 x_i}} \]

We could put these aspects all together to write our statistical model concisely as:

\[ Y_i | X=x_i \sim Bernoulli\left( \frac{e^{\beta_0 + \beta_1 x_i}}{1 + e^{\beta_0 + \beta_1 x_i}} \right) \]

Note

In the previous section, we used the notation \(\pi_x\) in order to emphasise that this probability is conditional on the value of \(X\). Now we are applying the distribution to a sample of people so we have changed to \(\pi_i\) to emphasise that the probability is conditional on whatever value \(X\) takes for person \(i\).

15.4.2 Maximum likelihood estimation¶

We first need to derive the likelihood of the model. we assume that observations (people) are independent of each other, thus:

\[ \begin{align*} L(\beta_0, \beta_1) &= \prod_{i=1}^n Pr(Y_i = y_i|X_i=x_i) \end{align*} \]

We can write this as

\[ \begin{align*} L(\beta_0, \beta_1) &= \prod_{i=1}^n Pr(Y_i = 1 | X_i = x_i)^{y_i}Pr(Y_i = 0 | X_i = x_i)^{1-y_i} \end{align*} \]

Because when the observed outcome is 1, the second term above is 1 (recall \(x^0\)=1 for any \(x\)) so we just have \(Pr(Y_i = 1|X_i=x_i)\), which is equal to \(Pr(Y_i = y_i|X_i=x_i)\) when \(y_i = 1\). Conversely, when \(y_i = 0\), the first term becomes 1 and we are left with just the second term.

Now \(Pr(Y_i = 1 | X_i = x_i)\) is just the fraction within the Bernoulli distribution above, so we can substitute this in to get:

\[\begin{split} \begin{align*} L(\beta_0, \beta_1) &= \prod_{i=1}^n \left( \frac{e^{\beta_0 + \beta_1 x_i}}{1 + e^{\beta_0 + \beta_1 x_i}} \right)^{y_i} \left( \frac{1}{1 + e^{\beta_0 + \beta_1 x_i}} \right)^{1-y_i} \\ & = \prod_{i=1}^n (e^{\beta_0 + \beta_1 x_i})^{y_i} \times \left(\frac{1}{1 + e^{\beta_0 + \beta_1 x_i}} \right) \end{align*} \end{split}\]

Taking the log of the above likelihood, we derive the following log-likelihood

\[\begin{split} \begin{align*} l(\beta_0, \beta_1) &= \sum_{i=1}^n \{ y_i log(e^{\beta_0 + \beta_1 x_i}) - log \left(1 + e^{\beta_0 + \beta_1 x_i} \right) \} \\ &= \sum_{i=1}^n \{ y_i (\beta_0 + \beta_1 x_i) - log \left(1 + e^{\beta_0 + \beta_1 x_i} \right) \} \end{align*} \end{split}\]

By maximizing this log-likelihood over the parameters \((\beta_0, \beta_1)\), we can obtain the maximum likelihood estimates of the parameters: \((\hat{\beta}_0, \hat{\beta}_1)\). There is no closed-form solution to this optimisation problem. Therefore, the maximisation over the parameters is done numerically.

Statistics for Health Data Science

15.4 Estimating the parameters¶

15.4.1 Statistical model and observed data¶

15.4.2 Maximum likelihood estimation¶