5.4 Finding the MLE¶

So far we have obtained the maximum likelihood estimate (MLE) by plotting the likelihood for different parameter values and looking for the value which yields the maximum. Of course, the estimate obtained in this way depends on how many parameter values we evaluate.

A more formal way is to determine the location of that maximal point algebraically, from the likelihood function itself. In this way, we can directly obtain the general form for the MLE in terms of the data.

This is where the log-likelihood comes into its own; we know that a value which maximises the log-likelihood also maximises the likelihood, and the impact of logs on products and powers make the algebra much simpler.

We find the maximum likelihood estimator of a parameter from the log-likelihood function through the following steps:

Method for finding MLEs:

Obtain the derivative of the log-likelihood: \(\frac{d l(\theta \mid {x})}{d \theta}\)
Set \(\frac{d l(\theta \mid {x})}{d \theta}=0\) and solve for \(\theta\)
Verify that it is a maximum by showing that the second derivative \(\frac{d ^2 l(\theta \mid {x})}{d \theta ^2 }\) is negative when the MLE is substituted for \(\theta\).

5.4.1 Binomial model¶

We will derive the MLE for the binomial example described earlier. In general, the likelihood given observed data of \(x\) responders out of \(n\) patients is

\[ L(\pi|x) = {n \choose x} \pi^{x} (1-\pi)^{n - x} \]

and so the log-likelihood is

\[\begin{split} \begin{align*} l(\pi|x) & = \log\left({n \choose x} \pi^{x} (1-\pi)^{n - x}\right) \\ & = \log {n \choose x} + x \log \pi + (n-x)\log (1-\pi) \end{align*} \end{split}\]

We can now obtain the maximum likelihood estimate from this function.

Step 1: Differentiate the log-likelihood with respect to our parameter \(\pi\)

\[ \begin{equation} \frac{d l \left( \pi \mid x \right)}{d \pi} = \frac{x}{\pi} -\frac{(n-x)}{(1-\pi)} \end{equation} \]

Step 2: We set the derivative equal to zero and solve for \(\pi\)

\[\begin{split} \begin{align*} 0 &= \frac{x}{\hat{\pi}} -\frac{(n-x)}{(1-\hat{\pi})} \\ \frac{(n-x)}{(1-\hat{\pi})} &= \frac{x}{\hat{\pi}} \\ (n-x)\hat{\pi} &= x(1-\hat{\pi}) \\ n\hat{\pi}-x\hat{\pi} &= x-x\hat{\pi} \\ \hat{\pi} &= \frac{x}{n} \\ \end{align*} \end{split}\]

Having solved the equation, we get that the maximum likelihood estimator for \(\pi\) is \(\hat{\pi} = \frac{x}{n}\) (note that we put a hat to indicate that it is an estimator).

There is one thing left for us to check: we have found that \(\frac{x}{n}\) is the point where the derivative of the log-likelihood is zero, but that could mean that it is a maximum or a minimum of the log-likelihood function. To verify that this is indeed a maximum, we need to compute the second derivative of the log-likelihood and check that it takes a negative value when \({\pi} = \frac{x}{n}\).

Step 3: Find the second derivative:

\[ \begin{equation} \frac{d l^2 \left( \pi \mid x \right)}{d \pi ^2} = -\frac{x}{\pi^2} -\frac{(n-x)}{(1-\pi)^2} \end{equation} \]

This second derivative must be negative if we plug in \(\frac{x}{n}\) for \({\pi}\). Both fractions on the right hand side have a squared number in the denominator which is positive, so we only have to think about the numerators. We have that \(-x \leq 0 \) since \(x \geq 0\) and also \(-(n-x) \leq 0 \) since \(n \geq x \geq 0\). Therefore, the value \(\frac{x}{n}\) is indeed a maximum.

Thus the MLE for \(\pi\) is \(\frac{x}{n}\).

5.4.2 Exponential model¶

The likelihood function for the exponential example is

\[ L(\lambda | y) = \lambda e^{-\lambda y} \]

Therefore we have

\[ l(\lambda | y) = \log \left( \lambda e^{-\lambda y} \right) = \log \lambda - y \lambda \]

Step 1: Differentiate the log-likelihood with respect to our parameter \(\lambda\)

\[ \frac{d l \left( \lambda \mid y \right)}{d \lambda} = \frac{1}{\lambda} - y \]

Step 2: Set the derivative to zero and solve for \(\lambda\)

\[\begin{split} \begin{align*} \frac{1}{\lambda} - y &= 0 \\ \hat{\lambda} & = \frac{1}{y} \end{align*} \end{split}\]

Step 3: Verify that this is a maximum rather than a minimum by considering the second derivative

\[ \frac{d l^2 \left(\lambda \mid y \right)}{d \lambda ^2} = -\frac{1}{\lambda^2} \]

This is negative for any value of \(\lambda\). So the MLE for \(\lambda\) is \(\hat{\lambda} = \frac{1}{y}\), one over the observed waiting time.

Statistics for Health Data Science

5.4 Finding the MLE¶

5.4.1 Binomial model¶

5.4.2 Exponential model¶