10.4 Predictions¶

10.4.1 Prior predictive distributions¶

Finding the predictive distribution for a new patient \(y\) before making any observations involves finding the following distribution:

\[\begin{split} p(y | \sigma^2, \phi, \tau^2) = \int p(y, \mu | \sigma^2, \phi, \tau^2) d \mu\\ = \int p(y | \mu, \sigma^2, \phi, \tau^2) p(\mu | \phi, \tau^2) d \mu \end{split}\]

This calculation involves a lot of algebra. We instead use a different approach: note that we can write the observation as \(y = \mu + \epsilon\), where \(\mu \sim N(\phi, \tau^2)\) and \(\epsilon \sim N(0, \sigma^2)\). Then, since \(\mu\) and \(\epsilon\) are independent, we can use this result:

If X and Y be independent random variables that are Normally distributed, \(X\sim N(\mu _{X},\sigma _{X}^{2})\) and \(Y\sim N(\mu _{Y},\sigma _{Y}^{2})\), then their sum is also Normally distributed: \(X + Y \sim N(\mu _{X}+\mu _{Y},\sigma _{X}^{2}+\sigma _{Y}^{2})\).

Thus we have that \(y \sim N(\phi, \tau^2 + \sigma^2)\).

In our example, before collecting any data, suppose we wish to predict the probability that the difference in cell counts is greater than 0.3 (30 \(cells/mm^3\)). We have that \(y \sim N(0, 0.1 + 0.7)\). We compute \(p(y > 0.3)\):

1-pnorm(0.3, 0, sqrt(0.8))

0.368657838608209

Given our prior distribution alone, the probability that the change in CD4 count for a new patient will exceed 0.3 (30 \(cells/mm^3\)) is approximately 0.369.

10.4.2 Posterior predictive distributions¶

Suppose that have observed \(y_1, ..., y_n \), and we want to predict future observations \(z\), assuming that \(z\) and \(y_i\) are independent for all \(1 \leq i \leq n\), conditional on \(\mu\). The posterior predictive distribution for \(z\) is given by,

\[\begin{split} \begin{align*} p(z| y_1, ..., y_n, \sigma^2, \phi, \tau^2) &= \int p(z, \mu | y_1, ..., y_n, \sigma^2, \phi, \tau^2) d \mu \\ &= \int p(z | y_1, ..., y_n,\mu, \sigma^2) p(\mu |y_1, ..., y_n,\sigma^2, \phi, \tau^2 ) d \mu. \\ \end{align*} \end{split}\]

Again, this involves some fiddly algebra but we can use a similar method to that we used for the prior predictive distribution. We wish to know what the predictive distribution of a new patient \(z\) is, given the previous observations \(y_1, ..., y_n\). We can write \(z = \mu + \epsilon\). We have that \(\mu \vert y_1,\dots,y_n \sim N\left\{ \frac{ \tau^2 n\bar{y} + \sigma^2\phi }{\tau^2 n + \sigma^2}, \frac{\sigma^2\tau^2}{\tau^2n+\sigma^2} \right\}, \) and \(\epsilon \sim N(0, \sigma^2)\).

Using the result for the sum of two independent Normal distributions, the posterior predictive distribution has the form \( N\left\{ \frac{ \tau^2 n\bar{y} + \sigma^2\phi }{\tau^2 n + \sigma^2}, \frac{\sigma^2\tau^2}{\tau^2n+\sigma^2} + \sigma ^2\right\}\)

In our example, based on both prior and observed data, the predictive distribution for cell counts in a new patient being greater than 0.3 (30 \(cells/mm^3\)) is \(N(0.596, 0.0259 + 0.7)\). We can compute \(f(z | y_1, ..., y_n > 0.3)\):

1- pnorm(0.3, 0.596, sqrt(0.7259))

0.635861643314828

After having observed the data, the predictive probability that the next patient will have a difference in CD4 cell counts of greater than 0.3 (30 \(cells/mm^3\)) has increased substantially to 0.636.

Statistics for Health Data Science

10.4 Predictions¶

10.4.1 Prior predictive distributions¶

10.4.2 Posterior predictive distributions¶