10.4 Predictions¶
10.4.1 Prior predictive distributions¶
Finding the predictive distribution for a new patient \(y\) before making any observations involves finding the following distribution:
This calculation involves a lot of algebra. We instead use a different approach: note that we can write the observation as \(y = \mu + \epsilon\), where \(\mu \sim N(\phi, \tau^2)\) and \(\epsilon \sim N(0, \sigma^2)\). Then, since \(\mu\) and \(\epsilon\) are independent, we can use this result:
If X and Y be independent random variables that are Normally distributed, \(X\sim N(\mu _{X},\sigma _{X}^{2})\) and \(Y\sim N(\mu _{Y},\sigma _{Y}^{2})\), then their sum is also Normally distributed: \(X + Y \sim N(\mu _{X}+\mu _{Y},\sigma _{X}^{2}+\sigma _{Y}^{2})\).
Thus we have that \(y \sim N(\phi, \tau^2 + \sigma^2)\).
In our example, before collecting any data, suppose we wish to predict the probability that the difference in cell counts is greater than 0.3 (30 \(cells/mm^3\)). We have that \(y \sim N(0, 0.1 + 0.7)\). We compute \(p(y > 0.3)\):
1-pnorm(0.3, 0, sqrt(0.8))
Given our prior distribution alone, the probability that the change in CD4 count for a new patient will exceed 0.3 (30 \(cells/mm^3\)) is approximately 0.369.
10.4.2 Posterior predictive distributions¶
Suppose that have observed \(y_1, ..., y_n \), and we want to predict future observations \(z\), assuming that \(z\) and \(y_i\) are independent for all \(1 \leq i \leq n\), conditional on \(\mu\). The posterior predictive distribution for \(z\) is given by,
Again, this involves some fiddly algebra but we can use a similar method to that we used for the prior predictive distribution. We wish to know what the predictive distribution of a new patient \(z\) is, given the previous observations \(y_1, ..., y_n\). We can write \(z = \mu + \epsilon\). We have that \(\mu \vert y_1,\dots,y_n \sim N\left\{ \frac{ \tau^2 n\bar{y} + \sigma^2\phi }{\tau^2 n + \sigma^2}, \frac{\sigma^2\tau^2}{\tau^2n+\sigma^2} \right\}, \) and \(\epsilon \sim N(0, \sigma^2)\).
Using the result for the sum of two independent Normal distributions, the posterior predictive distribution has the form \( N\left\{ \frac{ \tau^2 n\bar{y} + \sigma^2\phi }{\tau^2 n + \sigma^2}, \frac{\sigma^2\tau^2}{\tau^2n+\sigma^2} + \sigma ^2\right\}\)
In our example, based on both prior and observed data, the predictive distribution for cell counts in a new patient being greater than 0.3 (30 \(cells/mm^3\)) is \(N(0.596, 0.0259 + 0.7)\). We can compute \(f(z | y_1, ..., y_n > 0.3)\):
1- pnorm(0.3, 0.596, sqrt(0.7259))
After having observed the data, the predictive probability that the next patient will have a difference in CD4 cell counts of greater than 0.3 (30 \(cells/mm^3\)) has increased substantially to 0.636.