10.2 Calculating the posterior for the mean of a Normal distribution¶
In this section, we obtain the posterior for the mean of a Normal distribution with known variance, \(\sigma^2\).
Suppose we have \(n\) observed independent data points, each assumed to come from the Normal distribution: \(y_1,\dots,y_n \sim N(\mu,\sigma^2)\). Recall that the Normal distribution has probability density function given by
Note that some authors will parameterize the Normal distribution with the precision instead of the variance: \(\eta=\frac{1}{\sigma^2}\).
10.2.1 Likelihood¶
For convenience, we will drop the conditioning on \(\sigma^2\), since we are assuming this is a known number. Since we assume all observations are independent, the likelihood is the product of the \(n\) individual p.d.f.s:
Notice that
where (as usual) \(s^2 = \sum_{i=1}^n (y_i - \bar y)^2 /(n-1).\)
Thus the Likelihood can be written:
Since we are interested in the posterior for \(\mu,\) we can drop all terms not involving \(\mu,\) so the likelihood is proportional to
Notice that this also has the same form of a Normal distribution for the mean \(\bar{y}\), specifically, \(\bar{y} \sim N(\mu, \frac{\sigma^2}{n})\).
10.2.2 Prior¶
We noted in the previous session that the Normal distribution is a conjugate prior when the likelihood is a Normal distribution. Thus, for convenience, we will use a Normal distribution as a prior for \(\mu\):
as the posterior distribution will conveniently be a Normal distribution as well. The prior parameters \(\phi\) and \(\tau^2\) should be specified based on prior knowledge of \(\mu\) and the uncertainty around this prior knowledge. It may come from previous research or formally elicited from investigators. If no prior evidence is available, we assign an appriopriately large value to \(\tau\).
10.2.3 Posterior¶
To derive the posterior for the mean \(\mu\), we need to find the distribution of that parameter conditional on the data (both the empirical data and prior distribution). In the following calculation, we are only interested in the parts of the p.d.f. that depend on \(\mu\). Any terms not involving \(\mu\) are part of the normalisation constant. This is part of the p.d.f., but does not affect the shape of the density.
The posterior is given by
Expanding the brackets and retaining only terms containing \(\mu\):
Completing the squared term for \(\mu\):
We can recognise this has the form the p.d.f. of the Normal distribution, therefore we see that
We see that:
the Normal prior is conjugate for a Normal Likelihood, as the posterior is also Normal.
The posterior mean, \(\frac{ \tau^2 n\bar{y} + \sigma^2\phi }{\tau^2 n + \sigma^2}\) is a weighted average of the data \(\bar y \) and the prior mean \(\phi\): we can write it as \(w \bar{y} + (1-w) \phi\), where \(w= \frac{\tau^2 n}{\tau^2 n + \sigma^2}\) . Hence the posterior combines the information from the likelihood (data) and prior (a priori belief).
The variance of the posterior is \(\frac{\sigma^2\tau^2}{\tau^2n+\sigma^2}\). In a larger study, since \(n\) becomes very large, we have \(\tau^2 >> \frac{\sigma^2}{n}\), so the posterior variance tends to zero.
In smaller studies, \(\tau^2 << \frac{\sigma^2}{n}\), the posterior mean is closer to \(\phi\) and the posterior variance depends both on the prior and sampling variance \(\frac{\sigma^2\tau^2}{\tau^2n+\sigma^2}\).