9.4 Bayes thorem for discrete and continous data

So far this session, we have looked at Bayes theorem in the discrete case. We turn to the more general case of Bayes thorem to make inference about an unknown parameter \(\theta\), which could be discrete or continuous.

The probability distribution for \(\theta\) reflects our uncertainty about it before seeing the data, prior distribution, \(p(\theta)\). Once the data data \(y\) is known, we condition on it. Using Bayes theorem we obtain a conditional probability distribution for unobserved quantities of interest given the data. If \(\theta\) is continuous, we have:

\[ p(\theta \mid y)= \frac{ p(\theta)\, p(y \mid \theta)}{\int p(\theta)\,p(y \mid \theta)\,d\theta}, \]

and \(\theta\) is discrete and takes values in the set \(\Theta\), we have:

\[ p(\theta \mid y)= \frac{ p(\theta)\, p(y \mid \theta)}{\sum_{\theta \in \Theta} p(\theta) p(y \mid \theta) }. \]

We call \(p(\theta \mid y)\) the posterior distribution.

Note that the Bayesian approach is naturally synthetic in that it allows data from different sources to be combined, according to Bayes principles. This approach is most useful when there is informative prior information. We note that the Bayesian approach can be recursive, so \(p(\theta \mid y)\) may be used as a prior when calculating \(p(\theta \mid y, z)\) for a second data set \(z\).

The denominator, \({\int p(\theta)\,p(y \mid \theta)\,d\theta}\) or \(\sum_{\theta \in \Theta} p(\theta) p(y \mid \theta)\), is a constant with respect to \(\theta\). One of the challenges of using Bayesian approaches is that the integration can be analytically intractable, so that numerical methods are needed (for example, numerical integration or Markov Chain Monte Carlo methods). These methods are beyond the scope of the current module. In this introductory course, we will only look at examples where this constant need not be calculated, since the form of the posterior can be inferred by inspection once observing that the posterior is proportional to the product of the prior and likelihood:

\[p(\theta \mid y) \propto p(\theta)\,p(y \mid \theta).\]

We will see how this works for the inference of proportions.