9.3 The Bayesian paradigm in Health data science problems.¶
In this section we discuss the Bayesian approach in Health data science problems. Some features of the Bayesian paradigm are particularly useful in this context:
Bayes theorem provides a statistically principled method for combining data. Thus, we can take into account the context within which the data are generated. For example, results of a diagnostic test may have a different interpretation/consequence if used in a symptomatic patient than in a general screening programme. The prior probability of disease would be higher in the former than the latter. Priors can then be updated by the test result to give an assessment of disease risk specific to the local prevalence.
For problems where there are multiple or diverse sources of data which must be combined, the Bayesian framework provides a natural environment for doing so. Examples where Bayesian synthesis of information is common are:
• models of biological systems, for example genetic and genomic pathways,
• models of the natural history of diseases over time and relationships with clinical events,
• economic models of disease trajectories and cost-effect trade-offs for interventions that interrupt the trajectories,
• ecological studies of pollutant emissions and effects on population health,
• demographic studies, for example to study migration,
• speech recognition software,
• other pattern recognition models such as medical imaging or search engines,
• epidemic modelling.
In all these examples complex data is synthesised and/or used to update outputs.Bayesian models fit well into decision theory methodology, providing we can also specify consequences of model outputs.
In many examples, especially those that aim to model complicated processes, some of the data inputs are very sparse, or even non-existent. In such cases, prior data may be formally elicited from an expert panel and incorporated in a Bayesian analysis. Examples include multiple evidence synthesis and identification of latent groups.
Bayesians are allowed to make direct probability statements about unknown quantities. Frequentists cannot make these direct probability statements because the unknown model parameters are assumed fixed.
In recent years the resources available to complete Bayesian analysis have increased, including bespoke software and packages within commercial statistical software.
But Bayesian methods are not that widely used in statistics compared with more classical approaches because they have some limitations.
Sometimes the need for a prior distribution is a barrier if little is known about a parameter and researchers fall back on priors that are weakly informative. In that case, it is not easy to see how much benefit comes from a Bayesian analysis.
Because of the need to use Bayesian updating via a prior distribution, the analysis almost always requires a parametric approach. This limits the structure of the analysis models. Although non-parametric Bayesian methods are available for some situations, they often have underlying parametric assumptions.
The numerical integration methods usually required for realistic problems are often computationally expensive. This is especially true if there are multiple sources of evidence to be combined.
Many statisticians are unfamiliar with the methods and associated software.