Welcome to Statistics for Health Data Science

These notes provide the core material for the MSc module, Statistics for Health Data Science.

This is a compulsory module for the programme MSc Health Data Science. The module provides an introduction to the key statistical concepts and methods for health data science. Topics covered include probability, initial data description and exploration, frequentist and Bayesian approaches to statistical inference and regression modelling. These topics provide the framework needed for subsequent modules. The module places a focus on learning through practical examples and incorporates directed learning, lectures, group discussion, and computer practical exercises.

1.1 Overall aim of the module

The overall module aims are to introduce:

  • the motivation and critical thinking towards solving a question in health science through interrogation of data and drawing conclusions from evidence;

  • the principles of probability, regression modelling and statistical inference within frequentist and Bayesian frameworks.

1.2 Module Intended Learning Outcomes

Intended learning outcomes

Upon successful completion of the module you will be able to:

  • evaluate the application of different probability distributions to model health data (including Poisson, Binomial and Normal);

  • critically analyse frameworks for frequentist and Bayesian inference and evaluate their strengths, limitations and differences;

  • examine the concepts of sampling variability, estimators, bias, confidence intervals and credible intervals;

  • examine the theoretical basis of linear regression and generalized linear models;

  • assess the application of regression modelling to address specific health data science questions;

  • critically evaluate strengths and limitations of different statistical methods, including regression models, within a health data science project;

  • draw conclusions from the results of a data analysis and justify those conclusions, appropriately acknowledging uncertainty in the results.

1.3 Module Content

The module is split into 16 taught sessions, each building statistical knowledge for health data science. The sessions are:

  1. Introduction

  2. Probability and Discrete Probability Distributions

  3. Continuous Probability Distribution

  4. Populations and Sampling

  5. Likelihood

  6. Maximum Likelihood Estimation

  7. Frequentist Inference I

  8. Frequentist Inference II

  9. Bayesian Statistics I

  10. Bayesian Statistics II

  11. Types of Investigation

  12. Linear Regresion I

  13. Linear Regresion II

  14. Linear Regresion III

  15. Logistic Regression

  16. GLMs and Poisson Regression

A final short section (17) connects the regression models to the session regarding types of investigation. This is optional reading and does not have an accompanying taught session.