Welcome to Statistics for Health Data Science¶

These notes provide the core material for the MSc module, Statistics for Health Data Science.

This is a compulsory module for the programme MSc Health Data Science. The module provides an introduction to the key statistical concepts and methods for health data science. Topics covered include probability, initial data description and exploration, frequentist and Bayesian approaches to statistical inference and regression modelling. These topics provide the framework needed for subsequent modules. The module places a focus on learning through practical examples and incorporates directed learning, lectures, group discussion, and computer practical exercises.

1.1 Overall aim of the module¶

The overall module aims are to introduce:

the motivation and critical thinking towards solving a question in health science through interrogation of data and drawing conclusions from evidence;
the principles of probability, regression modelling and statistical inference within frequentist and Bayesian frameworks.

1.2 Module Intended Learning Outcomes¶

Intended learning outcomes

Upon successful completion of the module you will be able to:

evaluate the application of different probability distributions to model health data (including Poisson, Binomial and Normal);
critically analyse frameworks for frequentist and Bayesian inference and evaluate their strengths, limitations and differences;
examine the concepts of sampling variability, estimators, bias, confidence intervals and credible intervals;
examine the theoretical basis of linear regression and generalized linear models;
assess the application of regression modelling to address specific health data science questions;
critically evaluate strengths and limitations of different statistical methods, including regression models, within a health data science project;
draw conclusions from the results of a data analysis and justify those conclusions, appropriately acknowledging uncertainty in the results.

1.3 Module Content¶

The module is split into 16 taught sessions, each building statistical knowledge for health data science. The sessions are:

Introduction
Probability and Discrete Probability Distributions
Continuous Probability Distribution
Populations and Sampling
Likelihood
Maximum Likelihood Estimation
Frequentist Inference I
Frequentist Inference II
Bayesian Statistics I
Bayesian Statistics II
Types of Investigation
Linear Regresion I
Linear Regresion II
Linear Regresion III
Logistic Regression
GLMs and Poisson Regression

A final short section (17) connects the regression models to the session regarding types of investigation. This is optional reading and does not have an accompanying taught session.