Statistical Inference¶
This section of the notes concerns a really important part of statistics: statistical inference.
Statistical analysis is often separated into two types: descriptive and inferential. Descriptive statistics attempt to describe the data at hand (the sample). Inferential statistics go further - they attempt to use the data at hand to make statements about a wider population.
There is more than one framework for statistical inference. The traditional and most widely used is the frequentist or “classical” approach. An important alternative, the Bayesian approach, is increasingly influential.
Overview of the statistical inference sessions¶
This section of the notes comprises 7 sessions:
Population and samples
Likelihood (x2)
Frequentist inference (x2)
Bayesian inference (x2)
The first session introduces the concept of statistical inference, defining populations, samples and estimators. The second half of the session introduces the idea of sampling distributions, a fundamental building block for frequentist inference. The sampling distribution gives us information about how different our estimate of the unknown population quantity of interest might have been, had we selected a different sample. In other words, the sampling distribution describes how our estimate behaves under repeated sampling. One particular feature of the sampling distribution, the standard error, gives us information about the amount we might expect our estimate to change if we took a different sample (i.e. it describes the variability of the estimate between different samples).
The idea of sampling distributions is crucial to frequentist inference, but while it gives us important information about how our estimate behaves under repeated sampling, it does not provide a recipe for choosing an estimator. Maximum likelihood estimation (MLE), the subject of the following two sessions, does exactly this. Given a statistical model for the data, MLE provides a method for choosing an estimator with desirable statistical properties.
Having explored MLE to obtain our estimator, we return to the idea of sampling distributions in the following two frequentist inference sessions. We see how the idea of sampling distributions allows us to create confidence intervals, which are ranges of values of the population quantity which we believe are consistent with the observed data. A complementary frequentist inference tool, hypothesis testing, allows us to assess the evidence against a null hypothesis, which proposes a specific value (or range of values) for the unknown population parameter.
Thus far, our attention has been largely on the frequentist paradigm. The last two sessions focus instead on an important alternative approach, Bayesian inference. In this paradigm we do not base our inference on the idea of repeated sampling. Instead, we use the likelihood to update prior information (in the form of a probability distribution) about the unknown parameter, to provide a posterior distribution for the unknown parameter. The posterior can be summarised by obtaining its mean, or a credible interval (interval within which the unknown parameter falls with a particular probability).