4.2 Statistical models¶

To extract information about population quantities from sample statistics we need a precise and formal description of the whole sampling process from population to sample. This description is called the statistical model. Relevant features of the population are represented by parameters, such as the mean, variance, or correlation. The structure of the population, together with the sampling process, allows a model to be formulated that describes the statistical behaviour of the sample.

The crucial importance of the statistical model is that, given a certain value of the population parameter (in the simple case where there is only one parameter of interest), it allows us to calculate the probability of drawing a sample with the properties we observe: this will allow us to quantify the compatibility between the observed data and possible values of the population parameter.

4.2.1 Example: a statistical model¶

We will now write down a formal statistical model for the (sub-sample from the) emotional distress study. Remember that \(X_1, ...,X_{10}\) are random variables representing the ages of 10 sampled researchers and \(x_1, ..., x_{10}\) are the realised values of these random variables (i.e. the observed ages).

We will assume that each random variable is drawn from the same population distribution, and that the observations are independent of each other. We use the term independent and identically distributed as a succinct way of describing these assumptions, often abbreviated as iid.

Finally, we will assume that ages of violence researchers in the wider population follows a normal distribution with population mean \(\mu\) and population variance \(\sigma^2\).

This model can be compactly written as follows

\[ X_i \overset{\small{iid}}{\sim} N(\mu, \sigma^2), \qquad i=1,2,...,10 \]

Statistics for Health Data Science

4.2 Statistical models¶

4.2.1 Example: a statistical model¶