7.2 Confidence intervals for the mean¶

7.2.1 Example¶

In the sample of 10 researchers, the estimate of the population mean age is \(\hat{\mu} = 29.75\), the sample mean age. The standard error of the mean is

\[ SE(\hat{\mu}) = \frac{\sigma}{\sqrt{n}} = \frac{4.8}{\sqrt{10}} = 1.52 \]

We have seen that the 95% confidence interval for the mean is calculated as

\[ \hat{\mu} \pm 1.96 \times SE(\hat{\mu}) \]

Substituting in the sample mean and the standard error gives

\[ 29.57 \pm 1.96 \times 1.52 \]

This gives the 95% confidence interval for the population mean age: \((26.6, 32.5)\).

The code below reads in the data, prints the sample mean age and then calculates the 95% confidence interval for the population mean age.

# Our sample of data (ages for 10 sampled researchers)
ages <- c(28.1,27.5,25,29.9,29.7,29.9,39.9,33.6,21.3,30.8)

# Sample mean (estimate of the population mean)
mean(ages)

# Display the lower and upper limits of the confidence interval
mean(ages) - 1.96*1.52
mean(ages) + 1.96*1.52

29.57

26.5908

32.5492

7.2.2 95% confidence interval for a mean¶

For random variables \(𝑋_1,...,𝑋_n\), with \(𝑋_i \overset{\small{iid}}{\sim} N (\mu, \sigma^2)\) for \(i=1,...,n\) and \(\sigma\) is a known value, a 95% confidence interval for \(\mu\) is given by:

\[ \hat{\mu} \pm 1.96 \ SE(\hat{\mu}) \]

where the standard error of \(\hat{\mu}\) is given by

\[ SE(\hat{\mu}) = \frac{\sigma}{\sqrt{n}} \]

The calculation of this confidence interval relies on the assumptions that

the original random variables follow a normal distribution
the value of \(\sigma\) is known

However, if these assumptions are not true, we can still obtain valid confidence intervals:

If the original random variables do not follow a normal distribution but the sample size is large, then the Central Limit Theorem tells us that the sampling distribution of the mean is approximately normal. So this formula for the confidence interval is still valid.
If \(\sigma\) is unknown (which is typically the case), there is a modified confidence interval based on the t-distribution which provides a correct interval. Essentially, we replace the number 1.96 above by a slightly larger number to compensate for the estimation of the standard deviation. For large sample sizes (\(n>30\) or so), the substitution of the estimated standard deviation makes little difference. More detail is provided later in this session.

Statistics for Health Data Science

7.2 Confidence intervals for the mean¶

7.2.1 Example¶

7.2.2 95% confidence interval for a mean¶