8.3 Connection between p-values and confidence intervals¶

Recall that we previously used the following fact:

For a normal distribution, approximately 95% of observations are contained within 1.96 standard deviations of the mean.

Which, applied to sampling distributions, tells us that:

For a normally distributed sampling distribution that is centred around the true population value, 95% of the estimates obtained under repeated sampling would be contained within 1.96 standard errors of the true population value

Applying this to the estimator \(\hat{\delta}\), this leads to a 95% confidence interval of

\[ \hat{\delta} \pm 1.96 \times SE(\delta) \]

The graph below shows some possible values of \(\hat{\delta}\), along with their 95% confidence intervals. We see that:

if \(\hat{\delta}\) is exactly equal to the number \(1.96 \times SE(\delta)\) then the 95% confidence interval just touches zero.
if \(\hat{\delta} > 1.96 \times SE(\delta)\) then the 95% confidence interval does not include zero - the whole interval lies above zero.
if \(0 < \hat{\delta} < 1.96\) then the 95% confidence interval does include zero.

So what p-values would these values of \(\hat{\delta}\) result in?

if \(\hat{\delta} = 1.96 \times SE(\delta)\) then we know that 2.5% of the estimates lie above that point, so p=0.05.
if \(\hat{\delta} > 1.96 \times SE(\delta)\) then fewer than 2.5% of estimates lie above \(\hat{\delta}\), so p<0.05
if \(0 < \hat{\delta} < 1.96 \times SE(\delta)\) then more than 2.5% of estimates lie above \(\hat{\delta}\), so p>0.05

This leads us to the connection between 95% confidence intervals and p-values. When a 95% confidence interval and p-value are obtained from the same sampling distribution (which is typically the case when both are presented),

P-value	95% confidence interval
\(<0.05\)	Excludes the null value
\(\geq 0.05\)	Contains the null value

# Labels for graph
lab1 <- expression(- 2*SE)
lab2 <- expression(- 1*SE)
lab3 <- expression(1*SE)
lab4 <- expression(2*SE)

# Draw sampling distribution
options(repr.plot.width=6, repr.plot.height=5)
plot(seq(-4, 4, by=.05), xaxt="none",  xlab=" ", ylab="Density", 
     dnorm(seq(-4, 4, by=.05), 0, 1), col="blue", type = "l")
axis(1, seq(-2, 2, by=1), labels=c(lab1, lab2, 0, lab3, lab4))

# True population value
abline(v=0, col="red")
# 1.96 SE from population value
abline(v=c(-1.96, 1.96), col="green", lty=2)

# Some 95% confidence intervals
points(c(0.2, 1.96, 2.15), c(0.13, 0.03, 0.18), col = "orange")

lines(c(-1.76, 2.16), c(0.13, 0.13), col="orange")
lines(c(0, 3.92), c(0.03, 0.03), col ="orange")
lines(c(0.19, 4.17), c(0.18, 0.18), col ="orange")

text(2.75, 0.08, expression(hat(delta)==1.96*SE))
text(-2.6, 0.25, expression(hat(delta)<1.96*SE))
text(2.95, 0.23, expression(hat(delta)>1.96*SE))

lines(c(2.25, 3), c(0.185, 0.215), col="black")
lines(c(2.05, 2.8), c(0.035, 0.065), col="black")
lines(c(-2.4, 0.2), c(0.23, 0.14), col="black")

Statistics for Health Data Science

8.3 Connection between p-values and confidence intervals¶