8.3 Connection between p-values and confidence intervals¶
Recall that we previously used the following fact:
For a normal distribution, approximately 95% of observations are contained within 1.96 standard deviations of the mean.
Which, applied to sampling distributions, tells us that:
For a normally distributed sampling distribution that is centred around the true population value, 95% of the estimates obtained under repeated sampling would be contained within 1.96 standard errors of the true population value
Applying this to the estimator ˆδ, this leads to a 95% confidence interval of
The graph below shows some possible values of ˆδ, along with their 95% confidence intervals. We see that:
if ˆδ is exactly equal to the number 1.96×SE(δ) then the 95% confidence interval just touches zero.
if ˆδ>1.96×SE(δ) then the 95% confidence interval does not include zero - the whole interval lies above zero.
if 0<ˆδ<1.96 then the 95% confidence interval does include zero.
So what p-values would these values of ˆδ result in?
if ˆδ=1.96×SE(δ) then we know that 2.5% of the estimates lie above that point, so p=0.05.
if ˆδ>1.96×SE(δ) then fewer than 2.5% of estimates lie above ˆδ, so p<0.05
if 0<ˆδ<1.96×SE(δ) then more than 2.5% of estimates lie above ˆδ, so p>0.05
This leads us to the connection between 95% confidence intervals and p-values. When a 95% confidence interval and p-value are obtained from the same sampling distribution (which is typically the case when both are presented),
P-value |
95% confidence interval |
---|---|
<0.05 |
Excludes the null value |
≥0.05 |
Contains the null value |
# Labels for graph
lab1 <- expression(- 2*SE)
lab2 <- expression(- 1*SE)
lab3 <- expression(1*SE)
lab4 <- expression(2*SE)
# Draw sampling distribution
options(repr.plot.width=6, repr.plot.height=5)
plot(seq(-4, 4, by=.05), xaxt="none", xlab=" ", ylab="Density",
dnorm(seq(-4, 4, by=.05), 0, 1), col="blue", type = "l")
axis(1, seq(-2, 2, by=1), labels=c(lab1, lab2, 0, lab3, lab4))
# True population value
abline(v=0, col="red")
# 1.96 SE from population value
abline(v=c(-1.96, 1.96), col="green", lty=2)
# Some 95% confidence intervals
points(c(0.2, 1.96, 2.15), c(0.13, 0.03, 0.18), col = "orange")
lines(c(-1.76, 2.16), c(0.13, 0.13), col="orange")
lines(c(0, 3.92), c(0.03, 0.03), col ="orange")
lines(c(0.19, 4.17), c(0.18, 0.18), col ="orange")
text(2.75, 0.08, expression(hat(delta)==1.96*SE))
text(-2.6, 0.25, expression(hat(delta)<1.96*SE))
text(2.95, 0.23, expression(hat(delta)>1.96*SE))
lines(c(2.25, 3), c(0.185, 0.215), col="black")
lines(c(2.05, 2.8), c(0.035, 0.065), col="black")
lines(c(-2.4, 0.2), c(0.23, 0.14), col="black")
