8.5 Calculating p-values¶

8.5.1 Example: Calculation of the p-value¶

In the emotional distress example, our difference in sample means is \(\hat{\delta} = -0.892\). We are interested in the distribution of the difference in sampling means would look like under repeated sampling if the null hypothesis were true. The null hypothesis states that \(\delta = 0\). Therefore, under the null hypothesis,

\[ \hat{\delta} \sim N(0, 0.507^2) \]

The easiest way to do this calculation is to standardise the estimator to follow a standard normal distribution, i.e.

\[ Z = \frac{\hat{\delta}}{0.507} \sim N(0, 1) \]

In our sample, we get a value of \(Z=-0.892/0.507 = -1.76\). The p-value is defined as

\[ p = Pr( | \hat{\delta} | \geq -0.892) = Pr( | Z | \geq 1.76) \]

The standard normal distribution is symmetric, so this is equal to \(2 \times P(Z \geq 1.76)\). This probability can be looked up using pre-calculated tables stored in all standard statistical software.

# Manual calculation of p-value: 
2*(1-pnorm(1.76))

0.0784078065749654

8.5.2 Approximate tests in large samples¶

More generally, suppose that the random variable used to calculate our p-value (above, the random variable was the difference in sample means) is denoted by \(R\) and that it has an expected value and variance (under the null hypothesis) denoted by \(E(R)\) and \(Var(R)\). Then define:

\[ Z = \frac{R - E[R]}{\sqrt{Var(R)}} = \frac{R - E[R]}{SE(R)} \]

where \(SE(R)\) is the standard error of \(R\) (the standard deviation of the sampling distribution; alternatively the square root of the variance of \(R\)). To simplify this even further, in many cases, as for the difference in sample means, \(E(R) = 0\).

Thanks to the Central Limit Theorem, in almost all situations, as the sample size \(n\) becomes large, the distribution of \(Z\) tends towards a standard normal distribution.

\[ lim_{n\rightarrow \infty} \ \ Z \sim N(0, 1). \]

The standard normal distribution can then be used to calculate the two-sided p-value, as above.

8.5.3 The two-sample t-test¶

Let us return to the comparison in population means between two groups. When, as is more typical, we do not know the value of \(\sigma\), we need to replace it with an estimate from our sample, \(\hat{\sigma}\). Typically we use an estimate based on the sample standard deviations in the two groups, \(s_1\) and \(s_0\):

\[ \hat{\sigma}^2 = \frac{(n_1 - 1) s_1^2 + (n_0 - 1) s_0^2}{n_1 + n_0 - 2} \]

For our sample of data, \(\hat{\sigma} = 1.873\). The sampling distribution we used above involves the true population standard deviation

\[ \hat{\delta} \sim N\left(\delta, \sigma^2 \left(\frac{1}{n_1} + \frac{1}{n_0} \right) \right) \]

Similarly, the equivalent version of the sampling distribution (which we will find it easier to modify for our current purposes), is also no longer exactly true:

\[ \frac{\hat{\delta} - \delta}{\sigma \sqrt{\left(\frac{1}{n_1} + \frac{1}{n_0}\right) }}\sim N(0,1) \]

This is only approximately true if we substitute the sample estimate \(\hat{\sigma}\) into the equation. A little more algebra (not shown here), however, gives us an exact distribution.

\[ \frac{\hat{\delta} -\delta}{\hat{\sigma} \sqrt{\frac{1}{n_1} + \frac{1}{n_0}}} \sim t_{n_1 + n_0 - 2} \]

Under the null hypothesis, \(\delta = 0\), giving

\[ T = \frac{\hat{\delta}}{\hat{\sigma} \sqrt{\frac{1}{n_1} + \frac{1}{n_0}}} \sim t_{n_1 + n_0 - 2} \]

Substituting in the numbers from our sample of data,

\[ T = \frac{-0.892}{1.873 \sqrt{\frac{1}{22} + \frac{1}{26}}} \]

gives \(t = -1.644\) (remembering that \(T\) is the random variable and \(t\) here is the realised (observed) value of that statistic). T-distributions are symmetric around zero, so we take at least as extreme as to mean less than -1.64 or greater than +1.64, which in turn is twice the probability of being less than -1.64. We simply need to calculate this probability for a t-distribution with 46 degrees of freedom (where we obtained 46 as \(n_1 + n_0 - 2\)).

The code below performs this calculation and then uses an inbuilt R package to obtain the same p-value.

# Manual calculation of p-value (two equivalent calculations)
2*pt(-1.644, 46)

# Read in data (emotional distress scores in control and intervention group)
dist0 <- c(5, 2, 5,  7,  6,  7,  7,  5,  8,  6,  6,  9,  4,  5,  9,  7,  9,  5,  6, 10,  9,  4,  6,  6,  5,  7)
dist1<- c(5,  5,  6,  6, 1,  5, 10,  7,  3,  6,  7,  8,  6,  7,  5,  4,  5,  6,  4,  6,  3,  5)

# T-test using inbuilt R package
dist <- c(dist0, dist1) 
gp <- c(rep(0, 26), rep(1, 22))

t.test(dist~gp, var.equal=TRUE)

0.106994541315052

	Two Sample t-test

data:  dist by gp
t = 1.6435, df = 46, p-value = 0.1071
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -0.2004223  1.9836391
sample estimates:
mean in group 0 mean in group 1 
       6.346154        5.454545 

Rounding to 2 decimal places, the p-value is 0.11.

In the output from the R package, the line

t = 1.6435, df = 46, p-value = 0.1071

tells us that the value of the statistic \(T\) above is \(t=1.64\) in this sample, the degrees of freedom tell us that we are looking at a t-distribution on 46 degrees of freedom. We are also given a 95% confidence interval for the population difference in means: (-0.20 to 1.98). As we noted above, when the p-value is >0.05 then the null value (here, zero) will be included in the 95% confidence interval.

8.5.4 Other hypothesis tests¶

You will meet many types of hypothesis tests over your statistical studies. Many, like the t-test above, are constructed around a particular estimator and so there is a nice connection between the estimate, the 95% confidence interval and the p-value from the hypothesis test. Where this is the case, it is good practice to present the estimate and confidence interval alongside the p-value, since they contain much more information than the p-value alone.

In other cases, tests can be constructed without a specific parameter being estimated. The chi-squared test is a very commonly-used test. It tests the null hypothesis of no association between two unordered categorical variables. This test does not directly invoke the sampling distribution of an estimator, so typically only the p-value is presented, rather than also presenting an estimate and confidence interval.

In general, hypothesis testing is a controversial and widely misunderstood area of frequentist statistics. Where possible, focusing on estimating parameters along with confidence intervals can avoid some of the more damaging misuses of p-values.

Statistics for Health Data Science

8.5 Calculating p-values¶

8.5.1 Example: Calculation of the p-value¶

8.5.2 Approximate tests in large samples¶

8.5.3 The two-sample t-test¶

8.5.4 Other hypothesis tests¶