8.4 Other (mis-)interpretations of p-values

8.4.1 P-values as decision rules

Traditionally, hypothesis tests have been thought of as a means to make decisions. In this paradigm, a cut-off (typically p<0.05) is chosen. If the p-value is smaller than the chosen cut-off, the null hypothesis is rejected. If the p-value is above the cut-off then the null hypothesis is accepted. This leads to the terminology of:

  • “Type I error”, rejecting the null hypothesis when it is true

  • “Type II error”, accepting the null hypothesis when it is false

Linked to this approach is the habit of labelling p-values < 0.05 as “significant” and those larger as “non-significant”.

There are some instances where this decision-making paradigm seems appropriate. Some health data science research is indeed concerned with decision making. For example, we may wish to carry out a trial to assess whether a particular clinical decision support system improves the clinicians’ ability to detect malignant tumours. However much health data science research is not, at least directly, concerned with decision making. For example if we carry out an epidemiological study in which we relate risk of a particular disease to gender, we do this because we are interested in understanding the aetiology of the disease, not because we want to assess whether to modify gender! For this reason many researchers regard p-values as a measure of strength of evidence against the null hypothesis, rather than as an aid to decision making.

In general, we do not advocate any approach which dichotomises p-values. There is very little difference, in terms of the information contained about the population parameter, between the two p-values of \(p=0.049\) and \(p=0.051\). Therefore it seems counter-intuitive to make very different decisions based on these p-values.

P-values represent an area of substantial philosophical controversy in statistics. We choose to interpret the p-value as a measure of strength of evidence against the null hypothesis. It should, however, be pointed out that some statisticians advocate strongly against this interpretation.

In much health data science research, we are interested in knowing more about a particular population parameter. Many health data scientists, therefore, choose to focus on obtaining and interpreting estimates and confidence intervals rather than calculating p-values.

8.4.2 Misinterpretations of p-values

The p-value is the subject of a lot of argument, debate and controversy, both within the statistical world and beyond. The following warn against some common misinterpretations and mis-uses of p-values:

Do not:
- believe that an association or effect exists just because it was statistically significant.
- conclude that an association or effect is absent just because it was not statistically significant.
- base conclusions solely on whether an association or effect was statistically significant or not.
- conclude anything about scientific or practical importance based on statistical significance (or lack thereof).
- interpret a p-value as the probability that chance alone produced the observed association or effect or the probability that the null hypothesis is true.

Importantly, statistical significance was never meant to imply scientific or clinical importance. As well as the p-value, always consider the estimated effect of the population parameter of interest and its confidence interval. These will often provide more insight than the p-value alone.