2.1 Application of Bayes’ Theorem

Bayes’ Theorem has important and powerful applications in medical statistics. One important link, which you will return to later, is its connection to Bayesian statistics. In this session, we focus on another common application of Bayes’ theorem, in the area of assessing the accuracy of screening tests and prognostic scores.

2.1.1 Screening tests and prognostic scores

Screening tests are tests that attempt to identify people with a particular condition or disease of interest. Babies are often screened for cystic fibrosis at birth, for example. Sometimes, screening tests attempt to identify high risk people rather than those who already have the condition of interest. Cervical screening, for example, which is offered to all women and people with a cervix aged 25 to 64 in the UK by the National Health Service, is a test to help prevent cancer. It doesn’t look for existing cancer, but instead looks for certain viruses which can increase the subsequent risk of cancer. Similarly, prognostic tests or prognostic scores are used to identify a high risk group.

Screening tests or prognostic scores can be based on one genetic marker, as in our example below, or many. Or they might incorporate information from other sources (e.g. biomarkers, family history of the disease). These processes typically result in a binary classification of “positive” or “negative”.

2.1.2 Bayes’ Theorem

Suppose we have an event \(A\) and a set of events \(B_1, B_2, ..., B_n\) that partition the sample space. Suppose that we have information about the conditional probability of \(A\) conditional on event \(B_j\), i.e. we know \(P(A | B_j)\), for each \(j\). However, what we actually want to know about is \(P(B_j | A)\).

Bayes’ Theorem provides a way of reversing the conditioning.

Bayes’ Theorem:

\[ P(B_{j}|A) = \frac{P(A|B_{j}) P(B_{j})}{P(A)} = \frac{P(A|B_{j}) P(B_{j})}{\sum^{n}_{k=1} P(A|B_{k}) P(B_{k})}. \]

2.1.3 Example: Genetic marker in childhood cancer

In a population, 10% of people develop a particular childhood cancer. Of those who develop the cancer (\(C\)), 1 in 4 carry a genetic marker, \(M\), whereas of those who don’t develop the cancer, 1 in 10 carry \(M\). A newly born infant is tested for the genetic marker and is found to carry it. What is the probability that this infant will develop cancer?

The first couple of sentences tell us that \(P(C) = 0.1\), \(P(M|C) = 0.25\) and \(P(M | \bar{C})=0.1.\) Our interest lies in \(P(C|M)\). So we wish to reverse the conditioning. We can obtain this by applying Bayes’ Theorem:

\[ P(C|M) = \frac{P(M|C)P(C)}{P(M|C)P(C) + P(M|\bar{C})P(\bar{C})} \]

Substituting in the values above gives

\[ P(C|M) = \frac{0.25\times 0.1}{0.25\times 0.1 + 0.1\times 0.9}=0.22 \]

\(P(C|M)\) is called the positive predictive value (PPV) of the test. It is the probability, given a positive test result, that the individual actually will develop the disease. i.e. in this case there is a 22% chance that the infant will develop the disease if they tested positive.

2.1.4 The confusion matrix

More generally, suppose we have a procedure that results in a binary classification (a binary prediction). This might be a screening test, which could be based on one or more genetic markers or biomarkers. It could be based on the output from a prognostic risk score or a label derived from an algorithm. Whatever the procedure, suppose we end up with a binary classification: “Positive” or “Negative”. In the health context, this terminology (positive/negative) might represent pairs such as: “Diseased” and “Undiseased”; “Dead” and “Alive” or “Hospitalised” and “Not hospitalised”. We can contrast the binary classification with the (binary) true status. In the general discussion below, we will also use the terms positive and negative to denote the two possible true statuses.

The following table is often called a confusion matrix, or sometimes error matrix. The name confusion matrix stems from the fact that the matrix allows you to see whether the classification is confusing two classes. The values \(A\), \(B\), \(C\) and \(D\) are the numbers in each category. The name comes from the fact that the table allows you to see if the classification procedure is “confusing” two categories.

classification

Truth:

Positive

Negative

Positive

A

B

Negative

C

D

Two groups of people were correctly classified:

  • True Positives. The \(A\) individuals are people who are, in truth, positive (for the disease or outcome of interest) and were classified as positive. So they are often called true positives.

  • True Negatives. The \(D\) individuals are people who are, in truth, negative and and were classified as negative.

Two groups of people were incorrectly classified:

  • False Positives. The \(B\) individuals are people who are, in truth, negative but were incorrectly classified as positive. These are sometimes called Type I errors.

  • False Negatives. The \(C\) individuals are people who are, in truth, positive but were incorrectly classified as negative. These are sometimes called Type II errors.

Now let us imagine the same table but with joint probabilities rather than numbers from a sample. So, for instance, \(p_A\) is the joint probability of being classified as positive and being truly positive for the outcome.

Prediction

Truth:

Positive

Negative

Positive

\(p_A\)

\(p_B\)

Negative

\(p_C\)

\(p_D\)

We can obtain estimates of various useful quantities from this matrix. We will use the following notation: \(O\) represents being, in truth, positive for the outcome of interest, and \(\bar{O}\) represents being truly negative. \(P\) represents being classified as positive and \(\bar{P}\) being classified as negative.

The tabs below show various useful quantities.

The prevalence of the outcome is:

\[ P(O) = \frac{p_A+p_C}{p_A+p_B+p_C+p_D} \]

Prevalence is another word for risk or proportion. It tells us the fraction of the population of interest who have the outcome.

This is a property of the test. The sensitivity of the test remains the same, irrespective of how common or rare the outcome is.

The sensitivity is:

\[ P(P|O)=\frac{p_A}{p_A+p_C} \]

The terminology comes from the setting of clinical tests, i.e. how sensitive this test is to the presence of the disease. It is also often called the recall in the fields of machine learning and computer science.

This is also sometimes referred to as the true positive rate.

This is a property of the test. The specificity of the test remains the same, irrespective of how common or rare the outcome is.

The specificity is:

\[ P(\bar{P}|\bar{O})=\frac{p_D}{p_B+p_D} \]

As for the sensitivity, this terminology comes from the setting of clinical tests, i.e. how specific this test is to the presence of this disease (versus other diseases). So a test which is only positive for this specific disease is very specific. A test which picks up the presence of this disease, and other similar diseases, is not very specific.

This is sometimes called the selectivity or true negative rate.

PPV stands for Positive Predictive Value, which is shorthand for the predictive value of a positive classification. This is very common terminology in clinical settings.

This quantity is often of most interest to the person having the test. It answers the question: “what is the probability that I have (or will have) the outcome, given that I have just received a positive classification?”

The PPV is:

\[ P(O|P) = \frac{p_A}{p_A+p_B} \]

In machine learning, it is typically called the precision.

The accuracy is:

\[ P(\mbox{correct classification}) = \frac{p_A + p_D}{p_A+p_B+p_C+p_D} \]

This quantity is less used in medical settings but is commonly used elsewhere.