10.1 Example: CD4 cell counts¶

In this session, we will use a dataset on CD4 cell counts which is available in R through the boot package. CD4 cells are in our blood as part of our immune system. Since these cells die in people who have HIV, CD4 cell counts are used in HIV patients to determine the health of their immune system and susceptibility to opportunistic infections.

In this dataset, there are 20 patients with HIV. Their CD4 cell counts are recorded before and after they were put on treatment. We wish to investigate whether this treatment increased their CD4 cell counts.

We install the boot package where the data is stored and we look at the data. Note that the unit of CD4 cell count is 100 \(cells/mm^3\). We are interested in the difference in CD4 cell counts before and after treatment. We look at the summary statistics of the difference.

library(boot)
ydata <- cd4$oneyear - cd4$baseline
data <- cbind(cd4, y=ydata)
data
summary(ydata)

baseline	oneyear	y
2.12	2.47	0.35
4.35	4.61	0.26
3.39	5.26	1.87
2.51	3.02	0.51
4.04	6.36	2.32
5.10	5.93	0.83
3.77	3.93	0.16
3.35	4.09	0.74
4.10	4.88	0.78
3.35	3.81	0.46
4.15	4.74	0.59
3.56	3.29	-0.27
3.39	5.55	2.16
1.88	2.82	0.94
2.56	4.23	1.67
2.96	3.23	0.27
2.49	2.56	0.07
3.03	4.31	1.28
2.66	4.37	1.71
3.00	2.40	-0.60

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-0.6000  0.2675  0.6650  0.8050  1.3775  2.3200 

In the classical framework, we could use a paired t-test to see if the mean change in CD4 cell counts is significantly different from the null hypothesis value of zero (\(H_0: \mu = E[Y]=0)\).

For our Bayesian analysis, we will assume these measurements come from a Normal distribution with an unknown mean \(\mu\), which represents the mean change in CD4 counts. We will assume that the variance is known to be \(\sigma^2 = 0.7\). This is slightly artificial as, in a real example, we may not know what the true variance is; however, we might be able to infer the variability of CD4 counts from earlier studies. Having both \(\mu\) and \(\sigma^2\) unknown requires a more complicated analysis which we will not cover in this course.

The Bayesian analysis involves constructing a likelihood for the data, specifying an appropriate prior distribution and combining them to obtain a posterior distribution. We will then describe how credible intervals for \(\mu\), and prior and posterior predictive distributions can be found.

Statistics for Health Data Science

10.1 Example: CD4 cell counts¶