# Discrete Random Variables¶

A random variable \(X\) is a variable which takes a numerical value which depends on the outcome of the experiment under consideration. Random variables which take a value either from a *finite* or *countably infinite* set (e.g. the positive integers) are known as **discrete random variables**. In contrast, continuous random variables take values in an uncountable set (e.g. positive real numbers). For now we will consider discrete random variables.

## Probability Distribution Function¶

A random variable is characterised by its **probability distribution function** (often abbreviated **PDF**). For discrete random variables the probability distribution function gives the probability that the variable takes each of the values which it might take:

Suppose \(X\) represents the number of boys within a set of quadruplets. It is important to understand the distinction between \(X\) and \(x\).

Capital \(X\) denotes the random variable whose value is the number of boys in a set of quadruplets.

Little \(x\) is used to denote a (generic) realised value which the random variable \(X\) takes. So \(X\) represents the random variable, but once we have randomly selected a set of quadruplets and counted the number of boys then we have a realised value \((x)\), which takes the observed value. Suppose the set of quadruplets selected contained 2 boys and two girls. Then we can say that in this instance, the random variable has taken value 2, i.e \(X=2\).

So the expression \(P(X=x)\) can be read as “the probability that the random variable \(X\) takes value little \(x\)”.

### Example: PDF¶

Suppose we are doing a study of quadruplets (i.e. four children born within the same pregnancy). Suppose \(X\) is the number of the children, within the four siblings, that are male. Then the sample space is \(X \in \{ 0, 1, 2, 3, 4 \}.\) The probability distribution function is shown in the table below.

\(x\) |
P(X = \(x\)) |
---|---|

0 |
0.06 |

1 |
0.24 |

2 |
0.37 |

3 |
0.26 |

4 |
0.07 |

## Cumulative distribution function¶

For a random variable \(X\), its **cumulative distribution function** (CDF) is given by:

Given the probability distribution function for \(X\) we can derive the CDF, and vice-versa. They are two different ways of encoding the same information.

### Example: CDF¶

In the study of quadruplets above, the CDF is:

\(x\) |
P(X = \(x\)) |
P(X \(\leq x\)) |
---|---|---|

0 |
0.06 |
0.06 |

1 |
0.24 |
0.30 |

2 |
0.37 |
0.67 |

3 |
0.26 |
0.93 |

4 |
0.07 |
1 |

## Expectation of a random variable¶

The expectation (or mean) of a random variable \(X\) is one measure of the centre of its distribution (another is the median). For discrete random variables \(X\), it is defined as:

where the summation is over all possible \(x\) values that \(X\) can take.

One way to think of \(E(X)\) is the average value of \(X\) over a large number of repetitions of the experiment or random process that produces \(X\). The Greek letter \(\mu\) is often used for \(E(X)\). Note that here we are defining the population mean, which is not the same as the sample mean.

### Example: Mean¶

Returning to the example of number of boys within sets of quadruplets, for each value \(x\) we can calculate \(x \times P(X=x)\).

\(x\) |
P(X = \(x\)) |
\(x\) \(\times\) P(X = x) |
---|---|---|

0 |
0.06 |
0 \(\times\) 0.06 |

1 |
0.24 |
1 \(\times\) 0.24 |

2 |
0.37 |
2 \(\times\) 0.37 |

3 |
0.26 |
3 \(\times\) 0.26 |

4 |
0.07 |
4 \(\times\) 0.07 |

To find \(E(X)\) we then simply take the sum of these values across all values of \(x\):

Note that we do not actually expect to find 2.04 boys in a set of quadruplets!

Rather, if we repeatedly sample sets of quadruplets, and then take the average of the values of \(X\) across the samples, the value we expect to get is 2.04, with the value getting closer and closer the more samples we take.

### Expectation of functions of random variables¶

Expectations of functions of random variables satisfy certain rules.

For now, we will consider the effects (on the expectation) of multiplying a random variable by a constant \(a\) or adding a constant \(b\) to the random variable.

If we add a constant \(b\) to \(X\), the expectation of the newly obtained random variable is simply \(E(X)+b\). This is because adding \(b\) just shifts the distribution of \(X\) by \(b\). Similarly, if we multiply \(X\) by a constant \(a\), the new random variable \(aX\) has expectation \(aE(X)\), since for each value \(x\) which \(X\) takes, \(aX\) takes the value \(ax\). Combining these two results, we have that \(E(aX+b)=aE(X)+b\).

In summary, for constants \(a\) and \(b\),

\(E(X+b) = E(X) + b\)

\(E(aX) = a E(X)\)

\(E(aX + b) = aE(X) + b\)

In general, the expectation of a function of \(X\), \(g(X)\) is defined as

## Variance of a random variable¶

Expectation is a measure of the centre of a distribution. In contrast, the variance of a random variable measures the magnitude of the dispersion in the distribution around its expectation. The variance of a discrete random variable \(X\) is defined as

where \(\mu=E(X)\). The variance uses the square of the distance from observations to the mean because this is always positive. If we were to define variance instead as \(E(X-\mu)\), this would always be equal to zero!

An alternative expression for the variance is \(Var(X)\) is \(E(X^{2})-\mu^{2}\). This expression is equivalent to the previous and is often more useful for performing calculations involving variances.

The equivalence of the two expressions is easily proved:

\begin{eqnarray*} Var(X) &=& E((X-\mu)^{2}) \ &=& E(X^{2}-2X\mu + \mu^{2}) \ &=& E(X^{2}) - 2 \mu^{2} + \mu^{2} \ &=& E(X^{2}) - \mu^{2}. \end{eqnarray*}

### Variance of functions of random variables¶

Expectations of functions of random variables satisfy certain rules.

In contrast to expectation, adding a constant \(b\) to a random variable does not affect its variance. This makes sense intuitively - shifting a distribution does not affect how dispersed the distribution is around its (newly shifted) mean. Multiplication of \(X\) by a constant \(a\) does affect the variability.

In summary, for constants \(a\) and \(b\),

\(Var(X+b) = Var(X)\)

\(Var(aX) = a^2 Var(X)\)

\(Var(aX + b) = a^2 Var(X)\)

In general, the variance of a function of \(X\), \(g(X)\), is defined as

## Joint distributions¶

So far we have considered a single random variable \(X\). Often in medical statistics we are concerned with the associations between two or more random variables, and so we need to be able to characterise how one variable depends on another. The starting point for this is to define the joint distribution of two random variables \(X\) and \(Y\).

Let \(X\) and \(Y\) be two discrete random variables. We define their **joint** distribution function \(P(X=x,Y=y)\) for values \(x,y\) which \(X\) and \(Y\) can take by the probability \(P(X=x \cap Y=y)\). (Note that \(X=x\) and \(Y=y\) are events, as defined previously). The joint distribution function must satisfy:

and

where the summation above is over all possible values \(X\) can take and all possible values \(Y\) can take.

### Marginal and conditional distributions¶

The **marginal distribution** of either \(X\) and \(Y\) can be found from the joint distribution, e.g.

The **conditional distribution** of one variable given the other can then be found as

Lastly, the joint cumulative distribution function is defined as

### Independence between two random variables¶

Two random variables \(X\) and \(Y\) are independent if

for all possible values \(x\) and \(y\) that \(X\) and \(Y\) take.

### Variance and expectation of linear combinations and products¶

In general, for random variables \(X\) and \(Y\), and constants \(a, b\) and \(c\),

If random variables \(X\) and \(Y\) are independent we can find the expectation of their product as:

Again suppose that \(X\) and \(Y\) are independent. Then the variance of their sum is: