12.4 Estimation of the population parameters

In the specification of the simple linear regression model there are three population parameters (\(\beta_0\), \(\beta_1\), and \(\sigma\)). Since we do not know these parameters, we need to estimate them based on a sample from our population. We will use the symbols \(\hat{\beta}_0\) \(\hat{\beta}_1\), and \(\hat{\sigma}\) to represent the sample estimates of the true population parameters.

There are many different methods available for obtaining estimates of the parameters \(\beta_0\) and \(\beta_1\). In this section, we focus on an approach that works by minimising the amount of error in the model. These estimates are called the ordinary least squares estimates (the reason for this name will become clear in the next section).

12.4.1 Fitted values and residuals

Fitted values: Once we have estimates \(\hat{\beta}_0\) and \(\hat{\beta}_1\), the fitted value for the \(i{th}\) observation (in other words, the predicted value of the outcome for that individual) is:

\[ \hat{y}_i = \hat{\beta}_0 + \hat{\beta}_1 x_i \]

Residuals: The residual for the \(i{th}\) observation is

\[ \hat{\epsilon}_i = y_i - (\hat{\beta}_0 + \hat{\beta}_1 x_i) = y_i - \hat{y}_i \]

Sometimes, the word residual is used to refer to both the residual we have defined here and the error term, in which case it is necessary to distinguish between the true and fitted/estimated residual. Here, we use the term error to refer to the deviation of the observed value from the true mean outcome and we use the term residual to refer to the deviation of the observed value from the fitted value, as defined above.

12.4.2 Ordinary least squares estimates

The ordinary least square (OLS) estimates are those which minimise the sum of squared deviations from the fitted regression line. The residuals, \(\hat{\epsilon}\), measure deviations of the observed outcomes from the fitted regression line. Therefore This sum is sometimes called the residual sum of squares. It is often denoted by \(SS_{RES}\) (where “SS” stands for Sum of Squares and “RES” is shorthand for RESiduals).

Formally, the OLS estimators are the values of \(\hat{\beta}_0\) and \(\hat{\beta}_1\) that minimise:

\[ SS_{RES} = \sum_{i=1}^n \hat{\epsilon}_i^2 = \sum_{i=1}^n (y_i - \hat{\beta_0} -\hat{\beta_1}x_i)^2. \]

The ordinary least squares estimates of \(\beta_0\) and \(\beta_1\) are given by the following:

\[\begin{split} \begin{align} \hat{\beta_0} &= \bar{y} - \hat{\beta}_1 \bar{x} \\ \hat{\beta_1} &= \frac{\sum_{i=1}^n (x_i-\bar{x})(y_i-\bar{y})}{\sum_{i=1}^n (x_i-\bar{x})^2} \end{align} \end{split}\]

where \(\bar{y}=\frac{\sum_{i=1}^n y_i}{n}\) and \(\bar{x} = \frac{\sum_{i=1}^n x_i}{n}\). A proof of this result is given at the end of this session.

12.4.3 Estimation of the error variance

The residual sum of squares can be thought of as the remaining unexplained variation in the outcome. Therefore, an intuitively appealing estimator of \(\sigma^2\) is given by dividing the residual sum of squares by the number of observations:

\[ \hat{\sigma}^2 = \sum_{i=1}^n \frac{\hat{\epsilon}_i^2}{n} = \sum_{i=1}^n \frac{(y_i-\hat{\beta}_0-\hat{\beta}_1 x_i)^2}{n} \]

However, this is a biased estimator. The bias arises because the observed values tend, on average, to lie closer to the fitted line (defined by \(\hat{\beta}_0\) and \(\hat{\beta}_1\)) than they do to the true regression line (defined by \(\beta_0\) and \(\beta_1\)). This is an exact parallel to the way the variablility of a sample around its mean underestimates the variability around the population mean.

It can be shown that an unbiased estimator of the residual variance in the simple linear regression model is given by:

\[ \hat{\sigma}^2 = \sum_{i=1}^n \frac{\hat{\epsilon_i}^2}{n-2}=\sum_{i=1}^n \frac{(y_i-\hat{\beta}_0-\hat{\beta}_1 x_i)^2}{n-2} \]

This quantity is referred to as the residual mean square. It is often denoted by \(MS_{RES}\), where “MS” stands for Mean Square and “RES” is shorthand for residual. The denominator is \((n-2)\) because fitting the model first requires the estimation of two parameters (\(\beta_0\) and \(\beta_1\)) and the estimation of these parameters is said to reduce the information about the variance by two degrees of freedom.

12.4.4 Maximum likelihood estimation

An alternative approach to estimating the model parameters is maximum likelihood estimation. This approach selects the estimates which maximise the likelihood (or equivalently, the log-likelihood) of the parameter values. It can be shown that the ordinary least square estimates for \(\beta_0\) and \(\beta_1\) are also the maximum likelihood estimates (a proof of this result is at the end of the session).

The maximum likelihood estimate of \(\sigma^2\) is equal to the biased estimate given above, obtained by dividing the residual sum of squares by the number of observations.