11.6 Summary

We have placed some emphasis on how the investigation type affects what variables should be included in the analysis and on how the results might be interpreted. There are naturally many other things to consider which are beyond the scope of this session. The above example focused on regression. The next few sessions in this module will focus on regression models of different types. They are a fundamental part of the statistician’s toolbox and are used in investigations of different types. However, there are many other specialised methods available for specific tasks. For example, in descriptive analyses we may use clustering methods and principal components analysis. In prediction tasks, machine learning methods not based on regression are increasingly used. In studies of causal effects many specialised methods have been developed over recent years. Some of these involve regression and others not.

The type of investigation affects how we should assess the performance and assumptions of a model/analysis. For example, in prediction tasks we should assess how well the prediction model performs in terms of predicting the outcome for a new individual. This requires tools such as cross validation, and measures of predictive performance such as \(R^2\) , area under the curve, sensitivity and specificity. In causal analyses we are concerned with whether the assumptions of the models used are valid and whether the model is correctly specified, alongside the validity of untestable assumptions such as whether there are any important confounders that have not been accounted for in the analysis.

This session aimed to provide a broad overview of different types of investigation used in medical statistics/health data science, and which you are likely to encounter in your future careers. This topic has seen some recent emphasis in the literature. The statistical and epidemiological community is increasingly emphasising the need for researchers to ensure they conduct meaningful studies and interpret findings appropriately, particularly relating to the use of observational data. It is a wide topic, and we have only touched on some aspects here.