11.2 Different types of investigation¶
11.2.1 Classification¶
The research question informs what type of investigation is required. Investigations can be divided broadly into the following types:
Description
Prediction
Causality and explanation
Hernan, Hsu & Healy (Chance, 2019) set out to classify data science tasks and used three classifications: Description, Prediction, and Counterfactual prediction (meaning causality). Schmueli (Statistical Science, 2010) also described similar classifications: Descriptive modelling, Predictive modelling, and Explanatory modelling. See also Hand (Harvard Data Science Review, 2019) for a nice discussion on this topic.
11.2.2 Implications of investigation type¶
The distinction between the different types of investigation is crucial because it has a fundamental impact on the steps of the analysis and beyond. For example, the investigation type influences:
How we decide what variables are to be included in the analysis
What analysis methods to use
How we assess the fit/performance of the model or oter analysis approach used
How we present the results from the analysis
How the findings might be used in practice
How we need to work with other experts at different stages
11.2.3 The role of study design¶
The different types of investigation may be performed using data from studies of different design. Having posed a research question, we can consider (with input from collaborators) what data are required to answer it robustly, including whether new data collection is needed, or whether there are existing data that could be used to address the question. This process needs to take into account considerations of cost, timeliness, feasibility and ethics. For example, for some questions our ideal study could be a randomized controlled trial, but to perform one would require such long followup that it would be infeasible and unethical, and so we would turn to observational data to address the research question. There is a major emphasis in the recent biostatistical and epidemiological literature on the use of ‘found’ data from sources such as electronic health records, which present great opportunities to answer research questions using data on a large number of individuals, but also present challenges for analysis and interpretation. All three types of investigation may make use of observational data. Randomized controlled trials are designed to estimate treatment effects (i.e. for causal investigations), but secondary analyses of trial data can be used in other types of investigation, such as to develop a prediction model.