📣 Help Shape the Future of UKRI's Gateway to Research (GtR)

We're improving UKRI's Gateway to Research and are seeking your input! If you would be interested in being interviewed about the improvements we're making and to have your say about how we can make GtR more user-friendly, impactful, and effective for the Research and Innovation community, please email gateway@ukri.org.

Development of miDOC: an expert system and methodology for multiple imputation

Lead Research Organisation: University of Bristol
Department Name: Bristol Medical School

Abstract

Much health and social research is done using studies of people - e.g. randomised trials (comparing those who do have a treatment to those who don't), cohort studies (examining how the health of a group of people changes over time, and what causes these changes) or case-control studies (examining the risk factors for getting a relatively rare disease). All these studies can suffer from missing data - either when people drop out completely, or don't answer some questions, or forget to give some information. This missing data can mean that the results of the study are wrong ("biased"), or that they are less precise than they should be, or both.

Much research has been done into how to deal with missing data, and one commonly-used method is multiple imputation (MI). In MI, other information (e.g. details of someone's previous health, and medications they are currently using) is used to predict ("impute") the missing information. The success of this technique depends crucially on why the information is missing in the first place, and how well the missing information can be predicted. There are guidelines for researchers carrying out MI, but some of the guidelines are not correct, and some are complex and hard to follow. Different researchers use MI in different ways, and do not usually document what they did - so it is hard to replicate analyses, or to see if analysts have followed best practice.

We aim to develop methods to address some remaining issues about how to carry out MI. We will assess what problems are caused when people use the wrong analysis model; what problems may arise from including some variables in the imputation model that do not predict the missing information very well; how to choose which other variables to include in the imputation model. We will also investigate how researchers can best check whether their MI is working well. We will then pull these new methods together with existing knowledge into a new automated expert system, the 'multiple imputation Doctor' (miDOC). miDOC will guide researchers through their analyses, examining the structure of the dataset to advise on whether multiple imputation is needed, and if so how to perform it. The expert system, miDOC, will be useful for all researchers using incomplete data, but will be particularly aimed at those who may have relatively little formal training in statistical analysis of missing data. Not only will miDOC give users access to expert advice on their analysis, but by providing documented decisions and code it will increase reproducibility and transparency of analyses.

We will run focus groups with researchers to help us develop miDOC, and refine it on the basis of feedback. We will make miDOC freely available, and also include the methods and information about miDOC in courses we already run on how to deal with missing data, on www.missing data.org.uk and in the second edition of a textbook authored by one of the co-applicants. We will run two free workshops (which will be permanently made available online), in order to help as many people as possible benefit from these methods and miDOC. The methods and miDOC will be useful for all types of study - randomised trials, cohort studies, case-control studies - and thus have the potential to improve much research in both health and medicine, and beyond.

We will use our links with other cohorts, academic and non-academic agencies to ensure that our methods are widely used, and thus improve the level of evidence informing policy and practice in the UK and worldwide.

Technical Summary

Missing data are common in health research, increasingly addressed by multiple imputation (MI). There are unresolved methodological questions around how to choose the best imputation model for each incomplete variable. Application of MI can be complex and involve multiple decisions which are rarely justified (e.g. which variables to include in each imputation model, how to specify the functional form of each imputation model, diagnostics for the MI procedure, etc).

We will tackle these outstanding issues, and combine our insights with current knowledge into a new automated expert system, the 'multiple imputation Doctor' (miDOC). MiDOC will guide researchers through their analyses, and by providing documented decisions and code, will increase reproducibility and transparency of analyses.
Objectives:
1-3: Resolve outstanding questions around bias due to incorrect specification of the imputation model (even when compatible with the analysis model), and bias due to including variables that are strongly predictive of missingness, or due to over-fitting of the imputation model
4: Develop methodology and an associated algorithm to identify the optimum choice of variables to include in an imputation model, for imputation of different types and roles of variables
5: Extend diagnostics to address issues in (1), including diagnostics for over-fitting of imputation models.
6: Incorporate the results of (1) - (5), together with current knowledge, into an expert system, miDOC, developed in R. miDOC will take the scientific model and data, then (i) identify whether MI is likely to be biased, (ii) implement a sensible MI strategy and (iii) provide diagnostics , including a summary of the MI assumptions.
7: Apply miDOC to exemplar analyses
8: Disseminate the results through conference presentations, research articles, workshops, courses and guidance for researchers, our established website, www.missingdata.org, and the next edition of Carpenter & Kenward.
 
Description Better Methods Better Research: Developing guidance for multiple imputation
Amount £61,675 (GBP)
Organisation Medical Research Council (MRC) 
Sector Public
Country United Kingdom
Start 03/2025 
End 09/2026
 
Description Population Health Science Institute (PHSI) Knowledge Mobilisation Catalyst Award
Amount £1,000 (GBP)
Organisation University of Bristol 
Sector Academic/University
Country United Kingdom
Start 04/2024 
End 07/2024
 
Title midoc: A Decision-Making System for Multiple Imputation 
Description A guidance system for analysis with missing data. It incorporates expert, up-to-date methodology to help researchers choose the most appropriate analysis approach when some data are missing. You provide the available data and the assumed causal structure, including the likely causes of missing data. 'midoc' will advise which analysis approaches can be used, and how best to perform them. 
Type Of Technology Software 
Year Produced 2024 
Open Source License? Yes  
Impact 968 downloads since publication on The Comprehensive R Archive Network in October 2024. 
 
Description Data science users workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact 20 people, who were either post-graduate students or post-doctoral researchers in computing and information sciences at Radboud University, The Netherlands, attended an introduction to the Multiple Imputation DOCtor (midoc) R package workshop. This provided training in missing data methods for attendees, raised awareness of our research, and provided feedback on the functionality and end-user experience of our R package. The workshop also enabled us to develop collaborative relationships with researchers in this group.
Year(s) Of Engagement Activity 2024
 
Description Epidemiology researchers workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Other audiences
Results and Impact 20 people involved in epidemiology research at the University of Bristol attended an introduction to the Multiple Imputation DOCtor (midoc) R package workshop, which provided training in missing data methods for attendees, raised awareness of our research, and provided feedback on the functionality and end-user experience of our R package.
Year(s) Of Engagement Activity 2024
 
Description Health users workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Professional Practitioners
Results and Impact 12 people working in statistics and clinical studies in the NHS attended an introduction to the Multiple Imputation DOCtor (midoc) R package workshop, which provided training in missing data methods for attendees, raised awareness of our research, and provided feedback on the functionality and end-user experience of our R package.
Year(s) Of Engagement Activity 2024
 
Description Statistical researchers workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact 40 people (15 in-person and 25 online) attended an introduction to the Multiple Imputation DOCtor (midoc) R package workshop, hosted by the London School of Hygiene and Tropical Medicine Centre for Data and Statistical Science for Health. This raised awareness of our research and R package, and provided feedback on the functionality and end-user experience of our R package.
Year(s) Of Engagement Activity 2024