Partial recovery of missing responses - a toolbox for efficient design and analysis when data may be missing not at random

Lead Research Organisation: CARDIFF UNIVERSITY
Department Name: Sch of Mathematics

Abstract

Missing data are a common problem in many application areas. The presence of missing values complicates analyses, and if not dealt with properly can result in incorrect conclusions being drawn from the data. It is often helpful to assume there is a process that produces the missing values, typically called a missing data mechanism. A particularly problematic scenario is when this mechanism is in part determined by some other unknown variables, such as the missing values themselves. This is known as a missing not at random (MNAR) mechanism.

If missing values arise due to a MNAR mechanism then conclusions drawn from the data will typically be biased. Also, importantly, it is not possible to know whether this problem occurs or not in the data. This is the challenging problem area that this proposal seeks to address, namely developing procedures that can best test whether or MNAR occurs in the data.

The proposal will consider scenarios where it is possible to estimate some of the missing values through a follow up sample. The main purpose of this is to learn about the missing data mechanism and specifically test whether the MNAR assumption is valid or not. Further, the recovered data will also help to correct for the effect the missing data have on conclusions. The proposal makes use of optimal design techniques to decide which missing values to follow up. Essentially certain missing values might yield more information about the type of missing data mechanism than others; in addition some values might be more likely than others to be recovered. In this way we would ensure maximum information from the recovered data is obtained. This will allow data analysts to determine whether the presence of MNAR is likely and take appropriate action.

We will collaborate with our project partners, the Office for National Statistics and NHS Blood and Transplant in the development of these methods. Our project partners will provide relevant data for us to consider realistic scenarios and we will discuss interim results with them to ensure our methods are most useful for practitioners. We will also present the work as part of a missing data course at the African Institute of Mathematical Sciences (AIMS) to maximise the global benefit of the work.

The methods developed in this proposal will be disseminated through papers and presentations. In addition, we will create a free to use R package that will implement the methods to allow easy uptake by users. We will provide training in using this R package as part of a two-day workshop where we will describe our methods to users. A dedicated website will be updated throughout the project to describe developments and facilitate engagement with interested parties.

Planned Impact

The proposed research will have wide ranging benefits with the potential for improving how data is collected, analysed and subsequently used to make important decisions. We can broadly speaking categorise the proposed beneficiaries into three groups: 1) Academic Researchers, 2) Non Academic Researchers e.g. those in government and industry, and 3) Wider public. The benefits of the research are listed below in point form, with each indicating which of the above groups would benefit.

1. New methodology in handling the problem of Missing Not at Random (MNAR)

Academics working in the field would clearly benefit from learning of these developments (as detailed in the academic beneficiaries section). In addition, non academic researchers seeking to analyse data that might contain MNAR would also similarly benefit. Our project partners at the Office for National Statistics and NHS Blood and Transplant clearly show a wide ranging interest in handling this problem outside of academia. Groups to benefit from this component of the proposal are thus 1) and 2). The publications, presentations and the workshop will help to facilitate the transfer of knowledge and the free R package will also allow fast uptake of the methods.

2. A more efficient and appropriate analysis of data with the potential for MNAR

By being able to analyse data with an increased certainty of whether MNAR is present or not, and if necessary then correcting for this, will allow researchers to make conclusions that are more appropriate than would otherwise be the case. This would clearly be beneficial for researchers (both academic and non academic) but in addition the wider public that would be affected by the results of the analysis would benefit from a greater certainty in the validity of these. Thus group 3) will benefit here in addition to groups 1) and 2).

3. A greater appreciation of the importance of handling missing data appropriately

Our dissemination strategy is designed to be as wide reaching as possible. In particular, the course we plan to teach at the African Institute of Mathematical Sciences (AIMS) will aim to educate students in Africa of the importance of careful data collection procedures and how to deal with the problem of missing data appropriately. In many parts of the world, poor quality data and analysis practices obstructs effective decision making. Being able to disseminate these methods through a course will allow participants to benefit from learning of these methods, building capability in regions where this is greatly needed. In addition, as important policy decisions are often made based on data analysis, wider society in these areas will also benefit as a result of implementation of appropriate methods to handle MNAR and missing values in general. Thus group 3) will benefit from this component of the proposal.

Publications

10 25 50
 
Description We have derived new theory and methods that enable designs of follow up samples to better detect the presence of missing not at random values. A paper has been submitted to one of the top journals in Statistics with a version also posted on arxiv.
Exploitation Route The methods could in principle be adapted to suit any data collection strategy that permits the possibility of follow up sampling to recover missing values.
Sectors Environment,Healthcare,Government, Democracy and Justice,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology

URL https://arxiv.org/abs/2208.07813
 
Description Conference presentation at CMStatistics in December 2021 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact An invited presentation was given at CM Statistics in December 2021. There was substantial interest from the audience which comprised a diverse international mix of academics and researchers across multiple institutions and countries.
Year(s) Of Engagement Activity 2021
URL http://www.cmstatistics.org/CMStatistics2021/schedule_slot.php?slot=Q
 
Description Conference presentation at the 34th Panhellenic Statistics Conference of the Greek Statistical Institute 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Conference presentation at the 34th Panhellenic Statistics Conference of the Greek Statistical Institute in May 2022.
Year(s) Of Engagement Activity 2022
URL http://www.gsi-conference.gr
 
Description Plenary talk at the Research Students' Conference in Statistics, University of Nottingham 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Postgraduate students
Results and Impact Plenary talk at the Research Students' Conference in Statistics, University of Nottingham.
Year(s) Of Engagement Activity 2022
URL https://www.nottingham.ac.uk/mathematics/events/research-student-conference-2022.aspx
 
Description Poster presentation at the International Workshop on Statistical Modelling 2022 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Poster presentation at the International Workshop on Statistical Modelling 2022.
Year(s) Of Engagement Activity 2022
URL https://www.iwsm2022.com
 
Description Presentation at CMStatistics 2022 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Presentation at the 15th International Conference of the ERCIM WG on Computational and Methodological Statistics (CMStatistics 2022).
Year(s) Of Engagement Activity 2022
URL http://www.cmstatistics.org/CMStatistics2022/
 
Description Presentation at the 24th International Conference on COMPUTATIONAL STATISTICS (COMPSTAT 2022) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Presentation at the 24th International Conference on COMPUTATIONAL STATISTICS (COMPSTAT 2022).
Year(s) Of Engagement Activity 2022
URL http://www.compstat2022.org
 
Description Presentation at the 73rd British Mathematical Colloquium, KCL 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Presentation given at the 73rd British Mathematical Colloquium, KCL.
Year(s) Of Engagement Activity 2022
URL https://www.bmc2022.co.uk
 
Description Seminar at the University of Bath 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Professional Practitioners
Results and Impact Seminar given at the University of Bath.
Year(s) Of Engagement Activity 2022
 
Description Seminar at the University of Hull 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Professional Practitioners
Results and Impact Seminar given at the University of Hull.
Year(s) Of Engagement Activity 2023