Mathematical methods for differential privacy in clinical research

Lead Research Organisation: University of Bath
Department Name: Mathematical Sciences

Abstract

Machine learning has great potential to improve data-driven decision-making in healthcare and drug development by enabling deep analytics on complex data types (for example medical images) when these are collected at scale. Such data can greatly enrich our understanding of the mechanisms behind disease and treatment. This PhD project is set up in collaboration with Novartis, a global pharmaceutical company, and provides a unique opportunity to both develop new methods and being part of applying them to advance the science of medicine.
Respecting patient privacy is imperative and subject to strong legal and regulatory constraints, however, not all data types are uniformly usable in a privacy-preserving fashion. For example, only a small number of genetic markers, or peripheral information present in medical images such as MRIs may be sufficient to uniquely re-identify individuals contributing information to a dataset. Differential privacy defines a framework to protect individual privacy when computing a specific set of data summaries for datasets containing sensitive personal information through a randomized algorithm. We can characterize the level of privacy guaranteed through the level of randomization applied. This project focuses on the application of differential privacy to interrogate large pooled and multi-modal patient-level datasets to compute, for example, clinical endpoints or treatment outcomes. Pools of clinical studies both provide unique challenges and opportunities in this setting: On one hand, knowledge about randomization through study design may allow us to suppress columns with little impact on the summary to be calculated. On the other hand, patient datasets are much smaller and more expensive to generate compared to other types of data commonly considered in the differential privacy setting, with only hundreds or thousands of participants.
The aim is to develop methods which can adapt their level of randomization / privacy budget to healthcare scenarios, optimizing privacy and data utility simultaneously. We will consider (i) methods based on random projections to represent information in a way that makes individuals harder to identify, (ii) machine learning algorithms and (iii) other probabilistic /statistical approaches.

Planned Impact

Combining specialised modelling techniques with complex data analysis in order to deliver prediction with quantified uncertainties lies at the heart of many of the major challenges facing UK industry and society over the next decades. Indeed, the recent Government Office for Science report "Computational Modelling, Technological Futures, 2018" specifies putting the UK at the forefront of the data revolution as one of their Grand Challenges.

The beneficiaries of our research portfolio will include a wide range of UK industrial sectors such as the pharmaceutical industry, risk consultancy, telecommunications and advanced materials, as well as government bodies, including the NHS, the Met Office and the Environment Agency.

Examples of current impactful projects pursued by students and in collaboration with stake-holders include:

- Using machine learning techniques to develop automated assessment of psoriatic arthritis from hand X-Rays, freeing up consultants' time (with the NHS).

- Uncertainty quantification for the Neutron Transport Equation improving nuclear reactor safety (co-funded by Wood).

- Optimising the resilience and self-configuration of communication networks with the help of random graph colouring problems (co-funded by BT).

- Risk quantification of failure cascades on oil platforms by using Bayesian networks to improve safety assessment for certification (co-funded by DNV-GL).

- Krylov regularisation in a Bayesian framework for low-resolution Nuclear Magnetic Resonance to assess properties of porous media for real-time exploration (co-funded by Schlumberger).

- Machine learning methods to untangle oceanographic sound data for a variety of goals in including the protection of wildlife in shipping lanes (with the Department of Physics).

Future committed partners for SAMBa 2.0 are: BT, Syngenta, Schlumberger, DNV GL, Wood, ONS, AstraZeneca, Roche, Diamond Light Source, GKN, NHS, NPL, Environment Agency, Novartis, Cytel, Mango, Moogsoft, Willis Towers Watson.

SAMBa's core mission is to train the next generation of academic and industrial researchers with the breadth and depth of skills necessary to address these challenges. SAMBa's most sustained impact will be through the contributions these researchers make over the longer term of their careers. To set the students up with the skills needed to maximise this impact, SAMBa has developed a bespoke training experience in collaboration with industry, at the heart of its activities. Integrative Think Tanks (ITTs) are week-long workshops in which industrial partners present high-level research challenges to students and academics. All participants work collaboratively to formulate mathematical
models and questions that address the challenges. These outputs are meaningful both to the non-academic partner, and as a mechanism for identifying mathematical topics which are suitable for PhD research. Through the co-ownership of collaboratively developed projects, SAMBa has the capacity to lead industry in capitalising on recent advances in mathematics. ITTs occur twice a year and excel in the process of problem distillation and formulation, resulting in an exemplary environment for developing impactful projects.

SAMBa's impact on the student experience will be profound, with training in a broad range of mathematical areas, in team working, in academic-industrial collaborations, and in developing skills in communicating with specialist and generalist audiences about their research. Experience with current SAMBa students has proven that these skills are highly prized: "The SAMBa approach was a great template for setting up a productive, creative and collaborative atmosphere. The commitment of the students in getting involved with unfamiliar areas of research and applying their experience towards producing solutions was very impressive." - Dr Mike Marsh, Space weather researcher, Met Office.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/S022945/1 01/10/2019 31/03/2028
2597523 Studentship EP/S022945/1 01/10/2021 30/09/2025 Ruchen LIU