New Approaches to Bayesian Data Science: Tackling Challenges from the Health Sciences

Lead Research Organisation: Lancaster University
Department Name: Mathematics and Statistics


The health sciences have seen an explosion in the amount of data collected at both individual and population levels. This data can be varied, including genetic information, health records, data on activity levels obtained from wearable devices, and image data from scans. There is huge potential for improved diagnoses, timely interventions and more effective treatments if we can fully extract understanding from this data. Example applications included real-time monitoring of patients, developing personalised treatment, or real-time monitoring and decision-making for epidemics. However the data science challenges in extracting these insights are vast.

Features of these challenges include the need to make inferences about and decisions for individuals from within a population, and the need to synthesise information from disparate data sources and data types. Whilst we have substantial data collected at a population level, the amount of information on any given individual may be still be limited. Appropriately quantifying uncertainty is crucial for making decisions, with the optimal decision often being driven by the probability of relatively rare events (e.g. extreme reaction to a drug). We need model-based approaches to data science that can leverage scientific understanding, but we need the statistical analyses to be robust to unavoidable inadequacies of these models. Underpinning many of these applications is the requirement to develop new understanding, and this differs from a focus on making predictions that it is most common among current statistical or machine learning methods.

Bayesian data science provides a natural framework for tackling these challenges. Bayesian methods are model-based, can appropriately quantify and propagate uncertainty, and through hierarchical models are able to use population-level information when making inferences about individuals. Repeated application of Bayes theorem gives a natural paradigm for synthesizing information across multiple data sources. However, current Bayesian data science methods are not feasible for many modern, big-data, applications in the health sciences. Bayesian methods require integrating over uncertainty. Such high-dimensional integration carries a substantial computational overhead when compared to alternative, often optimization-based, data science methods. So while the motivation for Bayesian analysis is clear, this computational overhead means that, currently, implementing Bayesian approaches is often not feasible.

This programme of research will develop the new approaches to Bayesian data science that are needed both within the health sciences and more widely. It builds on recent breakthroughs in Monte Carlo integration methods that show great promise for being efficient for large data; and on new paradigms for Bayesian-like updates that are suitable for complex models and which focus modelling effort just on the aspects of these models that are most important. It will address key research challenges in the health sciences -- directly developing new insights and understanding for these.

Planned Impact

Who will benefit?
This proposal will benefit a variety of different stakeholders including:
(a) A range of public bodies, academic groups and companies within the health sciences;
(b) Society more generally through the application of this research;
(c) The academic data science research community;
(d) Project personnel: the PDRAs and PhD students.

How will they benefit?

Solution to current health science challenges [groups a,b]
The research project will tackle current key health science challenges, such as real-time decision-making for epidemics, structured association studies and in personalised medicine. This will involve working directly with scientists within each of these areas to develop and apply new data science methods. Impact will arise immediately from new insights found, for example within association studies; and from a suite of new methods that can be applied more widely. Direct work with project partners will see the quick uptake of new methods in practice. The wider impact will be supported through the development, by one of our project partners, of software for specific applications aimed at health scientists and practitioners.

New Bayesian data science techniques [groups a,b,c]
We will generalise the new data science methods developed to address specific health science challenges. This will lead to a new suite of Bayesian data science techniques, together with associated theory and insight. These methods will cover generic challenges such as scalable computational methods, robust Bayesian procedures and how to fuse information from disparate data sources and types. The impact of this work will be supported by making code developed freely-available; and by dissemination at international data science conferences and in journals that span a range of data science related disciplines. Part of this dissemination will be through an annual workshop linked to the research project.

Targeted knowledge exchange [group a]
Our project partners will benefit directly from this research project. To maximise this we have plans for two research retreats per year, each around a different specific health science challenge, and which will have appropriate project partner involvement. PDRAs and PhD students on the grant will spend periods of time working at and directly with our partners. The grant will also have an external advisory board with strong end-user involvement.

Developing good people [groups a,c,d]
We will develop highly skilled researchers with Bayesian data science and applications in health science. All PDRAs and PhD students will spend substantial time within research groups specialising both in fundamental data science and in health science applications. They will benefit from supportive training environments and opportunities provided by the five participating institutions; for example bespoke training courses run by the STOR-i and OxWasp doctoral training centres.

This grant will lead to an increase in the number of high-quality researchers working in a skill-shortage area, and able to seek future employment both within academy and industry.


10 25 50
Description Buchholz-Contributions to approximate Bayesian Inference, Glasgow 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Professional Practitioners
Results and Impact Alexander Buchholz- • Contributions to approximate Bayesian Inference, University of Glasgow, UK, 6/12/2020
Year(s) Of Engagement Activity 2020
Description Buchholz-Contributions to approximate Bayesian Inference, Warwick 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Professional Practitioners
Results and Impact Alexander Buchholz- • Contributions to approximate Bayesian Inference, University of Warwick, UK, 17/1/2020
Year(s) Of Engagement Activity 2020