Interpretable representation learning

Lead Research Organisation: University of Oxford

Department Name: Computer Science

Abstract

Deep learning approaches have had tremendous successes across a wide range of domains in recent years, from image segmentation and classification to speech recognition and language translation. They have also started to demonstrate promising results in healthcare applications, supported by the increasingly growing size and diversity of available patient data.

However, despite clear performance improvements, their adoption by the healthcare community is hindered both by the fact that many perceive these deep learning models as indecipherable black boxes, and that current state-of-the-art approaches in the medical domain do not offer a good handle on the uncertainty of model predictions.

The objective of my research will be to fill these gaps by developing novel representation learning approaches that are more interpretable and robust. Of particular interest will be the extension of these methods to the case of heterogenous and/or non-stationary data inputs, building for example on some of the early work developed around Bayesian Recurrent Neural Networks applied to language modeling and image captioning.
The research will aim to demonstrate how these can be leveraged for healthcare applications, where patient data may come from a wide range of different sources (e.g., medical imaging, high-throughput sequencing, electronic medical records) and vary over time under the influence of disease progression and treatment effects.
The goal will be to show very concretely how they can help provide a better understanding of the elements that underpin the predictions of a machine learning model, as well as lead to new insights related to disease understanding (e.g., identification of patient subgroups for a given pathology).

This cross-disciplinary project, combining theoretical machine learning developments as well as their applications to healthcare, is at the intersection of two of the core research areas from EPSRC, namely "Artificial Intelligence Technologies" and "Healthcare Technologies".
The project will be supervised by Professor Yarin Gal (Oxford Applied & Theoretical Machine Learning Group, Department of Computer Science, University of Oxford) and Dr. Lindsay Edwards (Vice President, AI/ML Engineering, GlaxoSmithKline).

Student:

Pascal Notin

Period of Study:

Sep 19 - Sep 23

Funder:

EPSRC

Project Status:

Closed

Project Category:

Studentship

Project Reference:

2287801

Research Topic:

Unclassified

Organisations

People	ORCID iD
Yarin Gal (Primary Supervisor)
Pascal Notin (Student)

Publications

Author Name

Title Publication Date Published

10 25 50

Frazer J (2021) Disease variant prediction with deep generative models of evolutionary data. in Nature

Frazer J (2020) Large-scale clinical interpretation of genetic variants using evolutionary data and deep learning

Gohil C (2022) Mixtures of large-scale dynamic functional brain network modes. in NeuroImage

Notin P (2022) TranceptEVE: Combining Family-specific and Family-agnostic Models of Protein Sequences for Improved Fitness Prediction

Thadani N (2022) Learning from pre-pandemic data to forecast viral escape

Studentship Projects

Project Reference	Relationship	Related To	Start	End	Student Name
EP/S513866/1			30/09/2018	31/03/2024
2287801	Studentship	EP/S513866/1	30/09/2019	29/09/2023	Pascal Notin

Key Findings
Impact Summary


Description	Disease variant prediction with deep generative models of evolutionary data: Quantifying the pathogenicity of protein variants in human disease-related genes would have a marked effect on clinical decisions, yet the overwhelming majority (over 98%) of these variants still have unknown consequences. In principle, computational methods could support the large-scale interpretation of genetic variants. However, state-of-the-art methods have relied on training machine learning models on known disease labels. As these labels are sparse, biased and of variable quality, the resulting models have been considered insufficiently reliable. Here we propose an approach that leverages deep generative models to predict variant pathogenicity without relying on labels. By modelling the distribution of sequence variation across organisms, we implicitly capture constraints on the protein sequences that maintain fitness. Our model EVE (evolutionary model of variant effect) not only outperforms computational approaches that rely on labelled data but also performs on par with, if not better than, predictions from high-throughput experiments, which are increasingly used as evidence for variant classification. We predict the pathogenicity of more than 36 million variants across 3,219 disease genes and provide evidence for the classification of more than 256,000 variants of unknown significance. Our work suggests that models of evolutionary information can provide valuable independent evidence for variant interpretation that will be widely useful in research and clinical settings.
Exploitation Route	Our data and results, available at evemodel.org, provide information on a gene-by-gene basis where researchers and physicians can look at individual variants in detail, including model predictions for each variant for 3k proteins. We are working to extend our predictions to the full proteome and are closely collaborating with several research teams and private institutions to integrate our models and predictions in their workflows and analyses. Our objective is to thereby support the early diagnosis of genetic diseases by clinical geneticists, as well as solidifying our understanding of the mechanisms underlying genetic disorders.
Sectors	Healthcare
URL	https://www.nature.com/articles/s41586-021-04043-8


Description	Models developed as part of this grant (in particular EVE models, discussed in the paper "Disease variant prediction with deep generative models of evolutionary data") have started being used in hospitals to identify potential genes responsible for genetic pathologies.
First Year Of Impact	2022
Sector	Healthcare
Impact Types	Societal

Abstract

Organisations

People

ORCID iD

Publications

Studentship Projects