Building machine learning models and neural networks trained on structural information of drug targets to predict antimicrobial resistance

Lead Research Organisation: University of Oxford
Department Name: Sustain Approach to Biomedical Sci CDT

Abstract

The proposed project focuses on training machine learning models using protein structural, chemical and evolutionary features of relevant antibiotic targets to predict antimicrobial resistance (AMR) conferred by Mycobacterium tuberculosis (Mtb). Whilst many researchers are using genetic features to predict resistance, we have previously demonstrated that traditional machine-learning models trained on structural and biophysical features of RNA polymerase can robustly and accurately predict the effect that a missense mutation confers on rifampicin susceptibility. However, these models are inherently unable to predict the effect multiple mutations can have, thereby constraining usable mutation data to a subset of the available mutation data, and thus limiting the clinical applicability of the models. The primary goal of the DPhil project is to address this. The student will have access to the dataset of around 70,000 clinical TB samples amassed by the international CRyPTIC project which was led by Oxford and is reporting its main findings through a series of publications. CRyPTIC collected 15,211 samples, each of which was whole genome sequenced and the susceptibility of 13 antibiotics measured using a 96-well broth microdilution plate. A limitations of this dataset was the lack of resistance to new compounds, such as bedaquiline. One of the CRyPTIC partners has recently provided c. 1000 high-value samples that are extensively resistant and the initial aims (Y1) of this DPhil are to analyse this additional dataset, including retraining previously developed machine learning models, as well as developing a rigorous statistical analysis pipeline to enable continuous robust and easily accessible performance assessment and benchmarking. This will facilitate the primary objective of the project; developing graph convolutional neural networks (gCNNs) featurised with structural and chemical data to predict AMR conferred by multiple mutations against first- and second-line anti-TB compounds. The hypothesis underlying this approach is that the topology of gCNNs can accurately capture all the information from a resistant allele, thereby allowing machine learning models to be efficiently trained and permitting protein targets with high levels of genetic variability to be considered for the first time. A logical extension, time permitting, would be incorporating dynamic data pulled down from molecular dynamics trajectories into the feature sets and assessing the impact on model performance. Aside from gCNNs being a more intuitive architecture to represent structural data than conventional convolutional neural networks (CNNs), gCNNs also preserve the concepts of the atom and the chemical bond until the final layers of the network, thereby preserving spatial information of the drug target. This allows for interrogation of atom embeddings to boost model attribution, a concept particularly relevant in clinically applicable molecular diagnostics. Although the use of structural data to predict AMR is still a relatively new approach, the real novelty of this methodology is that to date the field of AMR prediction has been largely unable to benefit from neural networks trained on sufficiently large datasets, and particularly neural networks trained on structural, physiochemical, and spatial information of the drug target. Furthermore, equivariant graph neural networks (which arguably show the most potential) are extremely new (2021), and with regard to structural modelling problems, have mostly been adopted by groups focussing on binding affinity prediction, not AMR prediction. This project would fall within the following EPSRC research themes: AI and data science Antimicrobial resistance Biological informatics Biophysics Clinical technologies Software engineering

Planned Impact

The main impact of the SABS CDT will be the difference made by the scientists trained within it, both during their DPhils and throughout their future careers.

The impact of the students during their DPhil should be measured by the culture change that the centre engenders in graduate training, in working at the interface between mathematical/physical sciences and the biomedical sciences, and in cross sector industry/academia working practices.

Current SABS projects are already changing the mechanisms of industry academic collaboration, for example as described by one of our Industrial Partners

"UCB and Roche are currently supervising a joint DPhil project and have put in two more joint proposals, which would have not been possible without the connections and the operational freedom offered by SABS-IDC and its open innovation culture, a one-of-the-kind in UK's CDTs."

New collaborations are also being generated: over 25% of current research projects are entirely new partnerships brokered by the Centre. The renewal of SABS will allow it to continue to strengthen and broaden this effect, building new bridges and starting new collaborations, and changing the culture of academic industrial partnerships. It will also continue to ensure that all of its research is made publically available through its Open Innovation structure, and help to create other centres with similar aims.

For all of our partners however, the students themselves are considered to be the ultimate output: as one our partners describes it,

"I believe the current SABS-IDC has met our original goals of developing young research scientists in a multidisciplinary environment with direct industrial experience and application. As a result, the graduating students have training and research experience that is directly applicable to the needs of modern lifescience R&D, in areas such as pharmaceuticals and biotechnology."

However, it is not only within the industrial realm that students have impact; in the later years of their DPhils, over 40% of SABS students, facilitated by the Centre, have undertaken various forms of public engagement. This includes visiting schools, working alongside Zooniverse to develop citizen science projects, and to produce educational resources in the area of crystal images. In the new Centre all students will be required to undertake outreach activities in order to increase engagement with the public.

The impact of the students after they have finished should be measured by how they carry on this novel approach to research, be it in the sector or outside it. As our industrial letters of support make clear, though no SABS students have yet completed their DPhils, there is a clear expectation that they will play a significant role in shaping the UK economy in the future. For example, as one of our partners comments about our students

"UCB has been in constant search for such talents, who would thrive in pharmaceutical research, but they are rare to find in conventional postgraduate programmes. Personally I am interested in recruiting SABS-IDC students to my group once they are ready for the job market."

To demonstrate the type of impact that SABS alumni will have, we consider the impact being made by the alumni of the i-DTC programmes from which this proposal has grown. Examples include two start-up companies, both of which already have investment in the millions. Several students also now hold senior positions in industry and in research facilities and institutes. They have also been named on 30 granted or pending patents, 15 of these arising directly from their DPhil work.

The examples of past success given above indicate the types of impact we expect the graduates from SABS to achieve, and offer clear evidence that SABS students will become future research leaders, driving innovation and changing research culture.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/S024093/1 01/10/2019 31/03/2028
2597363 Studentship EP/S024093/1 01/10/2021 30/09/2025 Dylan Adlard