Decoding Resistance: Statistical and Machine Learning Paradigms for Predicting Antimicrobial Resistance in Mycobacterium tuberculosis

Lead Research Organisation: UNIVERSITY OF OXFORD
Department Name: Sustain Approach to Biomedical Sci CDT

Abstract

The proposed project focuses on predicting the effects of mutations on drug efficacy against Mycobacterium tuberculosis. Current gold-standard prediction methods rely on look-up tables of mutation effects, referred to as mutation catalogues. These catalogues are created using highly complex and irreproducible statistical pipelines that make assumptions of clinical concern. Half of this PhD project aims to simplify and enhance these methods by exploring approaches grounded in traditional frequentist statistics and regression analysis. The resulting solutions will be implemented as rigorously tested, publicly available, pip-installable software tools, enabling researchers to apply them to their specific use cases and datasets. A new set of accurate, reproducible catalogues for all anti-TB compounds will be released.
The second half of the project aims to address an inferential gap by developing generalizable machine learning (ML) solutions capable of predicting the effects of mutations not included in existing catalogues. While many researchers leverage genetic features to predict resistance, we demonstrate that traditional ML models trained on structural and biophysical features of drug-target (protein) structures can robustly and accurately predict the impact of a missense mutation on drug susceptibility. However, these models cannot account for the combined effects of multiple mutations. This limitation reduces the usable mutation data to a subset of the available dataset, restricting the clinical applicability of such models.
One potential solution involves developing graph convolutional neural networks (gCNNs) featurized with structural and chemical data to predict antimicrobial resistance (AMR) conferred by multiple mutations for first- and second-line anti-TB compounds. The underlying hypothesis is that the topology of gCNNs can comprehensively capture resistant allele information, enabling efficient machine learning model training. This approach would, for the first time, accommodate protein targets with high genetic variability. Additionally, it allows for interrogation of atom embeddings to enhance model attribution, a concept particularly relevant to clinically applicable molecular diagnostics.
Although using structural data for AMR prediction is a relatively new approach, the true innovation of this methodology lies in addressing the field's historical inability to leverage neural networks trained on large, diverse datasets-particularly datasets incorporating structural, physiochemical, and spatial information about drug targets.
The student will have access to approximately 70,000 clinical TB samples collected by the international CRyPTIC project, led by Oxford. CRyPTIC's dataset includes 15,211 samples, each subjected to whole-genome sequencing and susceptibility testing for 13 antibiotics using a 96-well broth microdilution plate. A notable addition to this dataset is approximately 1,000 high-value samples with extensive drug resistance to Bedaquiline, the newest anti-TB compound.

This project falls within the following EPSRC research themes: AI and data science Antimicrobial resistance Biological informatics Biophysics Clinical technologies Software engineering

Planned Impact

The main impact of the SABS CDT will be the difference made by the scientists trained within it, both during their DPhils and throughout their future careers.

The impact of the students during their DPhil should be measured by the culture change that the centre engenders in graduate training, in working at the interface between mathematical/physical sciences and the biomedical sciences, and in cross sector industry/academia working practices.

Current SABS projects are already changing the mechanisms of industry academic collaboration, for example as described by one of our Industrial Partners

"UCB and Roche are currently supervising a joint DPhil project and have put in two more joint proposals, which would have not been possible without the connections and the operational freedom offered by SABS-IDC and its open innovation culture, a one-of-the-kind in UK's CDTs."

New collaborations are also being generated: over 25% of current research projects are entirely new partnerships brokered by the Centre. The renewal of SABS will allow it to continue to strengthen and broaden this effect, building new bridges and starting new collaborations, and changing the culture of academic industrial partnerships. It will also continue to ensure that all of its research is made publically available through its Open Innovation structure, and help to create other centres with similar aims.

For all of our partners however, the students themselves are considered to be the ultimate output: as one our partners describes it,

"I believe the current SABS-IDC has met our original goals of developing young research scientists in a multidisciplinary environment with direct industrial experience and application. As a result, the graduating students have training and research experience that is directly applicable to the needs of modern lifescience R&D, in areas such as pharmaceuticals and biotechnology."

However, it is not only within the industrial realm that students have impact; in the later years of their DPhils, over 40% of SABS students, facilitated by the Centre, have undertaken various forms of public engagement. This includes visiting schools, working alongside Zooniverse to develop citizen science projects, and to produce educational resources in the area of crystal images. In the new Centre all students will be required to undertake outreach activities in order to increase engagement with the public.

The impact of the students after they have finished should be measured by how they carry on this novel approach to research, be it in the sector or outside it. As our industrial letters of support make clear, though no SABS students have yet completed their DPhils, there is a clear expectation that they will play a significant role in shaping the UK economy in the future. For example, as one of our partners comments about our students

"UCB has been in constant search for such talents, who would thrive in pharmaceutical research, but they are rare to find in conventional postgraduate programmes. Personally I am interested in recruiting SABS-IDC students to my group once they are ready for the job market."

To demonstrate the type of impact that SABS alumni will have, we consider the impact being made by the alumni of the i-DTC programmes from which this proposal has grown. Examples include two start-up companies, both of which already have investment in the millions. Several students also now hold senior positions in industry and in research facilities and institutes. They have also been named on 30 granted or pending patents, 15 of these arising directly from their DPhil work.

The examples of past success given above indicate the types of impact we expect the graduates from SABS to achieve, and offer clear evidence that SABS students will become future research leaders, driving innovation and changing research culture.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/S024093/1 30/09/2019 30/03/2028
2597363 Studentship EP/S024093/1 30/09/2021 29/09/2025 Dylan Adlard