Understanding bacterial resistance by machine learning from genetic data

Lead Research Organisation: University of Liverpool
Department Name: Molecular and Clinical Pharmacology

Abstract

Motivation-Bacterial resistance to antibiotics remains one of the biggest challenges in medicine. Although more drugs have become available over the past decade, it is crucial that the use of these drugs is optimised through novel laboratory tests that allow individualised antibiotic therapy for patients.
State-of-the-art-Current treatment of patients with infection usually starts with a best guess antibiotic. At the same time, samples are taken and the laboratory attempts to culture the infecting bacterium. If this is successful, the bacterium is exposed in vitro to various antibiotics to determine which antibiotics are likely to effectively treat the infection. Clinicians use these results to change antibiotic treatment as appropriate. Although these tests are cheap and easy, they provide a narrow representation of complex underlying resistance mechanisms.
Whole genome sequencing continues to become more accessible and is now routinely used in the NHS in the management of certain infections. This technology allows laboratories to read the whole genetic code of bacteria and is used for the management of outbreaks and, in some cases, to detect antibiotic resistance. However, there is a poor understanding of how genomic data are linked to traditional bacterial culture techniques (phenotypic tests), which have a long track record of being able to predict the likelihood of an antibiotic to cure an infection.
As data produced by novel genomic tests become more accessible, we need a better understanding of how they can be used to optimise what antibiotics to give to patients. Past research has shown promising results and plausibility of this approach, however, datasets are limited to certain organisms such as Mycobacterium tuberculosis. Most studies represented antibiotic resistance in a binary form, rather than by a range of susceptibility.
Problem statement-The aim is to discover how certain parts of bacterial genomes affect the bacterial resistance to antibiotics. Since any genome is a long sequence and we already have many bacteria of interest, a machine learning approach is needed to find exact correlations between relevant entries of every genome and a minimum inhibitory concentration of given antibiotic to suppress bacterial growth. The ultimate goal is to replace slow experiments in the lab by faster and mathematically justified algorithmic predictions of susceptibility to antibiotics by using only the bacterial genome.
Work plan-In the early stages of the project, the candidate will collaborate with Liverpool Clinical Laboratories (LCL) and Liverpool University Foundation Trust to collect a representative bank of approximately 500 bacterial isolates that are important causes of infection.
Through collaboration with the Centre for Genomic Research, the bacterial isolates will be sequenced using state-of-the-art Next Generation Sequencing techniques.
The isolates will be tested for susceptibility to a panel of antibiotics using traditional techniques that will be used as the gold-standard test.
One of the challenges will be the high dimensions of data produced for each bacterial isolate from genome sequencing. Therefore, the data will need to undergo dimensionality reduction using information about genes known to confer resistance to the antibiotics of interest.
The data produced from this process will be used as an input to a machine learning model that will predict susceptibility of bacteria to a panel of antibiotics - with the output being a measure of antibiotic concentration required to suppress bacterial growth (minimum inhibitory concentration).
The study will explore the relationship of this model to patient clinical features and outcomes, through data available from LCL.
Expected Deliverables-This project will reveal an explicit relationship between genomes of bacteria and their resistance to antibiotics. A practical outcome will be an implemented algorithm to reliably predict a minimum inhibitory concent

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/T517975/1 01/10/2020 30/09/2025
2599501 Studentship EP/T517975/1 01/10/2021 31/03/2025 Alessandro Gerada