Combatting antimicrobial resistance through new software for natural product discovery

Lead Research Organisation: University of Glasgow

Department Name: School of Computing Science

Abstract

The rate of chemical discovery of new antibiotics is too slow. This has resulted in bacteria evolving resistance to current medicine at a faster rate than new chemistry is being discovered. Globally, antimicrobial resistance is already thought to be responsible for 700,000 deaths per year, and, in the absence of new solutions, this is estimated to rise to 10 million by 2050.

Bacteria themselves are excellent producers of compounds with biologically active properties. In fact, over 70% of the antibiotics approved between 1981 and 2016 are bacterially produced natural products or derivatives thereof. Many of these compounds are assembled by groups of enzymes that are themselves encoded in areas of the bacterial genome known as biosynthetic gene clusters. Technological advances have increased the number, quality and availability of bacterial genome sequences. This wealth of data has revealed that both the number and diversity of predicted biosynthetic gene clusters greatly exceed expectations.

The knowledge that bacteria have the potential to produce this vast reservoir of undiscovered chemistry has re-invigorated the research community. Often bacterial strains are genome sequenced and cultured in an attempt to detect the molecules being produced by the biosynthetic gene clusters identified in the sequence. Whilst mature computational tools exist to analyse the resulting mass spectrometry and sequence data sets independently, the community lacks a platform to bring these two data types together. This absence results in a sever bottleneck in the analysis pipeline as researchers are forced to attempt to manually link the predicted gene clusters with their products, which are hidden somewhere in the mass spectrometry data. Given that a typical strain can easily contain around 100 biosynthetic gene clusters and mass spectrometry of the cultured strain can easily result in fragment spectra for 2000 molecules, it is clear that the space of potential links is too vast for manual investigation.

We will develop and implement the computational tools that can link the gene clusters and their products in these large datasets in an automated way. The tools will allow import of the output of popular spectral and genomic analysis software. Our platform will then predict links and allow users to interactively explore the results. For example, investigating the content of the gene clusters and spectra that have been linked together to see if the link is likely to be genuine. Crucially, this software will be built in a modular manner, with future development in mind. It will therefore be the vehicle into which future tools (e.g. more advanced linking tools optimised for particular natural product gene clusters) can be developed, deployed and benchmarked.

Technical Summary

Bacterial natural products (specialized metabolites) have high potential as future antibiotics. Genome sequencing is revealing that the number and diversity of biosynthetic gene clusters (BGCs; groups of genes encoding enzymes that assemble specialized metabolites) exceeds previous expectations. A key challenge is matching the predicted BGCs to their products measured in bacterial culture via mass spectrometry (MS).
Computational tools such as GNPS molecular networking (for MS data) and antiSMASH (for predicting BGCs) are maturing and provide the means to analyse these data types separately but tools do not exist to help identify which of the ~2000 fragment spectra observed in a typical MS analysis of a single strain (under one fermentation/extraction condition) corresponds to which of the tens of BGCs predicted from the genome. Researchers perform this manually, resulting in a severe analysis bottleneck.
Large chemical and sequence datasets are becoming common: e.g. a combined genomic and mass spectral analysis of 146 strains was recently published. As we enter this data rich era, development of tools based on statistical and machine learning approaches are urgently required. We propose the development of software that will predict biosynthetic-chemical links by mining the data for shared patterns. Groups of similar spectra can be matched to groups of similar BGCs based upon the strains present/absent in the two groups, or biosynthetic features present in the spectra (predicted from the BGCs), or combinations of both.
Various research challenges exist in this area: how to build the link scoring methods, how to group spectra across strains, how to group BGCs across strains, etc. Development and benchmarking of the tools to answer these challenges requires the two data sets to be accessible in a shared analysis space. We propose the development of software that will bring together these two omic data types including the first automatic link prediction approaches.

Planned Impact

This research has a wide range of stakeholders. Researchers whose pipelines will be made more efficient, the pharmaceutical industry who will be able to accelerate the drug discovery process, government policy makers (particularly with respect to strategies to overcome antibiotic resistance), research funders and the public (through the health benefits that can be conferred).

Expanding the chemical resources (and knowledge of them and their biosynthesis) from microorganisms is a global fundamental research goal because of the urgency with which solutions are required to combat antimicrobial resistance. This has clear relevance within both academia and industry. The proposed research will identify links between biosynthetic gene clusters and the specialized metabolites they produce. When such predictions are experimentally validated with synthetic biology approaches, the result is a powerful tool for the discovery and prioritization of new antibiotics. Our software will be the first solution to this problem, but also a platform that can become the basis for the development of the next generation of tools in this area, based on machine learning and data science techniques.

We will combine metabolomics, genomics and software development in a manner that will, ultimately, provide a method to assess the potential of bacteria to address worldwide health issues. Ultimately, the public will indirectly benefit from the efficient discovery of new chemistry to address the antimicrobial resistance crisis.

To maximise uptake, we will provide an easy to install and use docker image of our software. To maximise community involvement, all source code will be available and the software will be designed in a modular way to easily enable the addition (and benchmarking) of new modules for link predictions.

All staff involved in this project will receive excellent exposure (and therefore training) in a vital multidisciplinary area.

Funded Value:

£141,400

Funded Period:

Nov 18 - Apr 20

Funder:

BBSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

BB/R022054/1

Principal Investigator:

Simon Rogers

Research Subject:

Microbial sciences (42%)

Omic sciences & technologies (14%)

Tools, technologies & methods (42%)

Research Topic:

Biochemistry & physiology (14%)

Bioinformatics (42%)

Genomics (14%)

Microbiology (14%)

Microorganisms (14%)

Organisations

People	ORCID iD
Simon Rogers (Principal Investigator)
Rónán Daly (Co-Investigator)	http://orcid.org/0000-0002-1275-6820
Paul Hoskisson (Co-Investigator)
Katherine Duncan (Co-Investigator)

Publications

Author Name

Title Publication Date Published

|< < 1 2 > >|

10 25 50

Beniddir MA (2021) Advances in decomposing complex metabolite mixtures using substructure- and network-based computational metabolomics approaches. in Natural product reports

Ernst M (2019) MolNetEnhancer: Enhanced Molecular Networks by Integrating Metabolome Mining and Annotation Tools. in Metabolites

Feeney MA (2022) ActinoBase: tools and protocols for researchers working on Streptomyces and other filamentous actinobacteria. in Microbial genomics

Hjörleifsson Eldjárn G (2021) Ranking microbial metabolomic and genomic links in the NPLinker framework using complementary scoring functions. in PLoS computational biology

Mullowney MW (2023) Artificial intelligence for natural product drug discovery. in Nature reviews. Drug discovery

Rogers S (2019) Deciphering complex metabolite mixtures by unsupervised and supervised substructure discovery and semi-automated annotation from MS/MS spectra. in Faraday discussions

Rogers S (2018) Deciphering complex metabolite mixtures by unsupervised and supervised substructure discovery and semi-automated annotation from MS/MS spectra

Schorn MA (2021) A community resource for paired genomic and metabolomic data mining. in Nature chemical biology

Soldatou S (2019) Linking biosynthetic and chemical space to accelerate microbial secondary metabolite discovery. in FEMS microbiology letters

Soldatou S (2021) Comparative Metabologenomics Analysis of Polar Actinomycetes. in Marine drugs

Key Findings
Further Funding
Research Databases and Models
Collaboration
Engagement Activities


Description	- We have developed methods for improved linking of genomic and metabolomic data that will aid in the discovery of novel antimicrobials. - We have developed an open source software framework that implements these methods and can be extended in future by other researchers developing methods in this area.
Exploitation Route	The software will be of use to researchers who have generated large linked metabolomic and genomic data for the purpose of finding novel antimicrobials
Sectors	Healthcare,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology


Description	University of Strathclyde PhD Research Excellence Award
Amount	£60,000 (GBP)
Organisation	University of Strathclyde
Sector	Academic/University
Country	United Kingdom
Start	10/2019
End	09/2022


Title	Metabolomics Data
Description	Metabolomics data, HR-MS/MS profiles of all 26 strains in five media = 130 metabolomics profiles plus controls
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	No
Impact	The dataset will become publicly available once we have finished analysing the data and the publication is complete


Title	NPLinker tool
Description	The NPLinker tool is the main output of this research. It is a platform in which users can mine paired metabolomic and genomic data for links between metabolites and the gene clusters that produce them. The code is currently available on GitHub and a paper is in preparation.
Type Of Material	Computer model/algorithm
Year Produced	2019
Provided To Others?	Yes
Impact	None yet.
URL	http://github.com/sdrogers/nplinker


Description	Prof. Juho Rousu
Organisation	Aalto University
Department	Department of Computer Science
Country	Finland
Sector	Academic/University
PI Contribution	Prof. Rousu and I obtained funding from SICSA for him to visit my group in Glasgow for three months in Summer 2019. Prof Rousu is an expert in the analysis of metabolomic data. Collaboration from his visit has two strands: one stemming from the BBSRC project (Combatting...), in which we are working together on new IOKR methods for predicting the products of Biosynthetic Gene Clusters and a second strand stemming from the EPSRC project (Closed-loop...), in which we are building probabilistic models that incorporate retention time into annotation, that could be used in a closed-loop context to prioritise MS acquisition.
Collaborator Contribution	Prof Rousu has provided expertise in kernel methods for metabolite ID, and retention time prediction. His group also funded a visit by one of his PGR students (Eric Bach) to my group for several weeks in Summer 2019 (a direct result of Prof. Rousu's visit)
Impact	1 Draft publication awaiting submission
Start Year	2019


Description	ScotChem Natural Products in the Bioeconomy
Organisation	Robert Gordon University
Country	United Kingdom
Sector	Academic/University
PI Contribution	Dr Duncan was invited to co-organize the inaugural ScotChem Natural Products in the Bioeconomy workshop as a result of ongoing work from this project.
Collaborator Contribution	The Workshop was over-subscribed and held at the University of Aberdeen. It involved an opportunity to engage with industry and academic partners and foster initial collaborations. Research feedback and exchange of ideas was valuable for the development of NP Linker
Impact	A subsequent grant (funded by Scottish University's Life Science Alliance) was secured between Robert Gordon's University (RGU, PI) and K. Duncan (co-I) - multidisciplinary (microbiology, environmental science, chemistry, molecular biology)
Start Year	2019


Description	ScotPEN Wellcome Trust Public Engagement Grant
Organisation	University of St Andrews
Country	United Kingdom
Sector	Academic/University
PI Contribution	ScotPEN Wellcome Trust Public Engagement Grant "Antibiotics under our feet".
Collaborator Contribution	PI: Clarissa Melo Czekster (University of St Andrews), Co-Is: K. Duncan (University of Strathclyde), Karen Doherty (Fife Council - Primary Science Development Officer), Eulyn Pagaling (James Hutton Institute). £68,000 (no funds to Strathclyde).
Impact	Antibiotics under our feet" aims to raise science capital in the P5-7 age group and their teachers through co-ownership of a citizen science project seeking new chemical compounds from soil microbes to treat drug resistant infections. - multidisciplinary
Start Year	2019


Description	Glasgow Science Centre - Curiosity Live
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	Regional
Primary Audience	Schools
Results and Impact	Glasgow Science Centre run a regional event, biannually called "Curiosity Live", this event is attended by many high schools over the course of several days. The Duncan group ran an activity stand "medicines from microorganisms" at both the event on November 7th 2019 and March 12th 2020, reaching over 1000 students at each event (over 2000 combined). The stands featured several interactive activities for the students to directly engage from, including "isolating their own bacteria from soil/sediment", a "Where do medicines come from" game and "match the drug to the organism". There were some great questions from all ages of students, about careers, drug discovery, microbiology and chemistry. The event was run by undergraduates and postgraduate members of the Duncan group, encompassing 10 individuals over the two events. Both events were additionally profiled online at the Glasgow Science Centre @gsk1 (twitter and instagram) reaching the greater public, and also on our own social media (twitter @kate_duncan, @medicinesfromthesea instagram). Due to the success of our first event, we were invited back in March 2020, and look forward to contributing to further events.
Year(s) Of Engagement Activity	2019,2020
URL	https://www.glasgowsciencecentre.org/discover/family-events/curiosity-live


Description	University of Strathclyde - Open Day
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	Regional
Primary Audience	Schools
Results and Impact	On the 5th October, we ran a stand called "microbiology and molecular biology" at the University of Strathclyde Open Day. This engaged approx. 400 senior high school students (prospective university students) and their parents in activities such as "actinomycetes - a source of medicines" and "chemical extraction" - which were hands on. This resulted in multiple questions about postgraduate and undergraduate study of microbiology and career choices. The actives were run by current postgraduate students.
Year(s) Of Engagement Activity	2019

Abstract

Technical Summary

Planned Impact

Organisations

People

ORCID iD

Publications