SENSE - Screening of ENvironmental SEquences to discover novel protein functions, using informatics target selection and high-throughput validation

Lead Research Organisation: University College London
Department Name: Structural Molecular Biology

Abstract

Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.

Technical Summary

This project will enable very large-scale discovery of novel enzymes and bacteriocins from assembled metagenomics sequence data by developing new computational and experimental platforms. Significant technical improvements will emerge from cycles of computational/experimental work, as results from experimental validation will inform algorithm refinements. Our predictions will be concomitantly captured in widely used databases. The scale of experimental validation will be extremely large compared to conventional approaches, enabling increased sampling of sequence space to identify functional novelty.
To sample metagenomic sequences, we will exploit existing and new assemblies to extract sequence data from a variety of sampled biomes. We will make major adaptations to existing bioinformatic platforms to functionally sub-classify metagenomic sequences and apply them to two cases, alpha/beta-hydrolases and bacteriocins. We will develop algorithms characterising key functional determinant residues to score the likelihood of new families having substantially different functionality. Transferring this to bacteriocins will be more challenging as these are often small peptides requiring accessory genes, which can be hard to detect and/or are functionally uncharacterised. Providing sensitive and accurate bacteriocin gene cluster identification and classification will require new methods to identify all components of the gene cluster through expanded homology and contextual models, prior to sub-family classification.
Key to our proposal will be the ability to perform very large-scale experimental validation of the bioinformatics predictions. This will be facilitated by using novel gene synthesis platforms that can synthesise 1000s of genes for screening so as to test sequence neighbours and the target sequences provided by bioinformatics predictions. Furthermore, use of high-throughput microfluidic droplet technology permits testing in a very cost effective and timely way.

Planned Impact

This project will enable large-scale detection of functional biomolecules (proteins), the discovery of which impacts diverse spheres, including biotechnology and biocatalysis, development of new materials, food security and medical applications. It will impact on four BBSRC strategic areas related to metagenomes, synthetic biology, antibiotic resistance and data driven biology.

Firstly, we will analyse the available sequence data more efficiently using a combination of novel bioinformatic and experimental platforms allowing unprecedented throughput. Secondly, newly identified hydrolases and bacteriocins may be valorised as novel functional proteins for the benefit of academic and industry communities. Thirdly, we hope to have educational impact by training researchers in this project in a consortium that will traverse traditional boundaries between in silico biology, microengineering, high-throughput screening and classical enzymology.

The first objective will develop powerful new methods for exploring the vast sequence data being captured by metagenome initiatives. The Finn team manage EMBL-EBI's MGnify resource and have developed robust platforms for handling data on this scale and providing high quality sequence outputs. Leveraging RF and CO's extensive experience in family classification, we will develop new techniques to detect relatives with a high likelihood of functional novelty. Putative targets will be experimentally validated by novel experimental platforms that allow high-throughput at an unprecedented level and additionally probe neighbours in sequence space to detect more stable mutants and further expand knowledge of functional determinants. Importantly, there will be cycles of bioinformatic analysis and prediction followed by experimental validation.

Although we will develop the protocols using two important classes of biomolecules, i.e. enzymes and bacteriocins, the methods will be generic and publicly available to apply to other families expanded by metagenomic data. Our tools will be made widely available to the large community of groups analysing this data, increasing impact. RF and CO coordinate different ELIXIR communities and will have opportunities to publicise the work and promote adoption of these techniques.

The novel hydrolases and bacteriocins, have commercial value and relevance to human health. Bacterial alpha/beta-hydrolases are widely used in many industries, including dairy, pharmaceutical, and laundry, as they are easy to cultivate, nontoxic, and eco-friendly. Bacteriocins have value in both food security and human health, e.g. producing strains can be applied in food to extend preservation times. Bacteriocins can also be added directly to foods as a preservative, incorporated into bioactive packaging, added to animal feed as an anti-pathogen additive to protect livestock against pathogen damage, or help balance the bacteria in the digestive tract of livestock and humans to reduce gastrointestinal diseases. They have the potential to replace existing antibiotics (especially those with resistance) and have been indicated as novel anticancer drugs.

The interdisciplinary aspect of our project will provide additional training opportunities and distinguish the staff development in this project from more conventional training, expanding interdisciplinary skills in the UK. Thus, the postdocs in this project will receive training that positions them to obtain jobs in small or large biotechnology enterprises. This aspect of the project will be accompanied by interactions with the institutions' technology transfer offices (CE, EMBLEM, and UCL Business) and industrial stakeholders, so that information is initially protected and then shared and commercialised.

Publications

10 25 50
 
Description We have developed a new algorithm CATH-FunFam-Fran which allows us to analyse large datasets of protein sequences from metagenomes. This allowed us to search for novel enzymes able to degrade plastics, in particular a family of enzymes called PETases. We have mined the metagenome data in MGnify and identified more than 20,000 putative novel PETases.

We have also built computational workflows (SiteTuner) that are allowing us to examine sequence and structure features of the active sites to identify residues most likely to enhance the activity of the enzyme. We have selected a number of enzymes that our experimental collaborator have tested. Preliminary results have shown activity in cells but some problems with solubility.

We have therefore developed an AI based approach for detecting which residues should be mutated to improve solubility. The solubility of the proteins is important for the proteins to be considered for industrial application.

We have also revised the selection of putative PETases to include enzymes with surface site properties that should improve solubility.
Exploitation Route The data from our analyses would allow researchers in the biotech industries to design novel enzymes that are more effective at degrading plastics.
Sectors Environment,Manufacturing, including Industrial Biotechology

 
Description The poster titled Identifying novel plastic degrading enzymes using computational methods was presented at UCL ISCB 2021. The work won best poster prize.
First Year Of Impact 2023
 
Description Poster-Identifying novel plastic degrading enzymes using computational methods 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Postgraduate students
Results and Impact This work is presented at UCL ISCB 2021 conference.
Year(s) Of Engagement Activity 2021