Unlocking the chemical potential of plants: Predicting function from DNA sequence for complex enzyme superfamilies

Lead Research Organisation: University College London
Department Name: Structural Molecular Biology

Abstract

Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.

Technical Summary

Our strategy is to integrate powerful data-driven computational approaches with experimental investigation of enzyme function to understand the functions and kingdom-specific expansion of an exemplar complex enzyme superfamily - the triterpene synthases (TTSs). The TTS enzyme superfamily is an ideal test case for our purposes, since these enzymes are able to generate an enormous diversity of cyclized triterpene scaffolds from a single common precursor molecule. Through iterative cycles of computational and experimental investigations we aim to develop sophisticated predictive analytic approaches that will enable us to relate DNA sequence to enzyme function with ever-increasing power and resolution, and in so doing to generate and test hypotheses about enzyme function, mechanisms and evolution. Our aims are to: (1) experimentally determine the chemical diversity encoded by diverse members of the TTS superfamily selected based on our initial CATH-FunFam classification; (2) expand the sequence data for the CATH TTS superfamily and integrate sequence- and structure-based computational approaches to refine our strategies for identifying TTS features implicated in determination of product specificity and for functional classification, and test TTS function predictions; (3) exploit a novel machine learning approach to predict known and novel TTSs; (4) understand TTS function and diversification by determining the product specificities of natural and engineered TTS variants, guided by computational predictions from (1)-(3).
 
Description We have identified the differentially conserved residues proteins having different Triterpene synthase products. This has helped identify a set of mutations to convert a cycloartenol producing enzyme to a cucurbitadienol producing enzyme, which will be experimentally tested by Osbourne group.

We have also delved into various pocket characteristics such as localized electric effect, flexibility, hydrophobicity, side chain interaction parameters, no of hydrogen bonds etc. to identify how the properties of pockets vary between the product types. We have also used APBS to calculate the electrostatic potential of the pockets to identify the differences in the electrical potentials of the binding pockets based on product type.

We have completed protein-ligand molecular dynamics' simulations of a cycloartenol producing enzyme, mutated enzyme (as predicted earlier to convert the protein to a cucurbitadienol producing enzyme) and cucurbitadienol producing enzyme. These proteins were simulated with the substrate 2,3-oxidosqualene, product cycloartenol/cucurbitadienol and intermediates of the reaction pathway from the substrate to the product. We are currently analyzing these molecular dynamics trajectories to identify the conformations of the ligands, interactions between the proteins and ligands, root mean square fluctuations of the binding site residues and ligands etc. These will help identify differences in interaction profiles based on the product types and how the protein might induce different substrate conformations leading to generation of different products.
Exploitation Route It will enable other plant biologists to predict the product type for a novel plant TTS enzyme
Sectors Agriculture

Food and Drink

Manufacturing

including Industrial Biotechology

 
Title PocketFeatures:Workflow for characterizing pocket features based on TTS product type 
Description We used various amino acid features from AAIndex (https://www.genome.jp/aaindex/) such as localized electric effect, flexibility, hydrophobicity, side chain interaction parameter, no of hydrogen bond doner etc to characterize the ligand binding pocket of the Triterpene synthase (TTS) proteins. We also characterized the binding pocket based on the electrostatic potential as calculated by solving posisson-boltzmann equations using the APBS server (https://server.poissonboltzmann.org). We noticed that the distribution of these properties varied based on the product type. We then developed a workflow to identify how these physico-chemical characteristics of the pockets varied based on the product type. 
Type Of Material Improvements to research infrastructure 
Year Produced 2024 
Provided To Others? No  
Impact The workflow of the characterization of the pocket can be used to predict the product of TTS proteins without known product types, based on similarity with the known product groups of the TTS proteins. 
 
Title Plant TTS models and embeddings 
Description Plant Triterpene Synthases (TTS) sequences were identified from annotated genomes/plant repositories and unannotated genomes based on HMM scans of the known TTS sequences. These were further clustered at 99% sequence identity to remove isoforms and sequences with a length cutoff of 650-850 amino acids were selected. This led to 21323 sequences which were modelled using ColabFold (based on AlphaFold2). We also calculated the ESM sequence embeddings of the 175 plant TTS sequences with known products. 
Type Of Material Data analysis technique 
Year Produced 2024 
Provided To Others? No  
Impact The models of plant proteins can be used for various studies of TTS such as identifying structural diversity in them either globally or in the ligand binding site. This can be further utilized to identify how the physical properties of the ligand binding site varies in the different plant TTS. This can have further implications in predicting the product type based on the difference in structure and physico-chemical properties of the binding pocket. The ESM embeddings can be used to identify remote homologues of the TTS sequences in metagenomes. Also, these embeddings can be used to predict the product type for TTS with unknown function. 
 
Description ProtFunAI 
Organisation Technical University of Munich
Country Germany 
Sector Academic/University 
PI Contribution Development of deep learning algorithms for protein function prediction, protein classification and analysis
Collaborator Contribution Training in deep learning protocols and protein language models. Contributions to project design. Novel protein language models to generate protein embeddings for protein function prediction and other protein based prediction tasks.
Impact Project has just started so no outputs yet
Start Year 2024
 
Description A talk at ISMB/ECCB 2023 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact NA
Year(s) Of Engagement Activity 2023