Unlocking the chemical potential of plants: Predicting function from DNA sequence for complex enzyme superfamilies
Lead Research Organisation:
University College London
Department Name: Structural Molecular Biology
Abstract
Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.
Technical Summary
Our strategy is to integrate powerful data-driven computational approaches with experimental investigation of enzyme function to understand the functions and kingdom-specific expansion of an exemplar complex enzyme superfamily - the triterpene synthases (TTSs). The TTS enzyme superfamily is an ideal test case for our purposes, since these enzymes are able to generate an enormous diversity of cyclized triterpene scaffolds from a single common precursor molecule. Through iterative cycles of computational and experimental investigations we aim to develop sophisticated predictive analytic approaches that will enable us to relate DNA sequence to enzyme function with ever-increasing power and resolution, and in so doing to generate and test hypotheses about enzyme function, mechanisms and evolution. Our aims are to: (1) experimentally determine the chemical diversity encoded by diverse members of the TTS superfamily selected based on our initial CATH-FunFam classification; (2) expand the sequence data for the CATH TTS superfamily and integrate sequence- and structure-based computational approaches to refine our strategies for identifying TTS features implicated in determination of product specificity and for functional classification, and test TTS function predictions; (3) exploit a novel machine learning approach to predict known and novel TTSs; (4) understand TTS function and diversification by determining the product specificities of natural and engineered TTS variants, guided by computational predictions from (1)-(3).
People |
ORCID iD |
Christine Orengo (Principal Investigator) |
Publications
Bordin N
(2023)
AlphaFold2 reveals commonalities and novelties in protein structure space for 21 model organisms
in Communications Biology
Goldtzvik Y
(2023)
Protein diversification through post-translational modifications, alternative splicing, and gene duplication.
in Current opinion in structural biology
Nallapareddy V
(2023)
CATHe: detection of remote homologues for CATH superfamilies using embeddings from protein language models.
in Bioinformatics (Oxford, England)
Description | We have identified the differentially conserved residues proteins having different Triterpene synthase products. This has helped identify a set of mutations to convert a cycloartenol producing enzyme to a cucurbitadienol producing enzyme, which will be experimentally tested by Osbourne group. We have also delved into various pocket characteristics such as localized electric effect, flexibility, hydrophobicity, side chain interaction parameters, no of hydrogen bonds etc. to identify how the properties of pockets vary between the product types. We have also used APBS to calculate the electrostatic potential of the pockets to identify the differences in the electrical potentials of the binding pockets based on product type. We have completed protein-ligand molecular dynamics' simulations of a cycloartenol producing enzyme, mutated enzyme (as predicted earlier to convert the protein to a cucurbitadienol producing enzyme) and cucurbitadienol producing enzyme. These proteins were simulated with the substrate 2,3-oxidosqualene, product cycloartenol/cucurbitadienol and intermediates of the reaction pathway from the substrate to the product. We are currently analyzing these molecular dynamics trajectories to identify the conformations of the ligands, interactions between the proteins and ligands, root mean square fluctuations of the binding site residues and ligands etc. These will help identify differences in interaction profiles based on the product types and how the protein might induce different substrate conformations leading to generation of different products. |
Exploitation Route | It will enable other plant biologists to predict the product type for a novel plant TTS enzyme |
Sectors | Agriculture Food and Drink Manufacturing including Industrial Biotechology |
Title | PocketFeatures:Workflow for characterizing pocket features based on TTS product type |
Description | We used various amino acid features from AAIndex (https://www.genome.jp/aaindex/) such as localized electric effect, flexibility, hydrophobicity, side chain interaction parameter, no of hydrogen bond doner etc to characterize the ligand binding pocket of the Triterpene synthase (TTS) proteins. We also characterized the binding pocket based on the electrostatic potential as calculated by solving posisson-boltzmann equations using the APBS server (https://server.poissonboltzmann.org). We noticed that the distribution of these properties varied based on the product type. We then developed a workflow to identify how these physico-chemical characteristics of the pockets varied based on the product type. |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2024 |
Provided To Others? | No |
Impact | The workflow of the characterization of the pocket can be used to predict the product of TTS proteins without known product types, based on similarity with the known product groups of the TTS proteins. |
Title | Plant TTS models and embeddings |
Description | Plant Triterpene Synthases (TTS) sequences were identified from annotated genomes/plant repositories and unannotated genomes based on HMM scans of the known TTS sequences. These were further clustered at 99% sequence identity to remove isoforms and sequences with a length cutoff of 650-850 amino acids were selected. This led to 21323 sequences which were modelled using ColabFold (based on AlphaFold2). We also calculated the ESM sequence embeddings of the 175 plant TTS sequences with known products. |
Type Of Material | Data analysis technique |
Year Produced | 2024 |
Provided To Others? | No |
Impact | The models of plant proteins can be used for various studies of TTS such as identifying structural diversity in them either globally or in the ligand binding site. This can be further utilized to identify how the physical properties of the ligand binding site varies in the different plant TTS. This can have further implications in predicting the product type based on the difference in structure and physico-chemical properties of the binding pocket. The ESM embeddings can be used to identify remote homologues of the TTS sequences in metagenomes. Also, these embeddings can be used to predict the product type for TTS with unknown function. |
Description | ProtFunAI |
Organisation | Technical University of Munich |
Country | Germany |
Sector | Academic/University |
PI Contribution | Development of deep learning algorithms for protein function prediction, protein classification and analysis |
Collaborator Contribution | Training in deep learning protocols and protein language models. Contributions to project design. Novel protein language models to generate protein embeddings for protein function prediction and other protein based prediction tasks. |
Impact | Project has just started so no outputs yet |
Start Year | 2024 |
Description | A talk at ISMB/ECCB 2023 |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Other audiences |
Results and Impact | NA |
Year(s) Of Engagement Activity | 2023 |