Unlocking the chemical potential of plants: Predicting function from DNA sequence for complex enzyme superfamilies

Lead Research Organisation: UNIVERSITY COLLEGE LONDON

Department Name: Structural Molecular Biology

Abstract

Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.

Technical Summary

Our strategy is to integrate powerful data-driven computational approaches with experimental investigation of enzyme function to understand the functions and kingdom-specific expansion of an exemplar complex enzyme superfamily - the triterpene synthases (TTSs). The TTS enzyme superfamily is an ideal test case for our purposes, since these enzymes are able to generate an enormous diversity of cyclized triterpene scaffolds from a single common precursor molecule. Through iterative cycles of computational and experimental investigations we aim to develop sophisticated predictive analytic approaches that will enable us to relate DNA sequence to enzyme function with ever-increasing power and resolution, and in so doing to generate and test hypotheses about enzyme function, mechanisms and evolution. Our aims are to: (1) experimentally determine the chemical diversity encoded by diverse members of the TTS superfamily selected based on our initial CATH-FunFam classification; (2) expand the sequence data for the CATH TTS superfamily and integrate sequence- and structure-based computational approaches to refine our strategies for identifying TTS features implicated in determination of product specificity and for functional classification, and test TTS function predictions; (3) exploit a novel machine learning approach to predict known and novel TTSs; (4) understand TTS function and diversification by determining the product specificities of natural and engineered TTS variants, guided by computational predictions from (1)-(3).

Funded Value:

£307,853

Funded Period:

Jan 22 - Jun 25

Funder:

BBSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

BB/V014722/1

Principal Investigator:

Christine Orengo

Research Subject:

Bioengineering (40%)

Biomolecules & biochemistry (40%)

Plant & crop science (20%)

Research Topic:

Biochemistry & physiology (20%)

Catalysis & enzymology (20%)

Metabolic engineering (20%)

Novel industrial products (20%)

Plant responses to environment (20%)

Organisations

People	ORCID iD
Christine Orengo (Principal Investigator)

Publications

Author Name

Title Publication Date Published

10 25 50

Bordin N (2023) AlphaFold2 reveals commonalities and novelties in protein structure space for 21 model organisms. in Communications biology

Goldtzvik Y (2023) Protein diversification through post-translational modifications, alternative splicing, and gene duplication in Current Opinion in Structural Biology

Nallapareddy V (2023) CATHe: detection of remote homologues for CATH superfamilies using embeddings from protein language models. in Bioinformatics (Oxford, England)

Yin J (2025) Understanding the structural and functional diversity of ATP-PPases using protein domains and functional families in the CATH database in Structure

Key Findings
Research Databases and Models
Research Tools and Methods
Collaboration
Engagement Activities


Description	We have identified the differentially conserved residues proteins having different Triterpene synthase products. This has helped identify a set of mutations to convert a cycloartenol producing enzyme to a cucurbitadienol producing enzyme, which will be experimentally tested by Osbourne group. We have also delved into various pocket characteristics such as localized electric effect, flexibility, hydrophobicity, side chain interaction parameters, no of hydrogen bonds etc. to identify how the properties of pockets vary between the product types. We have also used APBS to calculate the electrostatic potential of the pockets to identify the differences in the electrical potentials of the binding pockets based on product type. We used various amino acid features from AAIndex (https://www.genome.jp/aaindex/) such as localized electric effect, flexibility, hydrophobicity, side chain interaction parameter, no of hydrogen bond doner etc to characterize the ligand binding pocket of the Triterpene synthase (TTS) proteins. We also characterized the binding pocket based on the electrostatic potential as calculated by solving posisson-boltzmann equations using the APBS server (https://server.poissonboltzmann.org). We noticed that the distribution of these properties varied based on the product type. We have recently developed a workflow to identify how these physico-chemical characteristics of the pockets varied based on the product type. We have completed protein-ligand molecular dynamics' simulations of a cycloartenol producing enzyme, mutated enzyme (as predicted earlier to convert the protein to a cucurbitadienol producing enzyme) and cucurbitadienol producing enzyme. These proteins were simulated with the substrate 2,3-oxidosqualene, product cycloartenol/cucurbitadienol and intermediates of the reaction pathway from the substrate to the product. We are currently analyzing these molecular dynamics trajectories to identify the conformations of the ligands, interactions between the proteins and ligands, root mean square fluctuations of the binding site residues and ligands etc. These will help identify differences in interaction profiles based on the product types and how the protein might induce different substrate conformations leading to generation of different products.
Exploitation Route	It will enable other plant biologists to predict the product type for a novel plant TTS enzyme. The models of plant proteins can be used for various studies of TTS such as identifying structural diversity in them either globally or in the ligand binding site. This can be further utilized to identify how the physical properties of the ligand binding site varies in the different plant TTS. This can have further implications in predicting the product type based on the difference in structure and physico-chemical properties of the binding pocket. The ESM embeddings can be used to identify remote homologues of the TTS sequences in metagenomes. Also, these embeddings can be used to predict the product type for TTS with unknown function.
Sectors	Agriculture Food and Drink Chemicals Manufacturing including Industrial Biotechology


Title	PocketFeatures:Workflow for characterizing pocket features based on TTS product type
Description	We used various amino acid features from AAIndex (https://www.genome.jp/aaindex/) such as localized electric effect, flexibility, hydrophobicity, side chain interaction parameter, no of hydrogen bond doner etc to characterize the ligand binding pocket of the Triterpene synthase (TTS) proteins. We also characterized the binding pocket based on the electrostatic potential as calculated by solving posisson-boltzmann equations using the APBS server (https://server.poissonboltzmann.org). We noticed that the distribution of these properties varied based on the product type. We then developed a workflow to identify how these physico-chemical characteristics of the pockets varied based on the product type.
Type Of Material	Improvements to research infrastructure
Year Produced	2024
Provided To Others?	No
Impact	The workflow of the characterization of the pocket can be used to predict the product of TTS proteins without known product types, based on similarity with the known product groups of the TTS proteins.


Title	Select-Sites: TTS-Computational-Analysis-Workflow
Description	We designed a computational workflow (Select-Sites) that comprises the following steps: We first identified likely function determining residues using our in-house FunTuner method. We then calculate characteristics of the functional pocket using the in-house PocketFeatures method. Subsequently, for each of the protosteryl derived product type (cycloartenol, cucurbitadienol and lanosterol) we compared the physicochemical properties of the amino acids that line the binding pocket and also the predicted function determining (FD) residues. For the comparisons we further split the binding pocket based on proximity to each of the rings of the product. For the analysis we used AAindex (https://www.genome.jp/aaindex/), which contain a numeric value for various physicochemical and biochemical properties of the amino acids. When comparing cycloartenol and cucurbitadienol-producing enzymes, we found difference in the properties of the residues which could contribute to differences in product specificity. The cycloartenol had more hydrophobic FD residues than the cucurbitadienol, affecting electrostatics in the pocket. In particular, analysis of the binding pocket showed that the amino acids were more hydrophobic around rings 2 and 3 for cycloartenol producing enzymes. Furthermore, the flexibility of the amino acids around all the rings of the cycloartenol producing enzymes was higher than the cucurbitadienol. Differences in these properties might contribute to the stabilization of various intermediates and quenching of the different carbocations leading to different products. Additionally, 300ns triplicate molecular dynamics simulations showed that in cucurbitadienol-producing enzymes, water molecules were positioned closer to the carbon atom (responsible for cucurbitadienol production) as compared to cycloartenol-producing enzymes. This might aid in abstraction of hydrogen from the carbon producing the double bond in cucurbitadienol and hence influencing product specificity. Comparing lanosterol and cycloartenol producing enzymes we noticed that lanosterol-producing enzymes had smaller amino acids, increasing the size of the binding pocket. Molecular dynamics simulations showed that these differences in volume allowed more water molecules in the pocket close to the carbocation and the residues involved in its quenching, leading to the production of lanosterol. Furthermore, we used APBS (https://www.poissonboltzmann.org) to visualise the electrostatic surface of the binding pockets of these enzymes. We observed that the pockets of lanosterol and cucurbitadienol-producing enzymes were more electronegative compared to cycloartenol-producing ones, potentially supporting a different quenching mechanism of the reaction. We are now extending this analysis to other triterpene products, such as beta-amyrin, lupeol, and friedelin, aiming to improve yield and specificity.
Type Of Material	Improvements to research infrastructure
Year Produced	2024
Provided To Others?	No
Impact	The workflow is part of a protocol that detects characterises differences in amino acid residues and pocket features between enzymes associated with different product specificities. It enabled detection of key specificity determining residues which could be mutated to switch product type. The protocol has been validated experimentally for a swich involving cycloartenol producing enzyme to a cucurbitadienol producing enzyme.


Title	Plant TTS models and embeddings
Description	Plant Triterpene Synthases (TTS) sequences were identified from annotated genomes/plant repositories and unannotated genomes based on HMM scans of the known TTS sequences. These were further clustered at 99% sequence identity to remove isoforms and sequences with a length cutoff of 650-850 amino acids were selected. This led to 21323 sequences which were modelled using ColabFold (based on AlphaFold2). We also calculated the ESM sequence embeddings of the 175 plant TTS sequences with known products.
Type Of Material	Database/Collection of data
Year Produced	2024
Provided To Others?	No
Impact	The models of plant proteins can be used for various studies of TTS such as identifying structural diversity in them either globally or in the ligand binding site. This can be further utilized to identify how the physical properties of the ligand binding site varies in the different plant TTS. This can have further implications in predicting the product type based on the difference in structure and physico-chemical properties of the binding pocket. The ESM embeddings can be used to identify remote homologues of the TTS sequences in metagenomes. Also, these embeddings can be used to predict the product type for TTS with unknown function. A manuscript is in preparation and the dataset will be supplied together with the manuscript.


Title	Understanding structural and functional diversity of ATP-PPases using protein domains and functional families in CATH database
Description	The dataset of AF2-predicted HUP domains with overall pLDDT > 90, culled at 90% identity.
Type Of Material	Database/Collection of data
Year Produced	2023
Provided To Others?	Yes
Impact	We designed a protocol to analyse AlphaFold2 domains to understand functional diversity of protein superfamily called ATP-PPases. The computational protocol designed in this study will be used to analyse other important super families using data obtained from TED analyses.
URL	https://zenodo.org/record/8346481


Description	Collaborators on TTS project: Anne Osbourn (and Janet Thornton)
Organisation	John Innes Centre
Country	United Kingdom
Sector	Academic/University
PI Contribution	ased on differential conservation using a tool called GroupSim (cycloartenol to cucurbitadienol, cycloartenol to lanosterol, beta-amyrin to lupeol and beta-amyrin to friedelin) We showed how the physico-chemical properties of the amino acids lining the pockets varied based on the product produced eg. lanosterol producing enzymes had smaller amino acids compared to the cycloartentol producing ones. We showed that the pockets of cycloartenol producing enzymes were more electropositive compared to cucurbitadienol and lanosterol producing ones. We followed this up with molecular dynamics simulations to show how different intermidates to the products would be stabilized depending on the enzyme type. We also looked at sequence conservation for enzymes producing one product (monofunctional) versus multiple products (multifunctional). We observed that monofunctional enzymes were more conserved than multifunctional enzymes.
Collaborator Contribution	Janet's group validated the predictions of functional residues predicted by GroupSim using WebLogo. Janet's group explored the molecular dynamics simulation trajectories in details as to how water molecules might help in the final product formation. Janet's group also characterized the flexibility of the binding pocket and observed that multifunctional enzymes have higher flexibility in the binding pocket compared to monofunctional enzymes. Anne's group carried out the experimental characterization of the mutations suggested and showed which residue mutations in single or in groups could completely switch the enzyme from one product to another
Impact	The publication is under process. The work will be presented at the upcoming conference in Barcelona (3D-SIG 2025)
Start Year	2022


Description	A talk at ISMB/ECCB 2023
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Other audiences
Results and Impact	NA
Year(s) Of Engagement Activity	2023


Description	ELIXIR-3D BioInfo Community Webinar series
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Other audiences
Results and Impact	Co-organizer of ELIXIR-3D BioInfo Community Webinar, Steering Committee of ELIXIR-3D-Bioinfo meeting at Barcelona
Year(s) Of Engagement Activity	2025

Abstract

Technical Summary

Organisations

People

ORCID iD

Publications