Unlocking the chemical potential of plants: Predicting function from DNA sequence for complex enzyme superfamilies

Lead Research Organisation: John Innes Centre
Department Name: Metabolic Biology

Abstract

Plants are a rich source of drugs and other useful molecules. Examples include the anticancer drugs taxol (from yew trees) and vinblastine and vincristine (from Madagascan periwinkle); the antimalaria compound artemisinin from wormwood; the sweetener stevioside from sweetleaf; and flavours and fragrances such as menthol and limonene, from mint and citrus, respectively. While the biosynthetic pathways for ~50 plant natural products have been so far characterised, plants are known to make hundreds of thousands of diverse chemicals for which the biosynthetic pathways are unknown. Based on our knowledge of the overall classes of enzymes that we associate with plant natural product biosynthesis it has become clear from studying the sequences of plant genomes that plants have the potential to encode far more chemical diversity than has previously been appreciated. However, although we can identify genes in genome that 'look guilty' because they are predicted to encode enzymes belonging to certain major enzyme classes, this does not tell us exactly what specific chemical transformations these individual enzymes carry out. This project brings expertise in plant natural product pathway discovery and elucidation together with powerful computational approaches to tackle the challenges of decoding the information hidden in plant genomes, deducing the relationship between the structure and function of large enzyme superfamilies, and understanding mechanisms of metabolic diversification in the Plant Kingdom. To achieve this, we will focus on a major class of plant natural products known as the triterpenes, since they are one of the largest and most structurally diverse classes of plant natural products with many health, agronomic and industrial applications.

Technical Summary

Our strategy is to integrate powerful data-driven computational approaches with experimental investigation of enzyme function to understand the functions and kingdom-specific expansion of an exemplar complex enzyme superfamily - the triterpene synthases (TTSs). The TTS enzyme superfamily is an ideal test case for our purposes, since these enzymes are able to generate an enormous diversity of cyclized triterpene scaffolds from a single common precursor molecule. Through iterative cycles of computational and experimental investigations we aim to develop sophisticated predictive analytic approaches that will enable us to relate DNA sequence to enzyme function with ever-increasing power and resolution, and in so doing to generate and test hypotheses about enzyme function, mechanisms and evolution. Our aims are to: (1) experimentally determine the chemical diversity encoded by diverse members of the TTS superfamily selected based on our initial CATH-FunFam classification; (2) expand the sequence data for the CATH TTS superfamily and integrate sequence- and structure-based computational approaches to refine our strategies for identifying TTS features implicated in determination of product specificity and for functional classification, and test TTS function predictions; (3) exploit a novel machine learning approach to predict known and novel TTSs; (4) understand TTS function and diversification by determining the product specificities of natural and engineered TTS variants, guided by computational predictions from (1)-(3).
 
Description Plants are a rich source of drugs and other useful molecules. Examples include the anticancer drugs taxol (from yew trees) and vinblastine and vincristine (from Madagascan periwinkle); the antimalaria compound artemisinin from wormwood; the sweetener stevioside from sweetleaf; and flavours and fragrances such as menthol and limonene, from mint and citrus, respectively. While the biosynthetic pathways for ~50 plant natural products have been so far characterised, plants are known to make hundreds of thousands of diverse chemicals for which the biosynthetic pathways are unknown. Based on our knowledge of the overall classes of enzymes that we associate with plant natural product biosynthesis it has become clear from studying the sequences of plant genomes that plants have the potential to encode far more chemical diversity than has previously been appreciated. However, although we can identify genes in genome that 'look guilty' because they are predicted to encode enzymes belonging to certain major enzyme classes, this does not tell us exactly what specific chemical transformations these individual enzymes carry out. This project brings expertise in plant natural product pathway discovery and elucidation together with powerful computational approaches to tackle the challenges of decoding the information hidden in plant genomes, deducing the relationship between the structure and function of large enzyme superfamilies, and understanding mechanisms of metabolic diversification in the Plant Kingdom. To achieve this, we are focussing on a major class of plant natural products known as the triterpenes, since they are one of the largest and most structurally diverse classes of plant natural products with many health, agronomic and industrial applications.

The first step in triterpene biosynthesis involves the biosynthesis of complex hydrocarbon triterpene scaffolds by enzymes known as triterpene synthases (TTSs). Our strategy is to integrate powerful data-driven computational approaches with experimental investigation of enzyme function to understand the functions and kingdom-specific expansion of the TTS enzyme superfamily within the Plant Kingdom. The TTS enzyme superfamily is an ideal test case for our purposes, since these enzymes are able to generate an enormous diversity of cyclized triterpene scaffolds from a single common precursor molecule. Through iterative cycles of computational and experimental investigations we are developing sophisticated predictive analytic approaches that will enable us to relate DNA sequence to enzyme function with ever-increasing power and resolution, and in so doing to generate and test hypotheses about enzyme function, mechanisms and evolution. The project aims are to: (1) experimentally determine the chemical diversity encoded by diverse members of the TTS superfamily selected based on our initial CATH-FunFam classification; (2) expand the sequence data for the CATH TTS superfamily and integrate sequence- and structure-based computational approaches to refine our strategies for identifying TTS features implicated in determination of product specificity and for functional classification, and test TTS function predictions; (3) exploit a novel machine learning approach to predict known and novel TTSs; (4) understand TTS function and diversification by determining the product specificities of natural and engineered TTS variants, guided by computational predictions from (1)-(3).

Since the start of this project we have carried out in-depth analysis of characterised TTS sequences, structures and functions. We previously observed that the binding site is highly conserved for a given product. From structure-based sequence alignments, we have proposed specificity determining positions (SDPs) in the binding pocket for committed TTSs. Based on this we have proposed a set of mutations to change TTS function (from cycloartenol to cucurbitadienol) and have designed a series of expression constructs to enable these mutant variants to be taken forward for functional analysis. These will be tested experimentally within the next few months. We have further built a computational workflow for high throughput systematic mining for TTS genes from across the Plant Kingdom, including those that are unannotated. Based on the binding site of committed and multifunctional TTSs, we have developed a scoring matrix and protocol to predict likely products for each sequence.

We have also performed flexibility analysis of committed and multifunctional TTSs using the pLDDT score of their AlphaFold models. Further, to understand the specificity of substrate configuration in the TTS binding site, that leads to different product types, we have performed a test Molecular Dynamics (MD) simulation of human lanosterol synthase with substrate 2,3-oxidosqualene (for which we have an experimental structure available with product lanosterol). To interpret the simulation, we have developed a method to map the substrate conformations generated from MD simulations onto different substrate configurations, to identify and annotate the conformation by 'fold type'. Now, we are performing the simulations on plant TTSs to understand the factors governing this fold specificity. We have also delved into various pocket characteristics such as localized electric effect, flexibility, hydrophobicity, side chain interaction parameters, no of hydrogen bonds etc. to identify how the properties of pockets vary between the product types. We have also used APBS to calculate the electrostatic potential of the pockets to identify the differences in the electrical potentials of the binding pockets based on product type.

We have completed protein-ligand molecular dynamics simulations of a cycloartenol-producing enzyme, mutated enzyme (as predicted earlier to convert the protein to a cucurbitadienol producing enzyme) and cucurbitadienol producing enzyme. These proteins were simulated with the substrate 2,3-oxidosqualene, product cycloartenol/cucurbitadienol and intermediates of the reaction pathway from the substrate to the product. We are currently analyzing these molecular dynamics trajectories to identify the conformations of the ligands, interactions between the proteins and ligands, root mean square fluctuations of the binding site residues and ligands etc. These will help identify differences in interaction profiles based on the product types and how the protein might induce different substrate conformations leading to generation of different products.
Exploitation Route This project will constitute a major step towards understanding the function and diversification of enzymes of a major plant enzyme superfamily. It will provide a framework for understanding the natural roles of triterpenes in plants and for harnessing plant chemicals and enzymes for medicine, agriculture and industrial biotechnology applications. It will further reveal how the ability to synthesise different types of triterpenes has arisen in different plant lineages suggestive of adaptation to environmental niches, opening up opportunities to study their ecological functions. Demonstrating the use of our approaches with the TTS superfamily will provide a foundation to adapt and apply these strategies to other enzyme superfamilies. It will further help to equip the UK and international research communities to unlock the deluge of plant genome sequence data emerging from major sequencing initiatives such as the Earth BioGenome and Darwin Tree of Life Projects and to create a more sustainable environment by exploiting plant enzymes for production of designer molecules for diverse applications.

The tools and resources that we generate will be made available to the wider research community through public data resources, institutional websites and the non-profit repository, Addgene (plasmid DNA). The PI already has established collaborations with industry. We will work to expand our interactions with industry through targeted approaches, networking and meetings (e.g. NIBBS meetings, EMBL-EBI's Industry Programme quarterly meetings) to ensure that any potential commercial opportunities arising from our fundamental programme of research will benefit from early engagement with interested stakeholders, who will bring expertise in idea screening, concept development and testing, marketing strategy development, business analysis and downstream routes to commercialization.
Sectors Agriculture

Food and Drink

Chemicals

Education

Manufacturing

including Industrial Biotechology

Pharmaceuticals and Medical Biotechnology

 
Description The Norwich-based PDRA on this project contributed to designing and delivering an interactive science education stand for the general public for the Norwich Science Festival, on the theme of 'Making Molecules'.
First Year Of Impact 2024
Sector Education
Impact Types Cultural

Societal

 
Description Attended Roundtable meeting to discuss how Government might further help in the UK's engineering biology sector.
Geographic Reach National 
Policy Influence Type Contribution to a national consultation/review
 
Description Visit by Minister of State in the Department for Science, Innovation and Technology
Geographic Reach National 
Policy Influence Type Participation in a guidance/advisory committee
 
Description 21EBTA Engineering specialised metabolism and new cellular architectures in plants
Amount £1,517,514 (GBP)
Funding ID BB/W014173/1 
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 01/2022 
End 01/2024
 
Description Harnessing plant metabolic diversity for human health
Amount £4,817,214 (GBP)
Funding ID 227375/Z/23/Z 
Organisation Wellcome Trust 
Sector Charity/Non Profit
Country United Kingdom
Start 01/2024 
End 12/2031
 
Description ProtFunAI 
Organisation Technical University of Munich
Country Germany 
Sector Academic/University 
PI Contribution Development of deep learning algorithms for protein function prediction, protein classification and analysis
Collaborator Contribution Training in deep learning protocols and protein language models. Contributions to project design. Novel protein language models to generate protein embeddings for protein function prediction and other protein based prediction tasks.
Impact Project has just started so no outputs yet
Start Year 2024
 
Description "A million shades of green: Harnessing plant metabolic diversity for therapeutic applications" 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact SCI Engineering Biology Conference, theme of translation out of academia into industry.
Year(s) Of Engagement Activity 2024
 
Description Anne Osbourn meeting with George Freeman MP during visit to Norwich Research Park 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Professional Practitioners
Results and Impact Anne Osbourn met George Freeman MP (Minister of State in the Department for Science, Innovation and Technology) as one of the NRP Entrepreneurial Researchers.
Professor Anne Osbourn, Founder of Hothouse Therapeutics, spoke about Unlocking Nature's Inaccessible Chemistry.
Year(s) Of Engagement Activity 2023
 
Description Cracking Natures Code, Engineering Biology (House Magazine) 
Form Of Engagement Activity A magazine, newsletter or online publication
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Policymakers/politicians
Results and Impact Content in magazine following an interview, which is targeted at the Houses of Parliament. Article Title: "Cracking Natures Code"
Year(s) Of Engagement Activity 2024
 
Description Magazine Interview 
Form Of Engagement Activity A magazine, newsletter or online publication
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Policymakers/politicians
Results and Impact Interview with Journalist writing a piece about the Department for science, innovation and technology's vision for engineering biology, for House Magazine. Focusing on the vision and the government's aspirations for the field. Particular interest in way work is becoming increasingly important to medicine.
Year(s) Of Engagement Activity 2024
 
Description Norwich Science Festival 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact Norwich Science Festival; interacting with the general public, Making Molecules
Year(s) Of Engagement Activity 2024
 
Description Norwich Science Festival satellite event at Diss Corn Hall 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact We took an activity stand to a science discovery day at Diss Corn Hall. This event was set up as a satellite venue for the very popular Norwich Science Festival to try and reach a broader audience. There were 3 workshop sessions throughout the day for 50 children per workshop and their families, all of which were fully booked! We took a stand that focused on the instructions held within DNA to 'make stuff' which was explained by inviting people to engage with our robot DNA Dave, pushing buttons and turning cogs to complete transcription and translation to make new products. We used examples from plants that people would be familiar with such as menthol, limonoids, vanillin and anthocyanins and then invited children to extract anthocyanins from red cabbage to use to make colour-changing paint. Many of the parents were amazed how easy the process was and were keen to build on the experiment at home with their children to make a colour palette of paints using pigments from plants and acids and bases.
Year(s) Of Engagement Activity 2023
 
Description Novozymes Prize Lecture - From plant defence to therapeutics: The metabolic poetry of science 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Professional Practitioners
Results and Impact From plant defence to therapeutics: The metabolic poetry of science - Anne Osbourn presented the 2023 Novozymes Prize Lecture at John Innes Centre on Monday 12 June 2023
Year(s) Of Engagement Activity 2023
 
Description Presentation - "A million shades of green: understanding and harnessing plant metabolic diversity" 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact "Using Bioinformatics to Guide Engineering Strategies in Crops" Masterclass (held Virtually). Audience mix of Research Scientists, plant breeders and agronomists (including some industry partners) from all over the world.
Year(s) Of Engagement Activity 2024
 
Description School Visit (Cambourne) - SAW Trust 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Schools
Results and Impact 2 day hands-on SAW workshop for year 6 children at a Primary school in Cambourne. Aim to inspire and inform school children - there were 4 science activity stations followed by a creative writing activity and art activity.
Year(s) Of Engagement Activity 2023
 
Description Seminar - "Harnessing plant metabolic diversity for food and health applications" 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Seminar given at VIB-UGent Center for Plant Systems Biology - Title: "Harnessing plant metabolic diversity for food and health applications" on 22nd February 2024 as part of a wider visit to the Center.
Year(s) Of Engagement Activity 2024
 
Description Seminar: "Finding drugs in the garden: Harnessing plant metabolic diversity" in the @IPS2ParisSaclay amphitheater on Tuesday 22nd November at 2 pm 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Professor Anne Osbourn gave a seminar about "Finding drugs in the garden: Harnessing plant metabolic diversity" in the @IPS2ParisSaclay amphitheater on Tuesday 22nd November at 2 pm
Year(s) Of Engagement Activity 2023