What determines protein abundance in plants?

Lead Research Organisation: Rothamsted Research
Department Name: Plant Sciences and the Bioeconomy

Abstract

Proteins are the workhorses of the cell: they facilitate chemical reactions, act as gene switches and have structural roles. For cells to work efficiently, proteins need to be produced in the right place, at the right time and in the right amount. They also need to be removed when no longer needed. Crick's Central Dogma states that coding sequences of DNA are transcribed into mRNAs, which in turn are translated into proteins. There are many levels at which this process is regulated and there are still many gaps in our knowledge. We expect both inherited and environmental differences between individuals to play important roles in the control of proteins.

This project seeks to use the model plant, Arabidopsis thaliana, to answer fundamental questions about the control of protein expression, including which mechanisms are important and how they interact in a complex multi-cellular organism. We also aim to determine to what extent the protein content of a given cell, tissue or organ predicts observable traits (the phenotype) of the plant. To address these questions, we have designed an integrated programme of experiments and sophisticated mathematical analysis around a genetically variable population of Arabidopsis (known as the MAGIC population). This is a powerful genetic resource for mapping sections of DNA that correlate with variation in a trait (known as quantitative trait loci, QTL), to identify causal variants and dissect the regulation of genome expression. We will characterise and compare the following different processes that potentially influence protein expression in the MAGIC lines:

1. Structural variation within the genome (including small-scale variation and large-scale structural rearrangements)
2. Chromatin accessibility, a measure of the availability of a given region of DNA for transcription.
3. Chemical modifications to DNA that do not involve a change in DNA sequence, known as epigenetic marks, which often indicate environmental perturbation.
3. mRNA abundance.
4. Protein abundance.

It is important to take an holistic approach, because the amount of any given protein in an individual is determined by the balance of these processes. Much effort has been spent studying gene transcription, because it is relatively easy to measure on a genome-wide scale. However, evidence suggests that transcription is a poor predictor of protein abundance, because the control of translation and protein degradation are important, particularly in plants. Less research has been done on measuring translation, protein amount and protein breakdown but advances in technology now let us do so. Although it is relatively straightforward to measure genomic structural variation and epigenetic marks such as DNA methylation, their impact on protein expression is unclear.

Therefore, we are in an exciting position to provide enormous insight into protein regulation. The power of this project derives from innovative computational analysis that will enable us to apportion the relative contributions of genotype, transcription, protein synthesis and protein degradation and identify networks controlling protein expression. Because collecting genome-scale data from many samples is expensive and time-consuming, we will use novel statistical methods to get more information without significantly increasing sample size, including combining different layers of information. This will be the first study of this kind on this scale. As well as depositing our data in public repositories, our findings will be made available to the academic community via a user-friendly knowledge discovery and gene mining resource. The approaches developed in this project will provide valuable fundamental insights that will be applicable to other organisms and which will also pave the way to future crop improvement.

Technical Summary

This project aims to understand how protein abundance is controlled in plants and to determine the phenotypic consequences of proteomic variation, together with genotypic, structural, epigenotypic and transcriptomic variation. We propose an integrated programme of quantitative trait loci (QTL) analysis of an Arabidopsis multiparental advanced generation intercross (MAGIC) population. Firstly, we will determine all variation in the 19 MAGIC founders, and interactions between different 'omic layers, via a comprehensive set of assays. Long-read sequencing of 18 founders' genomes will be performed for comparison of structural variation relative to the 19th founder, the Col-0 reference. We will measure epigenetic marks of cytosine DNA methylation and chromatin accessibility by ATAC-seq. Transcript abundance and regulatory RNA species will be analysed by RNA-seq and protein translation and abundance quantified by Ribo-seq and proteomics, respectively. Next, a holistic experimental and computational analysis of 400 Arabidopsis MAGIC RILs (recombinant inbred lines) will be used to understand the regulatory networks controlling protein expression and dissect the relative contributions of genotype (including small-scale variation and large-scale structural rearrangements), epigenotype, RNA transcription, protein synthesis and protein degradation. We will use statistical and machine learning (ML) approaches to construct different types of molecular networks and identify causal mediators. Co-expression analysis will also identify novel physical complexes and sets of proteins that participate in common processes. Selected networks and complexes will be tested experimentally. Whole plant phenotyping of the MAGIC lines will be performed and used together with the molecular data to interrogate the predictive ability of different 'omic layers across a range of phenotypes. Finally, data and knowledge generated will be shared with the community through a user-friendly web resource.

Planned Impact

The project's immediate beneficiary will be the academic community. The project will deliver academic impact through fundamental discoveries, big data, novel methodology and trained personnel. The knowledge base and technology developed will benefit projects across the range of BBSRC's strategic priorities, especially 'agriculture and food security', 'bioenergy and industrial biotechnology' and "exploiting new ways of working". "Application of computational and mathematical techniques to high-quality, quantitative biological data" is at the heart of the project. Although this proposal employs the model plant, Arabidopsis thaliana, knowledge and techniques we develop apply to other plants and indeed to other organisms. Even in models such as Arabidopsis, much of the protein coding genome is not characterised functionally; we anticipate that this project will play an important role in assigning new gene functions. Academic beneficiaries not only include plant scientists and researchers with an interest in proteostasis but also synthetic biologists via provision of a knowledge base for selective manipulation of proteins and pathways. The project will provide new concepts and rich data sets for the genetics, epigenetics, proteomics and plant science communities, which will be made available through public repositories and peer-reviewed journals. In addition, a key goal is to make a comprehensive, integrated and interoperable data resource accessible to the community in a timely fashion. Therefore, we will build a knowledge graph for gene mining and knowledge discovery in Arabidopsis that can be accessed via a user-friendly webapp (KnetMiner) or programmatic access (RDF and Neo4j servers) and integrate this with published Arabidopsis data. A collateral benefit is the generation of high-quality proteogenomic data. Information about structural variation, alternative splicing, alternative start sites and novel proteoforms will improve genome annotation and inform the development of databases for searching peptide mass spectrometric data. Thus, we anticipate that the work will drive innovations in bioinformatics. Arabidopsis has been important in establishing statistical genetics methodology and we will develop novel predictive methodology integrating multi-'omic data, using information about both predicted and validated molecular interactions across 'omic layers. These analyses will inform future translational research.

This collaborative project combines a unique skills base, providing a framework within which early career researchers can be cross-trained trained in a range of key laboratory and data science skills. Such trained researchers will be of benefit to the academic and industrial sectors. Skills and new methodology will also be shared through running workshops and training courses.

Longer term, the project has potential to contribute towards wealth creation and deliver environmental benefits, for example through plant breeding. In addition to identifying candidate genes, pathways and regulatory networks for manipulation in crop species, the cross-platform QTL analysis in Arabidopsis leads the way to comparable approaches in crops that will feed directly into breeding pipelines and establish the UK bioscience community as a leader in translating genomics into crop improvement solutions. This will be expedited by the existence of MAGIC populations for a number of important crop species (rice, wheat, chickpea), several of which have been co-developed by RM. The annual turnover from UK plant breeding is estimated to be in the region of £200 to £230 million ("The UK Plant Breeding Sector and Innovation" The Intellectual Property Office, 2016), and the worldwide value of plant breeding is far greater, particularly when considering the indirect benefits of yield and quality improvements and increased resilience to changing climatic conditions.

Publications

10 25 50
 
Description The key plant resource that underpins this project (the 19 Arabidopsis MAGIC founder lines) has been validated by genome re-sequencing using "short-read" technology. We have also bulked up seed stocks for the 400 MAGIC lines derived from the founders and carried out trials to design a staggered planting scheme that takes account of the physical differences between these lines. This will ensure robust data can be obtained later in the project.

We have optimised growth conditions that enable us to collect high-quality plant material such that it can be used for different tests (measuring DNA methylation, RNA abundance, and protein abundance from the same samples). We have also designed and built a low-cost "phenotyping platform" within our plant growth rooms - apparatus that allows us to monitor and measure plant growth and health in real time, without entering the growth chamber.

We have sequenced the genomes of 17 of the Arabidopsis MAGIC founder lines using modern "long-read" technologies, which gives unprecedented insight into the differences between genomes of Arabidopsis plants from different geographical areas and how they have evolved to adapt to local conditions. A large amount of work has been done to obtain and annotate very high-quality genome assemblies. This information is also crucial to help us to interpret all the other data that we plan to collect in the project.

We have done a lot of work to develop or refine methods to measure the abundance of as many proteins as possible (the "proteome") and to measure proteins that are actively being synthesised (the "translatome"). It was important to make sure that the methods are highly reproducible and suitable for large numbers of samples.

We have conducted a large trial experiment using all the assays that we have developed to compare the two most divergent MAGIC founders. The data that we have obtained are very interesting in their own right, but the experiment has also helped us to design a much larger one for all 19 founders.

The 19 founders have now been grown under carefully controlled conditions and tissue harvested for multiple molecular analyses. Measurement of messenger RNA, RNA actively being translated into proteins, and protein abundance is currently in progress. Methods to measure the accessibility of DNA for transcription into RNA (ATAC-seq) are currently being optimised. In addition, via a leverage funded project and secondment, we have obtained data for abundance of transfer RNAs (tRNAs) . These molecules facilitate the decoding of mRNA through delivery of amino acids to the protein synthesis machinery and add an extra layer of information to the project.
Exploitation Route We now have validated stocks of Arabidopsis seeds that may be of use to other researchers.
The genome assemblies will be of great use to other plant researchers, and we anticipate wide re-use once we have published these. We have shared these data sets pre-publication with selected collaborators which is already providing new biological insights.
The different assays will help us to understand the "rules of life" that control protein abundance (and other traits). This information is very widely applicable and useful to all biologists. Longer-term, the information could help to inform crop improvement by breeding or engineering.
We will publish details of the phenotyping platform that can be used by other researchers to build their own apparatus. We will also share our data so that others can re-use it.
Sectors Agriculture, Food and Drink

 
Description 21ROMITIGATIONFUND Rothamsted
Amount £924,000 (GBP)
Funding ID BB/W510543/1 
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 09/2021 
End 03/2022
 
Description FTMA4: Training and mobility to support multi-omic analysis
Amount £27,392 (GBP)
Funding ID BB/X017877/1 
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 01/2023 
End 03/2023
 
Description High performance mass spectrometry: applications for the Cambridge biological sciences community
Amount £295,395 (GBP)
Funding ID BB/W019620/1 
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 08/2022 
End 08/2023
 
Description LC-MS system for proteomics
Amount £2,082,587 (GBP)
Funding ID IGP22-035 
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 10/2022 
End 09/2024
 
Description Provision of a replacement ultracentrifuge and rotors for fundamental biochemical studies at Rothamsted
Amount £144,284 (GBP)
Funding ID IGP22-015 
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 04/2022 
End 03/2023
 
Title Measurement of tRNA abundance in plants 
Description Transfer RNAs (tRNA) facilitate the decoding of mRNA through delivery of amino acids to the ribosome during protein synthesis. The regulation of tRNA abundance has emerged as a mechanism that shapes gene expression during stress response and recovery in yeast and animals, in addition to playing regulatory roles beyond protein synthesis as non-coding RNAs. Modification-induced misincorporation tRNA sequencing (mim-tRNAseq) is a novel method for the quantification of tRNAs that has been developed for yeast and animals. Adapting this technique for plants is complicated by the presence of chloroplast-specific tRNAs. In collaboration with Dr Dany Nedialkova (Max Planck Insitute for Biochemistry, Martinsried, Germany), we have developed both a lab protocol for plants as well as a bioinformatic pipeline to accommodate plastid tRNAs. This work was supported through leveraged funding (ODA mitigation fund- see "Further Funding"). 
Type Of Material Technology assay or reagent 
Year Produced 2022 
Provided To Others? No  
Impact Development of this method has enabled us to add a further 'omic layer to our datasets for the 19 MAGIC founders. This paves the way to understanding the relationship between tRNA abundance and protein expression. Upon publication, this method will be available to the research community at large and will be an important tool for plant protein translation research. 
 
Title Raspberry-pi plant phenotyping 
Description A low-cost, plant phenotyping platform for use in plant growth chambers has been developed. This comprises internet-connected raspberry-pi cameras and bespoke scripts that enable real-time, remote imaging of plants. Images are automatically downloaded and analysed using the PlantCV package. 
Type Of Material Improvements to research infrastructure 
Year Produced 2021 
Provided To Others? No  
Impact The platform is being used in a spin-off project, supported by the ODA mitigation fund (see Further Funding). Here, the phenotyping platform is being used to quantify the responses of the Arabidopsis MAGIC population founders to hypoxia/submergence-recovery stress. Ultimately this will enable screening of the MAGIC lines for this trait to obtain mechanistic understanding. This adds considerable value to the large multi-omic data sets that will be generated in this award. 
 
Description Arabidiopsis 1001 Genomes 
Organisation Salk Institute for Biological Studies
Country United States 
Sector Charity/Non Profit 
PI Contribution We have collaborated with members of the 1001 Arabidopsis genomes consortium. We have assembled all the genomes sequenced by the initiative using the assembly pipeline we developed for our project. Recently we have completed long read sequencing of 15 arabidopsis genomes and assembled them. We are in the process of dicussions to integrate these assemblies with those produced by other partners in the 1001 Arabidopsis genomes effort
Collaborator Contribution Our partners have sequenced over 1000 Arabidopsis genomes with short reads and several hundred with long reads
Impact Short read data have been publicly released. Long read data will be released once publication has occurred.
Start Year 2012
 
Description Measuring tRNA abundance in plants 
Organisation Max Planck Society
Department Max Planck Institute of Biochemistry
Country Germany 
Sector Academic/University 
PI Contribution We posed a biological problem and sent a PDRA on secondment to the collaborator's lab.
Collaborator Contribution Provision of knowhow to enable adoption of their methods for measurement of tRNA abundance to plants. This includes both experimental and bioinformatic workflows.
Impact The data generated will eventually be published in peer-reviewed journals.
Start Year 2022
 
Description Understanding the consequences of genetic variation on protein function 
Organisation University College London
Country United Kingdom 
Sector Academic/University 
PI Contribution We shared genome assemblies for the Arabidopsis MAGIC population founders to enable bespoke bioinformatic analysis, and provided biological insight for data interpretation.
Collaborator Contribution They performed protein structural modelling with our data to predict changes to protein function due to natural variation in DNA sequence.
Impact The findings will eventually be published in peer-reviewed papers.
Start Year 2022
 
Description Use of Arabidopsis MAGIC Founder genome assemblies to facilitate identification of genes involved in nematode-plant interactions 
Organisation University of Cambridge
Department Department of Plant Sciences
Country United Kingdom 
Sector Academic/University 
PI Contribution We shared our genome assemblies to facilitate identification of genes underpinning quantitative trait loci (QTLs) that influence plant-nematode interactions.
Collaborator Contribution They performed a comprehensive screen of the Arabidopsis MAGIC population to identify QTLs that influence plant-nematode interactions.
Impact The work will eventually be published in peer-reviewed journals
Start Year 2022
 
Description Poster for the US HUPO conference 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Yong-In Kim co-authored a poster: 'Deep proteome coverage in high throughput diaPASEF for complex Arabidopsis thaliana samples'. This was a collaboration with industry and Brucker, the company involved used his data to promote applications of their new range of mass spectrometers.
Year(s) Of Engagement Activity 2022
 
Description Presentation at the American Society for Plant Biology Annual Meeting, Portland, Oregon, USA 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact PDRA Yong-In Kim gave a talk about the development of robust high-throughput proteomics workflows. As well as helping to build his career network, it generated community interest in the methodology.
Year(s) Of Engagement Activity 2022