What determines protein abundance in plants?
Lead Research Organisation:
Rothamsted Research
Department Name: Plant Sciences and the Bioeconomy
Abstract
Proteins are the workhorses of the cell: they facilitate chemical reactions, act as gene switches and have structural roles. For cells to work efficiently, proteins need to be produced in the right place, at the right time and in the right amount. They also need to be removed when no longer needed. Crick's Central Dogma states that coding sequences of DNA are transcribed into mRNAs, which in turn are translated into proteins. There are many levels at which this process is regulated and there are still many gaps in our knowledge. We expect both inherited and environmental differences between individuals to play important roles in the control of proteins.
This project seeks to use the model plant, Arabidopsis thaliana, to answer fundamental questions about the control of protein expression, including which mechanisms are important and how they interact in a complex multi-cellular organism. We also aim to determine to what extent the protein content of a given cell, tissue or organ predicts observable traits (the phenotype) of the plant. To address these questions, we have designed an integrated programme of experiments and sophisticated mathematical analysis around a genetically variable population of Arabidopsis (known as the MAGIC population). This is a powerful genetic resource for mapping sections of DNA that correlate with variation in a trait (known as quantitative trait loci, QTL), to identify causal variants and dissect the regulation of genome expression. We will characterise and compare the following different processes that potentially influence protein expression in the MAGIC lines:
1. Structural variation within the genome (including small-scale variation and large-scale structural rearrangements)
2. Chromatin accessibility, a measure of the availability of a given region of DNA for transcription.
3. Chemical modifications to DNA that do not involve a change in DNA sequence, known as epigenetic marks, which often indicate environmental perturbation.
3. mRNA abundance.
4. Protein abundance.
It is important to take an holistic approach, because the amount of any given protein in an individual is determined by the balance of these processes. Much effort has been spent studying gene transcription, because it is relatively easy to measure on a genome-wide scale. However, evidence suggests that transcription is a poor predictor of protein abundance, because the control of translation and protein degradation are important, particularly in plants. Less research has been done on measuring translation, protein amount and protein breakdown but advances in technology now let us do so. Although it is relatively straightforward to measure genomic structural variation and epigenetic marks such as DNA methylation, their impact on protein expression is unclear.
Therefore, we are in an exciting position to provide enormous insight into protein regulation. The power of this project derives from innovative computational analysis that will enable us to apportion the relative contributions of genotype, transcription, protein synthesis and protein degradation and identify networks controlling protein expression. Because collecting genome-scale data from many samples is expensive and time-consuming, we will use novel statistical methods to get more information without significantly increasing sample size, including combining different layers of information. This will be the first study of this kind on this scale. As well as depositing our data in public repositories, our findings will be made available to the academic community via a user-friendly knowledge discovery and gene mining resource. The approaches developed in this project will provide valuable fundamental insights that will be applicable to other organisms and which will also pave the way to future crop improvement.
This project seeks to use the model plant, Arabidopsis thaliana, to answer fundamental questions about the control of protein expression, including which mechanisms are important and how they interact in a complex multi-cellular organism. We also aim to determine to what extent the protein content of a given cell, tissue or organ predicts observable traits (the phenotype) of the plant. To address these questions, we have designed an integrated programme of experiments and sophisticated mathematical analysis around a genetically variable population of Arabidopsis (known as the MAGIC population). This is a powerful genetic resource for mapping sections of DNA that correlate with variation in a trait (known as quantitative trait loci, QTL), to identify causal variants and dissect the regulation of genome expression. We will characterise and compare the following different processes that potentially influence protein expression in the MAGIC lines:
1. Structural variation within the genome (including small-scale variation and large-scale structural rearrangements)
2. Chromatin accessibility, a measure of the availability of a given region of DNA for transcription.
3. Chemical modifications to DNA that do not involve a change in DNA sequence, known as epigenetic marks, which often indicate environmental perturbation.
3. mRNA abundance.
4. Protein abundance.
It is important to take an holistic approach, because the amount of any given protein in an individual is determined by the balance of these processes. Much effort has been spent studying gene transcription, because it is relatively easy to measure on a genome-wide scale. However, evidence suggests that transcription is a poor predictor of protein abundance, because the control of translation and protein degradation are important, particularly in plants. Less research has been done on measuring translation, protein amount and protein breakdown but advances in technology now let us do so. Although it is relatively straightforward to measure genomic structural variation and epigenetic marks such as DNA methylation, their impact on protein expression is unclear.
Therefore, we are in an exciting position to provide enormous insight into protein regulation. The power of this project derives from innovative computational analysis that will enable us to apportion the relative contributions of genotype, transcription, protein synthesis and protein degradation and identify networks controlling protein expression. Because collecting genome-scale data from many samples is expensive and time-consuming, we will use novel statistical methods to get more information without significantly increasing sample size, including combining different layers of information. This will be the first study of this kind on this scale. As well as depositing our data in public repositories, our findings will be made available to the academic community via a user-friendly knowledge discovery and gene mining resource. The approaches developed in this project will provide valuable fundamental insights that will be applicable to other organisms and which will also pave the way to future crop improvement.
Technical Summary
This project aims to understand how protein abundance is controlled in plants and to determine the phenotypic consequences of proteomic variation, together with genotypic, structural, epigenotypic and transcriptomic variation. We propose an integrated programme of quantitative trait loci (QTL) analysis of an Arabidopsis multiparental advanced generation intercross (MAGIC) population. Firstly, we will determine all variation in the 19 MAGIC founders, and interactions between different 'omic layers, via a comprehensive set of assays. Long-read sequencing of 18 founders' genomes will be performed for comparison of structural variation relative to the 19th founder, the Col-0 reference. We will measure epigenetic marks of cytosine DNA methylation and chromatin accessibility by ATAC-seq. Transcript abundance and regulatory RNA species will be analysed by RNA-seq and protein translation and abundance quantified by Ribo-seq and proteomics, respectively. Next, a holistic experimental and computational analysis of 400 Arabidopsis MAGIC RILs (recombinant inbred lines) will be used to understand the regulatory networks controlling protein expression and dissect the relative contributions of genotype (including small-scale variation and large-scale structural rearrangements), epigenotype, RNA transcription, protein synthesis and protein degradation. We will use statistical and machine learning (ML) approaches to construct different types of molecular networks and identify causal mediators. Co-expression analysis will also identify novel physical complexes and sets of proteins that participate in common processes. Selected networks and complexes will be tested experimentally. Whole plant phenotyping of the MAGIC lines will be performed and used together with the molecular data to interrogate the predictive ability of different 'omic layers across a range of phenotypes. Finally, data and knowledge generated will be shared with the community through a user-friendly web resource.
Planned Impact
The project's immediate beneficiary will be the academic community. The project will deliver academic impact through fundamental discoveries, big data, novel methodology and trained personnel. The knowledge base and technology developed will benefit projects across the range of BBSRC's strategic priorities, especially 'agriculture and food security', 'bioenergy and industrial biotechnology' and "exploiting new ways of working". "Application of computational and mathematical techniques to high-quality, quantitative biological data" is at the heart of the project. Although this proposal employs the model plant, Arabidopsis thaliana, knowledge and techniques we develop apply to other plants and indeed to other organisms. Even in models such as Arabidopsis, much of the protein coding genome is not characterised functionally; we anticipate that this project will play an important role in assigning new gene functions. Academic beneficiaries not only include plant scientists and researchers with an interest in proteostasis but also synthetic biologists via provision of a knowledge base for selective manipulation of proteins and pathways. The project will provide new concepts and rich data sets for the genetics, epigenetics, proteomics and plant science communities, which will be made available through public repositories and peer-reviewed journals. In addition, a key goal is to make a comprehensive, integrated and interoperable data resource accessible to the community in a timely fashion. Therefore, we will build a knowledge graph for gene mining and knowledge discovery in Arabidopsis that can be accessed via a user-friendly webapp (KnetMiner) or programmatic access (RDF and Neo4j servers) and integrate this with published Arabidopsis data. A collateral benefit is the generation of high-quality proteogenomic data. Information about structural variation, alternative splicing, alternative start sites and novel proteoforms will improve genome annotation and inform the development of databases for searching peptide mass spectrometric data. Thus, we anticipate that the work will drive innovations in bioinformatics. Arabidopsis has been important in establishing statistical genetics methodology and we will develop novel predictive methodology integrating multi-'omic data, using information about both predicted and validated molecular interactions across 'omic layers. These analyses will inform future translational research.
This collaborative project combines a unique skills base, providing a framework within which early career researchers can be cross-trained trained in a range of key laboratory and data science skills. Such trained researchers will be of benefit to the academic and industrial sectors. Skills and new methodology will also be shared through running workshops and training courses.
Longer term, the project has potential to contribute towards wealth creation and deliver environmental benefits, for example through plant breeding. In addition to identifying candidate genes, pathways and regulatory networks for manipulation in crop species, the cross-platform QTL analysis in Arabidopsis leads the way to comparable approaches in crops that will feed directly into breeding pipelines and establish the UK bioscience community as a leader in translating genomics into crop improvement solutions. This will be expedited by the existence of MAGIC populations for a number of important crop species (rice, wheat, chickpea), several of which have been co-developed by RM. The annual turnover from UK plant breeding is estimated to be in the region of £200 to £230 million ("The UK Plant Breeding Sector and Innovation" The Intellectual Property Office, 2016), and the worldwide value of plant breeding is far greater, particularly when considering the indirect benefits of yield and quality improvements and increased resilience to changing climatic conditions.
This collaborative project combines a unique skills base, providing a framework within which early career researchers can be cross-trained trained in a range of key laboratory and data science skills. Such trained researchers will be of benefit to the academic and industrial sectors. Skills and new methodology will also be shared through running workshops and training courses.
Longer term, the project has potential to contribute towards wealth creation and deliver environmental benefits, for example through plant breeding. In addition to identifying candidate genes, pathways and regulatory networks for manipulation in crop species, the cross-platform QTL analysis in Arabidopsis leads the way to comparable approaches in crops that will feed directly into breeding pipelines and establish the UK bioscience community as a leader in translating genomics into crop improvement solutions. This will be expedited by the existence of MAGIC populations for a number of important crop species (rice, wheat, chickpea), several of which have been co-developed by RM. The annual turnover from UK plant breeding is estimated to be in the region of £200 to £230 million ("The UK Plant Breeding Sector and Innovation" The Intellectual Property Office, 2016), and the worldwide value of plant breeding is far greater, particularly when considering the indirect benefits of yield and quality improvements and increased resilience to changing climatic conditions.
Publications
Eckardt NA
(2024)
The lowdown on breakdown: Open questions in plant proteolysis.
in The Plant cell
Gibbs DJ
(2024)
Primed to persevere: Hypoxia regulation from epigenome to protein accumulation in plants.
in Plant physiology
Gibbs DJ
(2022)
A stable start: cotranslational Nt-acetylation promotes proteome stability across kingdoms.
in Trends in cell biology
Theodoulou FL
(2022)
Plant proteostasis: a proven and promising target for crop improvement.
in Essays in biochemistry
| Description | We have generated and annotated high-quality genome assemblies for the 19 MAGIC population founder varieties (known as accessions). This information crucial to help us to interpret all the other data that we plan to collect in the project and will be a valuable resource for the wider research community. We have done a lot of work to develop or refine methods to measure the abundance of as many proteins as possible (the "proteome") and to measure proteins that are actively being synthesised (the "translatome"). It was important to make sure that the methods are highly reproducible and suitable for large numbers of samples. The 19 founders have now been grown under carefully controlled conditions and multiple molecular analyses completed. These include DNA methylation, accessibility of DNA for transcription into RNA (ATAC-seq), messenger RNA ("transcriptome), RNA actively being translated into proteins ("translatome"), and protein abundance ("proteome") . In addition, via a leverage funded project and secondment, we have obtained data for abundance of transfer RNAs (tRNAs) . These molecules facilitate the decoding of mRNA through delivery of amino acids to the protein synthesis machinery and add an extra layer of information to the project. Importantly, we have also grown over 400 MAGIC recombinant lines and have generated high quality transcriptome and proteome data sets. Using sophisticated statistical approaches, this is helping us to understand what controls the abundance of RNAs and proteins and how they relate to the biology of the plant. We have also developed new functionality for KnetMiner, a knowledge graph resource that will help others to analyse and use our data. |
| Exploitation Route | We now have validated stocks of Arabidopsis seeds that may be of use to other researchers. The genome assemblies will be of great use to other plant researchers, and we anticipate wide re-use once we have published these, including as part of the "1001 Arabidopsis long read genomes" project. We have shared these data sets pre-publication with selected collaborators which is already providing new biological insights. The different assays will help us to understand the "rules of life" that control protein abundance (and other traits). This information is very widely applicable and useful to all biologists. Longer-term, the information could help to inform crop improvement by breeding or engineering. We will publish details of the phenotyping platform that can be used by other researchers to build their own apparatus. We will also share our data in accessible formats so that others can re-use it. |
| Sectors | Agriculture Food and Drink |
| Description | 21ROMITIGATIONFUND Rothamsted |
| Amount | £924,000 (GBP) |
| Funding ID | BB/W510543/1 |
| Organisation | Biotechnology and Biological Sciences Research Council (BBSRC) |
| Sector | Public |
| Country | United Kingdom |
| Start | 08/2021 |
| End | 03/2022 |
| Description | FTMA4: Training and mobility to support multi-omic analysis |
| Amount | £27,392 (GBP) |
| Funding ID | BB/X017877/1 |
| Organisation | Biotechnology and Biological Sciences Research Council (BBSRC) |
| Sector | Public |
| Country | United Kingdom |
| Start | 01/2023 |
| End | 03/2023 |
| Description | High performance mass spectrometry: applications for the Cambridge biological sciences community |
| Amount | £295,395 (GBP) |
| Funding ID | BB/W019620/1 |
| Organisation | Biotechnology and Biological Sciences Research Council (BBSRC) |
| Sector | Public |
| Country | United Kingdom |
| Start | 07/2022 |
| End | 08/2023 |
| Description | LC-MS system for proteomics |
| Amount | £2,082,587 (GBP) |
| Funding ID | IGP22-035 |
| Organisation | Biotechnology and Biological Sciences Research Council (BBSRC) |
| Sector | Public |
| Country | United Kingdom |
| Start | 09/2022 |
| End | 09/2024 |
| Description | Provision of a replacement ultracentrifuge and rotors for fundamental biochemical studies at Rothamsted |
| Amount | £144,284 (GBP) |
| Funding ID | IGP22-015 |
| Organisation | Biotechnology and Biological Sciences Research Council (BBSRC) |
| Sector | Public |
| Country | United Kingdom |
| Start | 03/2022 |
| End | 03/2023 |
| Title | Measurement of tRNA abundance in plants |
| Description | Transfer RNAs (tRNA) facilitate the decoding of mRNA through delivery of amino acids to the ribosome during protein synthesis. The regulation of tRNA abundance has emerged as a mechanism that shapes gene expression during stress response and recovery in yeast and animals, in addition to playing regulatory roles beyond protein synthesis as non-coding RNAs. Modification-induced misincorporation tRNA sequencing (mim-tRNAseq) is a novel method for the quantification of tRNAs that has been developed for yeast and animals. Adapting this technique for plants is complicated by the presence of chloroplast-specific tRNAs. In collaboration with Dr Dany Nedialkova (Max Planck Insitute for Biochemistry, Martinsried, Germany), we have developed both a lab protocol for plants as well as a bioinformatic pipeline to accommodate plastid tRNAs. This work was supported through leveraged funding (ODA mitigation fund- see "Further Funding"). |
| Type Of Material | Technology assay or reagent |
| Year Produced | 2022 |
| Provided To Others? | No |
| Impact | Development of this method has enabled us to add a further 'omic layer to our datasets for the 19 MAGIC founders. This paves the way to understanding the relationship between tRNA abundance and protein expression. Upon publication, this method will be available to the research community at large and will be an important tool for plant protein translation research. |
| Title | Raspberry-pi plant phenotyping |
| Description | A low-cost, plant phenotyping platform for use in plant growth chambers has been developed. This comprises internet-connected raspberry-pi cameras and bespoke scripts that enable real-time, remote imaging of plants. Images are automatically downloaded and analysed using the PlantCV package. |
| Type Of Material | Improvements to research infrastructure |
| Year Produced | 2021 |
| Provided To Others? | No |
| Impact | The platform is being used in a spin-off project, supported by the ODA mitigation fund (see Further Funding). Here, the phenotyping platform is being used to quantify the responses of the Arabidopsis MAGIC population founders to hypoxia/submergence-recovery stress. Ultimately this will enable screening of the MAGIC lines for this trait to obtain mechanistic understanding. This adds considerable value to the large multi-omic data sets that will be generated in this award. |
| Title | Revisiting the Central Dogma: the distinct roles of genome, methylation, transcription, and translation on protein expression in Arabidopsis thaliana |
| Description | Background We investigated the flow of information from genome sequence to protein expression implied by the Central Dogma, to determine the impact of intermediate genomic levels in plants.Results We performed genomic profiling of rosettes in two Arabidopsis accessions, Col-0 and Can-0, and assembled their genomes using long reads and chromatin interaction data. We measured gene and protein expression in biological replicates grown in a controlled environment, also measuring CpG methylation, ribosome-associated transcript levels and tRNA abundance. Each omic level is highly reproducible between biological replicates and between accessions despite their 0.5% sequence divergence; the single best predictor of any level in one accession is the corresponding level in the other. Within each accession, gene codon frequencies accurately model both mRNA and protein expression. The effects of a codon on mRNA and protein expression are highly correlated but are unrelated to genome-wide codon frequencies or to tRNA levels which instead match genome-wide amino acid frequencies. Ribosome-associated transcripts closely track mRNA levels.Conclusions In the absence of environmental perturbation, neither methylation, tRNA nor ribosome-associated transcript levels add appreciable information about constitutive protein abundance beyond that in DNA codon frequencies and mRNA expression levels. The impact of constitutive gbM is mostly explained by gene codon composition. tRNA abundance tracks overall amino acid demand. However, genetic differences between accessions associate with differential gbM by inflating differential expression variation. Our data show that the Central Dogma holds only if both sequence and abundance information in mRNA are considered. |
| Type Of Material | Database/Collection of data |
| Year Produced | 2025 |
| Provided To Others? | Yes |
| URL | https://rdr.ucl.ac.uk/articles/dataset/_b_Revisiting_the_Central_Dogma_the_distinct_roles_of_genome_... |
| Title | Genomes, Transcriptomes, Methylomes and Proteomes of Arabidopsis thaliana Accessions Col-0 and Can-0 |
| Description | Genomes, Transcriptomes, Methylomes and Proteomes of Arabidopsis thaliana Accessions Col-0 and Can-0 |
| Type Of Material | Database/Collection of data |
| Year Produced | 2024 |
| Provided To Others? | Yes |
| Impact | These datasets form the basis of a pre-printed paper and underpin further studies within this project. The genome assemblies have been used for others to identify genes underlying QTLs. |
| URL | https://www.ebi.ac.uk/ena/browser/view/PRJEB77203 |
| Description | Arabidiopsis 1001 Genomes |
| Organisation | Salk Institute for Biological Studies |
| Country | United States |
| Sector | Charity/Non Profit |
| PI Contribution | We have collaborated with members of the 1001 Arabidopsis genomes consortium. We have assembled all the genomes sequenced by the initiative using the assembly pipeline we developed for our project. Recently we have completed long read sequencing of 15 arabidopsis genomes and assembled them. We are in the process of dicussions to integrate these assemblies with those produced by other partners in the 1001 Arabidopsis genomes effort [Note: although the 1001 genomes project started before this award, we are here describing recent collaboration since the start of the current award]. |
| Collaborator Contribution | Our partners have sequenced over 1000 Arabidopsis genomes with short reads and several hundred with long reads |
| Impact | Short read data have been publicly released. Long read data will be released once publication has occurred. |
| Start Year | 2012 |
| Description | Collaboration with ELIXIR E-PAN project partners |
| Organisation | ELIXIR |
| Department | ELIXIR UK |
| Country | United Kingdom |
| Sector | Charity/Non Profit |
| PI Contribution | KHP is leading a work package on data standards and interoperability of pan-genomic data into tools like PLAZA and KnetMiner. |
| Collaborator Contribution | The project has started in Nov 2024. The project partners include major European Plant Bioinformatics teams. We have started reviewing the pan-genomic tools, resources and standards for data QC, visualisation and exchange. |
| Impact | The main output so far has been a draft review document with some identified key challenges and guidelines. |
| Start Year | 2024 |
| Description | Measuring tRNA abundance in plants |
| Organisation | Max Planck Society |
| Department | Max Planck Institute of Biochemistry |
| Country | Germany |
| Sector | Academic/University |
| PI Contribution | We posed a biological problem and sent a PDRA on secondment to the collaborator's lab. |
| Collaborator Contribution | Provision of knowhow to enable adoption of their methods for measurement of tRNA abundance to plants. This includes both experimental and bioinformatic workflows. |
| Impact | The data generated will eventually be published in peer-reviewed journals. |
| Start Year | 2022 |
| Description | Understanding the consequences of genetic variation on protein function |
| Organisation | University College London |
| Country | United Kingdom |
| Sector | Academic/University |
| PI Contribution | We shared genome assemblies for the Arabidopsis MAGIC population founders to enable bespoke bioinformatic analysis, and provided biological insight for data interpretation. |
| Collaborator Contribution | They performed protein structural modelling with our data to predict changes to protein function due to natural variation in DNA sequence. |
| Impact | The findings will eventually be published in peer-reviewed papers. |
| Start Year | 2022 |
| Description | Use of Arabidopsis MAGIC Founder genome assemblies to facilitate identification of genes involved in nematode-plant interactions |
| Organisation | University of Cambridge |
| Department | Department of Plant Sciences |
| Country | United Kingdom |
| Sector | Academic/University |
| PI Contribution | We shared our genome assemblies to facilitate identification of genes underpinning quantitative trait loci (QTLs) that influence plant-nematode interactions. |
| Collaborator Contribution | They performed a comprehensive screen of the Arabidopsis MAGIC population to identify QTLs that influence plant-nematode interactions. |
| Impact | The work will eventually be published in peer-reviewed journals |
| Start Year | 2022 |
| Title | Concise Common Workflow Language (ccwl) |
| Description | The Concise Common Workflow Language (ccwl) is a concise syntax to express CWL workflows. It is implemented as an Embedded Domain Specific Language (EDSL) in the Scheme programming language, a minimalist dialect of the Lisp family of programming languages. ccwl is a compiler to generate CWL workflows from concise descriptions in ccwl. In the future, ccwl will also have a runtime whereby users can interactively execute workflows while developing them. |
| Type Of Technology | Software |
| Year Produced | 2024 |
| Open Source License? | Yes |
| Impact | None so far |
| URL | https://github.com/arunisaac/ccwl |
| Title | KnetMaps - Knowledge Network Discovery Tool |
| Description | A webtool to visualise deep biological knowledge networks such as cross-species, pan-genomics, multi-omics networks with features such as species grouping, graph explorer and semantic path browsing. |
| Type Of Technology | Webtool/Application |
| Year Produced | 2024 |
| Impact | The tool is a visualisation component in the KnetMiner platform and assists scientists with the analysis of cereals, brassica, vegetables and Arabidopsis genomes. |
| URL | https://app.knetminer.com |
| Title | Literature Review Feature for KnetMiner |
| Description | A new features was added to KnetMiner which uses RAG (Retrieval Augmented Generation) to create a literature review of query specific gene knowledge graphs. |
| Type Of Technology | Webtool/Application |
| Year Produced | 2024 |
| Impact | Improves productivity of researchers. Available as a freemium feature in KnetMiner. |
| URL | https://app.knetminer.com |
| Title | New KnetMiner API |
| Description | New KnetMiner API for gene based queries against a Neo4j database with knowledge graphs for cereals, vegetable, brassica and ascomycota genomes. |
| Type Of Technology | Webtool/Application |
| Year Produced | 2024 |
| Impact | This will allow better integrations of KnetMiner data with other platforms, and provide new collaborations with AI and data science academic and industry users. |
| URL | https://api.knetminer.com/ |
| Title | ravanan |
| Description | ravanan (pronounced rah-vun-un, IPA: r??v?n?n, Shavian: ) is a Common Workflow Language (CWL) implementation that is powered by GNU Guix and provides strong reproducibility guarantees. ravanan provides strong bullet-proof caching (work reuse) so you never run the same step of your workflow twice, nor do you have to keep track of which steps were run and with what parameters: ravanan remembers everything for you. ravanan captures logs from every step of your workflow so you can always trace back in case of job failures. Salient features of ravanan include: Bullet-proof caching; never run the same computation again and never fear that the cache is stale (no kidding, we're serious!). This is especially useful when developing a workflow and you want to iterate by repeatedly running slightly modified workflows. Each step in your CWL corresponds to exactly one job on the batch system. Clear logging; you never have to hunt for log files in obscure directories or binary databases. Jobs do not directly write to the shared network filesystem; keeps performance good and your fellow HPC users happy Jobs never write to /tmp; keeps your HPC admin happy |
| Type Of Technology | Software |
| Year Produced | 2024 |
| Impact | None so far |
| URL | https://github.com/arunisaac/ravanan/ |
| Description | Common Workflow Language Conference (CWLCon) 2024 |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Professional Practitioners |
| Results and Impact | Arun Isaac gave a talk on "Concise Common Workflow Language---ccwl" on 15/5/2024, raising awareness of this efficient workflow developed for the project. |
| Year(s) Of Engagement Activity | 2024 |
| URL | https://cwl.discourse.group/t/concise-common-workflow-language-ccwl/880 |
| Description | Elite Network of Bavaria mini-symposium |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Postgraduate students |
| Results and Impact | Kathryn Lilley gave a talk entitled "What determines protein abundance in plants?" at the Elite Network of Bavaria mini-symposium, 12th July 2024, Technical University of Munich, Freising, Germany, leading to a PhD placement in her lab for 3 months. The student is funded by the Bavarian Government Proteomes that Feed the World programme. |
| Year(s) Of Engagement Activity | 2024 |
| Description | Genomics workloads for the future (now) |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Professional Practitioners |
| Results and Impact | Arun Isaac gave a seminar entitled "Ravanan---a high performance Common Workflow Language implementation" at the Barcelona Supercomputing Centre on 7/11/2024, to raise awareness of efficient workflows developed for high performance computing. |
| Year(s) Of Engagement Activity | 2024 |
| URL | https://www.bsc.es/research-and-development/research-seminars/sors-genomics-workloads-the-future-now |
| Description | Gordon Research Seminar: "Natural Variation in Ubiquitin-Proteasome System Gene Expression Revealed by eQTL Mapping and eTWAS in the Arabidopsis Magic Population" |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Professional Practitioners |
| Results and Impact | Xiaowei Li gave an invited presentation at the Gordon Research Seminar on Plant Proteolysis 2025, a unique forum for young doctoral and post-doctoral researchers to present their work, discuss new methods, cutting edge ideas, and pre-published data, as well as to build collaborative relationships with their peers. The event also included a career mentoring discussion panel designed to offer insights and guidance from experienced mentors on navigating the journey from academic training to professional roles in science. |
| Year(s) Of Engagement Activity | 2025 |
| URL | https://www.grc.org/plant-proteolysis-grs-conference/2025/ |
| Description | International Plant Proteostasis Conference, Vienna, 2024 |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Professional Practitioners |
| Results and Impact | Freddie Theodoulou gave an invited seminar at the International Plant Proteostasis Conference, 2024, entitled "Modelling and mapping protein abundance from multi-omic data". This raised awareness of the sLoLa project and sparked potential collaborations. |
| Year(s) Of Engagement Activity | 2024 |
| URL | https://www.oeaw.ac.at/gmi/news-events/events/international-plant-proteostasis-conference-2024#:~:te... |
| Description | KnetMiner Games for New Scientist Live |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | National |
| Primary Audience | Schools |
| Results and Impact | The KnetMiner team designed and delivered educational video games as part of Rothamsted Research's public outreach activities at the New Scientist Live event in London. The primary goal of these interactive games was to educate children about the application of artificial intelligence (AI) and knowledge graphs in biological research. To enhance engagement, we implemented a points-based reward system and integrated a leaderboard to foster a competitive yet enjoyable learning environment. The games were highly successful, attracting participation from over 500 children. Feedback collected from parents and teachers was consistently positive, highlighting the games' educational effectiveness and their capacity to stimulate interest in biology and AI. Although developing educational tools aimed at children is not our primary focus, this initiative provided valuable insights that will inform future improvements to KnetMiner's usability and user experience. |
| Year(s) Of Engagement Activity | 2024 |
| Description | KnetMiner booth at PAG-2025 in US |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Industry/Business |
| Results and Impact | The KnetMiner team participated actively in the Plant and Animal Genome (PAG) Conference held in San Diego, USA. The event provided an important platform to present our latest advancements in AI-assisted gene discovery. During the conference, we delivered a well-received talk highlighting recent developments in AI-powered discovery of gene-trait associations, demonstrating significant potential to accelerate genomic research. Our presence included a modern, interactive booth designed to effectively showcase the next-generation capabilities of the KnetMiner platform, including dynamic, user-friendly demonstrations which attracted substantial interest from attendees. The event facilitated valuable discussions with current collaborators, reinforcing ongoing partnerships, and provided opportunities to establish promising new connections with prominent scientists and representatives from breeding companies. These interactions have the potential to evolve into significant future collaborations, enhancing the application and reach of KnetMiner in both academic and industrial research contexts. Overall, participation at PAG reinforced KnetMiner's position at the forefront of bioinformatics innovation, underlining our commitment to enabling more efficient and impactful genomic research through AI and knowledge graph technologies. |
| Year(s) Of Engagement Activity | 2025 |
| URL | https://knetminer.com/news/pag-32 |
| Description | MemPanG24; Computational pangenomics course, conference, and biohackathon in Memphis, TN, exploring the cutting edge of pangenomes, biology, methods, software, and artificial intelligence (AI). |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Professional Practitioners |
| Results and Impact | Arun Isaac gave a talk entitled "GWAS on an Arabidopsis MAGIC pangenome" at this event. The event provided training and knowledge exchange for biologists and bioinformaticians interested in studying organisms with high genetic diversity or without a reference genome, as well as those involved in comparative genomics and the assembly of pangenomes for any species. |
| Year(s) Of Engagement Activity | 2024 |
| URL | https://pangenome.github.io/MemPanG24/ |
| Description | Poster for the US HUPO conference |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Professional Practitioners |
| Results and Impact | Yong-In Kim co-authored a poster: 'Deep proteome coverage in high throughput diaPASEF for complex Arabidopsis thaliana samples'. This was a collaboration with industry and Brucker, the company involved used his data to promote applications of their new range of mass spectrometers. |
| Year(s) Of Engagement Activity | 2022 |
| Description | Presentation at Rothamsted Heritage Day |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | Regional |
| Primary Audience | Public/other audiences |
| Results and Impact | Postdoctoral researcher Dr Mark Bailey attended Heritage Week (10/09/23) to discuss the importance of fundamental plant biology research and the importance of models such as Arabidopsis thaliana. |
| Year(s) Of Engagement Activity | 2023 |
| Description | Presentation at the American Society for Plant Biology Annual Meeting, Portland, Oregon, USA |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Professional Practitioners |
| Results and Impact | PDRA Yong-In Kim gave a talk about the development of robust high-throughput proteomics workflows. As well as helping to build his career network, it generated community interest in the methodology. |
| Year(s) Of Engagement Activity | 2022 |
| Description | Seminar at the conference "At the forefront of plant research 2023", Barcelona Spain, organised by CRAG |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Professional Practitioners |
| Results and Impact | On May 10th 2023 Richard Mott gave a talk "Toolkits for animal and crop improvement: Multiparental populations to dissect standing variation, and protocols to share proprietary genetic and phenotypic data " Abstract: We present two approaches to identify alleles relevant for improvement of animals and crops. These toolboxes represents the extremes of how we can leverage information from genetic and phenotypic data. Multiparental populations are descended from a selected set of founders, such that their chromosomes are random mosaics of the founders' chromosomes. They are particularly useful for crop breeding where genetically stable recombinant inbred lines may be bred and phenotyped across different environments, to map loci relevant to agronomic traits. We present data from a mutiparental population of 500 lines descended from 16 UK wheat varieties selected to represent germplasm across the past 70 years, to show how these populations can be used to understand the genetic architecture of complex traits, and to recapitulate the impact of the Green Revolution. An alternative to breeding new populations is to utilise existing genetic and phenotypic data, by combining datasets across institutions, including proprietary data held by breeding companies. To do so we must overcome the challenge of sharing data in such a way that individual level information is hidden whilst the computations needed to discover relevant genetic loci are unimpeded. We present a protocol based on random orthogonal transformations which is suitable for all quantitative genetic analyses that employ Gaussian likelihoods, including linear mixed models and many Bayesian methods. We demonstrate its use with proprietary pig data. |
| Year(s) Of Engagement Activity | 2023 |
| URL | https://www.cragenomica.es/events/forefront-plant-research-2023 |
