Large scale dynamic integration of proteomics and genomics to support next-generation rice research

Lead Research Organisation: University of Liverpool
Department Name: Institute of Integrative Biology


Rice can be considered the most important worldwide crop for human nutrition, currently providing ~20% of worldwide daily dietary energy. Due to a growing worldwide population, and the effects of climate change, research into improvements in rice yield, resilience to drought and resistance to pathogens is urgently needed. Such research must be underpinned by public databases storing high-quality information about the rice genome, genetic variants carried by varieties with desirable traits, and information about the function of each gene/protein. This project is a collaboration between research teams based at the University of Liverpool, the Beijing Institute of Genomics and the BGI Education Centre. Our teams have considerable track record in the development of methods for studying the abundance of genes (transcriptomics) and proteins (proteomics) on a large scale for rice and other species, as well as computational approaches for interpreting and integrating data from these different techniques.

At present, the public databases storing the rice genome and information known about gene/protein function are disconnected from experimental data (transcriptomics/proteomics) being collected in laboratories all over the world. These experimental data can be used directly to improve the annotation of the genome, by showing how strongly particular genes or proteins are expressed under particular growth conditions or for a given rice variety (which gives clues as to functional importance). These data also show how genes or proteins differ in a given variety from the "reference" genome contained in the database. Our groups are developing software tools for integrating and analysing these data in new ways, so that when laboratories submit their data to a public repository, it can be directly integrated and viewed alongside the genome - which at present is not possible. We are also going to generate and analyse new data sets for several important rice varieties, so we can study how these gene and protein sequences differ from the reference genome.

Our results will help to improve the sequences and annotation of rice genes and proteins, and will be made easily available to all other rice researchers through the most widely accessed international public databases.

Planned Impact

The following direct impacts will result:

- Improved integration and visualisation of public omics data for rice. These data may be used by research teams in academia, as well as supporting rice breeding programmes.
- There is potential for discovery of new genes/proteins or splice forms within the data sets collected for 10 rice varieties, which could in due course lead to improvements in the search for better yield.

Indirect benefits:
- Improved rice databases could lead to the production of rice varieties with more desirable traits - thus with downstream effects on food security and economic benefits.
- Economic benefits through the re-analysis of existing data sets, generated at high expense, for a new purpose; potentially meaning that less money needs to be invested in curation or collection of new data.
- Improvements in the annotation of other crops through propagation of annotations and sharing of software tools.
Description We have identified a considerable number of genes in the rice genome where the annotation can be improved from proteomics data, including around 100 new genes we have discovered that can now been added to the canonical gene models and around another 700 genes where we can improve through annotation via providing evidence for exon splicing. All the data is in the public domain and the primary manuscript has been published outlining the main findings.
Exploitation Route We hope to improve the annotation of the rice genome, with benefits for all users of those resources.
Sectors Agriculture, Food and Drink

Description We have provided a large amount of data into the public domain as stable "Track Hubs", which can be displayed easily as tracks on the Ensembl genome browser or other systems (UCSD) supporting this technology. The tracks show which rice genes have protein-level evidence, and also provide evidence supporting the refinement of ~700 genes, and ~100 new genes not currently annotated. Rice genome databases are used in a wide range of fields, supporting research as well as gene editing and breeding efforts, and thus our data will have indirect impacts in various fields.
First Year Of Impact 2018
Sector Agriculture, Food and Drink
Impact Types Economic