BBSRC-NSF/BIO PanOryza: Globally coordinated genomes, proteomes and pathways for rice

Lead Research Organisation: University of Liverpool
Department Name: Institute of Integrative Biology


The most serious issue facing the global human population is the 10 billion person question. Can the world support production of enough high-quality, nutritious food to feed a projected 10 billion people by 2050, against a backdrop of climate change that will make fertile land more scarce and weather more unpredictable (droughts, floods, temperature spikes). It is certain that rice will play a central role in feeding much of the globe. At present, rice provides the major daily caloric need for 50% of the human population. The genus as a whole, including the two main branches of domesticated rice (Oryza sativa Japonica and Indica), and a huge variety of wild relatives, has the capacity to grow under an exceptionally wide range of environments. Research teams around the world have been working closely with breeders to help identify desirable traits present in local landraces (i.e. locally adapted varieties) and their wild relatives, such as resistance to periodic droughts, flooding or high temperatures, and perform crosses to transfer such genes/traits into high yielding, and widely distributed varieties.

Genomic science is now central to research and development efforts. The genome of one reference variety of Japonica and one of Indica were sequenced around 15 years ago. However, the genome sequence is just the starting point. For the data to be useful for research, requires a process called genome annotation. Annotation involves finding and defining the genes within the genome, and working out what functional roles those genes code for. Annotation usually involves several different software packages, which high error rates, followed by manual work to fix errors and improve genes over many years. Rice gene annotation efforts have unfortunately suffered from a lack of international coordination, leading to several different independent efforts to annotate rice genes using different methods, which persist in different databases today. New genomes for varieties representing a wider pool of landraces and wild rice species are just coming online, and thus without a major effort in international coordination, this problem will get rapidly worse. Plant scientists and breeders will find it very challenging to interpret and compare information collected from different rice varieties.

This project aims to solve this issue, bringing together six international partners with a shared goal of creating consistency in the rice gene set across all varieties. We will build new software and protocols for sharing data, which will enable us to define what genes are present (and the proteins they encode) in the genomes of all rice types, called a pan-genome or pan gene set. The outcomes of our project will "future-proof" rice genomic resources, so that researchers can focus their efforts on understanding the biology of rice, and searching for desirable traits that span the genetic diversity of cultivated Asian rice.

Technical Summary

This project will create the pan gene set for cultivated Asian rice (Oryza sativa) and its wild Oryza relatives, using newly developed software to produce consistent gene models across all varieties and species. At the outset, we will have access to 16 "platinum standard" reference sequence (PSRefSeq) genomes for Asian rice, that represent the genetic diversity of O. sativa, as well as 25 wild rice genomes. At present, gene models are being annotated independently across different projects, leading to inconsistencies that cause confusion and challenges in fundamental and applied rice research. We will release consistent, aligned gene models for all Oryza genus reference genomes, through internationally recognised platforms Ensembl Plants and Gramene. We will develop capabilities within the Planteome knowledgebase for annotating rice genes with functional information, through semi-automated literature extraction, and provide a community platform for gene symbol assignment and manual gene model revisions. The PSRefSeqs will be fully aligned at the chromosomal level to define synteny, and enable users to view syntenic relationships between PSRefSeqs and their gene model sets. Genetic variant data coming from >3000 re-sequenced rice accessions will be mapped onto the aligned PSRefSeqs, and released via the European Variant Archive and linked to trait data on the varieties in the SNP-Seek platform.

Consistently annotated coding gene products i.e. proteomes, will be released via the world-leading protein knowledge-base UniProt, for all 16 O. sativa PSRefSeqs and 25 proteomes for wild rice. UniProt will add functional annotations, coordinated with Planteome, as well as defining the pan-proteome i.e. the full set of proteins present in all varieties of O. sativa and at the genus level. Large scale mass spectrometry data sets will be mined to provide protein-level evidence for coding genes, and to find and annotate sites of Post-Translational Modifications.

Planned Impact

The PanOryza proposal encompasses and brings together the most widely used international databases for rice research and development. The planned objectives will improve data sharing between these resources, and greatly improve the quality of the data on rice genes, variants, proteins and post-translational modifications.

Beyond academic beneficiaries, the following groups will directly benefit:

- Seed companies and breeders will benefit through improved linkage of traits data e.g. held in SNP-Seek to improved gene models across the pan gene set for allele mining.

There is potential for indirect benefits in breeders of other crops, via adoption of software and approaches developed in PanOryza improving capabilities for understanding the pan gene set within other species.

Staff will benefit through exposure to an international network of bioinformaticians, working in a key area of food security.


10 25 50