BBSRC-NSF/BIO PanOryza: Globally coordinated genomes, proteomes and pathways for rice

Lead Research Organisation: European Bioinformatics Institute
Department Name: Genome Assembly and Annotation

Abstract

Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.

Technical Summary

This project will create the pan gene set for cultivated Asian rice (Oryza sativa) and its wild Oryza relatives, using newly developed software to produce consistent gene models across all varieties and species. At the outset, we will have access to 16 "platinum standard" reference sequence (PSRefSeq) genomes for Asian rice, that represent the genetic diversity of O. sativa, as well as 25 wild rice genomes. At present, gene models are being annotated independently across different projects, leading to inconsistencies that cause confusion and challenges in fundamental and applied rice research. We will release consistent, aligned gene models for all Oryza genus reference genomes, through internationally recognised platforms Ensembl Plants and Gramene. We will develop capabilities within the Planteome knowledgebase for annotating rice genes with functional information, through semi-automated literature extraction, and provide a community platform for gene symbol assignment and manual gene model revisions. The PSRefSeqs will be fully aligned at the chromosomal level to define synteny, and enable users to view syntenic relationships between PSRefSeqs and their gene model sets. Genetic variant data coming from >3000 re-sequenced rice accessions will be mapped onto the aligned PSRefSeqs, and released via the European Variant Archive and linked to trait data on the varieties in the SNP-Seek platform.

Consistently annotated coding gene products i.e. proteomes, will be released via the world-leading protein knowledge-base UniProt, for all 16 O. sativa PSRefSeqs and 25 proteomes for wild rice. UniProt will add functional annotations, coordinated with Planteome, as well as defining the pan-proteome i.e. the full set of proteins present in all varieties of O. sativa and at the genus level. Large scale mass spectrometry data sets will be mined to provide protein-level evidence for coding genes, and to find and annotate sites of Post-Translational Modifications.

Planned Impact

The PanOryza proposal encompasses and brings together the most widely used international databases for rice research and development. The planned objectives will improve data sharing between these resources, and greatly improve the quality of the data on rice genes, variants, proteins and post-translational modifications.

Beyond academic beneficiaries, the following groups will directly benefit:

- Seed companies and breeders will benefit through improved linkage of traits data e.g. held in SNP-Seek to improved gene models across the pan gene set for allele mining


There is potential for indirect benefits in breeders of other crops, via adoption of software and approaches developed in PanOryza improving capabilities for understanding the pan gene set within other species.

Staff will benefit through exposure to an international network of bioinformaticians, working in a key area of food security

Publications

10 25 50

publication icon
Harrison P (2023) Ensembl 2024 in Nucleic Acids Research

 
Description We have imported the genome assemblies of 16 rice cultivars into Ensembl Rapid Release with gene annotation provided by our collaborators at Gramene. The list of cultivar includes four preexisting genomes plus twelve new Platinum standard sequences. We are also in the process of adding these genomes into Ensembl Plants. We have developed the pipelines to import the data from Ensembl Rapid Release and prepared the UniProt data model, back-end and front-end in Proteome pages to integrate these data. We are in the process of developing the pan-gene clusters and identifying additional gene models to improve existing rice reference gene annotation.
Exploitation Route Beyond academic beneficiaries, seed companies and breeders will benefit through improved linkage of traits data e.g. held in SNP-Seek to improved gene models across the pan gene set for allele mining. There is also potential for indirect benefits in breeders of other crops, via adoption of software and approaches developed in PanOryza improving capabilities for understanding the pan gene set within other species.
Sectors Agriculture, Food and Drink,Digital/Communication/Information Technologies (including Software)

 
Title MAGIC 16 rice in Ensembl Rapid Release 
Description The assemblies and annotations generated by the project partners were imported into Ensembl Rapid Release where they are available for users to browse, perform sequence search and discover homologues. 
Type Of Material Database/Collection of data 
Year Produced 2023 
Provided To Others? Yes  
Impact Preparatory steps for these to be imported into Ensembl Plants 
URL https://rapid.ensembl.org/