BBSRC-NSF/BIO PanOryza: Globally coordinated genomes, proteomes and pathways for rice
Lead Research Organisation:
European Bioinformatics Institute
Department Name: Genome Assembly and Annotation
Abstract
Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.
Technical Summary
This project will create the pan gene set for cultivated Asian rice (Oryza sativa) and its wild Oryza relatives, using newly developed software to produce consistent gene models across all varieties and species. At the outset, we will have access to 16 "platinum standard" reference sequence (PSRefSeq) genomes for Asian rice, that represent the genetic diversity of O. sativa, as well as 25 wild rice genomes. At present, gene models are being annotated independently across different projects, leading to inconsistencies that cause confusion and challenges in fundamental and applied rice research. We will release consistent, aligned gene models for all Oryza genus reference genomes, through internationally recognised platforms Ensembl Plants and Gramene. We will develop capabilities within the Planteome knowledgebase for annotating rice genes with functional information, through semi-automated literature extraction, and provide a community platform for gene symbol assignment and manual gene model revisions. The PSRefSeqs will be fully aligned at the chromosomal level to define synteny, and enable users to view syntenic relationships between PSRefSeqs and their gene model sets. Genetic variant data coming from >3000 re-sequenced rice accessions will be mapped onto the aligned PSRefSeqs, and released via the European Variant Archive and linked to trait data on the varieties in the SNP-Seek platform.
Consistently annotated coding gene products i.e. proteomes, will be released via the world-leading protein knowledge-base UniProt, for all 16 O. sativa PSRefSeqs and 25 proteomes for wild rice. UniProt will add functional annotations, coordinated with Planteome, as well as defining the pan-proteome i.e. the full set of proteins present in all varieties of O. sativa and at the genus level. Large scale mass spectrometry data sets will be mined to provide protein-level evidence for coding genes, and to find and annotate sites of Post-Translational Modifications.
Consistently annotated coding gene products i.e. proteomes, will be released via the world-leading protein knowledge-base UniProt, for all 16 O. sativa PSRefSeqs and 25 proteomes for wild rice. UniProt will add functional annotations, coordinated with Planteome, as well as defining the pan-proteome i.e. the full set of proteins present in all varieties of O. sativa and at the genus level. Large scale mass spectrometry data sets will be mined to provide protein-level evidence for coding genes, and to find and annotate sites of Post-Translational Modifications.
Planned Impact
The PanOryza proposal encompasses and brings together the most widely used international databases for rice research and development. The planned objectives will improve data sharing between these resources, and greatly improve the quality of the data on rice genes, variants, proteins and post-translational modifications.
Beyond academic beneficiaries, the following groups will directly benefit:
- Seed companies and breeders will benefit through improved linkage of traits data e.g. held in SNP-Seek to improved gene models across the pan gene set for allele mining
There is potential for indirect benefits in breeders of other crops, via adoption of software and approaches developed in PanOryza improving capabilities for understanding the pan gene set within other species.
Staff will benefit through exposure to an international network of bioinformaticians, working in a key area of food security
Beyond academic beneficiaries, the following groups will directly benefit:
- Seed companies and breeders will benefit through improved linkage of traits data e.g. held in SNP-Seek to improved gene models across the pan gene set for allele mining
There is potential for indirect benefits in breeders of other crops, via adoption of software and approaches developed in PanOryza improving capabilities for understanding the pan gene set within other species.
Staff will benefit through exposure to an international network of bioinformaticians, working in a key area of food security
Publications
Contreras-Moreira B
(2023)
Calling pangenes from plant genome alignments confirms presence-absence variation
Contreras-Moreira B
(2025)
A pan-gene catalogue of Asian cultivated rice.
in bioRxiv : the preprint server for biology
Contreras-Moreira B
(2023)
GET_PANGENES: calling pangenes from plant genome alignments confirms presence-absence variation.
in Genome biology
Dyer S
(2025)
Ensembl 2025
in Nucleic Acids Research
Harrison PW
(2024)
Ensembl 2024.
in Nucleic acids research
Omenn G
(2024)
The 2024 Report on the Human Proteome from the HUPO Human Proteome Project
in Journal of Proteome Research
| Description | We have imported the genome assemblies of 16 rice cultivars into Ensembl Plants with gene annotation provided by our collaborators at Gramene. The list of cultivar includes four preexisting genomes plus twelve new Platinum standard sequences. We have developed the pipelines to import the data from Ensembl Rapid Release and prepared the UniProt data model, back-end and front-end in Proteome pages to integrate these data. We have finalised version 1 of the pan-gene clusters and added these identifiers as gene synonyms in Ensembl. |
| Exploitation Route | Beyond academic beneficiaries, seed companies and breeders will benefit through improved linkage of traits data e.g. held in SNP-Seek to improved gene models across the pan gene set for allele mining. There is also potential for indirect benefits in breeders of other crops, via adoption of software and approaches developed in PanOryza improving capabilities for understanding the pan gene set within other species. |
| Sectors | Agriculture Food and Drink Digital/Communication/Information Technologies (including Software) |
| Description | The genomes made available via Ensembl Plants are downloaded by several breeding companies for use internally, and as part of community tools which include our data e.g. FAIDARE, |
| First Year Of Impact | 2023 |
| Sector | Agriculture, Food and Drink |
| Title | Ensembl Beta - MAGIC 15 with pan-gene identifiers |
| Description | 13 of the MAGIC 16 rice genomes were added into Ensembl Beta with Pan-gene identifiers added as gene synonyms |
| Type Of Material | Database/Collection of data |
| Year Produced | 2025 |
| Provided To Others? | Yes |
| Impact | None yet, the identifiers are provided to support the accompanying paper which is under review |
| URL | https://beta.ensembl.org |
| Title | MAGIC 15 rice in Ensembl Plants |
| Description | The assemblies and annotations generated by the project partners were imported into Ensembl Plants where they are available for users to browse, and the outputs of comparative genomics analyses are provided across all rice cultivars plus wild relatives. |
| Type Of Material | Database/Collection of data |
| Year Produced | 2023 |
| Provided To Others? | Yes |
| Impact | The comparative analyses have provided an important QC step in the generation of pan-gene clusters |
| URL | https://plants.ensembl.org/Oryza_sativa/Info/Cultivars?db=core |
| Title | MAGIC 16 rice in Ensembl Rapid Release |
| Description | The assemblies and annotations generated by the project partners were imported into Ensembl Rapid Release where they are available for users to browse, perform sequence search and discover homologues. |
| Type Of Material | Database/Collection of data |
| Year Produced | 2023 |
| Provided To Others? | Yes |
| Impact | Preparatory steps for these to be imported into Ensembl Plants |
| URL | https://rapid.ensembl.org/ |
| Description | Barley Genome Net 2025 - Dundee |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | National |
| Primary Audience | Professional Practitioners |
| Results and Impact | The methods developed as part of PanOryza were presented to members of the barley research community |
| Year(s) Of Engagement Activity | 2025 |
| Description | Monogram 2023 |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | National |
| Primary Audience | Postgraduate students |
| Results and Impact | The rice Pan-genome integration into Ensembl Plants was presented at Monogram 2023 to raise awareness of new data and functionality among the small grains research community |
| Year(s) Of Engagement Activity | 2023 |
| URL | https://research.reading.ac.uk/monogram-2023/ |
| Description | PAG rice 2024 |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Professional Practitioners |
| Results and Impact | The PanOryza project was presented at the IRIC workshop "Rice Informatics for the Global Community"of the Plant and Animal Genome conference. There were useful interactions with the audience. |
| Year(s) Of Engagement Activity | 2024 |
