BBSRC-NSF/BIO PanOryza: Globally coordinated genomes, proteomes and pathways for rice

Lead Research Organisation: European Bioinformatics Institute

Department Name: Genome Assembly and Annotation

Abstract

Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.

Technical Summary

This project will create the pan gene set for cultivated Asian rice (Oryza sativa) and its wild Oryza relatives, using newly developed software to produce consistent gene models across all varieties and species. At the outset, we will have access to 16 "platinum standard" reference sequence (PSRefSeq) genomes for Asian rice, that represent the genetic diversity of O. sativa, as well as 25 wild rice genomes. At present, gene models are being annotated independently across different projects, leading to inconsistencies that cause confusion and challenges in fundamental and applied rice research. We will release consistent, aligned gene models for all Oryza genus reference genomes, through internationally recognised platforms Ensembl Plants and Gramene. We will develop capabilities within the Planteome knowledgebase for annotating rice genes with functional information, through semi-automated literature extraction, and provide a community platform for gene symbol assignment and manual gene model revisions. The PSRefSeqs will be fully aligned at the chromosomal level to define synteny, and enable users to view syntenic relationships between PSRefSeqs and their gene model sets. Genetic variant data coming from >3000 re-sequenced rice accessions will be mapped onto the aligned PSRefSeqs, and released via the European Variant Archive and linked to trait data on the varieties in the SNP-Seek platform.

Consistently annotated coding gene products i.e. proteomes, will be released via the world-leading protein knowledge-base UniProt, for all 16 O. sativa PSRefSeqs and 25 proteomes for wild rice. UniProt will add functional annotations, coordinated with Planteome, as well as defining the pan-proteome i.e. the full set of proteins present in all varieties of O. sativa and at the genus level. Large scale mass spectrometry data sets will be mined to provide protein-level evidence for coding genes, and to find and annotate sites of Post-Translational Modifications.

Planned Impact

The PanOryza proposal encompasses and brings together the most widely used international databases for rice research and development. The planned objectives will improve data sharing between these resources, and greatly improve the quality of the data on rice genes, variants, proteins and post-translational modifications.

Beyond academic beneficiaries, the following groups will directly benefit:

- Seed companies and breeders will benefit through improved linkage of traits data e.g. held in SNP-Seek to improved gene models across the pan gene set for allele mining

There is potential for indirect benefits in breeders of other crops, via adoption of software and approaches developed in PanOryza improving capabilities for understanding the pan gene set within other species.

Staff will benefit through exposure to an international network of bioinformaticians, working in a key area of food security

Funded Value:

£314,528

Funded Period:

Dec 20 - Nov 24

Funder:

BBSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

BB/T015608/1

Principal Investigator:

Sarah Dyer

Paul Flicek

Research Subject:

Tools, technologies & methods (100%)

Research Topic:

Bioinformatics (100%)

Organisations

European Bioinformatics Institute (Lead Research Organisation)

People	ORCID iD
Sarah Dyer (Principal Investigator)
Paul Flicek (Principal Investigator)
Maria J. Martin (Co-Investigator)	http://orcid.org/0000-0001-5454-2815
Bruno Contreras-Moreira (Researcher Co-Investigator)

Publications

Author Name

Title Publication Date Published

10 25 50

Contreras-Moreira B (2023) Calling pangenes from plant genome alignments confirms presence-absence variation

Contreras-Moreira B (2025) A pan-gene catalogue of Asian cultivated rice. in bioRxiv : the preprint server for biology

Contreras-Moreira B (2023) GET_PANGENES: calling pangenes from plant genome alignments confirms presence-absence variation. in Genome biology

Dyer S (2025) Ensembl 2025 in Nucleic Acids Research

Harrison PW (2024) Ensembl 2024. in Nucleic acids research

Omenn G (2024) The 2024 Report on the Human Proteome from the HUPO Human Proteome Project in Journal of Proteome Research

Key Findings
Impact Summary
Research Databases and Models
Engagement Activities


Description	We have imported the genome assemblies of 16 rice cultivars into Ensembl Plants with gene annotation provided by our collaborators at Gramene. The list of cultivar includes four preexisting genomes plus twelve new Platinum standard sequences. We have developed the pipelines to import the data from Ensembl Rapid Release and prepared the UniProt data model, back-end and front-end in Proteome pages to integrate these data. We have finalised version 1 of the pan-gene clusters and added these identifiers as gene synonyms in Ensembl.
Exploitation Route	Beyond academic beneficiaries, seed companies and breeders will benefit through improved linkage of traits data e.g. held in SNP-Seek to improved gene models across the pan gene set for allele mining. There is also potential for indirect benefits in breeders of other crops, via adoption of software and approaches developed in PanOryza improving capabilities for understanding the pan gene set within other species.
Sectors	Agriculture Food and Drink Digital/Communication/Information Technologies (including Software)


Description	The genomes made available via Ensembl Plants are downloaded by several breeding companies for use internally, and as part of community tools which include our data e.g. FAIDARE,
First Year Of Impact	2023
Sector	Agriculture, Food and Drink


Title	Ensembl Beta - MAGIC 15 with pan-gene identifiers
Description	13 of the MAGIC 16 rice genomes were added into Ensembl Beta with Pan-gene identifiers added as gene synonyms
Type Of Material	Database/Collection of data
Year Produced	2025
Provided To Others?	Yes
Impact	None yet, the identifiers are provided to support the accompanying paper which is under review
URL	https://beta.ensembl.org


Title	MAGIC 15 rice in Ensembl Plants
Description	The assemblies and annotations generated by the project partners were imported into Ensembl Plants where they are available for users to browse, and the outputs of comparative genomics analyses are provided across all rice cultivars plus wild relatives.
Type Of Material	Database/Collection of data
Year Produced	2023
Provided To Others?	Yes
Impact	The comparative analyses have provided an important QC step in the generation of pan-gene clusters
URL	https://plants.ensembl.org/Oryza_sativa/Info/Cultivars?db=core


Title	MAGIC 16 rice in Ensembl Rapid Release
Description	The assemblies and annotations generated by the project partners were imported into Ensembl Rapid Release where they are available for users to browse, perform sequence search and discover homologues.
Type Of Material	Database/Collection of data
Year Produced	2023
Provided To Others?	Yes
Impact	Preparatory steps for these to be imported into Ensembl Plants
URL	https://rapid.ensembl.org/


Description	Barley Genome Net 2025 - Dundee
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Professional Practitioners
Results and Impact	The methods developed as part of PanOryza were presented to members of the barley research community
Year(s) Of Engagement Activity	2025


Description	Monogram 2023
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Postgraduate students
Results and Impact	The rice Pan-genome integration into Ensembl Plants was presented at Monogram 2023 to raise awareness of new data and functionality among the small grains research community
Year(s) Of Engagement Activity	2023
URL	https://research.reading.ac.uk/monogram-2023/


Description	PAG rice 2024
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	The PanOryza project was presented at the IRIC workshop "Rice Informatics for the Global Community"of the Plant and Animal Genome conference. There were useful interactions with the audience.
Year(s) Of Engagement Activity	2024