COpenPlantOmics (COPO): a Collaborative Bioinformatics Plant Science Platform

Lead Research Organisation: European Bioinformatics Institute

Department Name: Ensembl Genomes

Abstract

Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.

Technical Summary

Accessibility to biological data has been hindered by lack of standards, lack of awareness of the benefits and pathways to releasing data that is described by those standards, and lack of services whereby data can be analysed, published and retrieved easily. Recently, there has been a large commitment by the BBSRC to push for open access data and publishing to further bioscience research in the UK. However, barriers still exist that prevent scientists from openly depositing their data and metadata, which comprise a lack of interoperability between metadata annotation services, data repositories, data analysis platforms and data publishing platforms. As such, plant scientists might not: be aware that the services exist; have the expertise to use them; see the value in properly describing their data.
This project aims to build COPO, the software infrastructure required to reach the level of interoperability that plant researchers need to describe their data using community-recognised ontologies, seamless bi-directional data flow to relevant repositories, and then publish these data for open access. COPO will manage the hardware infrastructure at TGAC to deliver a consistent robust staging area and database that will support unique accessioned artefacts representing the corpus of data and metadata a user wants to expose. The resulting marked-up datasets processed and published using COPO will allow greater potential integrative analysis using existing tools such as iPlant and Galaxy.
New Application Programming Interfaces (APIs) will interconnect existing tools and services, and by developing new RESTful user interfaces that wrap up these APIs, COPO will be a single point-of-entry for plant researchers to disseminate their data all the way from generation to publication. By federating the TGAC iRODS data grid system with others, e.g. Texas Advanced Computing Center's iPlant installation, access to worldwide analytical infrastructure and data will be facilitated.

Planned Impact

Impact Summary
Academic, Economic and Commercial Impacts
With the renewed interest and push from all areas of bioscience to promote publicly available research, the COPO project will be a pioneering national and international effort to facilitate sharing of all aspects of plant research to the public. COPO aims to be the vehicle to bring together the tools required to harmonise open plant omics research. This sector has obvious ties with industry. Public domain omics-based bioscience is relevant and important input into industry internal research and discovery activities. To make such bioscience data truly reusable and ensure scientific robustness, it must be uniformly annotated, allowing not only integration through equivalence of terminology but also by increasing efficiency in data production and re-use, and allowing correct interpretation by means of the context provided by their metadata. A collaborative platform for frictionless bioinformatics built with and for the academic and industrial community is long overdue. Alongside data processing, industry also works on finding solutions for integration and management of large 'omics data sets, e.g. efforts like the Pistoia Alliance. Together with COPO industry partners (Eagle Genomics) we will develop use-cases for the platform in industry, propose acceptance criteria required for commercial use, supply technical advice/support on meeting acceptance criteria, evaluate the platform on 3rd party infrastructure, and maximise knowledge exchange and commercialisation.

COPO and the standards community
Expertise and knowledge gained throughout the lifetime of the project and beyond will be disseminated through a variety of channels. The presence of a direct link with the plant science community (through GARNet, UK Plant Sciences Federation (UKPSF)) is key to the success and adoption of the platform and associated standards. The project will have a continuous dialogue, through face-to-face events as well as online tools and social media, between those working on the platform and the plant bioscience community. The several letters of support show a clear interest in working together, using and adopting a platform that implicitly confers standards compliance. COPO will provide a solution to overcome the challenges in standards fragmentation by (i) fostering development, acceptance and implementation of reporting standards that are immediately suitable for plant research, and (ii) limiting the range and variability of standards. This will have a direct impact on the development and maintenance costs for commercial and academic software developers of standards-compliant products.

Societal impacts
Historically there has been reluctancy to adopt some of the standards and open-data principles in the plant bioscience community, especially in the field of food sustainability and security, so openness and transparency in these areas are vital to continue improving the public perception. The presentation of the research data will play a key role in opening the dialogue with the general public and will contribute to the development of stronger links with sectors in society (such as school teachers) that are less familiar with the scientific activities in plant research and the beneficial impact this has in their lives. It is widely recognised that the shortage of expertise and skill in biomathematics and informatics across the world is a major risks for a future development of key areas in life sciences. The objectives of this proposal will help to attract talented staff to work with the COPO partners, and offer alternative career paths.

Funded Value:

£85,976

Funded Period:

Oct 14 - Oct 18

Funder:

BBSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

BB/L024071/1

Principal Investigator:

Paul Flicek

Paul Kersey

Research Subject:

Tools, technologies & methods (99%)

Research Topic:

eScience (99%)

Organisations

European Bioinformatics Institute (Lead Research Organisation)

People	ORCID iD
Paul Flicek (Principal Investigator)
Paul Kersey (Principal Investigator)	http://orcid.org/0000-0002-7054-800X

Publications

Author Name

Title Publication Date Published

10 25 50

Arnaud E (2020) The Ontologies Community of Practice: A CGIAR Initiative for Big Data in Agrifood Systems in Patterns

Johnson D (2021) ISA API: An open platform for interoperable life science experimental metadata. in GigaScience

Kersey PJ (2018) Ensembl Genomes 2018: an integrated omics infrastructure for non-vertebrate species. in Nucleic acids research

Shaw F (2020) COPO: a metadata platform for brokering FAIR data in the life sciences in F1000Research

Key Findings
Engagement Activities


Description	We have gathered substantial information from the community about the relevant metadata related to their experiments; about data standards in use for diverse experimental data being generated by various research communities; and about how these concepts map onto the metadata collected by the major database archives. We have used this information to find generic common factors and to develop data submission and validation tools to ease the capture and archiving of plant omics data. These tools have been pre-released at https://copo-project.org ; the code and documentation can be found at https://github.com/collaborative-open-plant-omics. At the end of 2019 COPO was feeding archives such as ENA, figshare, DSPACE, ckan or Dataverse, with nearly 50 institutional users, with a total volume of 10TB brokered data. We have also worked with the community to formalise the standards for plant-related metadata (MIAPPE) and crop ontologies which are being integrated into the submission tool. We have completed an indexing project to automatically search for EBI plant samples, find their associated data files (across archives such as ENA, EVA, Array Express) and output them in a JSON format at ftp://ftp.ensemblgenomes.org/pub/misc_data/plant_index . The code has been recently updated to cope with changes at the ENA API and can be found at https://github.com/EnsemblGenomes/ebi_plant_index . The indexed data is now regularly imported into INRAE's Genetic and Genomic Information System at https://urgi.versailles.inra.fr/faidare , which allows users to search for germplasm and plant phenotype experiments across several plant breeding institutes.
Exploitation Route	The generic tools in development as COPO can be configured to meet the needs of other research communities, allowing a single technological solution to be deployed in any domain customised complex experiments, generating multiple data types with different persistent archives and subject to different formalised standards. The MIAPPE standards implemented in COPO have potential application by any other software/database handling the same data types.
Sectors	Agriculture, Food and Drink,Digital/Communication/Information Technologies (including Software)


Description	ENA Facilities Day 2019
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Other audiences
Results and Impact	Reviewed current data exchanges between Ensembl Plants, the European Nucleotide Archive (ENA) and Array Express and discussed problems plant community members face when submitting new genomic data to archives such as the ENA. The primary audience was teams involved in submissions of biological sequences. The most important impact was to raise awareness of the challenges of large plant genomes such as wheat and barley, which require different cut-offs.
Year(s) Of Engagement Activity	2019
URL	https://www.ebi.ac.uk/ena/support/facilities-day


Description	EU-China expert seminar on identifying potential joint priorities for research and innovation in food, agriculture and biotechnology
Form Of Engagement Activity	A formal working group, expert panel or dialogue
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Policymakers/politicians
Results and Impact	I participated in an EU-China expert seminar on identifying potential joint priorities for research and innovation in food, agriculture and biotechnology, designed to identify future priorities for joint funding schemes based on the direction of current research.
Year(s) Of Engagement Activity	2016


Description	Elixir meeting attedance
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Undergraduate students
Results and Impact	Introduced EBI plant sample indexing proposal to other Elixir plant nodes (break out meeting with slides). First connection made with the Italian node and their variation study on common apple cultivars.
Year(s) Of Engagement Activity	2018
URL	https://www.elixir-europe.org/events/elixir-all-hands-2018


Description	Marc Rossello attended "PhenoHarmonis Pheontyping Workshop"
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Meetings with Elixir plant nodes and CGIAR community about MIAPPE usage and scope
Year(s) Of Engagement Activity	2018
URL	https://bit.ly/2TnKXnL


Description	Participation in meeting on Plant genetic resources and SDGs: needs rights and opportunities
Form Of Engagement Activity	A formal working group, expert panel or dialogue
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Policymakers/politicians
Results and Impact	The sharing of biological data related to plant genetic resources, and ensuring that the benefits from this sharing are equitably distributed throughout the world, are a matter of important societal concern. A meeting of interested parties was convened to advise the DivSeek organisation, which had been asked to prepare a position paper for the secretariat of the International Treaty on Plant Genetic Resources on behalf of a number of organisations involved in the generation, management and usage of such data. Publications aimed at other audiences are also expected to result from this meeting.
Year(s) Of Engagement Activity	2016
URL	http://www.divseek.org/news/


Description	Presentation at the Conference "The Future of Science: The Digital Revolution: What is changing for humankind"
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Undergraduate students
Results and Impact	A presentation at a conference attended mostly by undergraduate and high-school students, focused on far-reaching changes in scientific practice.
Year(s) Of Engagement Activity	2016
URL	http://www.futureofscience.org/press/first-world-conference-on-the-future-of-science-science-and-soc...