COpenPlantOmics (COPO): a Collaborative Bioinformatics Plant Science Platform

Lead Research Organisation: University of York
Department Name: Biology


We live in a digital age where we increasingly rely on interconnected resources in our daily lives. Biological science, due to the very nature of the complexity of worldwide research avenues, is typically fragmented. Even though scientific information is published in peer-reviewed articles, it is often badly described and, until very recently, often unavailable to the general public because of journal licensing issues and expensive subscription costs.

The field of bioinformatics (the analysis and management of biological data using computational methods) produces many freely available tools for data analysis and exposure that are incredibly useful to researchers. However, these tools often do not interoperate well, meaning that great effort is spent attempting to convert or tweak datasets to fit with other tools that further bioinformatics processes, hindering timely accurate reusable research. Couple this with the lack of descriptive information noted earlier, and knowledge that can be vital to one researcher, team or community can become at least unreproducible (thus letting others confirm findings) at worst unusable.

Life scientists are people focused on investigating biological processes. This requires a lot of time, effort and fastidiousness in experimental observation, data collection and analysis. Typically for life scientists, more time is spent on the former: defining and publishing experimental methods and results. The latter, i.e. the data behind these results, is usually badly defined and largely unpublished. For computer scientists, the story is reversed - the focus is on getting to the data. This platform will bridge the gap between these two groups by providing tools and training to both life and computer scientists in the plant bioscience field, in order to help them get their data into the right formats and described uniformly for open research.

To do this, the management, interoperability and curation of scientific datasets is key. Researchers need clear guidance and help to:

- Manage their data in a concise relevant way that allows immediate reuse by others: Generating data is only one part of the picture. To back up scientific findings, data needs to be made available to others to allow the same degree of rigour and peer review that is enforced for published material. This is not an easy task because the tools and resources required to describe data well and to make data available are typically designed for the computer scientist.
- Let them analyse their data easily: Large software development projects like Galaxy provide access to complex analytical tools - we are not aiming to reinvent the wheel in this regard. We aim to engage and collaborate with these existing providers to develop and exploit interfaces to these specialised software projects, so to let descriptive tools and analytical tools communicate efficiently.

This project will address these issues directly, providing tools for storing, annotating and sharing valuable information as well as promoting clear guidance, training. Overall this promises to be a major boost to UK plant sciences research.

This project aims to promote and build links between scientific knowledge and the tools used to generate that knowledge, addressing the lack of descriptive information about underlying data. By doing so, we will provide a platform comprising both existing tools and novel interoperability processes, allowing researchers easy access to methods of describing their work, feeding directly into analytical software, thus promoting clear and robust best practices in science.

Open science is vital to the future generation of researcher, especially to realise the goals of transparent knowledge sharing. This project will remove the barriers that restrict researchers in making their findings freely available to everyone in a consolidated seamless easy-to-use fashion.

Technical Summary

Accessibility to biological data has been hindered by lack of standards, lack of awareness of the benefits and pathways to releasing data that is described by those standards, and lack of services whereby data can be analysed, published and retrieved easily. Recently, there has been a large commitment by the BBSRC to push for open access data and publishing to further bioscience research in the UK. However, barriers still exist that prevent scientists from openly depositing their data and metadata, which comprise a lack of interoperability between metadata annotation services, data repositories, data analysis platforms and data publishing platforms. As such, plant scientists might not: be aware that the services exist; have the expertise to use them; see the value in properly describing their data.
This project aims to build COPO, the software infrastructure required to reach the level of interoperability that plant researchers need to describe their data using community-recognised ontologies, seamless bi-directional data flow to relevant repositories, and then publish these data for open access. COPO will manage the hardware infrastructure at TGAC to deliver a consistent robust staging area and database that will support unique accessioned artefacts representing the corpus of data and metadata a user wants to expose. The resulting marked-up datasets processed and published using COPO will allow greater potential integrative analysis using existing tools such as iPlant and Galaxy.
New Application Programming Interfaces (APIs) will interconnect existing tools and services, and by developing new RESTful user interfaces that wrap up these APIs, COPO will be a single point-of-entry for plant researchers to disseminate their data all the way from generation to publication. By federating the TGAC iRODS data grid system with others, e.g. Texas Advanced Computing Center's iPlant installation, access to worldwide analytical infrastructure and data will be facilitated.

Planned Impact

Academic, Economic and Commercial Impacts
With the renewed interest and push from all areas of bioscience to promote publicly available research, the COPO project will be a pioneering national and international effort to facilitate sharing of all aspects of plant research to the public. COPO aims to be the vehicle to bring together the tools required to harmonise open plant omics research. This sector has obvious ties with industry. Public domain omics-based bioscience is relevant and important input into industry internal research and discovery activities. To make such bioscience data truly reusable and ensure scientific robustness, it must be uniformly annotated, allowing not only integration through equivalence of terminology but also by increasing efficiency in data production and re-use, and allowing correct interpretation by means of the context provided by their metadata. A collaborative platform for frictionless bioinformatics built with and for the academic and industrial community is long overdue. Alongside data processing, industry also works on finding solutions for integration and management of large 'omics data sets, e.g. efforts like the Pistoia Alliance. Together with COPO industry partners (Eagle Genomics) we will develop use-cases for the platform in industry, propose acceptance criteria required for commercial use, supply technical advice/support on meeting acceptance criteria, evaluate the platform on 3rd party infrastructure, and maximise knowledge exchange and commercialisation.

COPO and the standards community
Expertise and knowledge gained throughout the lifetime of the project and beyond will be disseminated through a variety of channels. The presence of a direct link with the plant science community (through GARNet, UK Plant Sciences Federation (UKPSF)) is key to the success and adoption of the platform and associated standards. The project will have a continuous dialogue, through face-to-face events as well as online tools and social media, between those working on the platform and the plant bioscience community. The several letters of support show a clear interest in working together, using and adopting a platform that implicitly confers standards compliance. COPO will provide a solution to overcome the challenges in standards fragmentation by (i) fostering development, acceptance and implementation of reporting standards that are immediately suitable for plant research, and (ii) limiting the range and variability of standards. This will have a direct impact on the development and maintenance costs for commercial and academic software developers of standards-compliant products.

Societal impacts
Historically there has been reluctancy to adopt some of the standards and open-data principles in the plant bioscience community, especially in the field of food sustainability and security, so openness and transparency in these areas are vital to continue improving the public perception. The presentation of the research data will play a key role in opening the dialogue with the general public and will contribute to the development of stronger links with sectors in society (such as school teachers) that are less familiar with the scientific activities in plant research and the beneficial impact this has in their lives. It is widely recognised that the shortage of expertise and skill in biomathematics and informatics across the world is a major risks for a future development of key areas in life sciences. The objectives of this proposal will help to attract talented staff to work with the COPO partners, and offer alternative career paths.


Description Co-organised plant user workshops to test the COPO platform with data provided by UK plant researchers (via GARNet and UKPSF) working with genomics, phenomic, proteomic and metabolic datasets. These workshops helped to validate the COPO platform, highlight bugs, provide training in the platform and generate additional user requirements and suggestions for the the extension and use of the COPO platform - Contributing to Objectives 1, 2, and 4.

Promotion of the COPO platform within the international DivSeek community that is focused on increasing the access, re-use and annotation of data associated with plant germplam. Activities included inclusion and presentation of COPO at DivSeek events (such as those held at PAG), involvement of COPO in DivSeek working groups. These activities helped to link COPO with relevant international data standards and annotation projects, involvement with relevant DivSeek Initiative working groups on data compatibility and generated opportunities to train people in the use of the platform. Contributing to Objectives 1,2 and 4

Active collaboration with the Research Data Alliance (RDA) specifically the Agriculture Data Group. Activities include attendance at RDA meetings and conference to promote awareness of COPO, to understand the wider 'environment' in which COPO sits and help to ensure it interacts with and is aware of appropriate initiatives with in the RDA activities. Contributing to Objectives 1,2 and 4

Facilitated interaction with the plant researchers and data managers at the international CGIAR Centres. Activities include presentations and attendance at workshops and events such as PhenoharmonIS and PAG. These activities helped to widen the test user group beyond the UK, created opportunities to train people in the use of the platform and provided data sets beyond the initial scope of COPO project generating ideas for future development and collaboration. Contributing to Objectives 1,2 and 4

Taken together activities undertaken by York/Warwick
provide users and datasets to test and validate the COPO platform
increased awareness of the COPO project at the national and international level
brought international activities involved in data management and annotation to the attention of the COPO development team
aligned the COPO platform with relevant international efforts in meta data, data standards and data management
helped to ensure that data that submitted, annotated and described via COPO is compatible and useful to wide range of users and accessible and reusable in the future
Exploitation Route Help make the COPO platform appropriate for plant scientists and increase awareness of the platform
Agriculture, Food and Drink