COpenPlantOmics (COPO): a Collaborative Bioinformatics Plant Science Platform

Lead Research Organisation: Earlham Institute
Department Name: Directorate Office

Abstract

We live in a digital age where we increasingly rely on interconnected resources in our daily lives. Biological science, due to the very nature of the complexity of worldwide research avenues, is typically fragmented. Even though scientific information is published in peer-reviewed articles, it is often badly described and, until very recently, often unavailable to the general public because of journal licensing issues and expensive subscription costs.

The field of bioinformatics (the analysis and management of biological data using computational methods) produces many freely available tools for data analysis and exposure that are incredibly useful to researchers. However, these tools often do not interoperate well, meaning that great effort is spent attempting to convert or tweak datasets to fit with other tools that further bioinformatics processes, hindering timely accurate reusable research. Couple this with the lack of descriptive information noted earlier, and knowledge that can be vital to one researcher, team or community can become at least unreproducible (thus letting others confirm findings) at worst unusable.

Life scientists are people focused on investigating biological processes. This requires a lot of time, effort and fastidiousness in experimental observation, data collection and analysis. Typically for life scientists, more time is spent on the former: defining and publishing experimental methods and results. The latter, i.e. the data behind these results, is usually badly defined and largely unpublished. For computer scientists, the story is reversed - the focus is on getting to the data. This platform will bridge the gap between these two groups by providing tools and training to both life and computer scientists in the plant bioscience field, in order to help them get their data into the right formats and described uniformly for open research.

To do this, the management, interoperability and curation of scientific datasets is key. Researchers need clear guidance and help to:

- Manage their data in a concise relevant way that allows immediate reuse by others: Generating data is only one part of the picture. To back up scientific findings, data needs to be made available to others to allow the same degree of rigour and peer review that is enforced for published material. This is not an easy task because the tools and resources required to describe data well and to make data available are typically designed for the computer scientist.
- Let them analyse their data easily: Large software development projects like Galaxy provide access to complex analytical tools - we are not aiming to reinvent the wheel in this regard. We aim to engage and collaborate with these existing providers to develop and exploit interfaces to these specialised software projects, so to let descriptive tools and analytical tools communicate efficiently.

This project will address these issues directly, providing tools for storing, annotating and sharing valuable information as well as promoting clear guidance, training. Overall this promises to be a major boost to UK plant sciences research.

This project aims to promote and build links between scientific knowledge and the tools used to generate that knowledge, addressing the lack of descriptive information about underlying data. By doing so, we will provide a platform comprising both existing tools and novel interoperability processes, allowing researchers easy access to methods of describing their work, feeding directly into analytical software, thus promoting clear and robust best practices in science.

Open science is vital to the future generation of researcher, especially to realise the goals of transparent knowledge sharing. This project will remove the barriers that restrict researchers in making their findings freely available to everyone in a consolidated seamless easy-to-use fashion.

Technical Summary

Accessibility to biological data has been hindered by lack of standards, lack of awareness of the benefits and pathways to releasing data that is described by those standards, and lack of services whereby data can be analysed, published and retrieved easily. Recently, there has been a large commitment by the BBSRC to push for open access data and publishing to further bioscience research in the UK. However, barriers still exist that prevent scientists from openly depositing their data and metadata, which comprise a lack of interoperability between metadata annotation services, data repositories, data analysis platforms and data publishing platforms. As such, plant scientists might not: be aware that the services exist; have the expertise to use them; see the value in properly describing their data.
This project aims to build COPO, the software infrastructure required to reach the level of interoperability that plant researchers need to describe their data using community-recognised ontologies, seamless bi-directional data flow to relevant repositories, and then publish these data for open access. COPO will manage the hardware infrastructure at TGAC to deliver a consistent robust staging area and database that will support unique accessioned artefacts representing the corpus of data and metadata a user wants to expose. The resulting marked-up datasets processed and published using COPO will allow greater potential integrative analysis using existing tools such as iPlant and Galaxy.
New Application Programming Interfaces (APIs) will interconnect existing tools and services, and by developing new RESTful user interfaces that wrap up these APIs, COPO will be a single point-of-entry for plant researchers to disseminate their data all the way from generation to publication. By federating the TGAC iRODS data grid system with others, e.g. Texas Advanced Computing Center's iPlant installation, access to worldwide analytical infrastructure and data will be facilitated.

Planned Impact

Academic, Economic and Commercial Impacts
With the renewed interest and push from all areas of bioscience to promote publicly available research, the COPO project will be a pioneering national and international effort to facilitate sharing of all aspects of plant research to the public. COPO aims to be the vehicle to bring together the tools required to harmonise open plant omics research. This sector has obvious ties with industry. Public domain omics-based bioscience is relevant and important input into industry internal research and discovery activities. To make such bioscience data truly reusable and ensure scientific robustness, it must be uniformly annotated, allowing not only integration through equivalence of terminology but also by increasing efficiency in data production and re-use, and allowing correct interpretation by means of the context provided by their metadata. A collaborative platform for frictionless bioinformatics built with and for the academic and industrial community is long overdue. Alongside data processing, industry also works on finding solutions for integration and management of large 'omics data sets, e.g. efforts like the Pistoia Alliance. Together with COPO industry partners (Eagle Genomics) we will develop use-cases for the platform in industry, propose acceptance criteria required for commercial use, supply technical advice/support on meeting acceptance criteria, evaluate the platform on 3rd party infrastructure, and maximise knowledge exchange and commercialisation.

COPO and the standards community
Expertise and knowledge gained throughout the lifetime of the project and beyond will be disseminated through a variety of channels. The presence of a direct link with the plant science community (through GARNet, UK Plant Sciences Federation (UKPSF)) is key to the success and adoption of the platform and associated standards. The project will have a continuous dialogue, through face-to-face events as well as online tools and social media, between those working on the platform and the plant bioscience community. The several letters of support show a clear interest in working together, using and adopting a platform that implicitly confers standards compliance. COPO will provide a solution to overcome the challenges in standards fragmentation by (i) fostering development, acceptance and implementation of reporting standards that are immediately suitable for plant research, and (ii) limiting the range and variability of standards. This will have a direct impact on the development and maintenance costs for commercial and academic software developers of standards-compliant products.

Societal impacts
Historically there has been reluctancy to adopt some of the standards and open-data principles in the plant bioscience community, especially in the field of food sustainability and security, so openness and transparency in these areas are vital to continue improving the public perception. The presentation of the research data will play a key role in opening the dialogue with the general public and will contribute to the development of stronger links with sectors in society (such as school teachers) that are less familiar with the scientific activities in plant research and the beneficial impact this has in their lives. It is widely recognised that the shortage of expertise and skill in biomathematics and informatics across the world is a major risks for a future development of key areas in life sciences. The objectives of this proposal will help to attract talented staff to work with the COPO partners, and offer alternative career paths.
 
Description The major impact this year has been the inclusion of COPO into the Darwin Tree of Life project, where COPO will act as the main sample collection metadata deposition system for the whole project. This is a considerable achievement that will result in a wider user base and impact for downstream reuse of DToL genomes.

COPO has been used by the EI CSP to submit data from the strategic programme into core public data repositories such as the EMBL-EBI ENA and the EI CKAN installation.

We have continued the collaboration with the CGIAR centres, working through the first CGCore implementation phase into a user testing phase. This resulted in a new COPO user interface and submission system into the three CG data repository types, and a group meeting in Rome 2019 to go through potential next steps.

Development of new features following focus group meetings and user interactions at major conferences, including PAG and the Big Data conference in Hyderabad.

We have finalised the COPO deployment to a Docker Swarm based virtual deployment hosted on the CyVerse UK infrastructure. Using a containerisation approach like Docker makes COPO transferable between different hardware configurations allowing for dynamic scaling of the installation. It also provides resilience allowing for COPO to be re-installed on server hardware in a matter of minutes. New installations can be setup by populating a configuration file and running simple terminal commands. This is a potential route of new installation for projects such as the Darwin Tree of Life, as appropriate.
Exploitation Route We are currently working with the UK Darwin Tree of Life project to design and implement the sample collection schemas within COPO so it can act as a data submission tool for the project sample collectors.

COPO is fully open source, and we have received interest in the project from companies as well as research organisations to respond to their data management needs. We are continuing to work with the Designing Future Wheat partners to use COPO as the data brokering platform for DFW data. We successfully undertook a funded collaboration with the CIMMYT Big Data platform to develop COPO to be the main data brokering platform for 15 CGIAR centres globally.

Harmonised metadata attribution is becoming a pressing need in the life sciences to ensure that data adheres to the FAIR principles, allowing researchers better access to a wider array of data.
Sectors Agriculture, Food and Drink,Digital/Communication/Information Technologies (including Software),Environment,Pharmaceuticals and Medical Biotechnology

URL https://copo-project.org
 
Description We presented the latest COPO developments at PAG 2020, as well as at the 14th RDA plenary meeting in Helsinki, and the CGIAR Big Data conference in Hyderabad where we also ran a COPO users workshop. We have given 3 webinars to RDA delegates, and CGIAR data managers. We have written a journal article on COPO which is currently in preprint (10.1101/782771). We have had one workshopping session with CGIAR data managers at Bioversity in Rome and have been involved in metadata discussions with members of the new ELIXIR Biodiversity working group that was initiated in Milan this year. We have been developing sample description modules in COPO for the Darwin Tree of Life project with the aim being to collect all sample data for that project in preparation for submitting this metadata s Biosamples to which the sequencing data can be linked and made publicly available.
Sector Agriculture, Food and Drink,Digital/Communication/Information Technologies (including Software),Environment,Pharmaceuticals and Medical Biotechnology
Impact Types Economic

 
Description Interview with Environment Adviser from the UK Parliamentary Office of Science and Technology
Geographic Reach National 
Policy Influence Type Implementation circular/rapid advice/letter to e.g. Ministry of Health
Impact Contacted by UK Parliament to contribute to a POSTnote (short document to advise ministers on a given topic) on genebanks and Digital Sequence Information as a result of my recent election to the DivSeek Board of Directors. I was interviewed to provide information around current international policies on DSI and how future UK involvement might be shaped around open licencing/MTAs of DSI datasets.
URL https://www.parliament.uk/postnotes
 
Description UKRI Data Infrastructure Roadmap
Geographic Reach National 
Policy Influence Type Participation in a national consultation
 
Description FAIRplus
Amount £3,996,150 (GBP)
Funding ID 802750 
Organisation European Commission 
Department Innovative Medicines Initiative (IMI)
Sector Public
Country Belgium
Start 01/2019 
End 01/2022
 
Description ISA-InterMine: accelerating and rewarding data sharing
Amount £1,174,660 (GBP)
Funding ID 208381/A/17/Z 
Organisation Wellcome Trust 
Sector Charity/Non Profit
Country United Kingdom
Start 08/2018 
End 07/2021
 
Title CGCore v2 Improvements 
Description As part of the collaboration between the EI COPO project and the CGIAR Big Data Platform, we worked with CGIAR and Crop Ontology developers to improve the CG Core v2 schema for describing CGIAR digital outputs. 
Type Of Material Improvements to research infrastructure 
Year Produced 2018 
Provided To Others? Yes  
Impact Globally, this work will affect all CGIAR Data Managers and users of the COPO platform to deposit data into CG Centre repositories. 
URL https://github.com/collaborative-open-plant-omics/cgcore_schema
 
Title Continued improvements to the ISA toolkit 
Description Started in 2003 and first released in 2007, the ISA tools have been developed over time by the Oxford team and collaborators or directly contributed by partnering contributors, via the ISA Commons collaborative community. Short description of the developments and achievements of the resource over the last year: • Awarded Wellcome Trust funds (2018-2021), as collaborative project with the University of Cambridge's InterMine team to link the two resources and reward researchers for annotating and publish FAIR data; also, ISA is embedded in two ELIXIR Implementation Studies, on a Plant-focused data validation and on metabolomics. • With the uptake of ISA-Galaxy tools (https://github.com/ISA-tools/isatools-galaxy) and integration with the Galaxy Framework, ISA has struck a major milestone by showcasing how prospective data management can be done, demonstrating a full deposition workflow to Metabolights and creating training material (10.7490/f1000research.1115757.1). • Jupyter notebooks (https://github.com/ISA-tools/dtp-isa-exercises) have been developed as teaching material to showcase the use of ISA-API in various context to undergraduate and postgraduate courses on data readiness. 
Type Of Material Improvements to research infrastructure 
Year Produced 2018 
Provided To Others? Yes  
Impact Community use and impact is tracked via the ISA Commons, which currently has over 40 international groups, projects, and organizations that use and contribute to the development of components of the ISA metadata tracking framework. Therefore, we can say that the ISA user base ranges from hundreds to thousands of researchers from increasingly diverse domains (ranging from -omics, cell-based research, biomedical nanotechnology, plant phenotyping, toxicology, biodiversity, metagenomics, stem cell research, system biology, neuroscience, microbial science and immunology), and goes beyond researchers, curators, others resource developers and service providers, to also include journals. For example, ISA is used by the University of Oxford' GigaScience and underpins Springer Nature's Scientific Data data journal, supporting intelligent data sharing and credit; ISA is used to describe the experiment and to provide browse and search functionality for Scientific Data's content (http://scientificdata.isa-explorer.org). The ISA framework is currently embedded in a number of UK, EC and NIH and pharma funded infrastructure and research projects; here are exemplars from the ELIXIR UK Node and other Nodes: o EMBL-EBI MetaboLights' new web-based submission relies on ISA-JSON format to build web component and on the ISA-API to validate, convert experiments represented in ISA objects. o BBSRC-funded COPO infrastructure relies on the ISA API, ISA-JSON serialization and on the ISA configurations to support plant-based experiment molecular profiling experiments; it also used the ISAconverter to deposit to the ENA database. o ELIXIR-UK Node partners, University of Birmingham and Imperial College London use ISA Galaxy Tools, ISA-API and ISA validator - as part of their work in the UK Phenome Centre - to collect data prospectively but also organise public deposition to repositories. o ELIXIR Plant Community's MIAPPE standards and BrAPI rely on availability of ISA parsers and validation tools in the context of data validation programs. 
URL http://isa-tools.org
 
Title Continued improvements to the ISA toolkit and the new Datascriptor component 
Description Started in 2003 and first released in 2007, the ISA tools (http://isa-tools.org) have been developed over time by the Oxford team and collaborators or directly contributed by partnering contributors, via the ISA Commons collaborative community (https://www.isacommons.org). Key work over the last year is the development of a new component, the Datascriptor: https://datascriptor.org, as part of the Wellcome Trust award (2018-2021), a collaborative project with the University of Cambridge's InterMine team. Leveraging our experience and links with the communities, we are designing an open-source web-based tool - part of an ecosystem of existing annotation and authoring systems - to help researchers to use community standards to describe their (meta)data at the source, and capitalize on their effort to accelerate the creation of a data article. In addition major advances have been made to the ISA API also working with the ELIXIR Plant and Metabolomics communities. 
Type Of Material Improvements to research infrastructure 
Year Produced 2019 
Provided To Others? Yes  
Impact Community use and impact is tracked via the ISA Commons, which currently has over 40 international groups, projects, and organizations that use and contribute to the development of components of the ISA metadata tracking framework. Therefore, we can say that the ISA user base ranges from hundreds to thousands of researchers from increasingly diverse domains (ranging from -omics, cell-based research, biomedical nanotechnology, plant phenotyping, toxicology, biodiversity, metagenomics, stem cell research, system biology, neuroscience, microbial science and immunology), and goes beyond researchers, curators, others resource developers and service providers, to also include journals. For example, ISA is used by the University of Oxford' GigaScience and underpins Springer Nature's Scientific Data data journal, supporting intelligent data sharing and credit; ISA is used to describe the experiment and to provide browse and search functionality for Scientific Data's content (http://scientificdata.isa-explorer.org). The ISA framework is currently embedded in a number of UK, EC and NIH and pharma funded infrastructure and research projects; here are exemplars from the ELIXIR UK Node and other Nodes: (i) EMBL-EBI MetaboLights' new web-based submission relies on ISA-JSON format to build web component and on the ISA-API to validate, convert experiments represented in ISA objects. (ii) BBSRC-funded COPO infrastructure relies on the ISA API, ISA-JSON serialization and on the ISA configurations to support plant-based experiment molecular profiling experiments; it also used the ISAconverter to deposit to the ENA database. (iii) ELIXIR-UK Node partners, University of Birmingham and Imperial College London use ISA Galaxy Tools, ISA-API and ISA validator - as part of their work in the UK Phenome Centre - to collect data prospectively but also organise public deposition to repositories. (iv) ELIXIR Plant Community's MIAPPE standards and BrAPI rely on availability of ISA parsers and validation tools in the context of data validation programs. 
URL https://datascriptor.org
 
Title Improvements to the COPO system 
Description COPO is a computational system that attempts to address the challenges of making data FAIR by enabling scientists to describe their research objects (raw or processed data, publications, samples, images, etc.) using community-sanctioned metadata sets and vocabularies, and then use public or institutional repositories to share it with the wider scientific community. COPO encourages data generators to adhere to appropriate metadata standards when publishing research objects, using semantic terms to add meaning to them and specify relationships between them. This allows data consumers, be they people or machines, to find, aggregate, and analyse data which would otherwise be private or invisible. Building upon existing standards to push the state of the art in scientific data dissemination whilst minimising the burden of data publication and sharing. 
Type Of Material Improvements to research infrastructure 
Year Produced 2020 
Provided To Others? Yes  
Impact Improvements to the COPO user interfaces and underlying code which have resulted in more data being submitted to public repositories through the system. The CGIAR CGCore v2 implementation is complete and undergoing testing to document and provide improvements. The Darwin Tree of Life project has chosen to use COPO as its main sample metadata submission route. 
URL https://github.com/collaborative-open-plant-omics/COPO
 
Title MIAPPE specification and tools 
Description Minimum Information About a Plant Phenotyping Experiment is an open, community driven project to harmonize data from plant phenotyping experiments. MIAPPE specification comprises both a conceptual checklist of metadata required to adequately describe a plant phenotyping experiment, and software to validate, store and disseminate MIAPPE-compliant data. 
Type Of Material Improvements to research infrastructure 
Year Produced 2017 
Provided To Others? Yes  
Impact MIAPPE is a logical standard - but there are specific implementations of tools designed to support its use and application, for example, in the ISA-tools framework. We are working with the developers of the Plant Breeding API (BRAPI) to ensure the compliance of BRAPI with the MIAPPE standard, and to coordinate future developments. 
URL http://www.miappe.org/
 
Title COPO Linked Data database 
Description The COPO docoument-based database runs on the MongoDB engine, providing a flexible and semantically-aware storage mechanism for recording and disseminating linked research objects. 
Type Of Material Database/Collection of data 
Year Produced 2016 
Provided To Others? No  
Impact The database powers the COPO user interface, and the underlying source code and business logic layers are available to the public via GitHub. Once the COPO project is live in production, we will be releasing staged dumps of this database at regular intervals for anyone to use the information contained within. 
URL https://github.com/collaborative-open-plant-omics/COPO/tree/master/web/src/dal
 
Title The Earlham Institute CKAN Digital Repository 
Description The CKAN digital repository has been set up as part of WP3 of Earlham Institute's CSP to hold all EI strategic publications alongside any supplementary datasets and information. This gives the public and researchers immediate access to EI's BBSRC funded research through open access routes where available. 
Type Of Material Database/Collection of data 
Year Produced 2019 
Provided To Others? Yes  
Impact We have buit scripts to find and make available open access versions of all EII published research, either as preprints or as journal articles. We also supply any supplementary information as appropriate to aid information dissemination. The EI CKAN runs within Earlham Institute's CyVerse UK National Capability. 
URL https://ckan.earlham.ac.uk
 
Description CyVerse US/UK Partnership 
Organisation CyVerse
Country United States 
Sector Private 
PI Contribution The COPO project team brings the expertise on working with disparate metadata standards, and also provides the platform (through COPO web application) for interoperable metadata interactions with CyVerse. This promotes the use of CyVerse not only to researchers at EI, but to a broader community of plant scientists internationally.
Collaborator Contribution CyVerse UK provides and supports the necessary infrastructure relied upon by COPO. It is also positioned, in the near future, as one of the analysis platforms supported by COPO. The continued collaboration with CyVerse will keep promoting these objectives, by providing improved applications and better workflows to support researchers relying on both systems.
Impact Participated in a hackathon to discuss ideas for improved metadata interoperability between COPO and CyVerse platforms. This led to outcomes including: developing services in COPO for exposing data objects stored in CyVerse; advancing discussions and potential ideas for large file transfer and manipulation via CyVerse.
Start Year 2016
 
Description DivSeek Partnership 
Organisation DivSeek International
Sector Learned Society 
PI Contribution I bring infrastructure expertise to this partnership, influencing and impacting policy to provide computational and training capacity to other DivSeek partners. I promote the range of infrastructure projects that are developed in my group at EI, but also solutions developed at other centres that can contribute to the DivSeek consortium. Partners are exposed to EI projects such as COPO, Grassroots (Wheat Information System, CerealsDB, marker design), CyVerse UK and Galaxy, through working group communications and meetings at international conferences such as PAG and RDA. I lead the Data Standards for Interoperable Tools working group, and we aim to collate community-suggested standards and tools, and advise the partnership and their stakeholders in best practice for delivery of sustainable and interoperable infrastructure.
Collaborator Contribution The DivSeek consortium contributes expertise and knowledge exchange in advances in crop diversity, improving our networking and understanding of challenges and potential solutions to social, structural, and biological problems. With over 66 global partners including EI, this is a powerful and highly respected group of research institutes that are working together to enable a step change in efficiency of interactions, leading to improved crop diversity research and data sharing.
Impact EI is a founding partner of DivSeek, and Dr Davey leads one of the new working groups, "Data Standards for Interoperable Tools" (http://www.divseek.org/standards/)
Start Year 2015
 
Description ELIXIR Biodiversity Working Group 
Organisation ELIXIR
Country United Kingdom 
Sector Charity/Non Profit 
PI Contribution Drs Davey and Shaw attended the first ELIXIR Biodiversity working group meeting in Milan 2020. Davey gave a talk on UK efforts to track biodiversity data, for example with the COPO platform.
Collaborator Contribution ELIXIR initiated this working group and invited member ELIXIR nodes to attend.
Impact Main outcome is building the community with a view to submitting an implementation study around biodiversity data.
Start Year 2020
 
Description ELIXIR Interoperability Platform and ISA 
Organisation ELIXIR
Country United Kingdom 
Sector Charity/Non Profit 
PI Contribution ISA is part of the ELIXIR Recommended Interoperability Resources (RIRs) to facilitate interoperability and reusability of life science data and support the principles of FAIR data management.
Collaborator Contribution The ELIXIR Recommended Interoperability Resources have been selected by external panel of reviewers, based on the selection criteria published in the Call for RIR application, which measure how they facilitate scientific research and how they improve FAIRness of life science data.
Impact ISA is and will continue to be used by and further developed with ELIXIR communities, especially with Plant and Metabolomics use cases.
Start Year 2018
 
Description ELIXIR Metabolomics Community 
Organisation ELIXIR
Department ELIXIR UK
Country United Kingdom 
Sector Charity/Non Profit 
PI Contribution My team has contributed ISA-related work to the ELIXIR Metabolomics use case, activities and reports.
Collaborator Contribution We have gained more visibility for the ISA work and now ISA-Tab is a formal format used by the Galaxy analysis toolkit for metabolomics applications.
Impact The ISA framework as the basis for the metadata standards used by this ELIXIR Metabolomics Community and the tools are embedded in the EBI MetaboLights databases, as well as in other international metabolomics resources.
Start Year 2017
 
Description ELIXIR Metabolomics Community 
Organisation ELIXIR
Country United Kingdom 
Sector Charity/Non Profit 
PI Contribution My team has contributed ISA-related work to the ELIXIR Metabolomics use case, activities and reports.
Collaborator Contribution We have gained more visibility for the ISA work and now ISA-Tab is a formal format used by the Galaxy analysis toolkit for metabolomics applications.
Impact The ISA framework as the basis for the metadata standards used by this ELIXIR Metabolomics Community and the tools are embedded in the EBI MetaboLights databases, as well as in other international metabolomics resources.
Start Year 2017
 
Description ELIXIR Plant Use Case 
Organisation ELIXIR
Department ELIXIR UK
Country United Kingdom 
Sector Charity/Non Profit 
PI Contribution My team has contributed ISA-related work to the ELIXIR Plant Science use case, work and report.
Collaborator Contribution We have gained more visibility for the ISA work and COPO activities.
Impact ISA is used by the BRAPI and there is an ISA implementation of the MIAPPE specification.
Start Year 2016
 
Description ELIXIR Plant Use Case 
Organisation ELIXIR
Country United Kingdom 
Sector Charity/Non Profit 
PI Contribution My team has contributed ISA-related work to the ELIXIR Plant Science use case, work and report.
Collaborator Contribution We have gained more visibility for the ISA work and COPO activities.
Impact ISA is used by the BRAPI and there is an ISA implementation of the MIAPPE specification.
Start Year 2016
 
Description ELIXIR UK Node 
Organisation Earlham Institute
Country United Kingdom 
Sector Academic/University 
PI Contribution Help create the ELIXIR UK Node
Collaborator Contribution Contribute to the creation of the ELIXIR UK Node
Impact Creation of a virtual entity that represents UK strengths in bioinformatics and provides a route for UK bioinformatics resources to participate in, and benefit from, ELIXIR. The Node is currently being formalized.
Start Year 2012
 
Description ELIXIR UK Node 
Organisation Heriot-Watt University
Country United Kingdom 
Sector Academic/University 
PI Contribution Help create the ELIXIR UK Node
Collaborator Contribution Contribute to the creation of the ELIXIR UK Node
Impact Creation of a virtual entity that represents UK strengths in bioinformatics and provides a route for UK bioinformatics resources to participate in, and benefit from, ELIXIR. The Node is currently being formalized.
Start Year 2012
 
Description ELIXIR UK Node 
Organisation Imperial College London
Country United Kingdom 
Sector Academic/University 
PI Contribution Help create the ELIXIR UK Node
Collaborator Contribution Contribute to the creation of the ELIXIR UK Node
Impact Creation of a virtual entity that represents UK strengths in bioinformatics and provides a route for UK bioinformatics resources to participate in, and benefit from, ELIXIR. The Node is currently being formalized.
Start Year 2012
 
Description ELIXIR UK Node 
Organisation Newcastle University
Country United Kingdom 
Sector Academic/University 
PI Contribution Help create the ELIXIR UK Node
Collaborator Contribution Contribute to the creation of the ELIXIR UK Node
Impact Creation of a virtual entity that represents UK strengths in bioinformatics and provides a route for UK bioinformatics resources to participate in, and benefit from, ELIXIR. The Node is currently being formalized.
Start Year 2012
 
Description ELIXIR UK Node 
Organisation Rothamsted Research
Country United Kingdom 
Sector Academic/University 
PI Contribution Help create the ELIXIR UK Node
Collaborator Contribution Contribute to the creation of the ELIXIR UK Node
Impact Creation of a virtual entity that represents UK strengths in bioinformatics and provides a route for UK bioinformatics resources to participate in, and benefit from, ELIXIR. The Node is currently being formalized.
Start Year 2012
 
Description ELIXIR UK Node 
Organisation University College London
Country United Kingdom 
Sector Academic/University 
PI Contribution Help create the ELIXIR UK Node
Collaborator Contribution Contribute to the creation of the ELIXIR UK Node
Impact Creation of a virtual entity that represents UK strengths in bioinformatics and provides a route for UK bioinformatics resources to participate in, and benefit from, ELIXIR. The Node is currently being formalized.
Start Year 2012
 
Description ELIXIR UK Node 
Organisation University of Birmingham
Country United Kingdom 
Sector Academic/University 
PI Contribution Help create the ELIXIR UK Node
Collaborator Contribution Contribute to the creation of the ELIXIR UK Node
Impact Creation of a virtual entity that represents UK strengths in bioinformatics and provides a route for UK bioinformatics resources to participate in, and benefit from, ELIXIR. The Node is currently being formalized.
Start Year 2012
 
Description ELIXIR UK Node 
Organisation University of Cambridge
Country United Kingdom 
Sector Academic/University 
PI Contribution Help create the ELIXIR UK Node
Collaborator Contribution Contribute to the creation of the ELIXIR UK Node
Impact Creation of a virtual entity that represents UK strengths in bioinformatics and provides a route for UK bioinformatics resources to participate in, and benefit from, ELIXIR. The Node is currently being formalized.
Start Year 2012
 
Description ELIXIR UK Node 
Organisation University of Dundee
Country United Kingdom 
Sector Academic/University 
PI Contribution Help create the ELIXIR UK Node
Collaborator Contribution Contribute to the creation of the ELIXIR UK Node
Impact Creation of a virtual entity that represents UK strengths in bioinformatics and provides a route for UK bioinformatics resources to participate in, and benefit from, ELIXIR. The Node is currently being formalized.
Start Year 2012
 
Description ELIXIR UK Node 
Organisation University of Edinburgh
Department Edinburgh Genomics
Country United Kingdom 
Sector Academic/University 
PI Contribution Help create the ELIXIR UK Node
Collaborator Contribution Contribute to the creation of the ELIXIR UK Node
Impact Creation of a virtual entity that represents UK strengths in bioinformatics and provides a route for UK bioinformatics resources to participate in, and benefit from, ELIXIR. The Node is currently being formalized.
Start Year 2012
 
Description ELIXIR UK Node 
Organisation University of Edinburgh
Country United Kingdom 
Sector Academic/University 
PI Contribution Help create the ELIXIR UK Node
Collaborator Contribution Contribute to the creation of the ELIXIR UK Node
Impact Creation of a virtual entity that represents UK strengths in bioinformatics and provides a route for UK bioinformatics resources to participate in, and benefit from, ELIXIR. The Node is currently being formalized.
Start Year 2012
 
Description ELIXIR UK Node 
Organisation University of Liverpool
Country United Kingdom 
Sector Academic/University 
PI Contribution Help create the ELIXIR UK Node
Collaborator Contribution Contribute to the creation of the ELIXIR UK Node
Impact Creation of a virtual entity that represents UK strengths in bioinformatics and provides a route for UK bioinformatics resources to participate in, and benefit from, ELIXIR. The Node is currently being formalized.
Start Year 2012
 
Description ELIXIR UK Node 
Organisation University of Manchester
Country United Kingdom 
Sector Academic/University 
PI Contribution Help create the ELIXIR UK Node
Collaborator Contribution Contribute to the creation of the ELIXIR UK Node
Impact Creation of a virtual entity that represents UK strengths in bioinformatics and provides a route for UK bioinformatics resources to participate in, and benefit from, ELIXIR. The Node is currently being formalized.
Start Year 2012
 
Description ELIXIR UK Node 
Organisation University of Oxford
Country United Kingdom 
Sector Academic/University 
PI Contribution Help create the ELIXIR UK Node
Collaborator Contribution Contribute to the creation of the ELIXIR UK Node
Impact Creation of a virtual entity that represents UK strengths in bioinformatics and provides a route for UK bioinformatics resources to participate in, and benefit from, ELIXIR. The Node is currently being formalized.
Start Year 2012
 
Description HPE AI workshop 2019 
Organisation Hewlett Packard Enterprise (HPE)
Country United Kingdom 
Sector Private 
PI Contribution We worked with HPE staff to organise and host an AI Workshop at EI. We opened the course up for national delegates to attend and discover more about how AI and Machine Learning techniques can be applied to biological research data.
Collaborator Contribution HPE provided the trainers and staff to teach the materials.
Impact We will continue to work with HPE to supply our institutlonal HPE equipment. We have also put forward HPE as a potential partner in the upcoming DTP3 bid.
Start Year 2019
 
Description Hanna Cwiek - 2 month visit : MIAPPE and ISA 
Organisation Polish Academy of Sciences
Department Institute of Plant Genetics
Country Poland 
Sector Academic/University 
PI Contribution Members of my team, namely Philippe Rocca-Serra and Alejandra Gonzalez-Beltran has assisted Hanna in her ISA-related work.
Collaborator Contribution Dr Hanna Cwiek from the Poznan Institute of Genetic Research in Poland (in Pawel Krajewski's team) visited my team to work on ISA and MIAPPE, helping to refine ISA tools relevant to plant science and COPO activities.
Impact Possible paper on the work done
Start Year 2017
 
Description Integration of COPO and CGCore Schemas and Associated Repositories 
Organisation CGIAR
Country France 
Sector Charity/Non Profit 
PI Contribution We have developed a proof-of-concept platform to streamline metadata attribution and dataset deposition into CGIAR repositories using the BBSRC-funded COPO software. Drs Etuk and Shaw, two Research Software Engineers in the Davey group at Earlham Institute and the original core developers, have implemented various new features into COPO to allow CGIAR Data Managers to harmonise and streamline the submission of CG-relevant metadata and data into the CG digital data repositories. All software and infrastructure is hosted within the CyVerse UK cloud. We have: - Implemented support of CG Core v.2.0. (http://repo.mel.cgiar.org/handle/20.500.11766/4764) metadata annotation of various data types, including publications, produced at the CGIAR institutes via the existing COPO wizard system. - Implemented support of submissions of annotated objects to institutional instances of the following repositories: dSpace (https://www.duraspace.org/dspace/), CKAN (https://ckan.org/) and Dataverse (https://dataverse.org/). - Designed and implemented a mechanism within COPO which controls which users can submit to which repositories. - Implemented support the annotation of variables within data sets (i.e. column headings; experiment condition descriptors etc) with terms and URIs from ontologies or controlled vocabularies/trait dictionaries (AGROVOC and GACS).
Collaborator Contribution CGIAR have provided coordination contributions with key members in the CG Centres to gather feedback on developed elements, as well as provided funds to allow a core CGCore metadata schema developer to travel to EI and work with Drs Etuk and Shaw to improve the CGCore schema.
Impact This collaboration has seen rapid development of key functionality in the COPO platform to support CG centre Data Managers. This has required technical skills to develop the software, biocuration expertise provided by CGIAR to improve and refine the CGCore metadata schema, ontology expertise from the Bioversity team in Montpellier, and coordination expertise from Dr Davey (EI) and Medha Devare (CGIAR). Software and Technical Products (Webtool/Application - Collaborative Open Plant Omics (COPO) (2017)): All software code developed is open source and can be found within the COPO Github repository: https://github.com/collaborative-open-plant-omics/COPO
Start Year 2018
 
Description Martin Mueller - 1 week visit to EI: CG Core v.2.0 work 
Organisation CGIAR
Country France 
Sector Charity/Non Profit 
PI Contribution Drs Etuk and Shaw worked closely with Martin to refine and integrate the CG Core v.2.0 schema into COPO
Collaborator Contribution Martin Mueller worked as a contractor to CGIAR to produce the CG Core v.2.0 schema. He provided a good insight to the workflow of the new schema and how COPO could better support its implementation from the metadata attribution user interfaces to the submission workflow.
Impact Full support for the CG Core workflow now implemented and available in COPO.
Start Year 2018
 
Title CGCore Schema 
Description Schema is a template designed to collect all metadata relating to research outputs produced by the CGIAR institutes. Our group helped design the specification. 
Type Of Technology Software 
Year Produced 2019 
Open Source License? Yes  
Impact This schema is being deployed to the CGIAR centers imminently. Once done, it will form the basis of data collection for 15 research centers around the globe employing in excess of 8000 scientists. 
 
Title CGCore Wizard 
Description Based on the CGCore specification, the wizard is an implementation of the template in the COPO platform. It enables researchers to actually record the metadata relating to CGIAR research objects. 
Type Of Technology Webtool/Application 
Year Produced 2019 
Open Source License? Yes  
Impact Researchers from the CGIAR institutes will be able to record their metadata based on the CGCore schema. CGIAR encompasses 15 institutes worldwide employing over 8000 researchers. 
URL https://copo-project.org/copo
 
Title CKAN Workflow for data deposition 
Description In collaboration with the CGIAR centers, we developed a workflow for depositing heterogeneous data into the CKAN repository along with appropriate metadata extracted from CGCore metadata fragments. CKAN is an open source, mature federated solution for storing, sharing and disseminating data objects and metadata. 
Type Of Technology Webtool/Application 
Year Produced 2019 
Open Source License? Yes  
Impact This will allow COPO to be the main access point of metadata annotation and data deposition for the CGIAR institutes. This is a major conglomeration of research stations around the globe responsible for many agricultural advances in the developing world since the end of the second world war. From this work, we can expect to broker tens of thousands of documents, data sets or other research objects from CGIAR researchers. 
 
Title Collaborative Open Plant Omics (COPO) 
Description COPO streamlines the process of data deposition to public repositories by hiding much of the complexity of metadata capture and data management from the end-user. The ISA infrastructure (www.isa-tools.org) is leveraged to provide the interoperability between metadata formats required for seamless deposition to repositories. COPO facilitates the links to data analysis platforms such as CyVerse UK and Galaxy. Logical groupings of artefacts (e.g. PDFs, raw data, contextual supplementary information) relating to a body of work are stored in COPO collections and represented by common standards, which are publicly searchable. Bundles of multiple data objects themselves can then be deposited directly into public repositories through COPO interfaces. This improvement output represents the beta release of the COPO platform in 2017. 
Type Of Technology Webtool/Application 
Year Produced 2017 
Open Source License? Yes  
Impact COPO has been added to the ELIXIR-UK roadmap for ELIXIR core data services, and is currently being used by EI and JIC researchers to deposit real, large scale sequencing datasets into the European Nucleotide Archive. COPO is also being investigated as a potential data entry tool for the CGIAR Big Data project, and this will be explored in a joint EAGER submission with CIMMYT. COPO has also been selected to act as one of the data ingestion pipelines for data arising from the Designing Future Wheat programme, depositing open data into the Grassroots repository. COPO is also being included in grant submissions to assist vertebrate and wheat communities in effective metadata management. COPO runs within the CyVerse UK National Capability infrastructure. 
URL https://copo-project.org
 
Title Collaborative Open Plant Omics (COPO) software 
Description COPO streamlines the process of data deposition to public repositories by hiding much of the complexity of metadata capture and data management from the end-user. The ISA infrastructure (www.isa-tools.org) is leveraged to provide the interoperability between metadata formats required for seamless deposition to repositories and to facilitate links to data analysis platforms. Logical groupings of artefacts (e.g. PDFs, raw data, contextual supplementary information) relating to a body of work are stored in COPO collections and represented by common standards, which are publicly searchable. Bundles of multiple data objects themselves can then be deposited directly into public repositories through COPO interfaces. 
Type Of Technology Webtool/Application 
Year Produced 2015 
Impact The software is in an early stage, but functional for deposition of data to a small number of repositories. As such we are not yet ready for end-user testing. However, we are collaborating with Cyverse US (was iPlant Collaborative) to investigate the use of COPO as the brokering system for their Data Commons. 
URL https://github.com/collaborative-open-plant-omics/COPO
 
Title Creation of Deployment architecture 
Description The deployment architecture is based on Docker Swarm hosted on the CyVerse UK virtual infrastructure. This provides a robust and dynamic deployment system allowing load balancing between virtual servers, whilst providing convenience and security. 
Type Of Technology Webtool/Application 
Year Produced 2018 
Open Source License? Yes  
Impact COPO has over 99% uptime meaning that our users are provided with a reliable and accessible service around the globe and around the clock. 
URL https://copo-project.org/copo
 
Title Creation of MIAPPE wizard 
Description Schema implemented in wizard format in COPO to allow collection of plant phenotyping experimental metadata. 
Type Of Technology Webtool/Application 
Year Produced 2017 
Open Source License? Yes  
Impact Users can now record their metadata in this format. Phenotyping experiments are an important part of crop science, and to be able to collect full metadata in such a manner as this is very important. 
 
Title Creation of institutional repos architecture 
Description This piece of work allows for the deposition of items to institutional repository instances. This is important since researchers are increasingly looking to looking to smaller scale institutionally hosted instances of off the shelf repository solutions such as Dataverse, CKAN and DSPACE. 
Type Of Technology Webtool/Application 
Year Produced 2018 
Open Source License? Yes  
Impact Allows users to enter details of their institutional repo instances and have many users submit there. 
 
Title Creation of shared profile architecture 
Description This allows users to share profiles with other users of their choice to facilitate group editing of metadata. 
Type Of Technology Webtool/Application 
Year Produced 2018 
Open Source License? Yes  
Impact This was a much requested feature to allow collaborations between disparate research groups. 
 
Title DSPACE repository deposition workflow 
Description In collaboration with the CGIAR centers, we developed a workflow for depositing heterogeneous data into the DSpace repository along with appropriate metadata extracted from CGCore metadata fragments. DSpace is developed by DuraSpace and is widely used in academia to deposit and disseminate research objects. 
Type Of Technology Webtool/Application 
Year Produced 2019 
Open Source License? Yes  
Impact This will allow COPO to be the main access point of metadata annotation and data deposition for the CGIAR institutes. This is a major conglomeration of research stations around the globe responsible for many agricultural advances in the developing world since the end of the second world war. From this work, we can expect to broker tens of thousands of documents, data sets or other research objects from CGIAR researchers. 
 
Title Datascriptor 
Description From structured dataset to data article. Leveraging our experience and links with the communities, we are now designing an open-source web-based tool - part of an ecosystem of existing annotation and authoring systems - to help researchers to use community standards to describe their (meta)data at the source, and capitalize on their effort to accelerate the creation of a data article. The user will be guided to provide (semi)structured descriptions of the experimental design, and of the post-processed data, to generate, respectively, the Methods and a set of statements to populate the Results section of a manuscript. Datascriptor will work: (i) as a stand-alone tool - for anyone to use - implementing generic metadata models, such as W3C Data Catalog vocabulary; and (ii) as a component of the ISA Tools - for its user communities - implementing the ISA metadata model. To output short sentences from the (semi)structured input, we will evaluate a mixed data-to-text approach using template-based and neural-based (i.e. machine learning) methods. To further enrich the content of the manuscript, Datascriptor will connect to existing authoring systems, including Substance, Texture, Stenci.la and Manuscripts, and export the result in JATS format. Our plans also include an export as a DAR file and in LaTeX format. 
Type Of Technology Webtool/Application 
Year Produced 2019 
Open Source License? Yes  
Impact Work has just started, but to ensure continued impact in the stakeholder community, the Datascriptor User Advisory Board includes a core group of existing collaborators: Thomas Lemberger (EMBO Press), Scott Edmunds (GigaScience), Holly Murray ( F1000), Varsha Khodiyar (Springer Nature). 
 
Title Deposition workflow to Dataverse Repository 
Description In collaboration with the CGIAR centers, we developed a workflow for depositing heterogeneous data into the Dataverse repository along with appropriate metadata extracted from CGCore metadata fragments. Dataverse is developed by Harvard university and is widely used in academia to deposit and disseminate research objects. 
Type Of Technology Webtool/Application 
Year Produced 2019 
Open Source License? Yes  
Impact This will allow COPO to be the main access point of metadata annotation and data deposition for the CGIAR institutes. This is a major conglomeration of research stations around the globe responsible for many agricultural advances in the developing world since the end of the second world war. From this work, we can expect to broker tens of thousands of documents, data sets or other research objects from CGIAR researchers. 
 
Title Dublin Core Integration 
Description Dublin core schema and implementing wizard created allowing researchers to record metadata for their outputs in this community standard format. 
Type Of Technology Webtool/Application 
Year Produced 2018 
Open Source License? Yes  
Impact Some users have reported using this feature. Dublin core is a well recognized community standard used by many data producers and repositories. 
 
Title ISA API 
Description Released under the Common Public Attribution License Version 1.0 (CPAL) license, the Investigation Study Assay (ISA) API aims to provide developers with with a set of tools to enable the programmatic construction of ISA objects, validation of objects, and conversion between serialisations of ISA-formatted datasets and other formats/schemas (e.g. data deposition schemas). To facilitate the use of the ISA model (see the ISA-Tab specification - http://www.isa-tools.org/format/specification/) in modern web applications, the model (version 1.0) is represented as a set of JSON schemas, which provide the information the ISA model maintains for each of the objects. JSON is a widely used interchange format that powers much of the web today, and is used by a range of programming languages and platforms. As such, the objective of designing and developing JSON schemas is to support a new serialisation of the ISA-Tab model in JSON format, in addition to existing serialisations in Tabular format and RDF format. The new JSON models can be found here: https://github.com/ISA-tools/isa-api/tree/master/isatools/schemas/isa_model_version_1_0_schemas/core 
Type Of Technology Software 
Year Produced 2015 
Open Source License? Yes  
Impact The ISA API is used in a number of projects arising in collaboration with the Oxford eResearch Centre (OERC), notably the COPO project, and is under continued development. 
URL https://github.com/ISA-tools/isa-api
 
Title ISA Model and Serialization 
Description The original ISA-Tab specification was published as a Release Candidate document in 2008, documenting the initial work that forms the ISA framework, with a further update in 2009. Since then, we have done work on a new serialization in JSON, ISA-JSON, and abstracted out the data model from both the tabular and JSON formats. 
Type Of Technology Software 
Year Produced 2016 
Open Source License? Yes  
Impact Serialisations implemented by several ISA components; the documentation also helps other users to implement ISA formats. 
URL http://isa-tools.org/2016/10/release-of-the-isa-specs/
 
Title ISA Python API 
Description The ISA API aims to provide software developers with a set of tools to help you easily and quickly build your own ISA objects, validate, and convert between serializations of ISA-formatted datasets and other formats/schemas (e.g. SRA schemas). The ISA API is published on PyPI as the isatools package. 
Type Of Technology Software 
Year Produced 2017 
Open Source License? Yes  
Impact The vision for the ISA API is to provide a programming library that will become the core for all software tooling that supports the ISA framework. It enables the import of various data formats into an implementation of the ISA Abstract Model as Python objects, and export of ISA content from Python objects back to different serialization formats. 
URL http://isa-tools.org/2017/01/isa-api-milestone/
 
Title ISA software suite (built iteratively, component by component) 
Description The open source ISA framework and tools help to manage an increasingly diverse set of life science, environmental and biomedical experiments that employing one or a combination of technologies. Built around the 'Investigation' (the project context), 'Study' (a unit of research) and 'Assay' (analytical measurement) data model and serializations (tabular, JSON and RDF), the ISA framework helps you to provide rich description of the experimental metadata (i.e. sample characteristics, technology and measurement types, sample-to-data relationships) so that the resulting data and discoveries are reproducible and reusable. 
Type Of Technology Software 
Year Produced 2010 
Open Source License? Yes  
Impact Growing number of users, as listed at http://isacommons.org; but also of co-developers have and are contributing to the collaborative enhancements. 
URL http://isa-tools.org/
 
Title ISA-API Python library 
Description Project name: ISA-API Project home page: http://github.com/ISA-tools/isa-api Operating system(s): Platform independent Programming language: Python 3 Other requirements: None License: CPAL-1.0 ISA-API, a Python library that supports the creation, editing, parsing, and validatiation of both ISA-Tab and ISA-JSON formats, using a common data model implemented as native Python objects. 
Type Of Technology Software 
Year Produced 2018 
Open Source License? Yes  
Impact This provides users with a common interface and interoperable medium between the two ISA formats, as well as conversion to a set of other formats required for depositing data in public databases. 
 
Description 1st COPO user workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Study participants or study members
Results and Impact The Collaborative Open Plant Omics (COPO) consortium workshop brought together a focus group, comprising a small number of experts for 2 days, with an active interest in collecting and managing plant data. During the workshop, we discussed approaches to the description, collection, annotation, standardisation and management of (large) datasets, including requirements for submission to public repositories, current user needs and stumbling blocks. The workshop enabled us to better understand the needs of end users and to generate an overview of how, and what types of datasets, plant biologists are currently generating. This information has helped to guide the COPO consortium as it develops its community platform for data publication and citation.
Year(s) Of Engagement Activity 2015
URL http://blog.garnetcommunity.org.uk/copo-2015-meeting/
 
Description 2nd COPO User Workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The Collaborative Open Plant Omics (COPO) consortium workshop brought together a focus group, comprising a small number of experts for 2 days, with an active interest in collecting and managing plant data. During the workshop, we demonstrated the new COPO portal and metadata collection layers of the software, discussed approaches to the description, collection, annotation, standardisation and management of (large) datasets, including requirements for submission to public repositories, current user needs and stumbling blocks.

The workshop enabled us to better understand the needs of end users and to deliver feedback to the COPO partners about gaps and recommended software features. This information has helped to guide the COPO consortium as it develops its community platform for data publication and citation.
Year(s) Of Engagement Activity 2016
URL http://copo-project.org/agenda_workshop2.html
 
Description Advanced statistical modelling and machine learning for molecular biology 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Workshop to look for synergies between disciplines between AI and molecular biology.
Year(s) Of Engagement Activity 2019
 
Description BBSRC/NSF/ERA-CAPS workshop on Challenges and Opportunties in Plant Science Data Management at PAG 2015, San Diego, CA, US 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The workshop intended to bring the plant data science community together to discuss data, metadata, tools and policy. Across the plant sciences data of a varying types are being generated, analysed and shared on a daily basis using a wide variety of tools, terminologies and formats. However, without mechanisms to encourage researchers to describe their materials, data, assays, and experimental procedures in an agreed manner it is almost impossible to discover and compare datasets. The workshop showcased researchers working in the areas of data and experimental description; data identification and discovery; data integration and data citation.

Dr Davey presented COPO to the audience, and it was well received. Most notably was the opportunity to discuss the platform with a co-speaker Dr Walls, a member of the Univ Arizona iPlant/Cyverse analyst team. She was very interested in COPO as a potential platform to wrap up into Cyverse as a brokering mechanism for their Data Commons. This collaboration is ongoing.

Specific information about the presentation can be found here: https://pag.confex.com/pag/xxiv/meetingapp.cgi/Paper/22219
Year(s) Of Engagement Activity 2015
URL https://pag.confex.com/pag/xxiv/meetingapp.cgi/Session/3732
 
Description BBSRC/NSF/ERACAPS Closed Forum Workshop on Plant Phenotyping at Plant and Animal Genomes (PAG) Conference 2015, San Diego, CA, US 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Policymakers/politicians
Results and Impact BBSRC/NSF/ERACAPS Closed Forum Workshop on Plant Phenotyping at Plant and Animal Genomes (PAG) Conference 2015

Topic areas for breakout groups: (1) Sensors - development and technologies, (2) Standards, and (3) Interoperability.

Questions to consider for each topic area:
What are the major strengths, weakness and opportunities?
What is the biggest challenge facing this area? What would be needed to meet this challenge in order to revolutionize the area and have a lasting impact?
Are new tools, technologies or resources needed?
Does progress require policy or social change?
What education and training is required to meet these challenges?

A white paper is in preparation to document the outcomes of the round table workshop.
Year(s) Of Engagement Activity 2015
 
Description Biohackathon; ELIXIR, Paris 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The team participated to several tracks, especially working on ISA for plant and metabolomics community, as well as for use in Galaxy, and the bioschema work. The work carried our continue to embed ISA and FAIRsharing into ELIXIR-driven infrastructure and activities.
Year(s) Of Engagement Activity 2018
URL https://www.elixir-europe.org/events/biohackathon-2018-paris
 
Description Bioinformatics for Plant Biology 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Hosted by the European Bioinformatics Institute (EBI), the aim of the workshop was to provide an introduction to resources and tools to help manage, discover and analyse genomics data, with a particular focus on cereal genomics. Within this context, an in-depth training session on COPO as a data management tool was delivered.
Year(s) Of Engagement Activity 2018
URL https://www.ebi.ac.uk/training/events/2018/bioinformatics-plant-biology
 
Description Building infrastructure for open science - British Computer Society 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Invited speaker at the Advanced Programming Group annual Christmas lecture
Year(s) Of Engagement Activity 2015
URL http://www.bcs.org/category/18516
 
Description CGIAR Big Data Workshop - Naivasha 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Around 60 data managers, developers, and policy makers from the CGIAR research centers attended this workshop to discuss annual progress and direction. COPO was demonstrated and a group of test users was recruited.
Year(s) Of Engagement Activity 2018
URL https://bigdata.cgiar.org/event/cgiar-platform-for-big-data-in-agriculture-convention-2018-nairobi-k...
 
Description CGIAR Big Data in Agriculture Convention (Hyderabad) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Demo of latest COPO features to CGIAR data managers.
Year(s) Of Engagement Activity 2019
URL https://bigdata.cgiar.org/hyderabad-2019/
 
Description COPO - A Metadata Platform for Brokering FAIR Data in the Life Sciences 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact 100 or so conference attendees saw my talk at Plant and Animal Genomics.
Year(s) Of Engagement Activity 2020
 
Description COPO 3rd User Workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The EI COPO team organised and ran the 3rd COPO User Workshop as a dedicated training event hosted as a satellite event to the Plant and Animal Genome (PAG) conference, in January 2018. We hired conference facilities at the nearby Mariott hotel, and ran a successful 3rd workshop to show recent developments to the platform and to gather feedback about potential improvements to 15 international participants.
Year(s) Of Engagement Activity 2018
 
Description COPO live demonstration at PAG 2018 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Live demonstration of the COPO system during a Digital Tools and Resources stream at the PAG conference.
Year(s) Of Engagement Activity 2018
 
Description COPO: A Data Stewardship Platform for Plant Scientists - Computer Demo session (PAG XXV) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact We present Collaborative Open Plant Omics (COPO), a brokering service between plant scientists and public repositories, which enables management, aggregation and publication of research outputs described and integrated using linked data. COPO provides consolidated access to services and disparate information sources via a web interface and Application Programming Interfaces.
Users create profiles which represent a logical span of research, such as a grant funding round or PhD project. 'Research objects' comprising a broad spectrum of potential outputs (such as sequence data, images, manuscripts, source code, posters) can be uploaded into the profile. Annotation of these objects with community-supported standards is facilitated using simple user-interface wizards which aim to reduce the complexity of this task, supported by ISA components (http://isa-tools.org/) for metadata interoperability and automated metadata format conversion. COPO uses the Ontology Lookup Service (http://www.ebi.ac.uk/ols) to provide the crucial contextual metadata required for standardised data description. Currently, deposition of both data and metadata to the European Nucleotide Archive for sequence data, and Figshare for data types such as images, posters, and presentations is supported.

In the future, we will support more public repositories for multi-omic data submission, and users will be able to search for and pull such data into analysis environments such as CyVerse and Galaxy. We will subsequently track the outputs and associated metadata in COPO, thus creating a provenance trail from data to publication.

http://copo-project.org/

https://github.com/collaborative-open-plant-omics/COPO
Year(s) Of Engagement Activity 2017
URL https://pag.confex.com/pag/xxv/meetingapp.cgi/Paper/23791
 
Description COPO: Extending the frontiers of "FAIR" Data in Agriculture 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Demonstration of COPO features to Research Data Alliance delegates.
Year(s) Of Engagement Activity 2020
 
Description CUDDEL closing workshop/hackathon, EBI 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Closing workshop of the CUDDEL grant, following up on issues outstanding from the 2017 Hong Kong workshop; discussion to explore the feasibility of making a follow up BBSRC Partnering application in the future.
Year(s) Of Engagement Activity 2018
URL https://github.com/ISA-tools/cuddel-mzml2isa-enhance
 
Description Challenges and Opportunities in Plant Science Data Management (Workshop, PAG 2019) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Dr Davey organised the PAG 2019 workshop Challenges and Opportunities in Plant Science Data Management alongside Carolyn Lawrence-Dill from Iowa State University, USA.
Year(s) Of Engagement Activity 2019
URL https://www.intlpag.org/2019/
 
Description Data Brokering for Plant Scientists (DivSeek partner's meeting, PAG 2018, San Diego) 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Delivered a lightning talk to promote the COPO data brokering platform at the annual DivSeek partner's meeting at PAG.
Year(s) Of Engagement Activity 2018
 
Description Data Stewardship in the Life Sciences 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact I spoke at the "Challenges and Opportunities in Plant Science Data Management" workshop on the subject of data management in the life sciences.

Open data and integrative data sharing are fundamental factors in order to address the challenges of modern data-intensive science. There is a clear need to develop and maintain community-focussed, semantically-aware data stewardship and management platforms, such as COPO, that are able to cope with the description and sharing of potentially huge datasets arising from the life sciences. Once made available, it is not sufficient to assume that researchers around the globe have requisite skills and resources to analyse these data. Therefore, we need to provide large-scale data analysis environments that are fit for purpose, incorporating state-of-the-art interfaces and programmatic layers to meet broad end-user requirements, such as CyVerse and Galaxy. Finally, this can only happen when there are community-led efforts into implementing solutions for data standardisation, best practice, and FAIR data policy. We are now only just starting to take advantage of groundbreaking opportunities to make integrated data a reality, and thus enabling scientists to store, manage, and share their data as a first-class citizen of the scientific process.
Year(s) Of Engagement Activity 2017
URL http://app.core-apps.com/pag_2017/event/e2bec353017762d275ce250c23e011e6
 
Description Data, Data, Data Everywhere (Pint of Science talk, Norwich) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact Dr Davey delivered a talk as part of the Norwich 2017 Pint of Science series about the challenges and solutions for modern data management in the life sciences, including recent data developments, high-performance computing, and software tools.
Year(s) Of Engagement Activity 2017
URL https://pintofscience.co.uk/event/crops-crystals-and-computers-technology-for-food-security
 
Description DivSeek Partner's Meeting (PAG 2019) 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Dr Davey attended the DivSeek Partner's Meeting in the Courtyard Marriott hotel at PAG 2019.
Year(s) Of Engagement Activity 2019
 
Description Divseek Working Group - Data Standards for Interoperable Tools 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact As part of the "DivSeek - Addressing the challenges and opportunities for information and data sharing associated with plant germplasm" session at PAG, I spoke about the DivSeek Data Standards for Interoperable Tools Working Group. This WG will promote best practice in data sharing in the plant sciences, through the use of open and interoperable software powered by the adoption of open standards, i.e. programmatic interoperability standards (APIs), controlled vocabularies, trait dictionaries, metadata standards, and ontologies. We aim to highlight gaps in interoperability that impede workflows important to the communities supported by DivSeek partners, by liaising with research development groups, other DivSeek working groups, and consortia with relevance to DivSeek. We will educate and train data generators about standards and the tools and resources that use them, in order to promote and foster standards-compliance for long-term open data stewardship.
Year(s) Of Engagement Activity 2017
URL https://pag.confex.com/pag/xxv/meetingapp.cgi/Paper/26202
 
Description Down The Tubes! Talk at the Norwich Science Festival 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact Dr Davey gave a talk on the internet and data science entitled "Down The Tubes!" at the 2018 Norwich Science Festival.
Year(s) Of Engagement Activity 2018
URL https://norwichsciencefestival.co.uk/events/down-the-tubes/
 
Description ELIXIR UK ALL HANDS 2018 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact The ELIXIR-UK All Hands Meeting provided updates on recent activities from the ELIXIR UK Node and ELIXIR Hub, alongside discussions of future resources, events and roadmapping breakouts.Dr Davey presented the COPO project and CyVerse UK infrastructure as UK-specific resources that were being developed as national infrastructure for UK researchers. There was much interest from the participants in both projects, and conversations at this event led to the submission of a BBSRC TRDF with Gos Micklem (Cambridge), Dr Davey and Dr Shaw (EI).
Year(s) Of Engagement Activity 2018
URL http://www.earlham.ac.uk/elixir-all-hands-2018
 
Description ELIXIR-UK ALL-HANDS MEETING 2017 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact The ELIXIR-UK All Hands Meeting provided updates on recent activities from the ELIXIR UK Node and ELIXIR Hub, alongside discussions of future resources, events and roadmapping breakouts.Dr Davey presented the COPO project and CyVerse UK infrastructure as UK-specific resources that were being developed as national infrastructure for UK researchers. There was much interest from the participants in both projects, and conversations at this event led to the submission of a BBSRC TRDF with Gos Micklem (Cambridge), Dr Davey and Dr Shaw (EI).
Year(s) Of Engagement Activity 2017
URL https://www.elixir-europe.org/events/elixir-uk-all-hands-meeting-2017
 
Description ELIXIR-UK AllHands meeting, Birmingham 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Showcasing latest work on FAIRsharing and ISA, as well as discussing how to best connect with other UK resources and those from other Nodes.
Year(s) Of Engagement Activity 2018
URL https://elixiruknode.org/event/elixir-uk-all-hands-2018/
 
Description ENA Facilities Day - 2017 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Gave a demonstration to promote COPO, as a data brokering and standards promotion platform, at the European Nucleotide Archive (ENA) Facilities Day. The Facilities Day is a meeting of representatives of sequencing facilities to participate in discussions, and to share their views on the impact of ENA as a global sharing platform for sequencing data and how it promotes bioinformatics research. Participants are given the opportunity to exchange ideas directly with ENA project leaders, and influence future developments towards sustaining and improving the ENA service. The ENA is one of the repositories currently supported by the COPO platform.
Year(s) Of Engagement Activity 2017
URL https://www.ebi.ac.uk/ena/support/facilities-day
 
Description Earlham Institute - COPO Workshop 2019 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Professional Practitioners
Results and Impact COPO tutorial for EI staff.
Year(s) Of Engagement Activity 2020
 
Description ISA presentation to GARnet workshop 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact David Johnson - in my team - gave a presentation on "Data Infrastructures to Foster Data Reuse" at a workshop on Integrating Large Data into Plant Science: From Big Data to Discovery hosted by GARnet (the UK network for Arabidopsis researchers) and Egenis (the Exeter Centre for the Study of the Life Sciences). The workshop was held at Dartington Hall in Devon, South West England, and was well attended by researchers from the plant and biological science community worldwide as well as representatives from industry from organisations such as Syngenta.
Year(s) Of Engagement Activity 2016
URL http://isa-tools.org/2016/07/plant-science-takes-a-focus-on-isa/
 
Description Integrative Bioinformatics workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact This event was hosted by the Rothamsted Research, Harpenden, and brought together experts in the field of bioinformatics, computer science, statistics, computational and systems biology to discuss key science and technology platforms to integrate, align and model heterogenous data types to generate meaningful insights from big data and from complex biological systems. The event attracted a wide range of participants (and expertise) both from academia and industry. In particular, key players in the publishing sector (e.g., F1000, Elsevier) were also present to discuss their individual efforts to making science more open. The workshop provided an excellent platform to showcase COPO, and its drive to make research output more open and easily accessible.
Year(s) Of Engagement Activity 2018
URL https://ib2018.eventzilla.net/web/event?eventid=2138944931
 
Description Laying the Foundations; Why are Semantics in Agriculture Difficult? - PAG 2020 talk in Plant Phenotypes workshop 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Dr Davey gave an invited talk to approx 90 attendees at the PAG 2020 workshop "Plant Phenotypes"
Year(s) Of Engagement Activity 2020
 
Description NERC DataTree 
Form Of Engagement Activity A broadcast e.g. TV/radio/film/podcast (other than news/press)
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Undergraduate students
Results and Impact Video to introduce the basic concepts of the FAIR principles, FAIR data management and FAIRsharing. The target audience for Data Tree is NERC funded PhD students and early career researchers, however, Data Tree will be an openly available resource.
Year(s) Of Engagement Activity 2017
URL https://datatree.org.uk/
 
Description Nudge Norfolk Workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Industry/Business
Results and Impact Workshop to discuss possible collaborations between COPO, AI and regional industrial partners such as farmers and retailers.
Year(s) Of Engagement Activity 2019
 
Description Ontology COP meeting presentation 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Discussions around COPO integration with big data platform's ontology community of practice.
Year(s) Of Engagement Activity 2019
 
Description Organiser of Challenges and Opportunities in Plant Science Data Management PAG workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Co-organiser of Challenges and Opportunities in Plant Science Data Management PAG workshop, which saw 6 international speakers deliver presentations on various aspects of data management in the plant sciences. Approx 50 attendees.
Year(s) Of Engagement Activity 2020
 
Description Poster presentation: ISAcreate and Galaxy; Galaxy conference, Portland 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact ISA-Tab format is now used by Galaxy tools; the discussion helped ensuring the uptake continue
Year(s) Of Engagement Activity 2018
URL https://gccbosc2018.sched.com/event/FEWs/g26-isacreate-a-galaxy-tool-for-prospective-data-management...
 
Description Preserving, Restoring and Managing Colombian Biodiversity Through Responsible Innovation - GROW Colombia UK workshop 2019 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Robert Davey gave a talk on the C3 Biodiversidad ConsortiumProject Coordination and Website
Year(s) Of Engagement Activity 2020
 
Description RDA Wheat Data Interoperability Working Group meeting, RDA Plenary, Barcelona 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The Wheat Data Interoperability Working Group aims to provide a common framework for describing, representing linking and publishing Wheat data with respect to open standards.Such a framework will promote and sustain Wheat data sharing, reusability and operability. Specifying the Wheat linked data framework will come with many questions: which (minimal) metadata to describe which type of data? Which vocabularies/ontologies/formats? Which good practices? Mainly based on the the needs of the Wheat initiatiative Information System (WheatIS) in terms of functionalities and data types, the working group will identify relevant use cases in order to produce a "cookbook" on how to produce "wheat data" that are easily shareable, reusable and interoperable. This meeting saw the maturation of the Working Group into a Maintenance Group, showing that we have moved from an inception phase to an implementation phase, promoting the outputs of the WG (the Wheat Data Interoperability guidelines) to users.
Year(s) Of Engagement Activity 2016
URL https://www.rd-alliance.org/group/agricultural-data-ig-igad-wheat-data-interoperability-wg-agriseman...
 
Description Support open science and FAIRness through an integrated collaborative platform for life science: CyVerse UK and hosted services 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Elixir all hands meeting.
Year(s) Of Engagement Activity 2019
 
Description Talk delivered at the ELIXIR Biodiversity working group Inaugural meeting - Milan 2020 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Policymakers/politicians
Results and Impact Organised as part of a new ELIXIR working group to address challenges in biodiversity data management and infrastructure.
Year(s) Of Engagement Activity 2020
 
Description UKRI Artificial Intelligence and Machine Learning workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Policymakers/politicians
Results and Impact UKRI fact finding workshop to ascertain how best to invest and implement UK AI strategy.
Year(s) Of Engagement Activity 2019
 
Description UKRI Darwin Tree of Life Project meeting, London 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Dr Davey travelled to London with other EI staff to discuss strategy for an SPF bid to UKRI for the UK Darwin Tree of Life Project.
Year(s) Of Engagement Activity 2018
 
Description Webinar given to Research Data Alliance 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Webinar given to Research Data Alliance members.
Year(s) Of Engagement Activity 2020
URL http://aims.fao.org/capacity-development/webinars/copo-extending-frontiers-data