COpenPlantOmics (COPO): a Collaborative Bioinformatics Plant Science Platform

Lead Research Organisation: University of Oxford
Department Name: Engineering Science

Abstract

We live in a digital age where we increasingly rely on interconnected resources in our daily lives. Biological science, due to the very nature of the complexity of worldwide research avenues, is typically fragmented. Even though scientific information is published in peer-reviewed articles, it is often badly described and, until very recently, often unavailable to the general public because of journal licensing issues and expensive subscription costs.

The field of bioinformatics (the analysis and management of biological data using computational methods) produces many freely available tools for data analysis and exposure that are incredibly useful to researchers. However, these tools often do not interoperate well, meaning that great effort is spent attempting to convert or tweak datasets to fit with other tools that further bioinformatics processes, hindering timely accurate reusable research. Couple this with the lack of descriptive information noted earlier, and knowledge that can be vital to one researcher, team or community can become at least unreproducible (thus letting others confirm findings) at worst unusable.

Life scientists are people focused on investigating biological processes. This requires a lot of time, effort and fastidiousness in experimental observation, data collection and analysis. Typically for life scientists, more time is spent on the former: defining and publishing experimental methods and results. The latter, i.e. the data behind these results, is usually badly defined and largely unpublished. For computer scientists, the story is reversed - the focus is on getting to the data. This platform will bridge the gap between these two groups by providing tools and training to both life and computer scientists in the plant bioscience field, in order to help them get their data into the right formats and described uniformly for open research.

To do this, the management, interoperability and curation of scientific datasets is key. Researchers need clear guidance and help to:

- Manage their data in a concise relevant way that allows immediate reuse by others: Generating data is only one part of the picture. To back up scientific findings, data needs to be made available to others to allow the same degree of rigour and peer review that is enforced for published material. This is not an easy task because the tools and resources required to describe data well and to make data available are typically designed for the computer scientist.
- Let them analyse their data easily: Large software development projects like Galaxy provide access to complex analytical tools - we are not aiming to reinvent the wheel in this regard. We aim to engage and collaborate with these existing providers to develop and exploit interfaces to these specialised software projects, so to let descriptive tools and analytical tools communicate efficiently.

This project will address these issues directly, providing tools for storing, annotating and sharing valuable information as well as promoting clear guidance, training. Overall this promises to be a major boost to UK plant sciences research.

This project aims to promote and build links between scientific knowledge and the tools used to generate that knowledge, addressing the lack of descriptive information about underlying data. By doing so, we will provide a platform comprising both existing tools and novel interoperability processes, allowing researchers easy access to methods of describing their work, feeding directly into analytical software, thus promoting clear and robust best practices in science.

Open science is vital to the future generation of researcher, especially to realise the goals of transparent knowledge sharing. This project will remove the barriers that restrict researchers in making their findings freely available to everyone in a consolidated seamless easy-to-use fashion.

Technical Summary

Accessibility to biological data has been hindered by lack of standards, lack of awareness of the benefits and pathways to releasing data that is described by those standards, and lack of services whereby data can be analysed, published and retrieved easily. Recently, there has been a large commitment by the BBSRC to push for open access data and publishing to further bioscience research in the UK. However, barriers still exist that prevent scientists from openly depositing their data and metadata, which comprise a lack of interoperability between metadata annotation services, data repositories, data analysis platforms and data publishing platforms. As such, plant scientists might not: be aware that the services exist; have the expertise to use them; see the value in properly describing their data.
This project aims to build COPO, the software infrastructure required to reach the level of interoperability that plant researchers need to describe their data using community-recognised ontologies, seamless bi-directional data flow to relevant repositories, and then publish these data for open access. COPO will manage the hardware infrastructure at TGAC to deliver a consistent robust staging area and database that will support unique accessioned artefacts representing the corpus of data and metadata a user wants to expose. The resulting marked-up datasets processed and published using COPO will allow greater potential integrative analysis using existing tools such as iPlant and Galaxy.
New Application Programming Interfaces (APIs) will interconnect existing tools and services, and by developing new RESTful user interfaces that wrap up these APIs, COPO will be a single point-of-entry for plant researchers to disseminate their data all the way from generation to publication. By federating the TGAC iRODS data grid system with others, e.g. Texas Advanced Computing Center's iPlant installation, access to worldwide analytical infrastructure and data will be facilitated.

Planned Impact

Academic, Economic and Commercial Impacts
With the renewed interest and push from all areas of bioscience to promote publicly available research, the COPO project will be a pioneering national and international effort to facilitate sharing of all aspects of plant research to the public. COPO aims to be the vehicle to bring together the tools required to harmonise open plant omics research. This sector has obvious ties with industry. Public domain omics-based bioscience is relevant and important input into industry internal research and discovery activities. To make such bioscience data truly reusable and ensure scientific robustness, it must be uniformly annotated, allowing not only integration through equivalence of terminology but also by increasing efficiency in data production and re-use, and allowing correct interpretation by means of the context provided by their metadata. A collaborative platform for frictionless bioinformatics built with and for the academic and industrial community is long overdue. Alongside data processing, industry also works on finding solutions for integration and management of large 'omics data sets, e.g. efforts like the Pistoia Alliance. Together with COPO industry partners (Eagle Genomics) we will develop use-cases for the platform in industry, propose acceptance criteria required for commercial use, supply technical advice/support on meeting acceptance criteria, evaluate the platform on 3rd party infrastructure, and maximise knowledge exchange and commercialisation.

COPO and the standards community
Expertise and knowledge gained throughout the lifetime of the project and beyond will be disseminated through a variety of channels. The presence of a direct link with the plant science community (through GARNet, UK Plant Sciences Federation (UKPSF)) is key to the success and adoption of the platform and associated standards. The project will have a continuous dialogue, through face-to-face events as well as online tools and social media, between those working on the platform and the plant bioscience community. The several letters of support show a clear interest in working together, using and adopting a platform that implicitly confers standards compliance. COPO will provide a solution to overcome the challenges in standards fragmentation by (i) fostering development, acceptance and implementation of reporting standards that are immediately suitable for plant research, and (ii) limiting the range and variability of standards. This will have a direct impact on the development and maintenance costs for commercial and academic software developers of standards-compliant products.

Societal impacts
Historically there has been reluctancy to adopt some of the standards and open-data principles in the plant bioscience community, especially in the field of food sustainability and security, so openness and transparency in these areas are vital to continue improving the public perception. The presentation of the research data will play a key role in opening the dialogue with the general public and will contribute to the development of stronger links with sectors in society (such as school teachers) that are less familiar with the scientific activities in plant research and the beneficial impact this has in their lives. It is widely recognised that the shortage of expertise and skill in biomathematics and informatics across the world is a major risks for a future development of key areas in life sciences. The objectives of this proposal will help to attract talented staff to work with the COPO partners, and offer alternative career paths.

Publications

10 25 50
publication icon
Amann RI (2019) Toward unrestricted use of public genomic data. in Science (New York, N.Y.)

publication icon
Bandrowski A (2016) The Ontology for Biomedical Investigations. in PloS one

publication icon
Chen X (2018) DataMed - an open source discovery index for finding biomedical datasets. in Journal of the American Medical Informatics Association : JAMIA

publication icon
Emami Khoonsari P (2019) Interoperable and scalable data analysis with microservices: applications in metabolomics. in Bioinformatics (Oxford, England)

publication icon
Etuk A (2018) COPO User Manual

publication icon
Gonzalez-Beltran AN (2018) Data discovery with DATS: exemplar adoptions and lessons learned. in Journal of the American Medical Informatics Association : JAMIA

 
Description COPO is a portal for plant scientists to describe, store and retrieve data more easily, using community standards and public repositories that enable the open sharing of results. COPO is now in production, helping users through the data brokering process, as well as gathering feedback regarding improvements and bugs.
Exploitation Route The ISA software suite, partly used by COPO, is open source and reusable for other domains outside plant science. A list of user communities is mantained here: http://www.isacommons.org/
Sectors Agriculture, Food and Drink,Digital/Communication/Information Technologies (including Software),Education

URL http://copo-project.org/
 
Description The COPO infrastructure will have an impact and continue to increase the effectiveness of data sharing and the reuse.
Sector Agriculture, Food and Drink,Digital/Communication/Information Technologies (including Software)
Impact Types Policy & public services

 
Description Advised Springer Nature on the data policy
Geographic Reach Multiple continents/international 
Policy Influence Type Membership of a guideline committee
URL http://www.springernature.com/gp/group/data-policy/
 
Description Co-authored a review commissioned by the Wellcome Trust focusing on interoperability standards for digital research outputs
Geographic Reach Multiple continents/international 
Policy Influence Type Membership of a guideline committee
URL https://figshare.com/articles/Review_Interoperability_standards/4055496
 
Description FAIRsharing is one of the elements mentioned in the "Framework for Discipline-specific Research Data Management" report by Science Europe.
Geographic Reach Europe 
Policy Influence Type Influenced training of practitioners or researchers
URL https://www.scienceeurope.org/wp-content/uploads/2018/01/SE_Guidance_Document_RDMPs.pdf
 
Description FAIRsharing is one of the resources recommended by the EU EOSC "Turning FAIR into Reality" report.
Geographic Reach Europe 
Policy Influence Type Influenced training of practitioners or researchers
 
Description FAIRsharing is one of the resources recommended by the UK Jisc "FAIR in Practice report".
Geographic Reach National 
Policy Influence Type Influenced training of practitioners or researchers
 
Description EC - PHC-32-2014 - MultiMot
Amount € 100,000 (EUR)
Funding ID H2020-EU.3.1, 634107 
Organisation European Commission 
Department Horizon 2020
Sector Public
Country European Union (EU)
Start 08/2015 
End 07/2018
 
Description EC H2020 - INFRADEV-3-2015 - ELIXIR EXCELERATE
Amount € 240,000 (EUR)
Organisation European Commission 
Department Horizon 2020
Sector Public
Country European Union (EU)
Start 09/2015 
End 08/2019
 
Description EINFRA-2015-1 - PhenoMeNal
Amount € 600,000 (EUR)
Funding ID H2020-EU.1.4.1.3, 654241 
Organisation European Commission 
Department Horizon 2020
Sector Public
Country European Union (EU)
Start 09/2015 
End 08/2018
 
Description FAIRplus
Amount £3,996,150 (GBP)
Funding ID 802750 
Organisation European Commission 
Department Innovative Medicines Initiative (IMI)
Sector Public
Country Belgium
Start 01/2019 
End 01/2022
 
Description ISA-InterMine: accelerating and rewarding data sharing
Amount £1,174,660 (GBP)
Funding ID 208381/A/17/Z 
Organisation Wellcome Trust 
Sector Charity/Non Profit
Country United Kingdom
Start 08/2018 
End 07/2021
 
Title Continued improvements to the ISA toolkit 
Description Started in 2003 and first released in 2007, the ISA tools have been developed over time by the Oxford team and collaborators or directly contributed by partnering contributors, via the ISA Commons collaborative community. Short description of the developments and achievements of the resource over the last year: • Awarded Wellcome Trust funds (2018-2021), as collaborative project with the University of Cambridge's InterMine team to link the two resources and reward researchers for annotating and publish FAIR data; also, ISA is embedded in two ELIXIR Implementation Studies, on a Plant-focused data validation and on metabolomics. • With the uptake of ISA-Galaxy tools (https://github.com/ISA-tools/isatools-galaxy) and integration with the Galaxy Framework, ISA has struck a major milestone by showcasing how prospective data management can be done, demonstrating a full deposition workflow to Metabolights and creating training material (10.7490/f1000research.1115757.1). • Jupyter notebooks (https://github.com/ISA-tools/dtp-isa-exercises) have been developed as teaching material to showcase the use of ISA-API in various context to undergraduate and postgraduate courses on data readiness. 
Type Of Material Improvements to research infrastructure 
Year Produced 2018 
Provided To Others? Yes  
Impact Community use and impact is tracked via the ISA Commons, which currently has over 40 international groups, projects, and organizations that use and contribute to the development of components of the ISA metadata tracking framework. Therefore, we can say that the ISA user base ranges from hundreds to thousands of researchers from increasingly diverse domains (ranging from -omics, cell-based research, biomedical nanotechnology, plant phenotyping, toxicology, biodiversity, metagenomics, stem cell research, system biology, neuroscience, microbial science and immunology), and goes beyond researchers, curators, others resource developers and service providers, to also include journals. For example, ISA is used by the University of Oxford' GigaScience and underpins Springer Nature's Scientific Data data journal, supporting intelligent data sharing and credit; ISA is used to describe the experiment and to provide browse and search functionality for Scientific Data's content (http://scientificdata.isa-explorer.org). The ISA framework is currently embedded in a number of UK, EC and NIH and pharma funded infrastructure and research projects; here are exemplars from the ELIXIR UK Node and other Nodes: o EMBL-EBI MetaboLights' new web-based submission relies on ISA-JSON format to build web component and on the ISA-API to validate, convert experiments represented in ISA objects. o BBSRC-funded COPO infrastructure relies on the ISA API, ISA-JSON serialization and on the ISA configurations to support plant-based experiment molecular profiling experiments; it also used the ISAconverter to deposit to the ENA database. o ELIXIR-UK Node partners, University of Birmingham and Imperial College London use ISA Galaxy Tools, ISA-API and ISA validator - as part of their work in the UK Phenome Centre - to collect data prospectively but also organise public deposition to repositories. o ELIXIR Plant Community's MIAPPE standards and BrAPI rely on availability of ISA parsers and validation tools in the context of data validation programs. 
URL http://isa-tools.org
 
Title Continued improvements to the ISA toolkit and the new Datascriptor component 
Description Started in 2003 and first released in 2007, the ISA tools (http://isa-tools.org) have been developed over time by the Oxford team and collaborators or directly contributed by partnering contributors, via the ISA Commons collaborative community (https://www.isacommons.org). Key work over the last year is the development of a new component, the Datascriptor: https://datascriptor.org, as part of the Wellcome Trust award (2018-2021), a collaborative project with the University of Cambridge's InterMine team. Leveraging our experience and links with the communities, we are designing an open-source web-based tool - part of an ecosystem of existing annotation and authoring systems - to help researchers to use community standards to describe their (meta)data at the source, and capitalize on their effort to accelerate the creation of a data article. In addition major advances have been made to the ISA API also working with the ELIXIR Plant and Metabolomics communities. 
Type Of Material Improvements to research infrastructure 
Year Produced 2019 
Provided To Others? Yes  
Impact Community use and impact is tracked via the ISA Commons, which currently has over 40 international groups, projects, and organizations that use and contribute to the development of components of the ISA metadata tracking framework. Therefore, we can say that the ISA user base ranges from hundreds to thousands of researchers from increasingly diverse domains (ranging from -omics, cell-based research, biomedical nanotechnology, plant phenotyping, toxicology, biodiversity, metagenomics, stem cell research, system biology, neuroscience, microbial science and immunology), and goes beyond researchers, curators, others resource developers and service providers, to also include journals. For example, ISA is used by the University of Oxford' GigaScience and underpins Springer Nature's Scientific Data data journal, supporting intelligent data sharing and credit; ISA is used to describe the experiment and to provide browse and search functionality for Scientific Data's content (http://scientificdata.isa-explorer.org). The ISA framework is currently embedded in a number of UK, EC and NIH and pharma funded infrastructure and research projects; here are exemplars from the ELIXIR UK Node and other Nodes: (i) EMBL-EBI MetaboLights' new web-based submission relies on ISA-JSON format to build web component and on the ISA-API to validate, convert experiments represented in ISA objects. (ii) BBSRC-funded COPO infrastructure relies on the ISA API, ISA-JSON serialization and on the ISA configurations to support plant-based experiment molecular profiling experiments; it also used the ISAconverter to deposit to the ENA database. (iii) ELIXIR-UK Node partners, University of Birmingham and Imperial College London use ISA Galaxy Tools, ISA-API and ISA validator - as part of their work in the UK Phenome Centre - to collect data prospectively but also organise public deposition to repositories. (iv) ELIXIR Plant Community's MIAPPE standards and BrAPI rely on availability of ISA parsers and validation tools in the context of data validation programs. 
URL https://datascriptor.org
 
Title MIAPPE specification and tools 
Description Minimum Information About a Plant Phenotyping Experiment is an open, community driven project to harmonize data from plant phenotyping experiments. MIAPPE specification comprises both a conceptual checklist of metadata required to adequately describe a plant phenotyping experiment, and software to validate, store and disseminate MIAPPE-compliant data. 
Type Of Material Improvements to research infrastructure 
Year Produced 2017 
Provided To Others? Yes  
Impact MIAPPE is a logical standard - but there are specific implementations of tools designed to support its use and application, for example, in the ISA-tools framework. We are working with the developers of the Plant Breeding API (BRAPI) to ensure the compliance of BRAPI with the MIAPPE standard, and to coordinate future developments. 
URL http://www.miappe.org/
 
Description ELIXIR Interoperability Platform and FAIRsharing 
Organisation ELIXIR
Country United Kingdom 
Sector Charity/Non Profit 
PI Contribution Run by Prof. Sansone group, FAIRsharing (https://fairsharing.org) is a resource on standards, repositories, and data policies endorsed by a growing number of stakeholder communities, including major publishers, funders, libraries and FAIR-supporting organizations. FAIRsharing is part of the ELIXIR Recommended Interoperability Resources (RIRs) to facilitate interoperability and reusability of life science data and support the principles of FAIR data management.
Collaborator Contribution The ELIXIR Recommended Interoperability Resources have been selected by external panel of reviewers, based on the selection criteria published in the Call for RIR application, which measure how they facilitate scientific research and how they improve FAIRness of life science data.
Impact FAIRsharing is and will continue to be used by and further linked to other ELIXIR registries and services.
Start Year 2018
 
Description ELIXIR Interoperability Platform and ISA 
Organisation ELIXIR
Country United Kingdom 
Sector Charity/Non Profit 
PI Contribution ISA is part of the ELIXIR Recommended Interoperability Resources (RIRs) to facilitate interoperability and reusability of life science data and support the principles of FAIR data management.
Collaborator Contribution The ELIXIR Recommended Interoperability Resources have been selected by external panel of reviewers, based on the selection criteria published in the Call for RIR application, which measure how they facilitate scientific research and how they improve FAIRness of life science data.
Impact ISA is and will continue to be used by and further developed with ELIXIR communities, especially with Plant and Metabolomics use cases.
Start Year 2018
 
Description ELIXIR Metabolomics Community 
Organisation ELIXIR
Department ELIXIR UK
Country United Kingdom 
Sector Charity/Non Profit 
PI Contribution My team has contributed ISA-related work to the ELIXIR Metabolomics use case, activities and reports.
Collaborator Contribution We have gained more visibility for the ISA work and now ISA-Tab is a formal format used by the Galaxy analysis toolkit for metabolomics applications.
Impact The ISA framework as the basis for the metadata standards used by this ELIXIR Metabolomics Community and the tools are embedded in the EBI MetaboLights databases, as well as in other international metabolomics resources.
Start Year 2017
 
Description ELIXIR Metabolomics Community 
Organisation ELIXIR
Country United Kingdom 
Sector Charity/Non Profit 
PI Contribution My team has contributed ISA-related work to the ELIXIR Metabolomics use case, activities and reports.
Collaborator Contribution We have gained more visibility for the ISA work and now ISA-Tab is a formal format used by the Galaxy analysis toolkit for metabolomics applications.
Impact The ISA framework as the basis for the metadata standards used by this ELIXIR Metabolomics Community and the tools are embedded in the EBI MetaboLights databases, as well as in other international metabolomics resources.
Start Year 2017
 
Description ELIXIR Plant Use Case 
Organisation ELIXIR
Department ELIXIR UK
Country United Kingdom 
Sector Charity/Non Profit 
PI Contribution My team has contributed ISA-related work to the ELIXIR Plant Science use case, work and report.
Collaborator Contribution We have gained more visibility for the ISA work and COPO activities.
Impact ISA is used by the BRAPI and there is an ISA implementation of the MIAPPE specification.
Start Year 2016
 
Description ELIXIR Plant Use Case 
Organisation ELIXIR
Country United Kingdom 
Sector Charity/Non Profit 
PI Contribution My team has contributed ISA-related work to the ELIXIR Plant Science use case, work and report.
Collaborator Contribution We have gained more visibility for the ISA work and COPO activities.
Impact ISA is used by the BRAPI and there is an ISA implementation of the MIAPPE specification.
Start Year 2016
 
Description ELIXIR UK Node 
Organisation Earlham Institute
Country United Kingdom 
Sector Academic/University 
PI Contribution Help create the ELIXIR UK Node
Collaborator Contribution Contribute to the creation of the ELIXIR UK Node
Impact Creation of a virtual entity that represents UK strengths in bioinformatics and provides a route for UK bioinformatics resources to participate in, and benefit from, ELIXIR. The Node is currently being formalized.
Start Year 2012
 
Description ELIXIR UK Node 
Organisation Heriot-Watt University
Country United Kingdom 
Sector Academic/University 
PI Contribution Help create the ELIXIR UK Node
Collaborator Contribution Contribute to the creation of the ELIXIR UK Node
Impact Creation of a virtual entity that represents UK strengths in bioinformatics and provides a route for UK bioinformatics resources to participate in, and benefit from, ELIXIR. The Node is currently being formalized.
Start Year 2012
 
Description ELIXIR UK Node 
Organisation Imperial College London
Country United Kingdom 
Sector Academic/University 
PI Contribution Help create the ELIXIR UK Node
Collaborator Contribution Contribute to the creation of the ELIXIR UK Node
Impact Creation of a virtual entity that represents UK strengths in bioinformatics and provides a route for UK bioinformatics resources to participate in, and benefit from, ELIXIR. The Node is currently being formalized.
Start Year 2012
 
Description ELIXIR UK Node 
Organisation Newcastle University
Country United Kingdom 
Sector Academic/University 
PI Contribution Help create the ELIXIR UK Node
Collaborator Contribution Contribute to the creation of the ELIXIR UK Node
Impact Creation of a virtual entity that represents UK strengths in bioinformatics and provides a route for UK bioinformatics resources to participate in, and benefit from, ELIXIR. The Node is currently being formalized.
Start Year 2012
 
Description ELIXIR UK Node 
Organisation Rothamsted Research
Country United Kingdom 
Sector Academic/University 
PI Contribution Help create the ELIXIR UK Node
Collaborator Contribution Contribute to the creation of the ELIXIR UK Node
Impact Creation of a virtual entity that represents UK strengths in bioinformatics and provides a route for UK bioinformatics resources to participate in, and benefit from, ELIXIR. The Node is currently being formalized.
Start Year 2012
 
Description ELIXIR UK Node 
Organisation University College London
Country United Kingdom 
Sector Academic/University 
PI Contribution Help create the ELIXIR UK Node
Collaborator Contribution Contribute to the creation of the ELIXIR UK Node
Impact Creation of a virtual entity that represents UK strengths in bioinformatics and provides a route for UK bioinformatics resources to participate in, and benefit from, ELIXIR. The Node is currently being formalized.
Start Year 2012
 
Description ELIXIR UK Node 
Organisation University of Birmingham
Country United Kingdom 
Sector Academic/University 
PI Contribution Help create the ELIXIR UK Node
Collaborator Contribution Contribute to the creation of the ELIXIR UK Node
Impact Creation of a virtual entity that represents UK strengths in bioinformatics and provides a route for UK bioinformatics resources to participate in, and benefit from, ELIXIR. The Node is currently being formalized.
Start Year 2012
 
Description ELIXIR UK Node 
Organisation University of Cambridge
Country United Kingdom 
Sector Academic/University 
PI Contribution Help create the ELIXIR UK Node
Collaborator Contribution Contribute to the creation of the ELIXIR UK Node
Impact Creation of a virtual entity that represents UK strengths in bioinformatics and provides a route for UK bioinformatics resources to participate in, and benefit from, ELIXIR. The Node is currently being formalized.
Start Year 2012
 
Description ELIXIR UK Node 
Organisation University of Dundee
Country United Kingdom 
Sector Academic/University 
PI Contribution Help create the ELIXIR UK Node
Collaborator Contribution Contribute to the creation of the ELIXIR UK Node
Impact Creation of a virtual entity that represents UK strengths in bioinformatics and provides a route for UK bioinformatics resources to participate in, and benefit from, ELIXIR. The Node is currently being formalized.
Start Year 2012
 
Description ELIXIR UK Node 
Organisation University of Edinburgh
Department Edinburgh Genomics
Country United Kingdom 
Sector Academic/University 
PI Contribution Help create the ELIXIR UK Node
Collaborator Contribution Contribute to the creation of the ELIXIR UK Node
Impact Creation of a virtual entity that represents UK strengths in bioinformatics and provides a route for UK bioinformatics resources to participate in, and benefit from, ELIXIR. The Node is currently being formalized.
Start Year 2012
 
Description ELIXIR UK Node 
Organisation University of Edinburgh
Country United Kingdom 
Sector Academic/University 
PI Contribution Help create the ELIXIR UK Node
Collaborator Contribution Contribute to the creation of the ELIXIR UK Node
Impact Creation of a virtual entity that represents UK strengths in bioinformatics and provides a route for UK bioinformatics resources to participate in, and benefit from, ELIXIR. The Node is currently being formalized.
Start Year 2012
 
Description ELIXIR UK Node 
Organisation University of Liverpool
Country United Kingdom 
Sector Academic/University 
PI Contribution Help create the ELIXIR UK Node
Collaborator Contribution Contribute to the creation of the ELIXIR UK Node
Impact Creation of a virtual entity that represents UK strengths in bioinformatics and provides a route for UK bioinformatics resources to participate in, and benefit from, ELIXIR. The Node is currently being formalized.
Start Year 2012
 
Description ELIXIR UK Node 
Organisation University of Manchester
Country United Kingdom 
Sector Academic/University 
PI Contribution Help create the ELIXIR UK Node
Collaborator Contribution Contribute to the creation of the ELIXIR UK Node
Impact Creation of a virtual entity that represents UK strengths in bioinformatics and provides a route for UK bioinformatics resources to participate in, and benefit from, ELIXIR. The Node is currently being formalized.
Start Year 2012
 
Description ELIXIR UK Node 
Organisation University of Oxford
Country United Kingdom 
Sector Academic/University 
PI Contribution Help create the ELIXIR UK Node
Collaborator Contribution Contribute to the creation of the ELIXIR UK Node
Impact Creation of a virtual entity that represents UK strengths in bioinformatics and provides a route for UK bioinformatics resources to participate in, and benefit from, ELIXIR. The Node is currently being formalized.
Start Year 2012
 
Description Hanna Cwiek - 2 month visit : MIAPPE and ISA 
Organisation Polish Academy of Sciences
Department Institute of Plant Genetics
Country Poland 
Sector Academic/University 
PI Contribution Members of my team, namely Philippe Rocca-Serra and Alejandra Gonzalez-Beltran has assisted Hanna in her ISA-related work.
Collaborator Contribution Dr Hanna Cwiek from the Poznan Institute of Genetic Research in Poland (in Pawel Krajewski's team) visited my team to work on ISA and MIAPPE, helping to refine ISA tools relevant to plant science and COPO activities.
Impact Possible paper on the work done
Start Year 2017
 
Description ISA Commons 
Organisation ISA Commons
Sector Charity/Non Profit 
PI Contribution We have helped many users, service providers and other developers to implement one or more components of the ISA software suite at their site to fit their data needs.
Collaborator Contribution They have helped us to refine the ISA software suite, filling gaps and tuning it for certain data types.
Impact The ISA Commons is a growing ecosystem of institute-based (e.g. USA NASA GeneLab Data Repository) and global repositories (e.g. EMBL-EBI MetaboLights), as well as data-driven journals (e.g. Springer Nature Scientific Data) that use the ISA formats, and/or are powered by one or more component of the ISA software suite. But also grass-root standards groups that leverage on the ISA data model and formats. The sustainability and maintenance of the ISA data model, formats, and tools, is guided by the ISA Working Group.
Start Year 2010
 
Description Integration of COPO and CGCore Schemas and Associated Repositories 
Organisation CGIAR
Country France 
Sector Charity/Non Profit 
PI Contribution We have developed a proof-of-concept platform to streamline metadata attribution and dataset deposition into CGIAR repositories using the BBSRC-funded COPO software. Drs Etuk and Shaw, two Research Software Engineers in the Davey group at Earlham Institute and the original core developers, have implemented various new features into COPO to allow CGIAR Data Managers to harmonise and streamline the submission of CG-relevant metadata and data into the CG digital data repositories. All software and infrastructure is hosted within the CyVerse UK cloud. We have: - Implemented support of CG Core v.2.0. (http://repo.mel.cgiar.org/handle/20.500.11766/4764) metadata annotation of various data types, including publications, produced at the CGIAR institutes via the existing COPO wizard system. - Implemented support of submissions of annotated objects to institutional instances of the following repositories: dSpace (https://www.duraspace.org/dspace/), CKAN (https://ckan.org/) and Dataverse (https://dataverse.org/). - Designed and implemented a mechanism within COPO which controls which users can submit to which repositories. - Implemented support the annotation of variables within data sets (i.e. column headings; experiment condition descriptors etc) with terms and URIs from ontologies or controlled vocabularies/trait dictionaries (AGROVOC and GACS).
Collaborator Contribution CGIAR have provided coordination contributions with key members in the CG Centres to gather feedback on developed elements, as well as provided funds to allow a core CGCore metadata schema developer to travel to EI and work with Drs Etuk and Shaw to improve the CGCore schema.
Impact This collaboration has seen rapid development of key functionality in the COPO platform to support CG centre Data Managers. This has required technical skills to develop the software, biocuration expertise provided by CGIAR to improve and refine the CGCore metadata schema, ontology expertise from the Bioversity team in Montpellier, and coordination expertise from Dr Davey (EI) and Medha Devare (CGIAR). Software and Technical Products (Webtool/Application - Collaborative Open Plant Omics (COPO) (2017)): All software code developed is open source and can be found within the COPO Github repository: https://github.com/collaborative-open-plant-omics/COPO
Start Year 2018
 
Title Collaborative Open Plant Omics (COPO) 
Description COPO streamlines the process of data deposition to public repositories by hiding much of the complexity of metadata capture and data management from the end-user. The ISA infrastructure (www.isa-tools.org) is leveraged to provide the interoperability between metadata formats required for seamless deposition to repositories. COPO facilitates the links to data analysis platforms such as CyVerse UK and Galaxy. Logical groupings of artefacts (e.g. PDFs, raw data, contextual supplementary information) relating to a body of work are stored in COPO collections and represented by common standards, which are publicly searchable. Bundles of multiple data objects themselves can then be deposited directly into public repositories through COPO interfaces. This improvement output represents the beta release of the COPO platform in 2017. 
Type Of Technology Webtool/Application 
Year Produced 2017 
Open Source License? Yes  
Impact COPO has been added to the ELIXIR-UK roadmap for ELIXIR core data services, and is currently being used by EI and JIC researchers to deposit real, large scale sequencing datasets into the European Nucleotide Archive. COPO is also being investigated as a potential data entry tool for the CGIAR Big Data project, and this will be explored in a joint EAGER submission with CIMMYT. COPO has also been selected to act as one of the data ingestion pipelines for data arising from the Designing Future Wheat programme, depositing open data into the Grassroots repository. COPO is also being included in grant submissions to assist vertebrate and wheat communities in effective metadata management. COPO runs within the CyVerse UK National Capability infrastructure. 
URL https://copo-project.org
 
Title Datascriptor 
Description From structured dataset to data article. Leveraging our experience and links with the communities, we are now designing an open-source web-based tool - part of an ecosystem of existing annotation and authoring systems - to help researchers to use community standards to describe their (meta)data at the source, and capitalize on their effort to accelerate the creation of a data article. The user will be guided to provide (semi)structured descriptions of the experimental design, and of the post-processed data, to generate, respectively, the Methods and a set of statements to populate the Results section of a manuscript. Datascriptor will work: (i) as a stand-alone tool - for anyone to use - implementing generic metadata models, such as W3C Data Catalog vocabulary; and (ii) as a component of the ISA Tools - for its user communities - implementing the ISA metadata model. To output short sentences from the (semi)structured input, we will evaluate a mixed data-to-text approach using template-based and neural-based (i.e. machine learning) methods. To further enrich the content of the manuscript, Datascriptor will connect to existing authoring systems, including Substance, Texture, Stenci.la and Manuscripts, and export the result in JATS format. Our plans also include an export as a DAR file and in LaTeX format. 
Type Of Technology Webtool/Application 
Year Produced 2019 
Open Source License? Yes  
Impact Work has just started, but to ensure continued impact in the stakeholder community, the Datascriptor User Advisory Board includes a core group of existing collaborators: Thomas Lemberger (EMBO Press), Scott Edmunds (GigaScience), Holly Murray ( F1000), Varsha Khodiyar (Springer Nature). 
 
Title ISA API 
Description Released under the Common Public Attribution License Version 1.0 (CPAL) license, the Investigation Study Assay (ISA) API aims to provide developers with with a set of tools to enable the programmatic construction of ISA objects, validation of objects, and conversion between serialisations of ISA-formatted datasets and other formats/schemas (e.g. data deposition schemas). To facilitate the use of the ISA model (see the ISA-Tab specification - http://www.isa-tools.org/format/specification/) in modern web applications, the model (version 1.0) is represented as a set of JSON schemas, which provide the information the ISA model maintains for each of the objects. JSON is a widely used interchange format that powers much of the web today, and is used by a range of programming languages and platforms. As such, the objective of designing and developing JSON schemas is to support a new serialisation of the ISA-Tab model in JSON format, in addition to existing serialisations in Tabular format and RDF format. The new JSON models can be found here: https://github.com/ISA-tools/isa-api/tree/master/isatools/schemas/isa_model_version_1_0_schemas/core 
Type Of Technology Software 
Year Produced 2015 
Open Source License? Yes  
Impact The ISA API is used in a number of projects arising in collaboration with the Oxford eResearch Centre (OERC), notably the COPO project, and is under continued development. 
URL https://github.com/ISA-tools/isa-api
 
Title ISA Model and Serialization 
Description The original ISA-Tab specification was published as a Release Candidate document in 2008, documenting the initial work that forms the ISA framework, with a further update in 2009. Since then, we have done work on a new serialization in JSON, ISA-JSON, and abstracted out the data model from both the tabular and JSON formats. 
Type Of Technology Software 
Year Produced 2016 
Open Source License? Yes  
Impact Serialisations implemented by several ISA components; the documentation also helps other users to implement ISA formats. 
URL http://isa-tools.org/2016/10/release-of-the-isa-specs/
 
Title ISA Python API 
Description The ISA API aims to provide software developers with a set of tools to help you easily and quickly build your own ISA objects, validate, and convert between serializations of ISA-formatted datasets and other formats/schemas (e.g. SRA schemas). The ISA API is published on PyPI as the isatools package. 
Type Of Technology Software 
Year Produced 2017 
Open Source License? Yes  
Impact The vision for the ISA API is to provide a programming library that will become the core for all software tooling that supports the ISA framework. It enables the import of various data formats into an implementation of the ISA Abstract Model as Python objects, and export of ISA content from Python objects back to different serialization formats. 
URL http://isa-tools.org/2017/01/isa-api-milestone/
 
Title ISA tooling for the metabolomics community 
Description A new set of ISA software tools have been developed out of the EU H2020 PhenoMeNal: Large-Scale Computing for Medical Metabolomics project (http://phenomenal-h2020.eu/home). The ISA team has been contributing to the project since 2015, and has been collaborating on the development of user-facing, cloud-based data management and processing infrastructure in the project. The PhenoMeNal software includes a new set of ISA-related Galaxy workflow tools, as well as native support for the ISA-Tab format in Galaxy. 
Type Of Technology Software 
Year Produced 2018 
Open Source License? Yes  
Impact The tools work with the EBI MetaboLights database as well as with ISA-Tab studies uploaded directly into the Galaxy platform, and builds on the Python ISA-API. The Metabolights' use of ISA-API: Python-based REST service relying on the ISA-API https://github.com/EBI-Metabolights/MtblsWS-Py 
URL http://isa-tools.org/2018/03/isa-galaxy-developed-for-metabolomics/
 
Title ISA-API Python library 
Description Project name: ISA-API Project home page: http://github.com/ISA-tools/isa-api Operating system(s): Platform independent Programming language: Python 3 Other requirements: None License: CPAL-1.0 ISA-API, a Python library that supports the creation, editing, parsing, and validatiation of both ISA-Tab and ISA-JSON formats, using a common data model implemented as native Python objects. 
Type Of Technology Software 
Year Produced 2018 
Open Source License? Yes  
Impact This provides users with a common interface and interoperable medium between the two ISA formats, as well as conversion to a set of other formats required for depositing data in public databases. 
 
Description 1st COPO user workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Study participants or study members
Results and Impact The Collaborative Open Plant Omics (COPO) consortium workshop brought together a focus group, comprising a small number of experts for 2 days, with an active interest in collecting and managing plant data. During the workshop, we discussed approaches to the description, collection, annotation, standardisation and management of (large) datasets, including requirements for submission to public repositories, current user needs and stumbling blocks. The workshop enabled us to better understand the needs of end users and to generate an overview of how, and what types of datasets, plant biologists are currently generating. This information has helped to guide the COPO consortium as it develops its community platform for data publication and citation.
Year(s) Of Engagement Activity 2015
URL http://blog.garnetcommunity.org.uk/copo-2015-meeting/
 
Description 2nd COPO User Workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The Collaborative Open Plant Omics (COPO) consortium workshop brought together a focus group, comprising a small number of experts for 2 days, with an active interest in collecting and managing plant data. During the workshop, we demonstrated the new COPO portal and metadata collection layers of the software, discussed approaches to the description, collection, annotation, standardisation and management of (large) datasets, including requirements for submission to public repositories, current user needs and stumbling blocks.

The workshop enabled us to better understand the needs of end users and to deliver feedback to the COPO partners about gaps and recommended software features. This information has helped to guide the COPO consortium as it develops its community platform for data publication and citation.
Year(s) Of Engagement Activity 2016
URL http://copo-project.org/agenda_workshop2.html
 
Description Biohackathon; ELIXIR, Paris 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The team participated to several tracks, especially working on ISA for plant and metabolomics community, as well as for use in Galaxy, and the bioschema work. The work carried our continue to embed ISA and FAIRsharing into ELIXIR-driven infrastructure and activities.
Year(s) Of Engagement Activity 2018
URL https://www.elixir-europe.org/events/biohackathon-2018-paris
 
Description COPO 3rd User Workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The EI COPO team organised and ran the 3rd COPO User Workshop as a dedicated training event hosted as a satellite event to the Plant and Animal Genome (PAG) conference, in January 2018. We hired conference facilities at the nearby Mariott hotel, and ran a successful 3rd workshop to show recent developments to the platform and to gather feedback about potential improvements to 15 international participants.
Year(s) Of Engagement Activity 2018
 
Description CUDDEL closing workshop/hackathon, EBI 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Closing workshop of the CUDDEL grant, following up on issues outstanding from the 2017 Hong Kong workshop; discussion to explore the feasibility of making a follow up BBSRC Partnering application in the future.
Year(s) Of Engagement Activity 2018
URL https://github.com/ISA-tools/cuddel-mzml2isa-enhance
 
Description ELIXIR-UK ALL-HANDS MEETING 2017 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact The ELIXIR-UK All Hands Meeting provided updates on recent activities from the ELIXIR UK Node and ELIXIR Hub, alongside discussions of future resources, events and roadmapping breakouts.Dr Davey presented the COPO project and CyVerse UK infrastructure as UK-specific resources that were being developed as national infrastructure for UK researchers. There was much interest from the participants in both projects, and conversations at this event led to the submission of a BBSRC TRDF with Gos Micklem (Cambridge), Dr Davey and Dr Shaw (EI).
Year(s) Of Engagement Activity 2017
URL https://www.elixir-europe.org/events/elixir-uk-all-hands-meeting-2017
 
Description ELIXIR-UK AllHands meeting, Birmingham 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Showcasing latest work on FAIRsharing and ISA, as well as discussing how to best connect with other UK resources and those from other Nodes.
Year(s) Of Engagement Activity 2018
URL https://elixiruknode.org/event/elixir-uk-all-hands-2018/
 
Description ISA presentation to GARnet workshop 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact David Johnson - in my team - gave a presentation on "Data Infrastructures to Foster Data Reuse" at a workshop on Integrating Large Data into Plant Science: From Big Data to Discovery hosted by GARnet (the UK network for Arabidopsis researchers) and Egenis (the Exeter Centre for the Study of the Life Sciences). The workshop was held at Dartington Hall in Devon, South West England, and was well attended by researchers from the plant and biological science community worldwide as well as representatives from industry from organisations such as Syngenta.
Year(s) Of Engagement Activity 2016
URL http://isa-tools.org/2016/07/plant-science-takes-a-focus-on-isa/
 
Description NERC DataTree 
Form Of Engagement Activity A broadcast e.g. TV/radio/film/podcast (other than news/press)
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Undergraduate students
Results and Impact Video to introduce the basic concepts of the FAIR principles, FAIR data management and FAIRsharing. The target audience for Data Tree is NERC funded PhD students and early career researchers, however, Data Tree will be an openly available resource.
Year(s) Of Engagement Activity 2017
URL https://datatree.org.uk/
 
Description Poster presentation: ISAcreate and Galaxy; Galaxy conference, Portland 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact ISA-Tab format is now used by Galaxy tools; the discussion helped ensuring the uptake continue
Year(s) Of Engagement Activity 2018
URL https://gccbosc2018.sched.com/event/FEWs/g26-isacreate-a-galaxy-tool-for-prospective-data-management...
 
Description The ELIXIR Plant Use Case - BRAPI meeting 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Ensure the use of ISA formats in the BRAPI API, which is part of the ELIXIR Plant Use Case, and that will connect plant -related ELIXIR Node repositories. This will benefit the ISA-compliant COPO infrastructure, which is also part of the ELIXIR UK Node.
Year(s) Of Engagement Activity 2017
URL https://www.elixir-europe.org/use-cases/plant-sciences