iPlant UK

Lead Research Organisation: University of Warwick
Department Name: Warwick Systems Biology Centre

Abstract

Biology is increasingly a 'big data' science as new high-throughput technologies support faster, cheaper generation of sequencing, metabolite and image data. This enables potentially exciting breakthroughs as researchers spot undiscovered patterns and make new discoveries of biological importance. However, many individual biologists, and in some areas the community as a whole, struggle to take full advantage of the data generated because of a lack of computing resource, appropriate support and technical skill. It is not only the output of data analyses, such as a models, curated datasets, or raw data, that have value to the wider community, but also the tools generated during research projects that are used to support researchers to test and validate their hypotheses. Currently these tools often remain in prototype form, for use only within the group or laboratory that generated them, because there is comparatively little standardisation and no easy means of sharing an accessible, user-friendly version of the tool.

To undertake world-class bioscience, researchers therefore need to be able to store and access datasets, models and analysis tools, ideally from different locations across the globe due to the need for international collaboration. The iPlant Collaborative was funded by US agency the National Science Foundation (NSF) in 2008 to help solve these issues. The iPlant Data Store is a cloud-based storage space, accessed via iPlant's Discovery Environment (DE), a virtual work/lab bench. In the DE, users can share datasets and tools to analyse data with as many or as few people as they wish. Tools to analyse data developed by iPlant staff or built by others can be shared with the wider community, in a similar manner to 'apps' on smartphones.

The iPlant Collaborative is currently distributed across three US locations; we propose to extend this into an international collaboration by building a UK iPlant node at The Genome Analysis Centre (TGAC). TGAC provides the National Capability of computational infrastructure and as such is perfectly situated to provide the foundations for the iPlant UK node. The UK iPlant node would provide independent versions of the iPlant Data Store and DE but would also be linked to the US nodes to share resources and expertise. Physical resource alone is not sufficient for a successful infrastructure: it also needs to be used, maintained and expanded as demand increases. To demonstrate the versatility, power and value of iPlant UK, a dedicated team of programmers based at the Universities of Warwick, Liverpool and Nottingham will adapt tools that have been generated for use in a single project for wider community adoption. Three suites of tools to benefit key areas of UK plant science - sequencing, systems biology and image analysis - will be made available to the global plant research community via the iPlant DE.

In less than 10 years, iPlant has built a global user base of over 18,500 users. As this continues to expand, iPlant's future sustainability must be considered. A UK iPlant node will help ensure the future existence and reliability of iPlant, spread expertise and best practice between the UK and US, allow the UK to input to the future direction of this valuable resource and provide an exemplar project to others wishing to establish future international iPlant nodes.

By establishing iPlant UK and promoting access to a resource that allows users to readily store and analyse their data, this project will help support a wide range of research including genome-wide association projects exploiting natural variation in crops, predicting biological networks and pathways, and the high-throughput imaging and image analysis services that take researchers one step closer to bridging the genotype to phenotype gap.

Technical Summary

New technologies such as next generation sequencing (NGS), high-throughput phenotyping and metabolite profiling have made large data sets, several terabytes in size, a common feature of modern plant biology. However, intelligent re-use and impact of this data is not always fully realised due to a lack of data storage capacity, compute power for analysis, technical skills (which often have to be self-taught or accessed via a collaborator) and limited tool sharing within the community. The NSF-funded iPlant Collaborative aims to help mitigate these problems. It provides three core services: the Data Store, for cloud-based large data storage and retrieval; the Discovery Environment (DE), for user-friendly data analysis software; and Atmosphere, a platform allowing researchers to custom-build virtual workbenches and share these with collaborators anywhere in the world. Data analysis in the DE is achieved via apps, which are built either by iPlant developers or by users. iPlant is structured as a distributed model within the US, spreading effort, expertise and resources between the Texas Advanced Computing Center (TACC), Cold Spring Harbor Laboratory, and the University of Arizona. It was designed with extension and replication in mind, and we propose taking advantage of iPlant's federation capabilities to develop a UK iPlant node at the The Genome Analysis Centre (TGAC). To encourage uptake and demonstrate the power of iPlant services, three suites of tools in the areas of systems biology, image analysis and sequencing data, which are currently only suitable for use by a small number of experts, will be optimised for HPC and adapted for the iPlant environment, thus widening their applicability and user base. A small number of additional tools from the wider community will also be adapted for use in the iPlant Environment via an extended collaborative support programme.

Planned Impact

The principal beneficiaries from iPlant UK are research scientists in academia and industry, BBSRC and other funding bodies. The three suites of tools, covering systems biology, sequencing data management and image-based phenomics, will deliver the first applications to iPlant UK and in doing so will provide proof of concept and establish guidelines and best practice for future users who wish to share their own command line-based research tools via iPlant. This proposal will allow increased availability of BBSRC-funded tools for the global community and will help build a common international biological science platform that prevents duplication of effort and funding. In doing so, rational and supported reuse of data, applications and resources is encouraged.

As the planned community tool development to prime and troubleshoot the system is focused on plant science applications, the main initial beneficiaries will be the plant science research community, from students to senior researchers. However, many of the tools are generic and can be used with any compatible dataset from any organism. Ultimately, iPlant UK will be a community resource for all biologists: the long-term beneficiaries will be anyone working with big data.

Funding bodies will also benefit from iPlant UK. Although sharing raw data has become a standard requirement for publication in recent years, sharing tools developed for data analysis and visualisation is not typical. Where they are shared, whether through an institutional repository or a third-party open data web service such as Figshare or Dryad, their use may be limited by differences in operating systems or the expertise of new users. iPlant UK will provide the tools, guidelines and the platform for developers to share their command line-based workflows with the research community in a user-friendly way. More of the output from publicly funded UK research will therefore be accessible to the wider national and international research community.

Although there is limited opportunity for outreach directly via the personnel requested in this project, all services from the iPlant Collaborative, including the Atmosphere cloud computing platform and the DNA Subway undergraduate teaching tool, will be promoted via invited talks and guest blog posts/articles via the PIs from the iPlant UK team.
 
Description BEIS/UKRI/RCUK Cloud Workshop, London, 24-10-2017
Geographic Reach National 
Policy Influence Type Contribution to a national consultation/review
 
Description UKRI Data Infrastructure Roadmap
Geographic Reach National 
Policy Influence Type Contribution to a national consultation/review
 
Description UKRI Supercomputing Roadmap
Geographic Reach National 
Policy Influence Type Contribution to a national consultation/review
 
Description 16ALERT
Amount £283,383 (GBP)
Funding ID BB/R000662/1 
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 08/2017 
End 08/2018
 
Description A computational cloud framework for the study of gene families
Amount £181,000 (GBP)
Funding ID BB/N023145/1 
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 04/2017 
End 09/2018
 
Description International Wheat Yield Partnership (IWYP).
Amount $2,000,000 (USD)
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 01/2016 
End 01/2019
 
Title CGCore v2 Improvements 
Description As part of the collaboration between the EI COPO project and the CGIAR Big Data Platform, we worked with CGIAR and Crop Ontology developers to improve the CG Core v2 schema for describing CGIAR digital outputs. 
Type Of Material Improvements to research infrastructure 
Year Produced 2018 
Provided To Others? Yes  
Impact Globally, this work will affect all CGIAR Data Managers and users of the COPO platform to deposit data into CG Centre repositories. 
URL https://github.com/collaborative-open-plant-omics/cgcore_schema
 
Description Computational biology for Genomics 
Organisation IBM
Department IBM UK Labs Ltd
Country United Kingdom 
Sector Private 
PI Contribution We have had scoping meetings and with work with Ritesh Krishna on the project
Collaborator Contribution Initial sharing of expertise
Impact Paper https://doi.org/10.1101/2021.02.04.429826 Code https://github.com/JoshuaColmer/HallCircadian
Start Year 2017
 
Description DivSeek Partnership 
Organisation DivSeek International
Sector Learned Society 
PI Contribution I bring infrastructure expertise to this partnership, influencing and impacting policy to provide computational and training capacity to other DivSeek partners. I promote the range of infrastructure projects that are developed in my group at EI, but also solutions developed at other centres that can contribute to the DivSeek consortium. Partners are exposed to EI projects such as COPO, Grassroots (Wheat Information System, CerealsDB, marker design), CyVerse UK and Galaxy, through working group communications and meetings at international conferences such as PAG and RDA. I lead the Data Standards for Interoperable Tools working group, and we aim to collate community-suggested standards and tools, and advise the partnership and their stakeholders in best practice for delivery of sustainable and interoperable infrastructure.
Collaborator Contribution The DivSeek consortium contributes expertise and knowledge exchange in advances in crop diversity, improving our networking and understanding of challenges and potential solutions to social, structural, and biological problems. With over 66 global partners including EI, this is a powerful and highly respected group of research institutes that are working together to enable a step change in efficiency of interactions, leading to improved crop diversity research and data sharing.
Impact EI is a founding partner of DivSeek, and Dr Davey leads one of the new working groups, "Data Standards for Interoperable Tools" (http://www.divseek.org/standards/)
Start Year 2015
 
Description Identification of genes underlying clock mutants in Arabidopsis 
Organisation University of Szeged
Country Hungary 
Sector Academic/University 
PI Contribution Sequence two Arabidopsis mutants and bioinformatically identified candidate genes
Collaborator Contribution Provided us with mutants
Impact none yet
Start Year 2016
 
Description Integration of COPO and CGCore Schemas and Associated Repositories 
Organisation CGIAR
Country France 
Sector Charity/Non Profit 
PI Contribution We have developed a proof-of-concept platform to streamline metadata attribution and dataset deposition into CGIAR repositories using the BBSRC-funded COPO software. Drs Etuk and Shaw, two Research Software Engineers in the Davey group at Earlham Institute and the original core developers, have implemented various new features into COPO to allow CGIAR Data Managers to harmonise and streamline the submission of CG-relevant metadata and data into the CG digital data repositories. All software and infrastructure is hosted within the CyVerse UK cloud. We have: - Implemented support of CG Core v.2.0. (http://repo.mel.cgiar.org/handle/20.500.11766/4764) metadata annotation of various data types, including publications, produced at the CGIAR institutes via the existing COPO wizard system. - Implemented support of submissions of annotated objects to institutional instances of the following repositories: dSpace (https://www.duraspace.org/dspace/), CKAN (https://ckan.org/) and Dataverse (https://dataverse.org/). - Designed and implemented a mechanism within COPO which controls which users can submit to which repositories. - Implemented support the annotation of variables within data sets (i.e. column headings; experiment condition descriptors etc) with terms and URIs from ontologies or controlled vocabularies/trait dictionaries (AGROVOC and GACS).
Collaborator Contribution CGIAR have provided coordination contributions with key members in the CG Centres to gather feedback on developed elements, as well as provided funds to allow a core CGCore metadata schema developer to travel to EI and work with Drs Etuk and Shaw to improve the CGCore schema.
Impact This collaboration has seen rapid development of key functionality in the COPO platform to support CG centre Data Managers. This has required technical skills to develop the software, biocuration expertise provided by CGIAR to improve and refine the CGCore metadata schema, ontology expertise from the Bioversity team in Montpellier, and coordination expertise from Dr Davey (EI) and Medha Devare (CGIAR). Software and Technical Products (Webtool/Application - Collaborative Open Plant Omics (COPO) (2017)): All software code developed is open source and can be found within the COPO Github repository: https://github.com/collaborative-open-plant-omics/COPO
Start Year 2018
 
Description Wheat Information System (WheatIS) 
Organisation Cold Spring Harbor Laboratory (CSHL)
Country United States 
Sector Charity/Non Profit 
PI Contribution The Grassroots infrastructure (https://grassroots.tools) developed at EI is being used to consolidate data and analyses, facilitating consistent approaches to generating, processing and disseminating public wheat datasets. The Grassroots infrastructure comprises: a data management layer to provide structure to unstructured filesystems; interfaces to interact with local or cloud-based analysis platforms; a search layer to provide multi-faceted metadata and literature querying; a web server layer to deliver content and provide access to public programmatic interfaces. EI has an extensive National Capability to provide scientific computing hardware to the UK research community and is therefore perfectly positioned to build a point-of-access to previously disparate resources to serve wheat breeders, biologists and bioinformaticians. Coupling the Grassroots project with BBSRC-funded efforts to bring Galaxy and CyVerse UK to UK researchers provides community standardised methodologies for data integration, interpretation and discovery in wheat. These resources are designed to be queried programmatically, and we are integrating them with other WheatIS resources (such as CerealsDB) accordingly via open source and freely available infrastructure. By doing so we will be promoting and facilitating an inclusive and collaborative community of experts to provide access to an interconnected network of wheat data to a scale that was simply not available previously. EI also has representation on the WheatIS Expert Working Group, meeting yearly at PAG to discuss strategy and policy for the Wheat Initiative.
Collaborator Contribution All WheatIS partners contribute to the global effort in harmonising, standardising, and sharing wheat data in a way that is technically sensible and user focused, thus minimising cost across a multi-faceted and independently funded project.
Impact This collaboration is multi-disciplinary in scope, undertaken by biologists, bioinformaticians, and breeders. Wheat Data Interoperability Guidelines - https://ist.blogs.inra.fr/wdi/
Start Year 2011
 
Description Wheat Information System (WheatIS) 
Organisation EMBL European Bioinformatics Institute (EMBL - EBI)
Country United Kingdom 
Sector Academic/University 
PI Contribution The Grassroots infrastructure (https://grassroots.tools) developed at EI is being used to consolidate data and analyses, facilitating consistent approaches to generating, processing and disseminating public wheat datasets. The Grassroots infrastructure comprises: a data management layer to provide structure to unstructured filesystems; interfaces to interact with local or cloud-based analysis platforms; a search layer to provide multi-faceted metadata and literature querying; a web server layer to deliver content and provide access to public programmatic interfaces. EI has an extensive National Capability to provide scientific computing hardware to the UK research community and is therefore perfectly positioned to build a point-of-access to previously disparate resources to serve wheat breeders, biologists and bioinformaticians. Coupling the Grassroots project with BBSRC-funded efforts to bring Galaxy and CyVerse UK to UK researchers provides community standardised methodologies for data integration, interpretation and discovery in wheat. These resources are designed to be queried programmatically, and we are integrating them with other WheatIS resources (such as CerealsDB) accordingly via open source and freely available infrastructure. By doing so we will be promoting and facilitating an inclusive and collaborative community of experts to provide access to an interconnected network of wheat data to a scale that was simply not available previously. EI also has representation on the WheatIS Expert Working Group, meeting yearly at PAG to discuss strategy and policy for the Wheat Initiative.
Collaborator Contribution All WheatIS partners contribute to the global effort in harmonising, standardising, and sharing wheat data in a way that is technically sensible and user focused, thus minimising cost across a multi-faceted and independently funded project.
Impact This collaboration is multi-disciplinary in scope, undertaken by biologists, bioinformaticians, and breeders. Wheat Data Interoperability Guidelines - https://ist.blogs.inra.fr/wdi/
Start Year 2011
 
Description Wheat Information System (WheatIS) 
Organisation French National Institute of Agricultural Research
Department INRA Versailles
Country France 
Sector Academic/University 
PI Contribution The Grassroots infrastructure (https://grassroots.tools) developed at EI is being used to consolidate data and analyses, facilitating consistent approaches to generating, processing and disseminating public wheat datasets. The Grassroots infrastructure comprises: a data management layer to provide structure to unstructured filesystems; interfaces to interact with local or cloud-based analysis platforms; a search layer to provide multi-faceted metadata and literature querying; a web server layer to deliver content and provide access to public programmatic interfaces. EI has an extensive National Capability to provide scientific computing hardware to the UK research community and is therefore perfectly positioned to build a point-of-access to previously disparate resources to serve wheat breeders, biologists and bioinformaticians. Coupling the Grassroots project with BBSRC-funded efforts to bring Galaxy and CyVerse UK to UK researchers provides community standardised methodologies for data integration, interpretation and discovery in wheat. These resources are designed to be queried programmatically, and we are integrating them with other WheatIS resources (such as CerealsDB) accordingly via open source and freely available infrastructure. By doing so we will be promoting and facilitating an inclusive and collaborative community of experts to provide access to an interconnected network of wheat data to a scale that was simply not available previously. EI also has representation on the WheatIS Expert Working Group, meeting yearly at PAG to discuss strategy and policy for the Wheat Initiative.
Collaborator Contribution All WheatIS partners contribute to the global effort in harmonising, standardising, and sharing wheat data in a way that is technically sensible and user focused, thus minimising cost across a multi-faceted and independently funded project.
Impact This collaboration is multi-disciplinary in scope, undertaken by biologists, bioinformaticians, and breeders. Wheat Data Interoperability Guidelines - https://ist.blogs.inra.fr/wdi/
Start Year 2011
 
Description Wheat Information System (WheatIS) 
Organisation Helmholtz Association of German Research Centres
Department Helmholtz Zentrum Munchen
Country Germany 
Sector Academic/University 
PI Contribution The Grassroots infrastructure (https://grassroots.tools) developed at EI is being used to consolidate data and analyses, facilitating consistent approaches to generating, processing and disseminating public wheat datasets. The Grassroots infrastructure comprises: a data management layer to provide structure to unstructured filesystems; interfaces to interact with local or cloud-based analysis platforms; a search layer to provide multi-faceted metadata and literature querying; a web server layer to deliver content and provide access to public programmatic interfaces. EI has an extensive National Capability to provide scientific computing hardware to the UK research community and is therefore perfectly positioned to build a point-of-access to previously disparate resources to serve wheat breeders, biologists and bioinformaticians. Coupling the Grassroots project with BBSRC-funded efforts to bring Galaxy and CyVerse UK to UK researchers provides community standardised methodologies for data integration, interpretation and discovery in wheat. These resources are designed to be queried programmatically, and we are integrating them with other WheatIS resources (such as CerealsDB) accordingly via open source and freely available infrastructure. By doing so we will be promoting and facilitating an inclusive and collaborative community of experts to provide access to an interconnected network of wheat data to a scale that was simply not available previously. EI also has representation on the WheatIS Expert Working Group, meeting yearly at PAG to discuss strategy and policy for the Wheat Initiative.
Collaborator Contribution All WheatIS partners contribute to the global effort in harmonising, standardising, and sharing wheat data in a way that is technically sensible and user focused, thus minimising cost across a multi-faceted and independently funded project.
Impact This collaboration is multi-disciplinary in scope, undertaken by biologists, bioinformaticians, and breeders. Wheat Data Interoperability Guidelines - https://ist.blogs.inra.fr/wdi/
Start Year 2011
 
Description Wheat Information System (WheatIS) 
Organisation International Centre for Maize and Wheat Improvement (CIMMYT)
Country Mexico 
Sector Charity/Non Profit 
PI Contribution The Grassroots infrastructure (https://grassroots.tools) developed at EI is being used to consolidate data and analyses, facilitating consistent approaches to generating, processing and disseminating public wheat datasets. The Grassroots infrastructure comprises: a data management layer to provide structure to unstructured filesystems; interfaces to interact with local or cloud-based analysis platforms; a search layer to provide multi-faceted metadata and literature querying; a web server layer to deliver content and provide access to public programmatic interfaces. EI has an extensive National Capability to provide scientific computing hardware to the UK research community and is therefore perfectly positioned to build a point-of-access to previously disparate resources to serve wheat breeders, biologists and bioinformaticians. Coupling the Grassroots project with BBSRC-funded efforts to bring Galaxy and CyVerse UK to UK researchers provides community standardised methodologies for data integration, interpretation and discovery in wheat. These resources are designed to be queried programmatically, and we are integrating them with other WheatIS resources (such as CerealsDB) accordingly via open source and freely available infrastructure. By doing so we will be promoting and facilitating an inclusive and collaborative community of experts to provide access to an interconnected network of wheat data to a scale that was simply not available previously. EI also has representation on the WheatIS Expert Working Group, meeting yearly at PAG to discuss strategy and policy for the Wheat Initiative.
Collaborator Contribution All WheatIS partners contribute to the global effort in harmonising, standardising, and sharing wheat data in a way that is technically sensible and user focused, thus minimising cost across a multi-faceted and independently funded project.
Impact This collaboration is multi-disciplinary in scope, undertaken by biologists, bioinformaticians, and breeders. Wheat Data Interoperability Guidelines - https://ist.blogs.inra.fr/wdi/
Start Year 2011
 
Description Wheat Information System (WheatIS) 
Organisation Monogram Network
Sector Academic/University 
PI Contribution The Grassroots infrastructure (https://grassroots.tools) developed at EI is being used to consolidate data and analyses, facilitating consistent approaches to generating, processing and disseminating public wheat datasets. The Grassroots infrastructure comprises: a data management layer to provide structure to unstructured filesystems; interfaces to interact with local or cloud-based analysis platforms; a search layer to provide multi-faceted metadata and literature querying; a web server layer to deliver content and provide access to public programmatic interfaces. EI has an extensive National Capability to provide scientific computing hardware to the UK research community and is therefore perfectly positioned to build a point-of-access to previously disparate resources to serve wheat breeders, biologists and bioinformaticians. Coupling the Grassroots project with BBSRC-funded efforts to bring Galaxy and CyVerse UK to UK researchers provides community standardised methodologies for data integration, interpretation and discovery in wheat. These resources are designed to be queried programmatically, and we are integrating them with other WheatIS resources (such as CerealsDB) accordingly via open source and freely available infrastructure. By doing so we will be promoting and facilitating an inclusive and collaborative community of experts to provide access to an interconnected network of wheat data to a scale that was simply not available previously. EI also has representation on the WheatIS Expert Working Group, meeting yearly at PAG to discuss strategy and policy for the Wheat Initiative.
Collaborator Contribution All WheatIS partners contribute to the global effort in harmonising, standardising, and sharing wheat data in a way that is technically sensible and user focused, thus minimising cost across a multi-faceted and independently funded project.
Impact This collaboration is multi-disciplinary in scope, undertaken by biologists, bioinformaticians, and breeders. Wheat Data Interoperability Guidelines - https://ist.blogs.inra.fr/wdi/
Start Year 2011
 
Description Wheat Information System (WheatIS) 
Organisation Rothamsted Research
Country United Kingdom 
Sector Academic/University 
PI Contribution The Grassroots infrastructure (https://grassroots.tools) developed at EI is being used to consolidate data and analyses, facilitating consistent approaches to generating, processing and disseminating public wheat datasets. The Grassroots infrastructure comprises: a data management layer to provide structure to unstructured filesystems; interfaces to interact with local or cloud-based analysis platforms; a search layer to provide multi-faceted metadata and literature querying; a web server layer to deliver content and provide access to public programmatic interfaces. EI has an extensive National Capability to provide scientific computing hardware to the UK research community and is therefore perfectly positioned to build a point-of-access to previously disparate resources to serve wheat breeders, biologists and bioinformaticians. Coupling the Grassroots project with BBSRC-funded efforts to bring Galaxy and CyVerse UK to UK researchers provides community standardised methodologies for data integration, interpretation and discovery in wheat. These resources are designed to be queried programmatically, and we are integrating them with other WheatIS resources (such as CerealsDB) accordingly via open source and freely available infrastructure. By doing so we will be promoting and facilitating an inclusive and collaborative community of experts to provide access to an interconnected network of wheat data to a scale that was simply not available previously. EI also has representation on the WheatIS Expert Working Group, meeting yearly at PAG to discuss strategy and policy for the Wheat Initiative.
Collaborator Contribution All WheatIS partners contribute to the global effort in harmonising, standardising, and sharing wheat data in a way that is technically sensible and user focused, thus minimising cost across a multi-faceted and independently funded project.
Impact This collaboration is multi-disciplinary in scope, undertaken by biologists, bioinformaticians, and breeders. Wheat Data Interoperability Guidelines - https://ist.blogs.inra.fr/wdi/
Start Year 2011
 
Description Wheat Information System (WheatIS) 
Organisation U.S. Department of Agriculture USDA
Department Agricultural Research Service
Country United States 
Sector Public 
PI Contribution The Grassroots infrastructure (https://grassroots.tools) developed at EI is being used to consolidate data and analyses, facilitating consistent approaches to generating, processing and disseminating public wheat datasets. The Grassroots infrastructure comprises: a data management layer to provide structure to unstructured filesystems; interfaces to interact with local or cloud-based analysis platforms; a search layer to provide multi-faceted metadata and literature querying; a web server layer to deliver content and provide access to public programmatic interfaces. EI has an extensive National Capability to provide scientific computing hardware to the UK research community and is therefore perfectly positioned to build a point-of-access to previously disparate resources to serve wheat breeders, biologists and bioinformaticians. Coupling the Grassroots project with BBSRC-funded efforts to bring Galaxy and CyVerse UK to UK researchers provides community standardised methodologies for data integration, interpretation and discovery in wheat. These resources are designed to be queried programmatically, and we are integrating them with other WheatIS resources (such as CerealsDB) accordingly via open source and freely available infrastructure. By doing so we will be promoting and facilitating an inclusive and collaborative community of experts to provide access to an interconnected network of wheat data to a scale that was simply not available previously. EI also has representation on the WheatIS Expert Working Group, meeting yearly at PAG to discuss strategy and policy for the Wheat Initiative.
Collaborator Contribution All WheatIS partners contribute to the global effort in harmonising, standardising, and sharing wheat data in a way that is technically sensible and user focused, thus minimising cost across a multi-faceted and independently funded project.
Impact This collaboration is multi-disciplinary in scope, undertaken by biologists, bioinformaticians, and breeders. Wheat Data Interoperability Guidelines - https://ist.blogs.inra.fr/wdi/
Start Year 2011
 
Description Wheat Information System (WheatIS) 
Organisation University of Bristol
Country United Kingdom 
Sector Academic/University 
PI Contribution The Grassroots infrastructure (https://grassroots.tools) developed at EI is being used to consolidate data and analyses, facilitating consistent approaches to generating, processing and disseminating public wheat datasets. The Grassroots infrastructure comprises: a data management layer to provide structure to unstructured filesystems; interfaces to interact with local or cloud-based analysis platforms; a search layer to provide multi-faceted metadata and literature querying; a web server layer to deliver content and provide access to public programmatic interfaces. EI has an extensive National Capability to provide scientific computing hardware to the UK research community and is therefore perfectly positioned to build a point-of-access to previously disparate resources to serve wheat breeders, biologists and bioinformaticians. Coupling the Grassroots project with BBSRC-funded efforts to bring Galaxy and CyVerse UK to UK researchers provides community standardised methodologies for data integration, interpretation and discovery in wheat. These resources are designed to be queried programmatically, and we are integrating them with other WheatIS resources (such as CerealsDB) accordingly via open source and freely available infrastructure. By doing so we will be promoting and facilitating an inclusive and collaborative community of experts to provide access to an interconnected network of wheat data to a scale that was simply not available previously. EI also has representation on the WheatIS Expert Working Group, meeting yearly at PAG to discuss strategy and policy for the Wheat Initiative.
Collaborator Contribution All WheatIS partners contribute to the global effort in harmonising, standardising, and sharing wheat data in a way that is technically sensible and user focused, thus minimising cost across a multi-faceted and independently funded project.
Impact This collaboration is multi-disciplinary in scope, undertaken by biologists, bioinformaticians, and breeders. Wheat Data Interoperability Guidelines - https://ist.blogs.inra.fr/wdi/
Start Year 2011
 
Description Wheat Information System (WheatIS) 
Organisation University of California, Davis
Department UC Davis College of Biological Sciences
Country United States 
Sector Academic/University 
PI Contribution The Grassroots infrastructure (https://grassroots.tools) developed at EI is being used to consolidate data and analyses, facilitating consistent approaches to generating, processing and disseminating public wheat datasets. The Grassroots infrastructure comprises: a data management layer to provide structure to unstructured filesystems; interfaces to interact with local or cloud-based analysis platforms; a search layer to provide multi-faceted metadata and literature querying; a web server layer to deliver content and provide access to public programmatic interfaces. EI has an extensive National Capability to provide scientific computing hardware to the UK research community and is therefore perfectly positioned to build a point-of-access to previously disparate resources to serve wheat breeders, biologists and bioinformaticians. Coupling the Grassroots project with BBSRC-funded efforts to bring Galaxy and CyVerse UK to UK researchers provides community standardised methodologies for data integration, interpretation and discovery in wheat. These resources are designed to be queried programmatically, and we are integrating them with other WheatIS resources (such as CerealsDB) accordingly via open source and freely available infrastructure. By doing so we will be promoting and facilitating an inclusive and collaborative community of experts to provide access to an interconnected network of wheat data to a scale that was simply not available previously. EI also has representation on the WheatIS Expert Working Group, meeting yearly at PAG to discuss strategy and policy for the Wheat Initiative.
Collaborator Contribution All WheatIS partners contribute to the global effort in harmonising, standardising, and sharing wheat data in a way that is technically sensible and user focused, thus minimising cost across a multi-faceted and independently funded project.
Impact This collaboration is multi-disciplinary in scope, undertaken by biologists, bioinformaticians, and breeders. Wheat Data Interoperability Guidelines - https://ist.blogs.inra.fr/wdi/
Start Year 2011
 
Description Wheat Information System (WheatIS) 
Organisation University of Western Australia
Country Australia 
Sector Academic/University 
PI Contribution The Grassroots infrastructure (https://grassroots.tools) developed at EI is being used to consolidate data and analyses, facilitating consistent approaches to generating, processing and disseminating public wheat datasets. The Grassroots infrastructure comprises: a data management layer to provide structure to unstructured filesystems; interfaces to interact with local or cloud-based analysis platforms; a search layer to provide multi-faceted metadata and literature querying; a web server layer to deliver content and provide access to public programmatic interfaces. EI has an extensive National Capability to provide scientific computing hardware to the UK research community and is therefore perfectly positioned to build a point-of-access to previously disparate resources to serve wheat breeders, biologists and bioinformaticians. Coupling the Grassroots project with BBSRC-funded efforts to bring Galaxy and CyVerse UK to UK researchers provides community standardised methodologies for data integration, interpretation and discovery in wheat. These resources are designed to be queried programmatically, and we are integrating them with other WheatIS resources (such as CerealsDB) accordingly via open source and freely available infrastructure. By doing so we will be promoting and facilitating an inclusive and collaborative community of experts to provide access to an interconnected network of wheat data to a scale that was simply not available previously. EI also has representation on the WheatIS Expert Working Group, meeting yearly at PAG to discuss strategy and policy for the Wheat Initiative.
Collaborator Contribution All WheatIS partners contribute to the global effort in harmonising, standardising, and sharing wheat data in a way that is technically sensible and user focused, thus minimising cost across a multi-faceted and independently funded project.
Impact This collaboration is multi-disciplinary in scope, undertaken by biologists, bioinformaticians, and breeders. Wheat Data Interoperability Guidelines - https://ist.blogs.inra.fr/wdi/
Start Year 2011
 
Title APPLES - Analysis of Plant Promoter-Linked Elements 
Description The APPLES software package is a set of tools to analyse promoter sequences on a genome-wide scale. Two functionalities are provided in this version: 1. Finding Orthologs as Reciprocal Best Hits (APPLES_rbh) 2. Finding Non-Coding Conserved Regions (APPLES_conservation). 
Type Of Technology Webtool/Application 
Year Produced 2016 
Impact By publishing this tool on the CyVerse Discovery Environment, its accessibility to the research community has been greatly improved. 
URL https://de.cyverse.org/de/?type=apps&app-id=d99ca952-dbe2-11e6-9e37-0242ac120003
 
Title BHC - Bayesian Hierarchical Clustering 
Description A clustering algorithm for expression data originally made available in R, allows for the analysis of both time course or multiple static datasets 
Type Of Technology Webtool/Application 
Year Produced 2016 
Impact By publishing this tool on the CyVerse Discovery Environment, its accessibility to the research community has been greatly improved. 
URL https://de.cyverse.org/de/?type=apps&app-id=1e03e32e-4e87-11e6-bd1d-0242ac120003
 
Title BWA_Alignment-_produces_sorted_+_indexed_BAM_output 
Description Workflow - Burrows Wheeler MEM alignment into samtools BAM sorting 
Type Of Technology Software 
Year Produced 2015 
Impact none 
URL http://www.iplantcollaborative.org
 
Title Bisque-compliant Roottrace 
Description This is a re-implementation of the software tool Roottrace, recoded to fit into iPlant's Bisque environment. The major difficulty in this was allowing the user input required by Rootrace, which does not fit the basic Bisque model. 
Type Of Technology Software 
Year Produced 2017 
Open Source License? Yes  
Impact Rootrace can now be made available to a wider community via iPlant. 
URL https://github.com/Khalid-ismail/RootTrace_iPlant
 
Title Bowtie-2.2.1--Build-and-Map_for_workflows 
Description Bowtie 2 alignment, utilised by the virus read filter aligner app 
Type Of Technology Software 
Year Produced 2015 
Impact none 
URL http://www.iplantcollaborative.org
 
Title CSI - Causal Structure Inference 
Description A network inference algorithm capable of inferring causal regulatory network models from time course expression data 
Type Of Technology Webtool/Application 
Year Produced 2016 
Impact By publishing this tool on the CyVerse Discovery Environment, its accessibility to the research community has been greatly improved. 
URL https://de.cyverse.org/de/?type=apps&app-id=12659e20-1c39-11e6-8842-0242ac120003
 
Title Collaborative Open Plant Omics (COPO) 
Description COPO streamlines the process of data deposition to public repositories by hiding much of the complexity of metadata capture and data management from the end-user. The ISA infrastructure (www.isa-tools.org) is leveraged to provide the interoperability between metadata formats required for seamless deposition to repositories. COPO facilitates the links to data analysis platforms such as CyVerse UK and Galaxy. Logical groupings of artefacts (e.g. PDFs, raw data, contextual supplementary information) relating to a body of work are stored in COPO collections and represented by common standards, which are publicly searchable. Bundles of multiple data objects themselves can then be deposited directly into public repositories through COPO interfaces. This improvement output represents the beta release of the COPO platform in 2017. 
Type Of Technology Webtool/Application 
Year Produced 2017 
Open Source License? Yes  
Impact COPO has been added to the ELIXIR-UK roadmap for ELIXIR core data services, and is currently being used by EI and JIC researchers to deposit real, large scale sequencing datasets into the European Nucleotide Archive. COPO is also being investigated as a potential data entry tool for the CGIAR Big Data project, and this will be explored in a joint EAGER submission with CIMMYT. COPO has also been selected to act as one of the data ingestion pipelines for data arising from the Designing Future Wheat programme, depositing open data into the Grassroots repository. COPO is also being included in grant submissions to assist vertebrate and wheat communities in effective metadata management. COPO runs within the CyVerse UK National Capability infrastructure. 
URL https://copo-project.org
 
Title CyVerse UK software stack deployment 
Description The CyVerse (formerly iPlant) UK project at EI provides hardware resources in an easy to use manner through a web interface called the Discovery Environment (DE), as well as developer and bioinformatician access through APIs and software. A series of commands, called a pipeline, is combined into a script and / or a virtualised operating system container image called Docker. The pipeline can run on any hardware available to the implementer, which in this case will be the extensive HTCondor cluster set up at EI. Once a pipeline is running correctly on through the raw scheduler, the app can be registered on the Agave API (http://www.agaveapi.co). This is enabled through constructing JSON files that specify input sources together with user-supplied and default parameters that are necessary for the pipeline to run. Once a pipeline is registered through Agave, it is available as a GUI "app" through the DE, and can be made public after testing. 
Type Of Technology Grid Application 
Year Produced 2016 
Impact The EI CyVerse hardware enables the bioinformatics pipelines developed by the project partners (Univ's. Liverpool, Nottingham, Warwick) to be run on this HPC environment. Once deployed in the CyVerse UK environment, these tools can then be made available globally through the CyVerse Discovery Environment, reaching upwards of 18000 potential users. We have released this infrastructure and are accepting users from the UK research community to start using the hardware. 
URL http://cyverseuk.org/about/cyverse-uk-projects/tgac/
 
Title DFW cloud HPC resources 
Description Designing Future Wheat researchers are able to request virtual machines within CyVerse UK to undertake bioinformatics analysis. 
Type Of Technology Grid Application 
Year Produced 2019 
Impact We have produced a robust and secure cloud framework within CyVerse UK to allow DFW researchers to access DFW and public data to analyse, as well as upload their own. We have already completed two successful pilot projects with external collaborators, and are now making the services available to all DFW researchers. 
URL http://cyverseuk.org/about/collaborations/designing-future-wheat/
 
Title Filter_Virus_Associated_Reads_From_Host_Reads 
Description Workflow - Bowtie 2 into samtools filtering of reads 
Type Of Technology Software 
Year Produced 2015 
Impact none 
URL http://www.iplantcollaborative.org
 
Title GP2S - Gaussian Process Two-Sample test of Differential Expression 
Description A differential expression algorithm for time series data with a two condition (eg. control/treated) experimental design 
Type Of Technology Webtool/Application 
Year Produced 2016 
Impact By publishing this tool on the CyVerse Discovery Environment, its accessibility to the research community has been greatly improved. 
URL https://de.cyverse.org/de/?type=apps&app-id=655a8432-7432-11e6-a6f8-0242ac120003
 
Title GWASSER app 
Description GWASSER is an R based script for performing simple genome wide association using statistical modelling. 
Type Of Technology Webtool/Application 
Year Produced 2016 
Impact The GWASSER app is now available for users of the CyVerse UK platform. 
URL http://cyverseuk.org/applications/gwasser/
 
Title Gradient Tool 
Description An algorithm for the identification of the time of change from single condition time course expression data 
Type Of Technology Webtool/Application 
Year Produced 2016 
Impact By publishing this tool on the CyVerse Discovery Environment, its accessibility to the research community has been greatly improved. 
URL https://de.cyverse.org/de/?type=apps&app-id=11d9f454-78d4-11e6-9314-0242ac120003
 
Title HMT - Hypergeometric Motif Test 
Description A transcription factor binding site overrepresentation analysis algorithm for known motifs 
Type Of Technology Webtool/Application 
Year Produced 2016 
Impact By publishing this tool on the CyVerse Discovery Environment, its accessibility to the research community has been greatly improved. 
URL https://de.cyverse.org/de/?type=apps&app-id=818d8ce0-5e4c-11e6-ac0d-0242ac120003
 
Title Local CyVerse Discovery Environment 
Description Full-stack deployment of CyVerse Discovery Environment 
Type Of Technology Software 
Year Produced 2016 
Open Source License? Yes  
Impact By having a local implementation of the DE infrastructure, we have the ability to 1. Test our software without delay caused by involving CyVerse US. 2. Have an independent platform to share our software and data. 
URL https://cyverse.warwick.ac.uk/de/
 
Title MEME-LaB 
Description A transcription factor binding site overrepresentation analysis algorithm with novel motif discovery 
Type Of Technology Webtool/Application 
Year Produced 2016 
Impact By publishing this tool on the CyVerse Discovery Environment, its accessibility to the research community has been greatly improved. 
URL https://de.cyverse.org/de/?type=apps&app-id=b781fc48-8edd-11e6-b4ab-0242ac120003
 
Title Mikado app 
Description Developed at EI, Mikado is a lightweight Python3 pipeline to identify the most useful or "best" set of transcripts from multiple transcript assemblies. 
Type Of Technology Webtool/Application 
Year Produced 2016 
Impact Mikado is now a DE app, and is available to users of the CyVerse environment. 
URL http://cyverseuk.org/applications/mikado-determine-and-select-the-best-rna-seq-prediction/
 
Title Polymarker app 
Description PolyMarker is an automated bioinformatics pipeline for SNP assay development which increases the probability of generating homoeologue-specific assays for polyploid wheat. 
Type Of Technology Webtool/Application 
Year Produced 2016 
Impact The Polymarker app is now available for users through the CyVerse platform. 
URL http://cyverseuk.org/applications/polymarker/
 
Title SAM_to_BAM_format_conversion 
Description Converts SAM format files to BAM format files 
Type Of Technology Software 
Year Produced 2015 
Impact none 
URL http://www.iplantcollaborative.org
 
Title Samtools_Flagstat 
Description Analyse the quality of the alignment contained within a BAM file 
Type Of Technology Software 
Year Produced 2015 
Impact none 
URL http://www.iplantcollaborative.org
 
Title Samtools_bamtofastq 
Description Converts BAM files to FASTQ files 
Type Of Technology Software 
Year Produced 2015 
Impact none 
URL http://www.iplantcollaborative.org
 
Title Samtools_bamtofastq__Version_1.2_-_with_options 
Description Updated version of the above, with control over additional input arguments 
Type Of Technology Software 
Year Produced 2015 
Impact none 
URL http://www.iplantcollaborative.org
 
Title Samtools_rmdup_-_remove_PCR_duplicates 
Description Removes PCR duplicates from a BAM file 
Type Of Technology Software 
Year Produced 2015 
Impact none 
URL http://www.iplantcollaborative.org
 
Title Samtools_sort_-_sort_BAM_file__app_for_workflows 
Description BAM file sorter, utilised by the BWA aligner app 
Type Of Technology Software 
Year Produced 2015 
Impact none 
URL http://www.iplantcollaborative.org
 
Title Samtools_view_-_Filter_mapped_or_unmapped_reads 
Description Filtering of mapped or unmapped reads, utilised by the virus read filter aligner app 
Type Of Technology Software 
Year Produced 2015 
Impact none 
URL http://www.iplantcollaborative.org
 
Title TCAP - Temporal Clustering by Affinity Propagation 
Description A clustering algorithm for time course expression data, identifies complex regulatory groups thanks to a rich information measure 
Type Of Technology Webtool/Application 
Year Produced 2016 
Impact By publishing this tool on the CyVerse Discovery Environment, its accessibility to the research community has been greatly improved. 
URL https://de.cyverse.org/de/?type=apps&app-id=d874c350-ad90-11e6-a854-0242ac120003
 
Title The Grassroots Infrastructure 
Description The Grassroots software is an open source "as-a-Service" stack that powers a number of data dissemination and analysis activities at EI, and other sites such as CerealsDB at the University of Bristol. We have continued to develop the functionality within the software stack to share crop-related datasets. 
Type Of Technology Webtool/Application 
Year Produced 2018 
Open Source License? Yes  
Impact Grassroots has previously been used to host the Field Pathogenomics project website and Yellow Rust map, the EI wheat BLAST service, the CerealsDB federation project, and the multi-scale improvements to the Polymarker marker design software. Recently, Grassroots has been put forward as the main data repository and metadata catalogue for the Designing Future Wheat project, and has started to host data from this project, the Open Wild Wheat Consortium, and 5 new wheat genomes from EI. The Grassroots service runs within the CyVerse UK National Capability infrastructure. 
URL https://grassroots.tools/
 
Title Tuxedo_suite_PE_up_to_4_conditions 
Description A complete Tuxedo suite workflow, going from RNA-Seq reads to differentially expressed gene lists. Utilises Tophat, Cufflinks/Cuffdiff and CummeRbund 
Type Of Technology Software 
Year Produced 2015 
Impact none 
URL http://www.iplantcollaborative.org
 
Title Wellington Bootstrap 
Description An algorithm for the identification of regions occupied by proteins in DNase-seq data, performing a differential analysis between two samples 
Type Of Technology Webtool/Application 
Year Produced 2016 
Impact By publishing this tool on the CyVerse Discovery Environment, its accessibility to the research community has been greatly improved. 
URL https://de.cyverse.org/de/?type=apps&app-id=cbf83e84-1cf1-11e6-b710-0242ac120003
 
Title Wellington Footprint 
Description An algorithm for the identification of regions occupied by proteins in DNase-seq data 
Type Of Technology Webtool/Application 
Year Produced 2016 
Impact By publishing this tool on the CyVerse Discovery Environment, its accessibility to the research community has been greatly improved. 
URL https://de.cyverse.org/de/?type=apps&app-id=035655fc-2736-11e6-ac3b-0242ac120003
 
Title Wigwams 
Description An algorithm for the extraction of gene groups co-regulated across subsets of multiple time course datasets 
Type Of Technology Webtool/Application 
Year Produced 2016 
Impact By publishing this tool on the CyVerse Discovery Environment, its accessibility to the research community has been greatly improved. 
URL https://de.cyverse.org/de/?type=apps&app-id=d5d04224-1cf8-11e6-81c4-0242ac120003
 
Title hCSI - Hierarchical Causal Structure Inference 
Description An expansion of CSI network inference to handle multiple time course datasets 
Type Of Technology Webtool/Application 
Year Produced 2016 
Impact By publishing this tool on the CyVerse Discovery Environment, its accessibility to the research community has been greatly improved. 
URL https://de.cyverse.org/de/?type=apps&app-id=ae88f3b0-1c3e-11e6-b0d6-0242ac120003
 
Title kallisto app 
Description kallisto is a program for quantifying abundances of transcripts from RNASeq data. 
Type Of Technology Webtool/Application 
Year Produced 2016 
Impact The kallisto app is now availabe for users of the CyVerse UK platform. 
URL http://cyverseuk.org/applications/kallisto/
 
Title oCSI - Orthologous Causal Structure Inference 
Description An expansion of CSI network inference to handle data from multiple organisms 
Type Of Technology Webtool/Application 
Year Produced 2016 
Impact By publishing this tool on the CyVerse Discovery Environment, its accessibility to the research community has been greatly improved. 
URL https://de.cyverse.org/de/?type=apps&app-id=429173d2-1c46-11e6-aaba-0242ac120003
 
Description Building infrastructure for open science - British Computer Society 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Invited speaker at the Advanced Programming Group annual Christmas lecture
Year(s) Of Engagement Activity 2015
URL http://www.bcs.org/category/18516
 
Description CyVerse UK Workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Other audiences
Results and Impact This meeting is focused on researchers who are either toward the beginning of their studies or have moved onto a new subject area. We will provide a hands-on sessions that will describe the use of software tools that can interrogate RNAseq, imaging, gene expression or GWAS data. Previous CyVerse users will provide real-life examples of how the software has been successfully used. This is the Learner Track

In addition we will host a concurrent track for more experienced bioinformaticians who wish to learn how to use CyVerse to host their own programs. This is the Intermediate Track. The concurrent tracks will run in separate rooms.

These software tools have been developed as part of the CyVerseUK grant. We will also highlight the opportunities that exist for the sharing of big data in a meaningful manner. This workshop is organised by GARNet with Professor Katherine Denby at the University of York.
Year(s) Of Engagement Activity 2017
URL http://cyverseuk.org/events/cyverse-uk-workshop/
 
Description CyVerse for Brassica: Performing Associative Transcriptomics by Integrating with Sequence and Phenotype Repositories 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact As part of the "CyVerse - Software, Tools, and Services for Data-Driven Discovery" workshop, Annemarie Eckes in the Davey group at EI spoke about Associative Transcriptomics (AT). AT is a method that links a physical genome, via the transcriptome, to quantitative phenotypic information. For complex polyploid crops such as Brassica napus, AT can be used to facilitate the identification of SNP markers. However, there are certain problems in performing AT for the Brassica Community: 1) this process is often data-intensive, as it commonly relies on large-scale genotypic and phenotypic raw data, and not all research groups have the computational capacity to do such analysis; 2) many groups are still dependent on the expertise of a small number of researchers who are able to generate AT data. With the help of CyVerse UK (http://cyverseuk.org), we are developing a reproducible workflow to make AT analysis available to the UK Brassica Community and beyond. The aim is to integrate phenotyping data stored in the Brassica Information Portal (BIP) (https://bip.earlham.ac.uk/) and sequence data from sequence repositories to establish an AT analysis framework, powered by tools and resources available within CyVerse.
A Brassica researcher would first submit their genotypic and phenotypic raw data to the BIP and respective public repositories (e.g. the SRA/ENA sequence read archives). This ensures that their data will be stored in standardised formats and marked up with required metadata to enable reuse and subsequent comparison. With the data in place, the researcher will then be able to run AT analyses on CyVerse.

We will present the current state of the project to the CyVerse user and Brassica communities in order to receive additional input and feedback.
Year(s) Of Engagement Activity 2017
URL https://pag.confex.com/pag/xxv/meetingapp.cgi/Paper/25485
 
Description Data Brokering for Plant Scientists (DivSeek partner's meeting, PAG 2018, San Diego) 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Delivered a lightning talk to promote the COPO data brokering platform at the annual DivSeek partner's meeting at PAG.
Year(s) Of Engagement Activity 2018
 
Description Data Stewardship in the Life Sciences 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact I spoke at the "Challenges and Opportunities in Plant Science Data Management" workshop on the subject of data management in the life sciences.

Open data and integrative data sharing are fundamental factors in order to address the challenges of modern data-intensive science. There is a clear need to develop and maintain community-focussed, semantically-aware data stewardship and management platforms, such as COPO, that are able to cope with the description and sharing of potentially huge datasets arising from the life sciences. Once made available, it is not sufficient to assume that researchers around the globe have requisite skills and resources to analyse these data. Therefore, we need to provide large-scale data analysis environments that are fit for purpose, incorporating state-of-the-art interfaces and programmatic layers to meet broad end-user requirements, such as CyVerse and Galaxy. Finally, this can only happen when there are community-led efforts into implementing solutions for data standardisation, best practice, and FAIR data policy. We are now only just starting to take advantage of groundbreaking opportunities to make integrated data a reality, and thus enabling scientists to store, manage, and share their data as a first-class citizen of the scientific process.
Year(s) Of Engagement Activity 2017
URL http://app.core-apps.com/pag_2017/event/e2bec353017762d275ce250c23e011e6
 
Description Data, Data, Data Everywhere (Pint of Science talk, Norwich) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact Dr Davey delivered a talk as part of the Norwich 2017 Pint of Science series about the challenges and solutions for modern data management in the life sciences, including recent data developments, high-performance computing, and software tools.
Year(s) Of Engagement Activity 2017
URL https://pintofscience.co.uk/event/crops-crystals-and-computers-technology-for-food-security
 
Description Divseek Working Group - Data Standards for Interoperable Tools 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact As part of the "DivSeek - Addressing the challenges and opportunities for information and data sharing associated with plant germplasm" session at PAG, I spoke about the DivSeek Data Standards for Interoperable Tools Working Group. This WG will promote best practice in data sharing in the plant sciences, through the use of open and interoperable software powered by the adoption of open standards, i.e. programmatic interoperability standards (APIs), controlled vocabularies, trait dictionaries, metadata standards, and ontologies. We aim to highlight gaps in interoperability that impede workflows important to the communities supported by DivSeek partners, by liaising with research development groups, other DivSeek working groups, and consortia with relevance to DivSeek. We will educate and train data generators about standards and the tools and resources that use them, in order to promote and foster standards-compliance for long-term open data stewardship.
Year(s) Of Engagement Activity 2017
URL https://pag.confex.com/pag/xxv/meetingapp.cgi/Paper/26202
 
Description ELIXIR-UK ALL-HANDS MEETING 2017 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact The ELIXIR-UK All Hands Meeting provided updates on recent activities from the ELIXIR UK Node and ELIXIR Hub, alongside discussions of future resources, events and roadmapping breakouts.Dr Davey presented the COPO project and CyVerse UK infrastructure as UK-specific resources that were being developed as national infrastructure for UK researchers. There was much interest from the participants in both projects, and conversations at this event led to the submission of a BBSRC TRDF with Gos Micklem (Cambridge), Dr Davey and Dr Shaw (EI).
Year(s) Of Engagement Activity 2017
URL https://www.elixir-europe.org/events/elixir-uk-all-hands-meeting-2017
 
Description RDA Wheat Data Interoperability Working Group meeting, RDA Plenary, Barcelona 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The Wheat Data Interoperability Working Group aims to provide a common framework for describing, representing linking and publishing Wheat data with respect to open standards.Such a framework will promote and sustain Wheat data sharing, reusability and operability. Specifying the Wheat linked data framework will come with many questions: which (minimal) metadata to describe which type of data? Which vocabularies/ontologies/formats? Which good practices? Mainly based on the the needs of the Wheat initiatiative Information System (WheatIS) in terms of functionalities and data types, the working group will identify relevant use cases in order to produce a "cookbook" on how to produce "wheat data" that are easily shareable, reusable and interoperable. This meeting saw the maturation of the Working Group into a Maintenance Group, showing that we have moved from an inception phase to an implementation phase, promoting the outputs of the WG (the Wheat Data Interoperability guidelines) to users.
Year(s) Of Engagement Activity 2016
URL https://www.rd-alliance.org/group/agricultural-data-ig-igad-wheat-data-interoperability-wg-agriseman...
 
Description UKRI Darwin Tree of Life Project meeting, London 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Dr Davey travelled to London with other EI staff to discuss strategy for an SPF bid to UKRI for the UK Darwin Tree of Life Project.
Year(s) Of Engagement Activity 2018
 
Description iRODS functionality within the Grassroots Infrastructure (iRODS User Group Meeting 2017, Utrecht, The Netherlands) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Dr Tyrrell presented work on the development of the eirods-dav software package for the Grassroots data dissemination platform.
Year(s) Of Engagement Activity 2017
URL https://irods.org/ugm2017/