Enabling UK wheat research with the CyVerse UK cyberinfrastructure

Lead Research Organisation: Earlham Institute
Department Name: Research Faculty

Abstract

Bread wheat represents one of the most complex examples of a plant genome, as well as one of the most commercially important in the UK and internationally with over 750 million tonnes harvested annually, 14 million in the UK alone. This juxtaposition creates a range of challenges for biologists and data analysts - how can the balance between needing large amounts of data to answer complex biological questions about wheat genetics and the requirements for analysing this data and be found? Furthermore, the pressing issues of climate change that we face are all too evident. We need to use modern technology to increase productivity and output for our wheat researchers, drive breeding strategies, and benefit the public's nutritional needs.

CyVerse represents such a technology, whereby computational resources, data storage, and analytical tools are made available through web-based graphical interfaces for end users or command line interfaces for power users or system administrators. CyVerse UK is the first implementation of the multi-million dollar CyVerse project outside the US, and both systems are interoperable, i.e. able to share their compute and storage services without the user needing to know where their analyses will be taking place. This federation allows a reduction in shared management cost, and an increase in productivity through shared expertise and software development.

The use of "the cloud" is commonplace in today's internet era. Users are moving away from storing data on their own devices, but using services hosted by third party providers such as Google, Microsoft, and Amazon. Furthermore, these vendors also supply complete computing environments over the internet, e.g. Amazon Web Services, and Microsoft Azure. However, these resources are not designed for the kinds of scale that are required for wheat researchers to make the most of publicly available and personal datasets, and the costs of running such environments are unclear at best and prohibitive at worst. Therefore, through the deployment of the proposed CyVerse Atmosphere cloud computing platform in the UK, we would be able to supply virtual server resources to users "elastically", i.e. elastic computing resources can be scaled up and down easily by users themselves. In this way, we can provide flexible computing power when and wherever required, to wheat researchers, labs, and breeders. These virtual wheat data analysis labs can be shared with a wider research group, even internationally, promoting collaboration and knowledge transfer.

Technical Summary

The emergence of wheat as a reference crop model increases worldwide plant community demand for resource access. Genomes at the level of complexity of wheat require expertise in sample preparation and library construction for sequencing, algorithm design and software engineering for assembly, and biological knowledge for interpretation. Whilst existing computational resources are barely sufficient for small scale analysis, the advent of rapid turnaround times for complete wheat genomes represents a real problem in delivering the requisite datasets and tools to analyse them in a form that is usable to researchers.

High-performance computing and modern web-based infrastructure can provide resources to address these challenges, and the CyVerse project is one such "cyberinfrastructure". The presence of CyVerse UK as a dedicated e-Infrastructure platform for life science is a huge boon for UK crop researchers, allowing them to take advantage of a freely available and well-supported set of services for data storage, sharing, and analysis. Coupled with the recent large grant awards to UK institutions for undertaking ever-increasingly complex data-driven investigations into wheat genomics, these institutions will find it increasingly difficult to keep up with computational requirements. CyVerse UK is able to meet these needs, and this project represents an expansion of existing hardware in order to proactively prepare for the deluge of wheat data that will need to be managed.

We will procure and deploy 40 modern, fit-for-purpose compute nodes that can be introduced into our existing CyVerse UK infrastructure, housed in two data centres at the Earlham Institute. Each node comprises 2 12-core Intel Xeon CPUs, 512GB RAM and a local 1TB solid state disk for fast file input/output operations. These nodes will be used for day-to-day wheat analysis pipelines provided by CyVerse UK, as well as supporting the implementation of the CyVerse Atmosphere cloud computing platform.

Planned Impact

UK research supports the underpinning breeding and baking sectors, as well as the £6 billion farming industry, critical to the UK rural economy. Wheat is the most important UK crop, with annual production of over 14 million tonnes, and market values for seed and processed products of around £1.4 billion and £14 billion, respectively. More frequent extremes in climate, increased precipitation, flooding and drought, will further affect wheat yields. There is an urgent need to address the problems of producing sufficient nutritious food for 2050, along with the significant associated societal and economic benefits.

This project will establish guidelines and best practice for wheat researchers who wish to share their datasets with the wider community, their own research tools via the CyVerse UK infrastructure, and initiate user-provisioned cloud computing environments that can form powerful and bespoke "virtual labs" of shared resources. This proposal will allow increased availability of BBSRC-funded tools for the UK wheat community and will integrate with the CyVerse project in the United States to form a common international biological science platform that prevents duplication of effort and funding. In doing so, rational and supported reuse of data, applications and resources is encouraged through this proposal.

The impact delivered from this expansion of CyVerse UK to support wheat research will be seen by research scientists in academia and industry, funded by BBSRC and other bodies, that are involved in the application of bioinformatics analyses to wheat datasets. It will also impact breeders and policymakers, through the release of openly available datasets and analytical tools that power fundamental and applied research in wheat improvement. The main beneficiaries will therefore be the UK wheat research community, from students to senior researchers. However, many of the tools that are already in use can be run with any compatible dataset arising from exisg or future wheat research. Ultimately, CyVerse UK will be a community resource for all wheat biologists: the long-term beneficiaries will be anyone working with big data in the wheat domain.

Funding bodies will see huge benefits from extending CyVerse UK, mostly through cost-effective provision of shared computing resources that are locally and remotely accessible to a number of UK research institutions. Although sharing raw data has become a standard requirement for publication in recent years, the wheat community needs guidance and support to carry out this daunting task. Similarly, sharing tools developed for data analysis and visualisation is not typical. Where they are shared, whether through an institutional repository or a third-party open data web service such as Figshare or Dryad, their use may be limited by differences in operating systems or the expertise of new users. CyVerse UK will provide the tools, guidelines and the platform for developers to share their command line-based workflows with the wheat community in a user-friendly way. More of the output from publicly funded UK wheat research will therefore be accessible to the wider national and international research community.

Publications

10 25 50
publication icon
Leonelli S (2017) Data management and best practice for plant science. in Nature plants

 
Description We have procured and deployed the new CyVerse UK hardware within the Earlham Institute data centres, and have set up the management interfaces (OpenNebula) to orchestrate the virtual machines and storage layers.

The CyVerse UK infrastructure hosting 8 servers for the Grassroots infrastructure to present backend and frontend servers for data representation, iRODS for the DFW Portal data sharing and metadata management, a testing Germinate 3 server, CKAN for testing a digital repository solution for DFW and EI, and an Elasticsearch for indexing and aggregation for internet searches over the DFW Data Portal.

The CyVerse UK cloud was used for the first time to provide virtual machines for a hands-on bioinformatics training in Kenya; we are building on this experience to further develop our offer of training on the platform. At the current time the platform is fully developed to host additional training on site, and capable to host external training needing a basic setup.

The CyVerse UK cloud is also hosting Sherlock, a data platform developed in the Korcsmaros group at EI that helps researchers storing and converting databases to a common queryable format. This project also represents a proof of concept for the usage of object storage in the cloud.

Recently we successfully provided the first external user with a customised virtual machine and its own private storage under Service Level Agreement. The user has provided valuable feedback that allowed us to produce better documentation for future users. This first experience confirmed the cloud is accessible, functional and responds to the researchers needs. In response to this we advertised the availability of the resource to DFW researchers.
Exploitation Route The new infrastructure will power the sharing of prepublication data between DFW researchers, and published data to external stakeholders.
Sectors Agriculture, Food and Drink,Digital/Communication/Information Technologies (including Software),Education

URL http://cyverseuk.org/about/collaborations/designing-future-wheat/
 
Description The CyVerse UK infrastructure hosting 8 servers for the Grassroots infrastructure to present backend and frontend servers for data representation, iRODS for the DFW Portal data sharing and metadata management, a testing Germinate 3 server, CKAN for testing a digital repository solution for DFW and EI, and an Elasticsearch for indexing and aggregation for internet searches over the DFW Data Portal. These solutions are, or will be in the case of the testing environments, to the wider community, not only DFW researchers, but (pre-)breeders working in and around wheat internationally.
First Year Of Impact 2019
Sector Agriculture, Food and Drink,Digital/Communication/Information Technologies (including Software),Education
Impact Types Cultural,Societal,Economic

 
Description BEIS/UKRI/RCUK Cloud Workshop, London, 24-10-2017
Geographic Reach National 
Policy Influence Type Participation in a national consultation
 
Description UKRI Data Infrastructure Roadmap
Geographic Reach National 
Policy Influence Type Participation in a national consultation
 
Description UKRI Supercomputing Roadmap
Geographic Reach National 
Policy Influence Type Participation in a national consultation
 
Description 16ALERT
Amount £283,383 (GBP)
Funding ID BB/R000662/1 
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 08/2017 
End 08/2018
 
Title Field trial database schema for DFW 
Description Database schema for recording field trial data for DFW, stored in the Grassroots Infrastructure. 
Type Of Material Computer model/algorithm 
Year Produced 2018 
Provided To Others? Yes  
Impact Standardise the data structure and schema for recording field trial data. 
URL https://github.com/TGAC/grassroots-field-trial-database-schema
 
Description HPE AI workshop 2019 
Organisation Hewlett Packard Enterprise (HPE)
Country United Kingdom 
Sector Private 
PI Contribution We worked with HPE staff to organise and host an AI Workshop at EI. We opened the course up for national delegates to attend and discover more about how AI and Machine Learning techniques can be applied to biological research data.
Collaborator Contribution HPE provided the trainers and staff to teach the materials.
Impact We will continue to work with HPE to supply our institutlonal HPE equipment. We have also put forward HPE as a potential partner in the upcoming DTP3 bid.
Start Year 2019
 
Description Integration of COPO and CGCore Schemas and Associated Repositories 
Organisation CGIAR
Country Global 
Sector Charity/Non Profit 
PI Contribution We have developed a proof-of-concept platform to streamline metadata attribution and dataset deposition into CGIAR repositories using the BBSRC-funded COPO software. Drs Etuk and Shaw, two Research Software Engineers in the Davey group at Earlham Institute and the original core developers, have implemented various new features into COPO to allow CGIAR Data Managers to harmonise and streamline the submission of CG-relevant metadata and data into the CG digital data repositories. All software and infrastructure is hosted within the CyVerse UK cloud. We have: - Implemented support of CG Core v.2.0. (http://repo.mel.cgiar.org/handle/20.500.11766/4764) metadata annotation of various data types, including publications, produced at the CGIAR institutes via the existing COPO wizard system. - Implemented support of submissions of annotated objects to institutional instances of the following repositories: dSpace (https://www.duraspace.org/dspace/), CKAN (https://ckan.org/) and Dataverse (https://dataverse.org/). - Designed and implemented a mechanism within COPO which controls which users can submit to which repositories. - Implemented support the annotation of variables within data sets (i.e. column headings; experiment condition descriptors etc) with terms and URIs from ontologies or controlled vocabularies/trait dictionaries (AGROVOC and GACS).
Collaborator Contribution CGIAR have provided coordination contributions with key members in the CG Centres to gather feedback on developed elements, as well as provided funds to allow a core CGCore metadata schema developer to travel to EI and work with Drs Etuk and Shaw to improve the CGCore schema.
Impact This collaboration has seen rapid development of key functionality in the COPO platform to support CG centre Data Managers. This has required technical skills to develop the software, biocuration expertise provided by CGIAR to improve and refine the CGCore metadata schema, ontology expertise from the Bioversity team in Montpellier, and coordination expertise from Dr Davey (EI) and Medha Devare (CGIAR). Software and Technical Products (Webtool/Application - Collaborative Open Plant Omics (COPO) (2017)): All software code developed is open source and can be found within the COPO Github repository: https://github.com/collaborative-open-plant-omics/COPO
Start Year 2018
 
Description Wheat Information System (WheatIS) 
Organisation Cold Spring Harbor Laboratory (CSHL)
Country United States 
Sector Charity/Non Profit 
PI Contribution The Grassroots infrastructure (https://grassroots.tools) developed at EI is being used to consolidate data and analyses, facilitating consistent approaches to generating, processing and disseminating public wheat datasets. The Grassroots infrastructure comprises: a data management layer to provide structure to unstructured filesystems; interfaces to interact with local or cloud-based analysis platforms; a search layer to provide multi-faceted metadata and literature querying; a web server layer to deliver content and provide access to public programmatic interfaces. EI has an extensive National Capability to provide scientific computing hardware to the UK research community and is therefore perfectly positioned to build a point-of-access to previously disparate resources to serve wheat breeders, biologists and bioinformaticians. Coupling the Grassroots project with BBSRC-funded efforts to bring Galaxy and CyVerse UK to UK researchers provides community standardised methodologies for data integration, interpretation and discovery in wheat. These resources are designed to be queried programmatically, and we are integrating them with other WheatIS resources (such as CerealsDB) accordingly via open source and freely available infrastructure. By doing so we will be promoting and facilitating an inclusive and collaborative community of experts to provide access to an interconnected network of wheat data to a scale that was simply not available previously. EI also has representation on the WheatIS Expert Working Group, meeting yearly at PAG to discuss strategy and policy for the Wheat Initiative.
Collaborator Contribution All WheatIS partners contribute to the global effort in harmonising, standardising, and sharing wheat data in a way that is technically sensible and user focused, thus minimising cost across a multi-faceted and independently funded project.
Impact This collaboration is multi-disciplinary in scope, undertaken by biologists, bioinformaticians, and breeders. Wheat Data Interoperability Guidelines - https://ist.blogs.inra.fr/wdi/
Start Year 2011
 
Description Wheat Information System (WheatIS) 
Organisation EMBL European Bioinformatics Institute (EMBL - EBI)
Country United Kingdom 
Sector Academic/University 
PI Contribution The Grassroots infrastructure (https://grassroots.tools) developed at EI is being used to consolidate data and analyses, facilitating consistent approaches to generating, processing and disseminating public wheat datasets. The Grassroots infrastructure comprises: a data management layer to provide structure to unstructured filesystems; interfaces to interact with local or cloud-based analysis platforms; a search layer to provide multi-faceted metadata and literature querying; a web server layer to deliver content and provide access to public programmatic interfaces. EI has an extensive National Capability to provide scientific computing hardware to the UK research community and is therefore perfectly positioned to build a point-of-access to previously disparate resources to serve wheat breeders, biologists and bioinformaticians. Coupling the Grassroots project with BBSRC-funded efforts to bring Galaxy and CyVerse UK to UK researchers provides community standardised methodologies for data integration, interpretation and discovery in wheat. These resources are designed to be queried programmatically, and we are integrating them with other WheatIS resources (such as CerealsDB) accordingly via open source and freely available infrastructure. By doing so we will be promoting and facilitating an inclusive and collaborative community of experts to provide access to an interconnected network of wheat data to a scale that was simply not available previously. EI also has representation on the WheatIS Expert Working Group, meeting yearly at PAG to discuss strategy and policy for the Wheat Initiative.
Collaborator Contribution All WheatIS partners contribute to the global effort in harmonising, standardising, and sharing wheat data in a way that is technically sensible and user focused, thus minimising cost across a multi-faceted and independently funded project.
Impact This collaboration is multi-disciplinary in scope, undertaken by biologists, bioinformaticians, and breeders. Wheat Data Interoperability Guidelines - https://ist.blogs.inra.fr/wdi/
Start Year 2011
 
Description Wheat Information System (WheatIS) 
Organisation French National Institute of Agricultural Research
Department INRA Versailles
Country France 
Sector Public 
PI Contribution The Grassroots infrastructure (https://grassroots.tools) developed at EI is being used to consolidate data and analyses, facilitating consistent approaches to generating, processing and disseminating public wheat datasets. The Grassroots infrastructure comprises: a data management layer to provide structure to unstructured filesystems; interfaces to interact with local or cloud-based analysis platforms; a search layer to provide multi-faceted metadata and literature querying; a web server layer to deliver content and provide access to public programmatic interfaces. EI has an extensive National Capability to provide scientific computing hardware to the UK research community and is therefore perfectly positioned to build a point-of-access to previously disparate resources to serve wheat breeders, biologists and bioinformaticians. Coupling the Grassroots project with BBSRC-funded efforts to bring Galaxy and CyVerse UK to UK researchers provides community standardised methodologies for data integration, interpretation and discovery in wheat. These resources are designed to be queried programmatically, and we are integrating them with other WheatIS resources (such as CerealsDB) accordingly via open source and freely available infrastructure. By doing so we will be promoting and facilitating an inclusive and collaborative community of experts to provide access to an interconnected network of wheat data to a scale that was simply not available previously. EI also has representation on the WheatIS Expert Working Group, meeting yearly at PAG to discuss strategy and policy for the Wheat Initiative.
Collaborator Contribution All WheatIS partners contribute to the global effort in harmonising, standardising, and sharing wheat data in a way that is technically sensible and user focused, thus minimising cost across a multi-faceted and independently funded project.
Impact This collaboration is multi-disciplinary in scope, undertaken by biologists, bioinformaticians, and breeders. Wheat Data Interoperability Guidelines - https://ist.blogs.inra.fr/wdi/
Start Year 2011
 
Description Wheat Information System (WheatIS) 
Organisation Helmholtz Association of German Research Centres
Department Helmholtz Zentrum Munchen
Country Germany 
Sector Public 
PI Contribution The Grassroots infrastructure (https://grassroots.tools) developed at EI is being used to consolidate data and analyses, facilitating consistent approaches to generating, processing and disseminating public wheat datasets. The Grassroots infrastructure comprises: a data management layer to provide structure to unstructured filesystems; interfaces to interact with local or cloud-based analysis platforms; a search layer to provide multi-faceted metadata and literature querying; a web server layer to deliver content and provide access to public programmatic interfaces. EI has an extensive National Capability to provide scientific computing hardware to the UK research community and is therefore perfectly positioned to build a point-of-access to previously disparate resources to serve wheat breeders, biologists and bioinformaticians. Coupling the Grassroots project with BBSRC-funded efforts to bring Galaxy and CyVerse UK to UK researchers provides community standardised methodologies for data integration, interpretation and discovery in wheat. These resources are designed to be queried programmatically, and we are integrating them with other WheatIS resources (such as CerealsDB) accordingly via open source and freely available infrastructure. By doing so we will be promoting and facilitating an inclusive and collaborative community of experts to provide access to an interconnected network of wheat data to a scale that was simply not available previously. EI also has representation on the WheatIS Expert Working Group, meeting yearly at PAG to discuss strategy and policy for the Wheat Initiative.
Collaborator Contribution All WheatIS partners contribute to the global effort in harmonising, standardising, and sharing wheat data in a way that is technically sensible and user focused, thus minimising cost across a multi-faceted and independently funded project.
Impact This collaboration is multi-disciplinary in scope, undertaken by biologists, bioinformaticians, and breeders. Wheat Data Interoperability Guidelines - https://ist.blogs.inra.fr/wdi/
Start Year 2011
 
Description Wheat Information System (WheatIS) 
Organisation International Centre for Maize and Wheat Improvement (CIMMYT)
Country Mexico 
Sector Charity/Non Profit 
PI Contribution The Grassroots infrastructure (https://grassroots.tools) developed at EI is being used to consolidate data and analyses, facilitating consistent approaches to generating, processing and disseminating public wheat datasets. The Grassroots infrastructure comprises: a data management layer to provide structure to unstructured filesystems; interfaces to interact with local or cloud-based analysis platforms; a search layer to provide multi-faceted metadata and literature querying; a web server layer to deliver content and provide access to public programmatic interfaces. EI has an extensive National Capability to provide scientific computing hardware to the UK research community and is therefore perfectly positioned to build a point-of-access to previously disparate resources to serve wheat breeders, biologists and bioinformaticians. Coupling the Grassroots project with BBSRC-funded efforts to bring Galaxy and CyVerse UK to UK researchers provides community standardised methodologies for data integration, interpretation and discovery in wheat. These resources are designed to be queried programmatically, and we are integrating them with other WheatIS resources (such as CerealsDB) accordingly via open source and freely available infrastructure. By doing so we will be promoting and facilitating an inclusive and collaborative community of experts to provide access to an interconnected network of wheat data to a scale that was simply not available previously. EI also has representation on the WheatIS Expert Working Group, meeting yearly at PAG to discuss strategy and policy for the Wheat Initiative.
Collaborator Contribution All WheatIS partners contribute to the global effort in harmonising, standardising, and sharing wheat data in a way that is technically sensible and user focused, thus minimising cost across a multi-faceted and independently funded project.
Impact This collaboration is multi-disciplinary in scope, undertaken by biologists, bioinformaticians, and breeders. Wheat Data Interoperability Guidelines - https://ist.blogs.inra.fr/wdi/
Start Year 2011
 
Description Wheat Information System (WheatIS) 
Organisation Rothamsted Research
Country United Kingdom 
Sector Academic/University 
PI Contribution The Grassroots infrastructure (https://grassroots.tools) developed at EI is being used to consolidate data and analyses, facilitating consistent approaches to generating, processing and disseminating public wheat datasets. The Grassroots infrastructure comprises: a data management layer to provide structure to unstructured filesystems; interfaces to interact with local or cloud-based analysis platforms; a search layer to provide multi-faceted metadata and literature querying; a web server layer to deliver content and provide access to public programmatic interfaces. EI has an extensive National Capability to provide scientific computing hardware to the UK research community and is therefore perfectly positioned to build a point-of-access to previously disparate resources to serve wheat breeders, biologists and bioinformaticians. Coupling the Grassroots project with BBSRC-funded efforts to bring Galaxy and CyVerse UK to UK researchers provides community standardised methodologies for data integration, interpretation and discovery in wheat. These resources are designed to be queried programmatically, and we are integrating them with other WheatIS resources (such as CerealsDB) accordingly via open source and freely available infrastructure. By doing so we will be promoting and facilitating an inclusive and collaborative community of experts to provide access to an interconnected network of wheat data to a scale that was simply not available previously. EI also has representation on the WheatIS Expert Working Group, meeting yearly at PAG to discuss strategy and policy for the Wheat Initiative.
Collaborator Contribution All WheatIS partners contribute to the global effort in harmonising, standardising, and sharing wheat data in a way that is technically sensible and user focused, thus minimising cost across a multi-faceted and independently funded project.
Impact This collaboration is multi-disciplinary in scope, undertaken by biologists, bioinformaticians, and breeders. Wheat Data Interoperability Guidelines - https://ist.blogs.inra.fr/wdi/
Start Year 2011
 
Description Wheat Information System (WheatIS) 
Organisation U.S. Department of Agriculture USDA
Department Agricultural Research Service
Country United States 
Sector Public 
PI Contribution The Grassroots infrastructure (https://grassroots.tools) developed at EI is being used to consolidate data and analyses, facilitating consistent approaches to generating, processing and disseminating public wheat datasets. The Grassroots infrastructure comprises: a data management layer to provide structure to unstructured filesystems; interfaces to interact with local or cloud-based analysis platforms; a search layer to provide multi-faceted metadata and literature querying; a web server layer to deliver content and provide access to public programmatic interfaces. EI has an extensive National Capability to provide scientific computing hardware to the UK research community and is therefore perfectly positioned to build a point-of-access to previously disparate resources to serve wheat breeders, biologists and bioinformaticians. Coupling the Grassroots project with BBSRC-funded efforts to bring Galaxy and CyVerse UK to UK researchers provides community standardised methodologies for data integration, interpretation and discovery in wheat. These resources are designed to be queried programmatically, and we are integrating them with other WheatIS resources (such as CerealsDB) accordingly via open source and freely available infrastructure. By doing so we will be promoting and facilitating an inclusive and collaborative community of experts to provide access to an interconnected network of wheat data to a scale that was simply not available previously. EI also has representation on the WheatIS Expert Working Group, meeting yearly at PAG to discuss strategy and policy for the Wheat Initiative.
Collaborator Contribution All WheatIS partners contribute to the global effort in harmonising, standardising, and sharing wheat data in a way that is technically sensible and user focused, thus minimising cost across a multi-faceted and independently funded project.
Impact This collaboration is multi-disciplinary in scope, undertaken by biologists, bioinformaticians, and breeders. Wheat Data Interoperability Guidelines - https://ist.blogs.inra.fr/wdi/
Start Year 2011
 
Description Wheat Information System (WheatIS) 
Organisation University of Bristol
Country United Kingdom 
Sector Academic/University 
PI Contribution The Grassroots infrastructure (https://grassroots.tools) developed at EI is being used to consolidate data and analyses, facilitating consistent approaches to generating, processing and disseminating public wheat datasets. The Grassroots infrastructure comprises: a data management layer to provide structure to unstructured filesystems; interfaces to interact with local or cloud-based analysis platforms; a search layer to provide multi-faceted metadata and literature querying; a web server layer to deliver content and provide access to public programmatic interfaces. EI has an extensive National Capability to provide scientific computing hardware to the UK research community and is therefore perfectly positioned to build a point-of-access to previously disparate resources to serve wheat breeders, biologists and bioinformaticians. Coupling the Grassroots project with BBSRC-funded efforts to bring Galaxy and CyVerse UK to UK researchers provides community standardised methodologies for data integration, interpretation and discovery in wheat. These resources are designed to be queried programmatically, and we are integrating them with other WheatIS resources (such as CerealsDB) accordingly via open source and freely available infrastructure. By doing so we will be promoting and facilitating an inclusive and collaborative community of experts to provide access to an interconnected network of wheat data to a scale that was simply not available previously. EI also has representation on the WheatIS Expert Working Group, meeting yearly at PAG to discuss strategy and policy for the Wheat Initiative.
Collaborator Contribution All WheatIS partners contribute to the global effort in harmonising, standardising, and sharing wheat data in a way that is technically sensible and user focused, thus minimising cost across a multi-faceted and independently funded project.
Impact This collaboration is multi-disciplinary in scope, undertaken by biologists, bioinformaticians, and breeders. Wheat Data Interoperability Guidelines - https://ist.blogs.inra.fr/wdi/
Start Year 2011
 
Description Wheat Information System (WheatIS) 
Organisation University of California, Davis
Department UC Davis College of Biological Sciences
Country United States 
Sector Academic/University 
PI Contribution The Grassroots infrastructure (https://grassroots.tools) developed at EI is being used to consolidate data and analyses, facilitating consistent approaches to generating, processing and disseminating public wheat datasets. The Grassroots infrastructure comprises: a data management layer to provide structure to unstructured filesystems; interfaces to interact with local or cloud-based analysis platforms; a search layer to provide multi-faceted metadata and literature querying; a web server layer to deliver content and provide access to public programmatic interfaces. EI has an extensive National Capability to provide scientific computing hardware to the UK research community and is therefore perfectly positioned to build a point-of-access to previously disparate resources to serve wheat breeders, biologists and bioinformaticians. Coupling the Grassroots project with BBSRC-funded efforts to bring Galaxy and CyVerse UK to UK researchers provides community standardised methodologies for data integration, interpretation and discovery in wheat. These resources are designed to be queried programmatically, and we are integrating them with other WheatIS resources (such as CerealsDB) accordingly via open source and freely available infrastructure. By doing so we will be promoting and facilitating an inclusive and collaborative community of experts to provide access to an interconnected network of wheat data to a scale that was simply not available previously. EI also has representation on the WheatIS Expert Working Group, meeting yearly at PAG to discuss strategy and policy for the Wheat Initiative.
Collaborator Contribution All WheatIS partners contribute to the global effort in harmonising, standardising, and sharing wheat data in a way that is technically sensible and user focused, thus minimising cost across a multi-faceted and independently funded project.
Impact This collaboration is multi-disciplinary in scope, undertaken by biologists, bioinformaticians, and breeders. Wheat Data Interoperability Guidelines - https://ist.blogs.inra.fr/wdi/
Start Year 2011
 
Description Wheat Information System (WheatIS) 
Organisation University of Western Australia
Country Australia 
Sector Academic/University 
PI Contribution The Grassroots infrastructure (https://grassroots.tools) developed at EI is being used to consolidate data and analyses, facilitating consistent approaches to generating, processing and disseminating public wheat datasets. The Grassroots infrastructure comprises: a data management layer to provide structure to unstructured filesystems; interfaces to interact with local or cloud-based analysis platforms; a search layer to provide multi-faceted metadata and literature querying; a web server layer to deliver content and provide access to public programmatic interfaces. EI has an extensive National Capability to provide scientific computing hardware to the UK research community and is therefore perfectly positioned to build a point-of-access to previously disparate resources to serve wheat breeders, biologists and bioinformaticians. Coupling the Grassroots project with BBSRC-funded efforts to bring Galaxy and CyVerse UK to UK researchers provides community standardised methodologies for data integration, interpretation and discovery in wheat. These resources are designed to be queried programmatically, and we are integrating them with other WheatIS resources (such as CerealsDB) accordingly via open source and freely available infrastructure. By doing so we will be promoting and facilitating an inclusive and collaborative community of experts to provide access to an interconnected network of wheat data to a scale that was simply not available previously. EI also has representation on the WheatIS Expert Working Group, meeting yearly at PAG to discuss strategy and policy for the Wheat Initiative.
Collaborator Contribution All WheatIS partners contribute to the global effort in harmonising, standardising, and sharing wheat data in a way that is technically sensible and user focused, thus minimising cost across a multi-faceted and independently funded project.
Impact This collaboration is multi-disciplinary in scope, undertaken by biologists, bioinformaticians, and breeders. Wheat Data Interoperability Guidelines - https://ist.blogs.inra.fr/wdi/
Start Year 2011
 
Title API for SeedStor 
Description API for https://www.seedstor.ac.uk to improve the programmatic access. Used in the Grassroots Infrastructure and CerealsDB. 
Type Of Technology New/Improved Technique/Technology 
Year Produced 2018 
Impact Grassroots Infrastructure and CerealsDB can now query SeedStor programatically instead of browsing the web page. 
URL https://github.com/TGAC/grassroots-seedstor-api
 
Title DFW cloud HPC resources 
Description Designing Future Wheat researchers are able to request virtual machines within CyVerse UK to undertake bioinformatics analysis. 
Type Of Technology Grid Application 
Year Produced 2019 
Impact We have produced a robust and secure cloud framework within CyVerse UK to allow DFW researchers to access DFW and public data to analyse, as well as upload their own. We have already completed two successful pilot projects with external collaborators, and are now making the services available to all DFW researchers. 
URL http://cyverseuk.org/about/collaborations/designing-future-wheat/
 
Title EIRods-DAV 
Description Eirods-dav provides access to iRODS servers using the WebDAV protocol and has a complete REST API for accessing and manipulating metadata from within a web browser. It adds a substantial amount of functionality to the original Davrods module written by Ton Smeele and Chris Smeele, which is a bridge between the WebDAV protocol and the iRODS API. Eirods-dav leverages the Apache server implementation of the WebDAV protocol, mod_dav, for compliance with the WebDAV Class 2 standard. 
Type Of Technology Webtool/Application 
Year Produced 2018 
Open Source License? Yes  
Impact The software is now used to host the Designing Future Wheat data portal. 
URL https://opendata.earlham.ac.uk/wheat
 
Title Parental Genotype Service 
Description The Parental Genotype Service works with data from various cross-parental breeding lines with associated genotypic markers along with which parent is responsible for their presence in the child line. It can accept various queries across this data, 
Type Of Technology Webtool/Application 
Year Produced 2018 
Open Source License? Yes  
Impact As part of a collaboration with Paul Wilkinson at the University of Bristol and Luzie Wingen at the John Innes Centre, it is used as part of a QTL web service available from the CerealsDB website. 
URL http://www.cerealsdb.uk.net/cerealgenomics/CerealsDB/select_QTL.php
 
Title The Grassroots Infrastructure 
Description The Grassroots software is an open source "as-a-Service" stack that powers a number of data dissemination and analysis activities at EI, and other sites such as CerealsDB at the University of Bristol. We have continued to develop the functionality within the software stack to share crop-related datasets. 
Type Of Technology Webtool/Application 
Year Produced 2018 
Open Source License? Yes  
Impact Grassroots has previously been used to host the Field Pathogenomics project website and Yellow Rust map, the EI wheat BLAST service, the CerealsDB federation project, and the multi-scale improvements to the Polymarker marker design software. Recently, Grassroots has been put forward as the main data repository and metadata catalogue for the Designing Future Wheat project, and has started to host data from this project, the Open Wild Wheat Consortium, and 5 new wheat genomes from EI. The Grassroots service runs within the CyVerse UK National Capability infrastructure. 
URL https://grassroots.tools/
 
Title iRODS filename completion tool for BASH 
Description This is a tool to allow the iRODS client icommands to have auto-complete functionality within Bash as is the case with normal mounted filesystems. 
Type Of Technology Software 
Year Produced 2018 
Open Source License? Yes  
Impact For people using the iRODS client icommands, it allows time to be saved as instead of having to type out the full paths which is error-prone and time-consuming, they can simply press the tab key to get all current matching filenames taking into account any characters that they may have already entered. 
 
Description ELIXIR Compute Platform F2F Meeting 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact There were discussion about how to proceed to develop a european cloud infrastructure for life science, both in technical terms and in terms of policies.
Year(s) Of Engagement Activity 2018
 
Description BMGF CIMMYT - UK Wheat Research Workshop - August 2017 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact A workshop was organised by the Bill and Melinda Gates Foundation between key CIMMYT investigators and Designing Future Wheat Investigators to explore areas of common interest to help network links and to identify areas of common interest. Possibilities of future collaborative projects were explored.
Year(s) Of Engagement Activity 2017
 
Description Challenges and Opportunities in Plant Science Data Management (Workshop, PAG 2019) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Dr Davey organised the PAG 2019 workshop Challenges and Opportunities in Plant Science Data Management alongside Carolyn Lawrence-Dill from Iowa State University, USA.
Year(s) Of Engagement Activity 2019
URL https://www.intlpag.org/2019/
 
Description DivSeek Partner's Meeting (PAG 2019) 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Dr Davey attended the DivSeek Partner's Meeting in the Courtyard Marriott hotel at PAG 2019.
Year(s) Of Engagement Activity 2019
 
Description Down The Tubes! Talk at the Norwich Science Festival 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact Dr Davey gave a talk on the internet and data science entitled "Down The Tubes!" at the 2018 Norwich Science Festival.
Year(s) Of Engagement Activity 2018
URL https://norwichsciencefestival.co.uk/events/down-the-tubes/
 
Description EMPHASIS 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact EMPHASIS is a project to develop a European Science Infrastructure (ESFRI) for plant phenotyping. A community workshop was held at Rothamsted in 2018. An important component of the activities of EMPHASIS is the management and sharing of data following agreed national/international standards.
Year(s) Of Engagement Activity 2018
URL https://emphasis.plant-phenotyping.eu/
 
Description Engagement with industry - KWS UK Ltd. 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Industry/Business
Results and Impact Representatives from KWs were given an overview of the wheat projects and NC3 infrastructures and discussed the possibility to be involved.
Year(s) Of Engagement Activity 2018
 
Description Norwich science festival 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact A number of member from the public attended the Norwich science festival, where the Earlham institute presented the work being done in Norwich.
Year(s) Of Engagement Activity 2018
URL https://norwichsciencefestival.co.uk/
 
Description Phenotyping Data Standards Workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact This workshop was convened to discuss the use of an emerging international standard for describing phenotyping datasets - MIAPPE withing the Designing Future Wheat programme. Most of the participants were collaborators in the the project, but a major contributor came from INRA.
Year(s) Of Engagement Activity 2018
 
Description Poster: Support open science and FAIRness through an integrated collaborative platform for life science: CyVerse UK and hosted services 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Presented as CyVerse, COPO, Galaxy and Grassroots icrease the level of FAIRNESS in the research process.
Year(s) Of Engagement Activity 2018
URL http://www.igst.it/nettab/2018/
 
Description Training held at BeCA: Cyverse ecosystem, basic Sysadmin skills and dockerization 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact About 15 researchers from Easter Africa attended a 1 week training course, as part of a longer training in bioinformatics, to learn about virtual machines, Docker, basic Sysadmin skills and how to apply this knowledge using CyVerse. CyVerse UK also supplied the virtual machines for the training to take place.
Year(s) Of Engagement Activity 2018
 
Description UKRI Darwin Tree of Life Project meeting, London 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Dr Davey travelled to London with other EI staff to discuss strategy for an SPF bid to UKRI for the UK Darwin Tree of Life Project.
Year(s) Of Engagement Activity 2018
 
Description Wheat Initiative group discussion at Plant and Animal Genome conference 2019 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Discussion of the latest research activities from the Wheat Initiative members.
Year(s) Of Engagement Activity 2019
 
Description WheatIS Expert Working Grop 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The expert working group for the Wheat Information System is developing standards and tools to enable the global wheat science community to share data effectively.
Year(s) Of Engagement Activity 2014,2015,2016,2017,2018
URL http://wheatis.org/