Enabling UK wheat research with the CyVerse UK cyberinfrastructure

Lead Research Organisation: Earlham Institute
Department Name: Research Faculty

Abstract

Bread wheat represents one of the most complex examples of a plant genome, as well as one of the most commercially important in the UK and internationally with over 750 million tonnes harvested annually, 14 million in the UK alone. This juxtaposition creates a range of challenges for biologists and data analysts - how can the balance between needing large amounts of data to answer complex biological questions about wheat genetics and the requirements for analysing this data and be found? Furthermore, the pressing issues of climate change that we face are all too evident. We need to use modern technology to increase productivity and output for our wheat researchers, drive breeding strategies, and benefit the public's nutritional needs.

CyVerse represents such a technology, whereby computational resources, data storage, and analytical tools are made available through web-based graphical interfaces for end users or command line interfaces for power users or system administrators. CyVerse UK is the first implementation of the multi-million dollar CyVerse project outside the US, and both systems are interoperable, i.e. able to share their compute and storage services without the user needing to know where their analyses will be taking place. This federation allows a reduction in shared management cost, and an increase in productivity through shared expertise and software development.

The use of "the cloud" is commonplace in today's internet era. Users are moving away from storing data on their own devices, but using services hosted by third party providers such as Google, Microsoft, and Amazon. Furthermore, these vendors also supply complete computing environments over the internet, e.g. Amazon Web Services, and Microsoft Azure. However, these resources are not designed for the kinds of scale that are required for wheat researchers to make the most of publicly available and personal datasets, and the costs of running such environments are unclear at best and prohibitive at worst. Therefore, through the deployment of the proposed CyVerse Atmosphere cloud computing platform in the UK, we would be able to supply virtual server resources to users "elastically", i.e. elastic computing resources can be scaled up and down easily by users themselves. In this way, we can provide flexible computing power when and wherever required, to wheat researchers, labs, and breeders. These virtual wheat data analysis labs can be shared with a wider research group, even internationally, promoting collaboration and knowledge transfer.

Technical Summary

The emergence of wheat as a reference crop model increases worldwide plant community demand for resource access. Genomes at the level of complexity of wheat require expertise in sample preparation and library construction for sequencing, algorithm design and software engineering for assembly, and biological knowledge for interpretation. Whilst existing computational resources are barely sufficient for small scale analysis, the advent of rapid turnaround times for complete wheat genomes represents a real problem in delivering the requisite datasets and tools to analyse them in a form that is usable to researchers.

High-performance computing and modern web-based infrastructure can provide resources to address these challenges, and the CyVerse project is one such "cyberinfrastructure". The presence of CyVerse UK as a dedicated e-Infrastructure platform for life science is a huge boon for UK crop researchers, allowing them to take advantage of a freely available and well-supported set of services for data storage, sharing, and analysis. Coupled with the recent large grant awards to UK institutions for undertaking ever-increasingly complex data-driven investigations into wheat genomics, these institutions will find it increasingly difficult to keep up with computational requirements. CyVerse UK is able to meet these needs, and this project represents an expansion of existing hardware in order to proactively prepare for the deluge of wheat data that will need to be managed.

We will procure and deploy 40 modern, fit-for-purpose compute nodes that can be introduced into our existing CyVerse UK infrastructure, housed in two data centres at the Earlham Institute. Each node comprises 2 12-core Intel Xeon CPUs, 512GB RAM and a local 1TB solid state disk for fast file input/output operations. These nodes will be used for day-to-day wheat analysis pipelines provided by CyVerse UK, as well as supporting the implementation of the CyVerse Atmosphere cloud computing platform.

Planned Impact

UK research supports the underpinning breeding and baking sectors, as well as the £6 billion farming industry, critical to the UK rural economy. Wheat is the most important UK crop, with annual production of over 14 million tonnes, and market values for seed and processed products of around £1.4 billion and £14 billion, respectively. More frequent extremes in climate, increased precipitation, flooding and drought, will further affect wheat yields. There is an urgent need to address the problems of producing sufficient nutritious food for 2050, along with the significant associated societal and economic benefits.

This project will establish guidelines and best practice for wheat researchers who wish to share their datasets with the wider community, their own research tools via the CyVerse UK infrastructure, and initiate user-provisioned cloud computing environments that can form powerful and bespoke "virtual labs" of shared resources. This proposal will allow increased availability of BBSRC-funded tools for the UK wheat community and will integrate with the CyVerse project in the United States to form a common international biological science platform that prevents duplication of effort and funding. In doing so, rational and supported reuse of data, applications and resources is encouraged through this proposal.

The impact delivered from this expansion of CyVerse UK to support wheat research will be seen by research scientists in academia and industry, funded by BBSRC and other bodies, that are involved in the application of bioinformatics analyses to wheat datasets. It will also impact breeders and policymakers, through the release of openly available datasets and analytical tools that power fundamental and applied research in wheat improvement. The main beneficiaries will therefore be the UK wheat research community, from students to senior researchers. However, many of the tools that are already in use can be run with any compatible dataset arising from existing or future wheat research. Ultimately, CyVerse UK will be a community resource for all wheat biologists: the long-term beneficiaries will be anyone working with big data in the wheat domain.

Funding bodies will see huge benefits from extending CyVerse UK, mostly through cost-effective provision of shared computing resources that are locally and remotely accessible to a number of UK research institutions. Although sharing raw data has become a standard requirement for publication in recent years, the wheat community needs guidance and support to carry out this daunting task. Similarly, sharing tools developed for data analysis and visualisation is not typical. Where they are shared, whether through an institutional repository or a third-party open data web service such as Figshare or Dryad, their use may be limited by differences in operating systems or the expertise of new users. CyVerse UK will provide the tools, guidelines and the platform for developers to share their command line-based workflows with the wheat community in a user-friendly way. More of the output from publicly funded UK wheat research will therefore be accessible to the wider national and international research community.
 
Description We have procured and deployed the new CyVerse UK hardware within the Earlham Institute data centres, and have set up the management interfaces (OpenNebula) to orchestrate the virtual machines and storage layers.

The CyVerse UK einfrastructure is now hosting 13 servers for the Grassroots infrastructure to present backend and frontend servers for data representation, iRODS for the DFW Portal data sharing and metadata management, a testing Germinate 3 server, CKAN for testing a digital repository solution for DFW and EI, and an Elasticsearch for indexing and aggregation for internet searches over the DFW Data Portal. CyVerse UK is now hosting a second iRODS server federated with the US CyVerse Data Store, which makes available a data commons repository for DFW researchers to publicize data and associated metadata.

The CyVerse UK cloud was used for the first time to provide virtual machines for a hands-on bioinformatics training in Kenya; we are building on this experience to further develop our offer of training on the platform. At the current time the platform is fully developed to host additional training on site, and capable to host external training needing a basic setup. We have successfully provided virtual machines designed for training for three courses on site, including graphical interfaces. This has resulted in policy that all future EI training courses can now use CyVerse UK virtual environments to provide HPC to trainers and to host any training datasets and materials using this open source Guacamole interface.

The CyVerse UK cloud is also hosting Sherlock, a data platform developed in the Korcsmaros group at EI that helps researchers storing and converting databases to a common queryable format. This project also represents a proof of concept for the usage of object storage in the cloud. This object storage cloud is now part of the backend stack for the DFW Grassroots infrastructure, greatly increasing our storage capacity (additional 750TB).

Recently we successfully provided the first external user with a customised virtual machine and its own private storage under the CyVerse UK Cloud Service Agreement. The user has provided valuable feedback that allowed us to produce better documentation for future users. This first experience confirmed the cloud is accessible, functional and responds to the researchers needs. In response to this we advertised the availability of the resource to DFW researchers.

Currently we are providing web server hosting service to three non-NBI researchers, for a total of five long running virtual machines. Two of these services, cerealsDB and knetminer, are important to the wheat community. We have a number of possible VM images to offer to the wheat community, including the possibility to use the ORCA environment.

Finally, UK and European users can now request CyVerse UK resources from the EOSC marketplace.

We have also submitted an EOSC-Life Digital Life Sciences Open Call proposal to develop S3 (cloud) storage interfaces into the CyVerse UK data service, which will benefit wheat data sharing with other EU EOSC projects.
Exploitation Route The new infrastructure has powered the sharing of prepublication data between DFW researchers, and published data to external stakeholders. Large amounts of data are downloaded from, shared and analysed within, and published through this infrastructure.
Sectors Agriculture, Food and Drink,Digital/Communication/Information Technologies (including Software),Education

URL http://cyverseuk.org/about/collaborations/designing-future-wheat/
 
Description The CyVerse UK infrastructure hosting 8 servers for the Grassroots infrastructure to present backend and frontend servers for data representation, iRODS for the DFW Portal data sharing and metadata management, a testing Germinate 3 server, CKAN for testing a digital repository solution for DFW and EI, and an Elasticsearch for indexing and aggregation for internet searches over the DFW Data Portal. These solutions are valuable, or will be in the case of the testing environments, to the wider community, not only DFW researchers, but (pre-)breeders working in and around wheat internationally. Web hosting is proving to be an in demand service as users don't have to deal with the accountancy and complications of private cloud vendors, while getting first hand support and a platform to share their work. CyVerse UK is assuring the availability of helpful resources to get academic web services online and securely hosted. We provide a CyVerse UK data hosting and sharing repository - this will allow the wheat community to easily find and reuse data, and use CyVerse UK compute that is close to the data, increasing the impact of data generation. The services provided by CyVerse UK and through this wheat grant have formed the backbone of the next wheat ISP proposal.
First Year Of Impact 2019
Sector Agriculture, Food and Drink,Digital/Communication/Information Technologies (including Software),Education
Impact Types Cultural,Societal,Economic

 
Description Interview with Environment Adviser from the UK Parliamentary Office of Science and Technology
Geographic Reach National 
Policy Influence Type Implementation circular/rapid advice/letter to e.g. Ministry of Health
Impact Contacted by UK Parliament to contribute to a POSTnote (short document to advise ministers on a given topic) on genebanks and Digital Sequence Information as a result of my recent election to the DivSeek Board of Directors. I was interviewed to provide information around current international policies on DSI and how future UK involvement might be shaped around open licencing/MTAs of DSI datasets.
URL https://www.parliament.uk/postnotes
 
Description 16ALERT
Amount £283,383 (GBP)
Funding ID BB/R000662/1 
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 08/2017 
End 08/2018
 
Title Docker Images for KnetMiner (bare, base, knetminer) 
Description Containerised versions of KnetMiner for cloud deployment pubished on Dockerhub 
Type Of Material Computer model/algorithm 
Year Produced 2019 
Provided To Others? Yes  
Impact Nothing yet 
URL https://hub.docker.com/r/knetminer/knetminer
 
Title Field trial database schema for DFW 
Description Database schema for recording field trial data for DFW, stored in the Grassroots Infrastructure. 
Type Of Material Computer model/algorithm 
Year Produced 2018 
Provided To Others? Yes  
Impact Standardise the data structure and schema for recording field trial data. 
URL https://github.com/TGAC/grassroots-field-trial-database-schema
 
Title Open API Access Points for RDF/Sparql and Neo4J/Cypher 
Description These are public data access points (APIs) for the KnetMiner for Wheat knowledge base on the UK Cyverse infrastructure SPARQL is completely open and can be used anonymously, the Cypher endpoints require user credentials. https://knetminer.org/data is the entry point for them all. SPARQL (http://knetminer-data.cyverseuk.org/lodestar/sparql) Cypher (http://knetminer-wheat.cyverseuk.org:7474/ 
Type Of Material Database/Collection of data 
Year Produced 2019 
Provided To Others? Yes  
Impact None yet 
URL https://knetminer.org/data
 
Title The DFW CKAN Digital Repository 
Description The CKAN digital repository has been set up as part of WP4 of Designing Future Wheat to hold all DFW publications alongside any supplementary datasets and information. This gives the public and researchers immediate access to DFW funded research through open access routes where available. 
Type Of Material Database/Collection of data 
Year Produced 2019 
Provided To Others? Yes  
Impact We have buit scripts to find and make available open access versions of all DFW published research, either as preprints or as journal articles. We also supply any supplementary information as appropriate to aid information dissemination. The DFW CKAN runs within Earlham Institute's CyVerse UK National Capability. 
URL https://ckan.grassroots.tools
 
Title The Earlham Institute CKAN Digital Repository 
Description The CKAN digital repository has been set up as part of WP3 of Earlham Institute's CSP to hold all EI strategic publications alongside any supplementary datasets and information. This gives the public and researchers immediate access to EI's BBSRC funded research through open access routes where available. 
Type Of Material Database/Collection of data 
Year Produced 2019 
Provided To Others? Yes  
Impact We have buit scripts to find and make available open access versions of all EII published research, either as preprints or as journal articles. We also supply any supplementary information as appropriate to aid information dissemination. The EI CKAN runs within Earlham Institute's CyVerse UK National Capability. 
URL https://ckan.earlham.ac.uk
 
Title The Grassroots DFW Data Portal 
Description Continually updated large datasaet repository for the DFW project. Houses a variety of key wheat and associated datasets that are either under the Toronto licence or others as apprpriate for the level of open access. 
Type Of Material Database/Collection of data 
Year Produced 2018 
Provided To Others? Yes  
Impact To date, we house 24TB of wheat datasets that have been accessed by over 4000 researchers from 64 countries. 
URL https://grassroots.tools/dfw
 
Description ACACIA Bioinformatics Community of Practice (BixCoP) 
Organisation International Livestock Research Institute (ILRI)
Country Kenya 
Sector Charity/Non Profit 
PI Contribution Members of EI delivered training throughout the year for the BixCoP fellowship programme.
Collaborator Contribution The GCRF STARS project was led by JIC and hosted at BeCA-Hub ILRI in Nairobi.
Impact The training programme trsulted in a group of Fellows ready to take their skills back into their home countries and communities, with some undertaking Carpentries instructor training so that they can lead their own training courses in those communities.
Start Year 2018
 
Description ACACIA Bioinformatics Community of Practice (BixCoP) 
Organisation John Innes Centre
Country United Kingdom 
Sector Academic/University 
PI Contribution Members of EI delivered training throughout the year for the BixCoP fellowship programme.
Collaborator Contribution The GCRF STARS project was led by JIC and hosted at BeCA-Hub ILRI in Nairobi.
Impact The training programme trsulted in a group of Fellows ready to take their skills back into their home countries and communities, with some undertaking Carpentries instructor training so that they can lead their own training courses in those communities.
Start Year 2018
 
Description CyVerse UK MOOC development partnership 
Organisation University of Arizona
Country United States 
Sector Academic/University 
PI Contribution We have been working with the CyVerse US staff at the University of Arizona to develop a series of online courses to help users with using the platforms in the US, UK and Austria
Collaborator Contribution Development of course materials
Impact A set of reusable and open course materials for use in CyVerse training events
Start Year 2021
 
Description CyVerse UK MOOC development partnership 
Organisation University of Graz
Country Austria 
Sector Academic/University 
PI Contribution We have been working with the CyVerse US staff at the University of Arizona to develop a series of online courses to help users with using the platforms in the US, UK and Austria
Collaborator Contribution Development of course materials
Impact A set of reusable and open course materials for use in CyVerse training events
Start Year 2021
 
Description ELIXIR Biodiversity Working Group 
Organisation ELIXIR
Country United Kingdom 
Sector Charity/Non Profit 
PI Contribution Drs Davey and Shaw attended the first ELIXIR Biodiversity working group meeting in Milan 2020. Davey gave a talk on UK efforts to track biodiversity data, for example with the COPO platform.
Collaborator Contribution ELIXIR initiated this working group and invited member ELIXIR nodes to attend.
Impact Main outcome is building the community with a view to submitting an implementation study around biodiversity data.
Start Year 2020
 
Description HPE AI workshop 2019 
Organisation Hewlett Packard Enterprise (HPE)
Country United Kingdom 
Sector Private 
PI Contribution We worked with HPE staff to organise and host an AI Workshop at EI. We opened the course up for national delegates to attend and discover more about how AI and Machine Learning techniques can be applied to biological research data.
Collaborator Contribution HPE provided the trainers and staff to teach the materials.
Impact We will continue to work with HPE to supply our institutlonal HPE equipment. We have also put forward HPE as a potential partner in the upcoming DTP3 bid.
Start Year 2019
 
Description Integration of COPO and CGCore Schemas and Associated Repositories 
Organisation CGIAR
Country France 
Sector Charity/Non Profit 
PI Contribution We have developed a proof-of-concept platform to streamline metadata attribution and dataset deposition into CGIAR repositories using the BBSRC-funded COPO software. Drs Etuk and Shaw, two Research Software Engineers in the Davey group at Earlham Institute and the original core developers, have implemented various new features into COPO to allow CGIAR Data Managers to harmonise and streamline the submission of CG-relevant metadata and data into the CG digital data repositories. All software and infrastructure is hosted within the CyVerse UK cloud. We have: - Implemented support of CG Core v.2.0. (http://repo.mel.cgiar.org/handle/20.500.11766/4764) metadata annotation of various data types, including publications, produced at the CGIAR institutes via the existing COPO wizard system. - Implemented support of submissions of annotated objects to institutional instances of the following repositories: dSpace (https://www.duraspace.org/dspace/), CKAN (https://ckan.org/) and Dataverse (https://dataverse.org/). - Designed and implemented a mechanism within COPO which controls which users can submit to which repositories. - Implemented support the annotation of variables within data sets (i.e. column headings; experiment condition descriptors etc) with terms and URIs from ontologies or controlled vocabularies/trait dictionaries (AGROVOC and GACS).
Collaborator Contribution CGIAR have provided coordination contributions with key members in the CG Centres to gather feedback on developed elements, as well as provided funds to allow a core CGCore metadata schema developer to travel to EI and work with Drs Etuk and Shaw to improve the CGCore schema.
Impact This collaboration has seen rapid development of key functionality in the COPO platform to support CG centre Data Managers. This has required technical skills to develop the software, biocuration expertise provided by CGIAR to improve and refine the CGCore metadata schema, ontology expertise from the Bioversity team in Montpellier, and coordination expertise from Dr Davey (EI) and Medha Devare (CGIAR). Software and Technical Products (Webtool/Application - Collaborative Open Plant Omics (COPO) (2017)): All software code developed is open source and can be found within the COPO Github repository: https://github.com/collaborative-open-plant-omics/COPO
Start Year 2018
 
Description Wheat Information System (WheatIS) 
Organisation Cold Spring Harbor Laboratory (CSHL)
Country United States 
Sector Charity/Non Profit 
PI Contribution The Grassroots infrastructure (https://grassroots.tools) developed at EI is being used to consolidate data and analyses, facilitating consistent approaches to generating, processing and disseminating public wheat datasets. The Grassroots infrastructure comprises: a data management layer to provide structure to unstructured filesystems; interfaces to interact with local or cloud-based analysis platforms; a search layer to provide multi-faceted metadata and literature querying; a web server layer to deliver content and provide access to public programmatic interfaces. EI has an extensive National Capability to provide scientific computing hardware to the UK research community and is therefore perfectly positioned to build a point-of-access to previously disparate resources to serve wheat breeders, biologists and bioinformaticians. Coupling the Grassroots project with BBSRC-funded efforts to bring Galaxy and CyVerse UK to UK researchers provides community standardised methodologies for data integration, interpretation and discovery in wheat. These resources are designed to be queried programmatically, and we are integrating them with other WheatIS resources (such as CerealsDB) accordingly via open source and freely available infrastructure. By doing so we will be promoting and facilitating an inclusive and collaborative community of experts to provide access to an interconnected network of wheat data to a scale that was simply not available previously. EI also has representation on the WheatIS Expert Working Group, meeting yearly at PAG to discuss strategy and policy for the Wheat Initiative.
Collaborator Contribution All WheatIS partners contribute to the global effort in harmonising, standardising, and sharing wheat data in a way that is technically sensible and user focused, thus minimising cost across a multi-faceted and independently funded project.
Impact This collaboration is multi-disciplinary in scope, undertaken by biologists, bioinformaticians, and breeders. Wheat Data Interoperability Guidelines - https://ist.blogs.inra.fr/wdi/
Start Year 2011
 
Description Wheat Information System (WheatIS) 
Organisation EMBL European Bioinformatics Institute (EMBL - EBI)
Country United Kingdom 
Sector Academic/University 
PI Contribution The Grassroots infrastructure (https://grassroots.tools) developed at EI is being used to consolidate data and analyses, facilitating consistent approaches to generating, processing and disseminating public wheat datasets. The Grassroots infrastructure comprises: a data management layer to provide structure to unstructured filesystems; interfaces to interact with local or cloud-based analysis platforms; a search layer to provide multi-faceted metadata and literature querying; a web server layer to deliver content and provide access to public programmatic interfaces. EI has an extensive National Capability to provide scientific computing hardware to the UK research community and is therefore perfectly positioned to build a point-of-access to previously disparate resources to serve wheat breeders, biologists and bioinformaticians. Coupling the Grassroots project with BBSRC-funded efforts to bring Galaxy and CyVerse UK to UK researchers provides community standardised methodologies for data integration, interpretation and discovery in wheat. These resources are designed to be queried programmatically, and we are integrating them with other WheatIS resources (such as CerealsDB) accordingly via open source and freely available infrastructure. By doing so we will be promoting and facilitating an inclusive and collaborative community of experts to provide access to an interconnected network of wheat data to a scale that was simply not available previously. EI also has representation on the WheatIS Expert Working Group, meeting yearly at PAG to discuss strategy and policy for the Wheat Initiative.
Collaborator Contribution All WheatIS partners contribute to the global effort in harmonising, standardising, and sharing wheat data in a way that is technically sensible and user focused, thus minimising cost across a multi-faceted and independently funded project.
Impact This collaboration is multi-disciplinary in scope, undertaken by biologists, bioinformaticians, and breeders. Wheat Data Interoperability Guidelines - https://ist.blogs.inra.fr/wdi/
Start Year 2011
 
Description Wheat Information System (WheatIS) 
Organisation French National Institute of Agricultural Research
Department INRA Versailles
Country France 
Sector Academic/University 
PI Contribution The Grassroots infrastructure (https://grassroots.tools) developed at EI is being used to consolidate data and analyses, facilitating consistent approaches to generating, processing and disseminating public wheat datasets. The Grassroots infrastructure comprises: a data management layer to provide structure to unstructured filesystems; interfaces to interact with local or cloud-based analysis platforms; a search layer to provide multi-faceted metadata and literature querying; a web server layer to deliver content and provide access to public programmatic interfaces. EI has an extensive National Capability to provide scientific computing hardware to the UK research community and is therefore perfectly positioned to build a point-of-access to previously disparate resources to serve wheat breeders, biologists and bioinformaticians. Coupling the Grassroots project with BBSRC-funded efforts to bring Galaxy and CyVerse UK to UK researchers provides community standardised methodologies for data integration, interpretation and discovery in wheat. These resources are designed to be queried programmatically, and we are integrating them with other WheatIS resources (such as CerealsDB) accordingly via open source and freely available infrastructure. By doing so we will be promoting and facilitating an inclusive and collaborative community of experts to provide access to an interconnected network of wheat data to a scale that was simply not available previously. EI also has representation on the WheatIS Expert Working Group, meeting yearly at PAG to discuss strategy and policy for the Wheat Initiative.
Collaborator Contribution All WheatIS partners contribute to the global effort in harmonising, standardising, and sharing wheat data in a way that is technically sensible and user focused, thus minimising cost across a multi-faceted and independently funded project.
Impact This collaboration is multi-disciplinary in scope, undertaken by biologists, bioinformaticians, and breeders. Wheat Data Interoperability Guidelines - https://ist.blogs.inra.fr/wdi/
Start Year 2011
 
Description Wheat Information System (WheatIS) 
Organisation Helmholtz Association of German Research Centres
Department Helmholtz Zentrum Munchen
Country Germany 
Sector Academic/University 
PI Contribution The Grassroots infrastructure (https://grassroots.tools) developed at EI is being used to consolidate data and analyses, facilitating consistent approaches to generating, processing and disseminating public wheat datasets. The Grassroots infrastructure comprises: a data management layer to provide structure to unstructured filesystems; interfaces to interact with local or cloud-based analysis platforms; a search layer to provide multi-faceted metadata and literature querying; a web server layer to deliver content and provide access to public programmatic interfaces. EI has an extensive National Capability to provide scientific computing hardware to the UK research community and is therefore perfectly positioned to build a point-of-access to previously disparate resources to serve wheat breeders, biologists and bioinformaticians. Coupling the Grassroots project with BBSRC-funded efforts to bring Galaxy and CyVerse UK to UK researchers provides community standardised methodologies for data integration, interpretation and discovery in wheat. These resources are designed to be queried programmatically, and we are integrating them with other WheatIS resources (such as CerealsDB) accordingly via open source and freely available infrastructure. By doing so we will be promoting and facilitating an inclusive and collaborative community of experts to provide access to an interconnected network of wheat data to a scale that was simply not available previously. EI also has representation on the WheatIS Expert Working Group, meeting yearly at PAG to discuss strategy and policy for the Wheat Initiative.
Collaborator Contribution All WheatIS partners contribute to the global effort in harmonising, standardising, and sharing wheat data in a way that is technically sensible and user focused, thus minimising cost across a multi-faceted and independently funded project.
Impact This collaboration is multi-disciplinary in scope, undertaken by biologists, bioinformaticians, and breeders. Wheat Data Interoperability Guidelines - https://ist.blogs.inra.fr/wdi/
Start Year 2011
 
Description Wheat Information System (WheatIS) 
Organisation International Centre for Maize and Wheat Improvement (CIMMYT)
Country Mexico 
Sector Charity/Non Profit 
PI Contribution The Grassroots infrastructure (https://grassroots.tools) developed at EI is being used to consolidate data and analyses, facilitating consistent approaches to generating, processing and disseminating public wheat datasets. The Grassroots infrastructure comprises: a data management layer to provide structure to unstructured filesystems; interfaces to interact with local or cloud-based analysis platforms; a search layer to provide multi-faceted metadata and literature querying; a web server layer to deliver content and provide access to public programmatic interfaces. EI has an extensive National Capability to provide scientific computing hardware to the UK research community and is therefore perfectly positioned to build a point-of-access to previously disparate resources to serve wheat breeders, biologists and bioinformaticians. Coupling the Grassroots project with BBSRC-funded efforts to bring Galaxy and CyVerse UK to UK researchers provides community standardised methodologies for data integration, interpretation and discovery in wheat. These resources are designed to be queried programmatically, and we are integrating them with other WheatIS resources (such as CerealsDB) accordingly via open source and freely available infrastructure. By doing so we will be promoting and facilitating an inclusive and collaborative community of experts to provide access to an interconnected network of wheat data to a scale that was simply not available previously. EI also has representation on the WheatIS Expert Working Group, meeting yearly at PAG to discuss strategy and policy for the Wheat Initiative.
Collaborator Contribution All WheatIS partners contribute to the global effort in harmonising, standardising, and sharing wheat data in a way that is technically sensible and user focused, thus minimising cost across a multi-faceted and independently funded project.
Impact This collaboration is multi-disciplinary in scope, undertaken by biologists, bioinformaticians, and breeders. Wheat Data Interoperability Guidelines - https://ist.blogs.inra.fr/wdi/
Start Year 2011
 
Description Wheat Information System (WheatIS) 
Organisation Monogram Network
Sector Academic/University 
PI Contribution The Grassroots infrastructure (https://grassroots.tools) developed at EI is being used to consolidate data and analyses, facilitating consistent approaches to generating, processing and disseminating public wheat datasets. The Grassroots infrastructure comprises: a data management layer to provide structure to unstructured filesystems; interfaces to interact with local or cloud-based analysis platforms; a search layer to provide multi-faceted metadata and literature querying; a web server layer to deliver content and provide access to public programmatic interfaces. EI has an extensive National Capability to provide scientific computing hardware to the UK research community and is therefore perfectly positioned to build a point-of-access to previously disparate resources to serve wheat breeders, biologists and bioinformaticians. Coupling the Grassroots project with BBSRC-funded efforts to bring Galaxy and CyVerse UK to UK researchers provides community standardised methodologies for data integration, interpretation and discovery in wheat. These resources are designed to be queried programmatically, and we are integrating them with other WheatIS resources (such as CerealsDB) accordingly via open source and freely available infrastructure. By doing so we will be promoting and facilitating an inclusive and collaborative community of experts to provide access to an interconnected network of wheat data to a scale that was simply not available previously. EI also has representation on the WheatIS Expert Working Group, meeting yearly at PAG to discuss strategy and policy for the Wheat Initiative.
Collaborator Contribution All WheatIS partners contribute to the global effort in harmonising, standardising, and sharing wheat data in a way that is technically sensible and user focused, thus minimising cost across a multi-faceted and independently funded project.
Impact This collaboration is multi-disciplinary in scope, undertaken by biologists, bioinformaticians, and breeders. Wheat Data Interoperability Guidelines - https://ist.blogs.inra.fr/wdi/
Start Year 2011
 
Description Wheat Information System (WheatIS) 
Organisation Rothamsted Research
Country United Kingdom 
Sector Academic/University 
PI Contribution The Grassroots infrastructure (https://grassroots.tools) developed at EI is being used to consolidate data and analyses, facilitating consistent approaches to generating, processing and disseminating public wheat datasets. The Grassroots infrastructure comprises: a data management layer to provide structure to unstructured filesystems; interfaces to interact with local or cloud-based analysis platforms; a search layer to provide multi-faceted metadata and literature querying; a web server layer to deliver content and provide access to public programmatic interfaces. EI has an extensive National Capability to provide scientific computing hardware to the UK research community and is therefore perfectly positioned to build a point-of-access to previously disparate resources to serve wheat breeders, biologists and bioinformaticians. Coupling the Grassroots project with BBSRC-funded efforts to bring Galaxy and CyVerse UK to UK researchers provides community standardised methodologies for data integration, interpretation and discovery in wheat. These resources are designed to be queried programmatically, and we are integrating them with other WheatIS resources (such as CerealsDB) accordingly via open source and freely available infrastructure. By doing so we will be promoting and facilitating an inclusive and collaborative community of experts to provide access to an interconnected network of wheat data to a scale that was simply not available previously. EI also has representation on the WheatIS Expert Working Group, meeting yearly at PAG to discuss strategy and policy for the Wheat Initiative.
Collaborator Contribution All WheatIS partners contribute to the global effort in harmonising, standardising, and sharing wheat data in a way that is technically sensible and user focused, thus minimising cost across a multi-faceted and independently funded project.
Impact This collaboration is multi-disciplinary in scope, undertaken by biologists, bioinformaticians, and breeders. Wheat Data Interoperability Guidelines - https://ist.blogs.inra.fr/wdi/
Start Year 2011
 
Description Wheat Information System (WheatIS) 
Organisation U.S. Department of Agriculture USDA
Department Agricultural Research Service
Country United States 
Sector Public 
PI Contribution The Grassroots infrastructure (https://grassroots.tools) developed at EI is being used to consolidate data and analyses, facilitating consistent approaches to generating, processing and disseminating public wheat datasets. The Grassroots infrastructure comprises: a data management layer to provide structure to unstructured filesystems; interfaces to interact with local or cloud-based analysis platforms; a search layer to provide multi-faceted metadata and literature querying; a web server layer to deliver content and provide access to public programmatic interfaces. EI has an extensive National Capability to provide scientific computing hardware to the UK research community and is therefore perfectly positioned to build a point-of-access to previously disparate resources to serve wheat breeders, biologists and bioinformaticians. Coupling the Grassroots project with BBSRC-funded efforts to bring Galaxy and CyVerse UK to UK researchers provides community standardised methodologies for data integration, interpretation and discovery in wheat. These resources are designed to be queried programmatically, and we are integrating them with other WheatIS resources (such as CerealsDB) accordingly via open source and freely available infrastructure. By doing so we will be promoting and facilitating an inclusive and collaborative community of experts to provide access to an interconnected network of wheat data to a scale that was simply not available previously. EI also has representation on the WheatIS Expert Working Group, meeting yearly at PAG to discuss strategy and policy for the Wheat Initiative.
Collaborator Contribution All WheatIS partners contribute to the global effort in harmonising, standardising, and sharing wheat data in a way that is technically sensible and user focused, thus minimising cost across a multi-faceted and independently funded project.
Impact This collaboration is multi-disciplinary in scope, undertaken by biologists, bioinformaticians, and breeders. Wheat Data Interoperability Guidelines - https://ist.blogs.inra.fr/wdi/
Start Year 2011
 
Description Wheat Information System (WheatIS) 
Organisation University of Bristol
Country United Kingdom 
Sector Academic/University 
PI Contribution The Grassroots infrastructure (https://grassroots.tools) developed at EI is being used to consolidate data and analyses, facilitating consistent approaches to generating, processing and disseminating public wheat datasets. The Grassroots infrastructure comprises: a data management layer to provide structure to unstructured filesystems; interfaces to interact with local or cloud-based analysis platforms; a search layer to provide multi-faceted metadata and literature querying; a web server layer to deliver content and provide access to public programmatic interfaces. EI has an extensive National Capability to provide scientific computing hardware to the UK research community and is therefore perfectly positioned to build a point-of-access to previously disparate resources to serve wheat breeders, biologists and bioinformaticians. Coupling the Grassroots project with BBSRC-funded efforts to bring Galaxy and CyVerse UK to UK researchers provides community standardised methodologies for data integration, interpretation and discovery in wheat. These resources are designed to be queried programmatically, and we are integrating them with other WheatIS resources (such as CerealsDB) accordingly via open source and freely available infrastructure. By doing so we will be promoting and facilitating an inclusive and collaborative community of experts to provide access to an interconnected network of wheat data to a scale that was simply not available previously. EI also has representation on the WheatIS Expert Working Group, meeting yearly at PAG to discuss strategy and policy for the Wheat Initiative.
Collaborator Contribution All WheatIS partners contribute to the global effort in harmonising, standardising, and sharing wheat data in a way that is technically sensible and user focused, thus minimising cost across a multi-faceted and independently funded project.
Impact This collaboration is multi-disciplinary in scope, undertaken by biologists, bioinformaticians, and breeders. Wheat Data Interoperability Guidelines - https://ist.blogs.inra.fr/wdi/
Start Year 2011
 
Description Wheat Information System (WheatIS) 
Organisation University of California, Davis
Department UC Davis College of Biological Sciences
Country United States 
Sector Academic/University 
PI Contribution The Grassroots infrastructure (https://grassroots.tools) developed at EI is being used to consolidate data and analyses, facilitating consistent approaches to generating, processing and disseminating public wheat datasets. The Grassroots infrastructure comprises: a data management layer to provide structure to unstructured filesystems; interfaces to interact with local or cloud-based analysis platforms; a search layer to provide multi-faceted metadata and literature querying; a web server layer to deliver content and provide access to public programmatic interfaces. EI has an extensive National Capability to provide scientific computing hardware to the UK research community and is therefore perfectly positioned to build a point-of-access to previously disparate resources to serve wheat breeders, biologists and bioinformaticians. Coupling the Grassroots project with BBSRC-funded efforts to bring Galaxy and CyVerse UK to UK researchers provides community standardised methodologies for data integration, interpretation and discovery in wheat. These resources are designed to be queried programmatically, and we are integrating them with other WheatIS resources (such as CerealsDB) accordingly via open source and freely available infrastructure. By doing so we will be promoting and facilitating an inclusive and collaborative community of experts to provide access to an interconnected network of wheat data to a scale that was simply not available previously. EI also has representation on the WheatIS Expert Working Group, meeting yearly at PAG to discuss strategy and policy for the Wheat Initiative.
Collaborator Contribution All WheatIS partners contribute to the global effort in harmonising, standardising, and sharing wheat data in a way that is technically sensible and user focused, thus minimising cost across a multi-faceted and independently funded project.
Impact This collaboration is multi-disciplinary in scope, undertaken by biologists, bioinformaticians, and breeders. Wheat Data Interoperability Guidelines - https://ist.blogs.inra.fr/wdi/
Start Year 2011
 
Description Wheat Information System (WheatIS) 
Organisation University of Western Australia
Country Australia 
Sector Academic/University 
PI Contribution The Grassroots infrastructure (https://grassroots.tools) developed at EI is being used to consolidate data and analyses, facilitating consistent approaches to generating, processing and disseminating public wheat datasets. The Grassroots infrastructure comprises: a data management layer to provide structure to unstructured filesystems; interfaces to interact with local or cloud-based analysis platforms; a search layer to provide multi-faceted metadata and literature querying; a web server layer to deliver content and provide access to public programmatic interfaces. EI has an extensive National Capability to provide scientific computing hardware to the UK research community and is therefore perfectly positioned to build a point-of-access to previously disparate resources to serve wheat breeders, biologists and bioinformaticians. Coupling the Grassroots project with BBSRC-funded efforts to bring Galaxy and CyVerse UK to UK researchers provides community standardised methodologies for data integration, interpretation and discovery in wheat. These resources are designed to be queried programmatically, and we are integrating them with other WheatIS resources (such as CerealsDB) accordingly via open source and freely available infrastructure. By doing so we will be promoting and facilitating an inclusive and collaborative community of experts to provide access to an interconnected network of wheat data to a scale that was simply not available previously. EI also has representation on the WheatIS Expert Working Group, meeting yearly at PAG to discuss strategy and policy for the Wheat Initiative.
Collaborator Contribution All WheatIS partners contribute to the global effort in harmonising, standardising, and sharing wheat data in a way that is technically sensible and user focused, thus minimising cost across a multi-faceted and independently funded project.
Impact This collaboration is multi-disciplinary in scope, undertaken by biologists, bioinformaticians, and breeders. Wheat Data Interoperability Guidelines - https://ist.blogs.inra.fr/wdi/
Start Year 2011
 
Title API for SeedStor 
Description API for https://www.seedstor.ac.uk to improve the programmatic access. Used in the Grassroots Infrastructure and CerealsDB. 
Type Of Technology New/Improved Technique/Technology 
Year Produced 2018 
Impact Grassroots Infrastructure and CerealsDB can now query SeedStor programatically instead of browsing the web page. 
URL https://github.com/TGAC/grassroots-seedstor-api
 
Title DFW cloud HPC resources 
Description Designing Future Wheat researchers are able to request virtual machines within CyVerse UK to undertake bioinformatics analysis. 
Type Of Technology Grid Application 
Year Produced 2019 
Impact We have produced a robust and secure cloud framework within CyVerse UK to allow DFW researchers to access DFW and public data to analyse, as well as upload their own. We have already completed two successful pilot projects with external collaborators, and are now making the services available to all DFW researchers. 
URL http://cyverseuk.org/about/collaborations/designing-future-wheat/
 
Title Docker Images for KnetMiner (bare, base, knetminer) 
Description This is a containerised version of KnetMiner for Cloud deployment - particularly for use in the Cyverse UK environment 
Type Of Technology Software 
Year Produced 2019 
Impact None yet 
 
Title EIRods-DAV 
Description Eirods-dav provides access to iRODS servers using the WebDAV protocol and has a complete REST API for accessing and manipulating metadata from within a web browser. It adds a substantial amount of functionality to the original Davrods module written by Ton Smeele and Chris Smeele, which is a bridge between the WebDAV protocol and the iRODS API. Eirods-dav leverages the Apache server implementation of the WebDAV protocol, mod_dav, for compliance with the WebDAV Class 2 standard. 
Type Of Technology Webtool/Application 
Year Produced 2018 
Open Source License? Yes  
Impact The software is now used to host the Designing Future Wheat data portal. 
URL https://opendata.earlham.ac.uk/wheat
 
Title EIRods-DAV 
Description Eirods-dav provides access to iRODS servers using the WebDAV protocol and has a complete REST API for accessing and manipulating metadata from within a web browser. It adds a substantial amount of functionality to the original Davrods module written by Ton Smeele and Chris Smeele, which is a bridge between the WebDAV protocol and the iRODS API. Eirods-dav leverages the Apache server implementation of the WebDAV protocol, mod_dav, for compliance with the WebDAV Class 2 standard. It also automatically generates and exports the datasets as Frictionless Data Packages. 
Type Of Technology Webtool/Application 
Year Produced 2021 
Open Source License? Yes  
Impact The software is used to host the Designing Future Wheat data portal. 
URL https://opendata.earlham.ac.uk/wheat
 
Title Grassroots BrAPI web service 
Description This is a web service that uses the Grassroots Field Trial service and adds a Breeding API (BrAPI) layer on top to allow other BrAPI-compliant software to access the field trial data. We currently have complete support for approximately a third of BrAPI classes and calls with partial support for others. 
Type Of Technology Webtool/Application 
Year Produced 2020 
Open Source License? Yes  
Impact This allows other data scientists, software developers and applications to easily access the field trial data stored in our system using a standard nomenclature and REST API. 
 
Title Grassroots Field Trial service 
Description A web-based application for submitting and searching for various aspects of field trial experimental data. 
Type Of Technology Webtool/Application 
Year Produced 2019 
Open Source License? Yes  
Impact A web-based application for submitting and searching for field trial data. 
URL https://grassroots.tools/beta/dynamic/fieldtrial_dynamic.html?type=AllFieldTrials
 
Title Grassroots Field Trial service 
Description Continuous updating of our existing web-based application for submitting and searching for various aspects of field trial experimental data. Updates include adding images , treatment factors, research programmes and vastly expanded faceted search functionality. 
Type Of Technology Webtool/Application 
Year Produced 2021 
Open Source License? Yes  
Impact 60 studies have now been added into the system and the pace of input is increasing as, after working closely with partners in the DFW programme, they are looking to use it as their submission system for all of this year's studies whilst in the field. 
URL https://grassroots.tools/beta/dynamic/fieldtrial_dynamic.html?type=AllFieldTrials
 
Title Grassroots Field Trial service 
Description Continuous updating of our existing web-based application for submitting and searching for various aspects of field trial experimental data. Updates include adding images , treatment factors, research programmes and vastly expanded faceted search functionality. 
Type Of Technology Webtool/Application 
Year Produced 2022 
Open Source License? Yes  
Impact 112 studies have now been added into the system. 
URL https://grassroots.tools/fieldtrial/all
 
Title Grassroots Frictionless Data Tool 
Description This is a command-line tool to extract the resources within a Frictionless Data Package into a variety of formats such as Markdown, HTML, CSV, etc. It will be available for as many different platforms as possible. It uses the schemas for each resource within the Data Package to generate the reports. It has in-built support for tabular-data-resources and will download and parse any web-based schemas from the resource profiles and use these when they are specified. It will output a file for each Data Resource within the Data Package. 
Type Of Technology Webtool/Application 
Year Produced 2021 
Open Source License? Yes  
Impact It allows users to get their data as Frictionless Data Packages and export them into other formats as needed 
 
Title Grassroots Gene Trees Search service 
Description This is a search service querying and mapping clusters to genes for sequence data 
Type Of Technology Webtool/Application 
Year Produced 2022 
Open Source License? Yes  
Impact This is used as the backend service for a user-friendly Gene Trees search service combined with BLAST searches 
 
Title Grassroots Parental Genotype service. 
Description This software stores information regarding peak markers and parental genotype information for various QTL. It is part of a collaboration between the University of Bristol, the John Innes Centre and the Earlham Institute. 
Type Of Technology Software 
Year Produced 2019 
Open Source License? Yes  
Impact This software is used by the CerealsDB web service to give users a simple way to browse between QTL, peak marker informations and the parental genotype information. 
URL http://www.cerealsdb.uk.net/cerealgenomics/CerealsDB/select_QTL.php
 
Title Grassroots Search service 
Description The Grassroots free-text search engine, based upon Lucene, allows us to give ranked, faceted results for various types of research data such as field trial information, research datasets, sequence data, etc. These data items are all faceted and each facet automatically weights searches for its specific fields. For example, queries that match study names get ranked higher than those that match queries in their description field instead. This is used for general searches as well as a specific faceted search applications such as the one we have for Measured Variables to denote phenotypic data. 
Type Of Technology Webtool/Application 
Year Produced 2021 
Open Source License? Yes  
Impact This has allowed users to search across all of our data within the EI Grassroots infrastructure and allowed users to get to both services and data more quickly. 
URL https://grassroots.tools/public/service/search
 
Title Grassroots core infrastrructure 
Description The Grassroots Infrastructure project aims to create an easily-deployable suite of computing middleware tools to help users and developers gain access to scientific data infrastructure that can easily be interconnected. With the data-generative approaches that are increasingly common in modern life science research, it is vital that the data and metadata produced by these efforts can be shared and reused. The Grassroots Infrastructure project wraps up industry-standard software tools with a consistent API that can be federated on a number of levels. This means institutions and groups can deploy a simple lightweight virtual machine, expose local data, connect up any existing data services, and federate their instance of the Grassroots with others out-of-the-box. The Grassroots Infrastructure uses a controlled vocabulary of JSON messages to communicate, so any server or client that can understand JSON can be used to access and connect to the platform. We provide infrastructure to ensure that the scientific data remains the important factor, and not the worry about how to build a system to expose your data. 
Type Of Technology Software 
Year Produced 2021 
Open Source License? Yes  
Impact The Grassroots Infrastructure has allowed researchers data scientists, breeders to perform a variety of data analyses such as sequence searching using BLAST, map-based interactive searches for field trial data, QTL parental genotype mapping, as well as custom bespoke software web services utilised by third parties such as the CerealsDB team at the University of Bristol as part of systems that they have developed for users. 
URL https://grassroots.tools
 
Title Grassroots core infrastructure 
Description The Grassroots Infrastructure project aims to create an easily-deployable suite of computing middleware tools to help users and developers gain access to scientific data infrastructure that can easily be interconnected. With the data-generative approaches that are increasingly common in modern life science research, it is vital that the data and metadata produced by these efforts can be shared and reused. The Grassroots Infrastructure project wraps up industry-standard software tools with a consistent API that can be federated on a number of levels. This means institutions and groups can deploy a simple lightweight virtual machine, expose local data, connect up any existing data services, and federate their instance of the Grassroots with others out-of-the-box. The Grassroots Infrastructure uses a controlled vocabulary of JSON messages to communicate, so any server or client that can understand JSON can be used to access and connect to the platform. We provide infrastructure to ensure that the scientific data remains the important factor, and not the worry about how to build a system to expose your data. 
Type Of Technology Webtool/Application 
Year Produced 2020 
Open Source License? Yes  
Impact The Grassroots Infrastructure has allowed researchers data scientists, breeders to perform a variety of data analyses such as sequence searching using BLAST, map-based interactive searches for field trial data, QTL parental genotype mapping, as well as custom bespoke software web services utilised by third parties such as the CerealsDB team at the University of Bristol as part of systems that they have developed for users. 
URL https://grassroots.tools
 
Title Grassroots core server software 
Description The Grassroots Infrastructure project aims to create an easily-deployable suite of computing middleware tools to help users and developers gain access to scientific data infrastructure that can easily be interconnected. With the data-generative approaches that are increasingly common in modern life science research, it is vital that the data and metadata produced by these efforts can be shared and reused. The Grassroots Infrastructure project wraps up industry-standard software tools with a consistent API that can be federated on a number of levels. This means institutions and groups can deploy a simple lightweight virtual machine, expose local data, connect up any existing data services, and federate their instance of the Grassroots with others out-of-the-box. The Grassroots Infrastructure uses a controlled vocabulary of JSON messages to communicate, so any server or client that can understand JSON can be used to access and connect to the platform. We provide infrastructure to ensure that the scientific data remains the important factor, and not the worry about how to build a system to expose your data. 
Type Of Technology Software 
Year Produced 2019 
Open Source License? Yes  
Impact The Grassroots Inftrastructure has allowed researchers data scientists, breeders to perform a variety of data analyses such as sequence searching using BLAST, map-based interactive searches for field pathogenomic data, field trial service as well as custom bespoke software web services utiliisd by third parties such as the CerealsDB team at the University of Bristol as part of systems that they have developed for users. 
URL https://grassroots.tools
 
Title Grassroots free-text search engine 
Description The Grassroots free-text search engine, based upon Lucene, allows us to give ranked, faceted results for various types of field trial data. Each facet automatically weights searches for its specific fields. For example, queries that match study names get ranked higher than those that match queries in their description field instead. This is used for general searches as well as a specific faceted search applications such as the one we have for Measured Variables to denote phenotypic data. 
Type Of Technology Webtool/Application 
Year Produced 2020 
Open Source License? Yes  
Impact This has allowed field trial data scientists to search across all of our data and allows them to search for the correct ontological terms to describe the phenotypic traits that have been measured within their trials. This has allowed researchers to be able to upload their data to our systems more quickly by allowing them to determine the correct ontological terms more easily. 
URL https://grassroots.tools/beta/public/SearchTreatment
 
Title Parental Genotype Service 
Description The Parental Genotype Service works with data from various cross-parental breeding lines with associated genotypic markers along with which parent is responsible for their presence in the child line. It can accept various queries across this data, 
Type Of Technology Webtool/Application 
Year Produced 2018 
Open Source License? Yes  
Impact As part of a collaboration with Paul Wilkinson at the University of Bristol and Luzie Wingen at the John Innes Centre, it is used as part of a QTL web service available from the CerealsDB website. 
URL http://www.cerealsdb.uk.net/cerealgenomics/CerealsDB/select_QTL.php
 
Title The Grassroots Infrastructure 
Description The Grassroots software is an open source "as-a-Service" stack that powers a number of data dissemination and analysis activities at EI, and other sites such as CerealsDB at the University of Bristol. We have continued to develop the functionality within the software stack to share crop-related datasets. 
Type Of Technology Webtool/Application 
Year Produced 2018 
Open Source License? Yes  
Impact Grassroots has previously been used to host the Field Pathogenomics project website and Yellow Rust map, the EI wheat BLAST service, the CerealsDB federation project, and the multi-scale improvements to the Polymarker marker design software. Recently, Grassroots has been put forward as the main data repository and metadata catalogue for the Designing Future Wheat project, and has started to host data from this project, the Open Wild Wheat Consortium, and 5 new wheat genomes from EI. The Grassroots service runs within the CyVerse UK National Capability infrastructure. 
URL https://grassroots.tools/
 
Title Wiki page: Deploying and Developing KnetMiner with Docker 
Description This is documentation for how to develop and deploy KnetMiner using Docker 
Type Of Technology Webtool/Application 
Year Produced 2019 
Impact None 
URL https://github.com/Rothamsted/knetminer/wiki/8.-Docker
 
Title iRODS filename completion tool for BASH 
Description This is a tool to allow the iRODS client icommands to have auto-complete functionality within Bash as is the case with normal mounted filesystems. 
Type Of Technology Software 
Year Produced 2018 
Open Source License? Yes  
Impact For people using the iRODS client icommands, it allows time to be saved as instead of having to type out the full paths which is error-prone and time-consuming, they can simply press the tab key to get all current matching filenames taking into account any characters that they may have already entered. 
 
Title iRODS server open to the public 
Description data store federated with CyVerse for users to store and share datasets with collaborators. It also hosts a data commons for DFW project 
Type Of Technology Systems, Materials & Instrumental Engineering 
Year Produced 2019 
Impact Users can now store a considerable amount of data in a UK repository, while being able to run jobs using such data as input. They can also easily share data with collaborators relying on fast data transfer. They can also make data public changing permissions or submit them with metadata to public folders 
 
Title web server hosting on the CyVerse UK e-infrastructure 
Description CyVerseUK is hosting a 3 web servers for external researchers free of charge for a total of 5 hosts. 
Type Of Technology Systems, Materials & Instrumental Engineering 
Year Produced 2019 
Impact Researchers have the possibility to host their work free of charge for the community to take advantage of. they don't have to rely on commercial vendors and can take advantage of EI support in case of issues. 
 
Description ELIXIR Compute Platform F2F Meeting 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact There were discussion about how to proceed to develop a european cloud infrastructure for life science, both in technical terms and in terms of policies.
Year(s) Of Engagement Activity 2018
 
Description AI for Wheat workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact The AI for Wheat workshop was a meeting of approximately 50 people from academia and industry to examine ways to use AI methods and algorithms on wheat-based data.
Year(s) Of Engagement Activity 2020
 
Description AI for Wheat workshop 2020 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Attended the DFW AI for Wheat workshop at the Alan Turing Institute, where we discussed and planned the submission of proposals for collaborative Data Study Groups around the use of machine learning for wheat data analysis.
Year(s) Of Engagement Activity 2020
 
Description Ai for Wheat Workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact The Alan Turing Institute (ATI) hosted a workshop in partnership with the BBSRC-funded Designing Future Wheat research programme to explore the potential for innovative applications of data science and artificial intelligence to address problems relevant to the UK wheat community.

This two day workshop will bring members of the UK wheat community and data science researchers together to identify suitable and exciting biological problems (with associated datasets) that could be further developed and made ready for a future wheat-focussed Data Study Group (DSG). DSGs are week-long intensive interdisciplinary workshops which explore a small number of topics in greater depth to seed longer-term collaborations which could lead to joint publications or research proposals. See here for more information https://www.turing.ac.uk/collaborate-turing/data-study-groups.

We hope to encourage wide participation from data scientists and the UK wheat community, including stakeholders from industry and government. Problems considered within scope for the workshop include, but are not limited to: wheat genomics and genetics, phenotyping, breeding, GxExM interactions, agronomy, wheat pests and disease interactions, impacts of climate change, sustainability and wheat as part of a healthy diet.

We particularly want to encourage participation from members of the wider data science community who consider that they have relevant skills or experience that could be brought to bear on the areas listed above and might be interested to be involved in future DSGs as a challenge owner (Early Career Researcher) or as a potential PI.
Year(s) Of Engagement Activity 2020
 
Description Attendance at the ELIXIR-UK All Hands 2020 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Rob Davey attended the ELIXIR-UK All Hands 2020 to take part in discussions about ongoing and possible future collaborations
Year(s) Of Engagement Activity 2020
URL https://www.earlham.ac.uk/elixir-uk-all-hands-2020
 
Description Attendance at the UK-Conference of Bioinformatics and Computational Biology 2020 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Attendance at the UK-Conference of Bioinformatics and Computational Biology 2020
The UK-CBCB conference is designed to bring together biologists, bioinformaticians, computer scientists, software engineers and data scientists across the life sciences to share innovations, applications and best practice in their fields. Rob Davey, Nicola Soranzo, Alice Minotto attended to participate in discussions and to add impact to the ongoing work in bioinformatics
Year(s) Of Engagement Activity 2020
URL https://www.earlham.ac.uk/uk-conference-bioinformatics-and-computational-biology-2020#About-
 
Description Attended and Presented a talk at the DFW All Hands 2020 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Rob Davey attended the DFW All Hands 2020 (DFW Annual Meeting) and presented a talk on the work being carried out in WP4
Year(s) Of Engagement Activity 2020
 
Description BIKE data management 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Postgraduate students
Results and Impact We gave an overview of good practice for data management to post graduate students working in research park as part of the bioinformatics knowledge exchange series of lesson they organise.
Year(s) Of Engagement Activity 2019
 
Description BMGF CIMMYT - UK Wheat Research Workshop - August 2017 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact A workshop was organised by the Bill and Melinda Gates Foundation between key CIMMYT investigators and Designing Future Wheat Investigators to explore areas of common interest to help network links and to identify areas of common interest. Possibilities of future collaborative projects were explored.
Year(s) Of Engagement Activity 2017
 
Description Blog about successful award application 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact A blog describing our initial work on adding support for the Frictionless Data standard to our hosted data. This resulted in interest and feedback from new research groups about possible future work.
Year(s) Of Engagement Activity 2020
URL https://frictionlessdata.io/blog/2020/08/17/frictionless-wheat/
 
Description Blog after completion of project 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact A blog describing our fully-completed work on adding support for the Frictionless Data standard to our hosted data. This resulted in interest and feedback from new research groups about possible future work.
Year(s) Of Engagement Activity 2021
URL https://frictionlessdata.io/blog/2021/03/05/frictionless-data-for-wheat/
 
Description Challenges and Opportunities in Plant Science Data Management (Workshop, PAG 2019) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Dr Davey organised the PAG 2019 workshop Challenges and Opportunities in Plant Science Data Management alongside Carolyn Lawrence-Dill from Iowa State University, USA.
Year(s) Of Engagement Activity 2019
URL https://www.intlpag.org/2019/
 
Description DFW Hackathon 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact A workshop to discuss and implement potential collaborations to create tools to solve bioinformatic needs within the DFW community.
Year(s) Of Engagement Activity 2019
 
Description DivSeek Partner's Meeting (PAG 2019) 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Dr Davey attended the DivSeek Partner's Meeting in the Courtyard Marriott hotel at PAG 2019.
Year(s) Of Engagement Activity 2019
 
Description Down The Tubes! Talk at the Norwich Science Festival 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact Dr Davey gave a talk on the internet and data science entitled "Down The Tubes!" at the 2018 Norwich Science Festival.
Year(s) Of Engagement Activity 2018
URL https://norwichsciencefestival.co.uk/events/down-the-tubes/
 
Description ELIXIR Compute Platform F2F Meeting 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact This was a meeting to delineate the next steps in achieving the tasks of the ELIXIR computing platform. Particular attention was given to the creation of an European hybrid cloud with common AAI to enable life research and the integration with EOSC. Plans were made to engage with the community to develop pilots.
Year(s) Of Engagement Activity 2019
 
Description ELIXIR all hands 2020 - poster: the Cyverse UK cyberinfrastructure offer to support open science and FAIRness 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact We presented a poster to highlight what the CyVerse UK infrastructure is supporting in term of all the services hosted (COPO, grassroots and Galaxy in particular) to improve the FAIRness of the data landscape. We also showed how the data is shared with collaborators trough federation of iRODS instances.
Year(s) Of Engagement Activity 2020
URL https://f1000research.com/posters/9-525
 
Description ELIXIR compute platform meeting 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact meeting to discuss what's been going on since March and what progresses we have made in the different working groups of the platform. Especially participated in the discussion about the creation of an hybrid cloud which researchers could submit to directly from any ELIXIR institution/country.
Year(s) Of Engagement Activity 2020
 
Description EMPHASIS 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact EMPHASIS is a project to develop a European Science Infrastructure (ESFRI) for plant phenotyping. A community workshop was held at Rothamsted in 2018. An important component of the activities of EMPHASIS is the management and sharing of data following agreed national/international standards.
Year(s) Of Engagement Activity 2018
URL https://emphasis.plant-phenotyping.eu/
 
Description Engagement with industry - KWS UK Ltd. 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Industry/Business
Results and Impact Representatives from KWs were given an overview of the wheat projects and NC3 infrastructures and discussed the possibility to be involved.
Year(s) Of Engagement Activity 2018
 
Description Frictionless Data for Wheat - CSV Conf talk 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact Gave a talk on Frictionless Data for Wheat as part of the Grassroots Infrastructure
Year(s) Of Engagement Activity 2021
URL https://csvconf.com/2021/
 
Description Frictionless Data for Wheat blog 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact A blog to describe our work on tools for integrating Frictionless Data into the Grassroots Infrastructure as part of our successful grant application from the Frictionless community.
Year(s) Of Engagement Activity 2021
URL https://frictionlessdata.io/blog/2021/03/05/frictionless-data-for-wheat/
 
Description Frictionless Data for Wheat talk 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact As part of the Frictionless Data Community Call series, we gave a talk on the Frictionless Data functionality that has been developed as part of the Grassroots Infrastructure
Year(s) Of Engagement Activity 2021
 
Description Grassroots Field Trial Workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Industry/Business
Results and Impact 25 people attended a workshop on submitting data to and using the Grassroots Field Trial system
Year(s) Of Engagement Activity 2022
 
Description Grassroots: An infrastructure for sharing services & data 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact A talk at a conference on agricultural data to show the various applications available as part of the Grassroots Infrastructure for disseminating bioinformatics data.
Year(s) Of Engagement Activity 2019
 
Description Grassroots: Field Trials database presentation 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Gave a talk on the Grassroots Field Trial system as part of the annual DFW all-hands meeting
Year(s) Of Engagement Activity 2021
 
Description Laying the Foundations; Why are Semantics in Agriculture Difficult? - PAG 2020 talk in Plant Phenotypes workshop 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Dr Davey gave an invited talk to approx 90 attendees at the PAG 2020 workshop "Plant Phenotypes"
Year(s) Of Engagement Activity 2020
 
Description Norwich science festival 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact A number of member from the public attended the Norwich science festival, where the Earlham institute presented the work being done in Norwich.
Year(s) Of Engagement Activity 2018
URL https://norwichsciencefestival.co.uk/
 
Description Organiser of Challenges and Opportunities in Plant Science Data Management PAG workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Co-organiser of Challenges and Opportunities in Plant Science Data Management PAG workshop, which saw 6 international speakers deliver presentations on various aspects of data management in the plant sciences. Approx 50 attendees.
Year(s) Of Engagement Activity 2020
 
Description Phenotyping Data Standards Workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact This workshop was convened to discuss the use of an emerging international standard for describing phenotyping datasets - MIAPPE withing the Designing Future Wheat programme. Most of the participants were collaborators in the the project, but a major contributor came from INRA.
Year(s) Of Engagement Activity 2018
 
Description Poster: Support open science and FAIRness through an integrated collaborative platform for life science: CyVerse UK and hosted services 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Presented as CyVerse, COPO, Galaxy and Grassroots icrease the level of FAIRNESS in the research process.
Year(s) Of Engagement Activity 2018
URL http://www.igst.it/nettab/2018/
 
Description Poster: Support open science and FAIRness through an integrated collaborative platform for life science: CyVerse UK and hosted services 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact poster presentation of the projects supported by the CyVerse UK infrastructure and how they contribute to the FAIRness of data
Year(s) Of Engagement Activity 2019
 
Description Presentation at DockerCon 2019: Value in simplicity - how Docker is helping Academia and non-dev 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact A number of participants to the conference from different backgrounds were given an overview of the reproducibility issue in science and how the new containerization technology is proving useful.
Year(s) Of Engagement Activity 2019
URL https://www.docker.com/dockercon/2019-videos?watch=value-in-simplicity-how-docker-is-helping-academi...
 
Description Presentation at Research Data Alliance's 14th Plenary - Interest Group on Agricultural Data (IGAD) Pre-Meeting 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Presented Grassroots infrastructure at Research Data Alliance's 14th Plenary - Interest Group on Agricultural Data (IGAD) Pre-Meeting in Helsinki Finland.

IGAD is a domain-oriented group working on all issues related to global agriculture data. It represents stakeholders in managing data for agricultural research and innovation, including producing, aggregating and consuming data.
Year(s) Of Engagement Activity 2019
URL http://aims.fao.org/activity/blog/presentations-available-igad-meeting-during-rda-14th-plenary
 
Description Preserving, Restoring and Managing Colombian Biodiversity Through Responsible Innovation - GROW Colombia UK workshop 2019 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Robert Davey gave a talk on the C3 Biodiversidad ConsortiumProject Coordination and Website
Year(s) Of Engagement Activity 2020
 
Description Support open science and FAIRness through an integrated collaborative platform for life science: CyVerse UK and hosted services 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The Earlham Institute, an Elixir UK node, is home to CyVerse UK, a collaborative cyberinfrastructure for life science. CyVerse UK objectives align greatly with the Elixir vision, as it aims to ensure researchers have easy access to HTC resources while lowering the entry barrier to bioinformatics, thanks both to the easy of use of the platform and the trainings provided. Great focus is posed on data storage, management, and overall how to ensure FAIRness. The Cyverse Data Store and Data Commons come with attached metadata, in the latter case a bare minimum set is required. Data availability and reliable data transfer take advantage of iRODs. The CyVerse cyberinfrastructure also hosts COPO and Grassroots, which are of particular interest to the data ecosystem. COPO is a brokering service between scientists and public repositories, enabling management, aggregation and publication of research outputs. COPO eases the process of metadata attribution by presenting the same intuitive interface for different repositories, and a wizard to guide the user through the steps of adding metadata. The Grassroots Genomics project aims to facilitate consistent approaches to generating, processing and disseminating public wheat datasets so that research efforts can be translated into community valuable resources thanks to effective sharing and reuse of data. On the computational side, CyVerse UK offers a number of registered and versionised applications users can run both using an API or through the parent CyVerse US web interface. Our last report shows how researchers not only from the UK, but also from Europe, America, Africa and Asia benefited from these applications. The CyVerse UK pool also hosts a Galaxy instance reserved to collaborators at BeCA. The expansion of the infrastructure will allow us to offer on demand virtual machines to the research community to support them in development, training or with collaborative virtual laboratory.
Year(s) Of Engagement Activity 2019
 
Description Talk delivered at the ELIXIR Biodiversity working group Inaugural meeting - Milan 2020 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Policymakers/politicians
Results and Impact Organised as part of a new ELIXIR working group to address challenges in biodiversity data management and infrastructure.
Year(s) Of Engagement Activity 2020
 
Description Training held at BeCA: Cyverse ecosystem, basic Sysadmin skills and dockerization 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact About 15 researchers from Easter Africa attended a 1 week training course, as part of a longer training in bioinformatics, to learn about virtual machines, Docker, basic Sysadmin skills and how to apply this knowledge using CyVerse. CyVerse UK also supplied the virtual machines for the training to take place.
Year(s) Of Engagement Activity 2018
 
Description UKRI Darwin Tree of Life Project meeting, London 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Dr Davey travelled to London with other EI staff to discuss strategy for an SPF bid to UKRI for the UK Darwin Tree of Life Project.
Year(s) Of Engagement Activity 2018
 
Description Wheat Bioinformatics III workshop, South Africa 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact 25 people from across the crop industrial and academic sectors were trained in Wheat Bioinformatics, the third in a series of workshops organised by Diane Saunders, Burkhard SteuerNagel and Rob Davey, funded by the UK High Commission in South Africa. This workshops are valuable for the attendees to learn the up-to-date computational and analytical techniques to make the most of their own and publicly available wheat data, feeding these skills directly into their breeding programmes.
Year(s) Of Engagement Activity 2020
 
Description Wheat Initiative group discussion at Plant and Animal Genome conference 2019 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Discussion of the latest research activities from the Wheat Initiative members.
Year(s) Of Engagement Activity 2019
 
Description WheatIS Expert Working Grop 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The expert working group for the Wheat Information System is developing standards and tools to enable the global wheat science community to share data effectively.
Year(s) Of Engagement Activity 2014,2015,2016,2017,2018
URL http://wheatis.org/
 
Description Why cloud computing is important for life science research 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Article to highlight services and opportunities arising from the use of cloud computing in life science research
Year(s) Of Engagement Activity 2022
URL https://www.earlham.ac.uk/articles/why-cloud-computing-important-data-driven-bioscience-research
 
Description computing platform ELIXIR F2F meeting 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Meeting to updates in the compute platform of ELIXIR and make plans for interactions with EOSC and start using AAI ELIXIR authentication system.
Year(s) Of Engagement Activity 2019
 
Description iRODS UGM 2021 Talk 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Gave a talk on our iRODS developments as part of our Grassroots Infrastructure
Year(s) Of Engagement Activity 2021