Federating access to wheat data services for efficient genome-specific marker design

Lead Research Organisation: Earlham Institute
Department Name: Research Faculty

Abstract

Wheat is the most widely grown crop worldwide that provides 20% of the calories to the growing human population. It is estimated that the average person will consume the grain of 50 wheat plants per day (https://www.jic.ac.uk/calculations/), and to support this the UK exports 15-20% (~ 2m tonnes) of its yearly crop to over 20 countries worldwide [1], as well as providing for the UK market. Research into breeding programmes over the last decade has made large improvements in key traits such as yield, and growing ability in tough conditions for world market viability. It is strongly predicted that rapid climate change, newly emerging wheat diseases, and reliance on a small set of wheat varieties will greatly challenge modern day agriculture and food production.

The availability of information about wheat genomes and the differences between them (variation) are leading a breakthrough in wheat research. Current services that share information about wheat genomes and these differences give researchers the ability to find regions of interest that match their research goals, and to understand and exploit characteristics of these regions for improving the crop. Such information can then be used in breeding programmes to design genetic markers for traits of interest, akin to marking Points of Interest on a map or navigation system. Once these markers have been discovered, robotic platforms can take this information and can screen thousands of wheat lines a day to look for matches, and hence potential knowledge about how that plant may perform in breeding experiments under different conditions.

Tools and resources that harness the power of breeding data and analysis packages, both openly available to academics and industry alike, are key to accelerating wheat breeding programmes in the coming years. There are many web-based databases and information services that exist for housing and exposing wheat data. However, the stages leading up to screening the wheat lines involve intensive and laborious manual processes, and the availability of this information and the way it is represented is not consistent which makes it difficult for researchers and breeders to effectively utilise it for their research. Users must submit information at each step to multiple online or local analysis tools, run multiple queries and analyses, and manually process the results in desktop computer applications to ensure that they can be fed into the next tools in the workflow.

Our project will remove these manual steps by developing software to automate the required interactions with commonly used online wheat data resources. As such, we will build software tools that are able to automatically connect each wheat data service in turn to form a workflow, understanding and processing the data produced by a previous service to provide the input data to the next service. This will free up valuable researcher time and, due to the removal of necessary human intervention and management of potentially complex data files, will result in a more robust and reproducible workflow.

Technical Summary

Research datasets generated from non conventional model organisms are as rarely curated and accessible as those generated as part of the Human Genome Project, ENCODE, and Ensembl. Most research data is either housed in public repositories which have the goal of archiving data for long periods of time to allow researchers to download their own copies (e.g. EMBL-EBI European Nucleotide Archive), or institutional repositories and databases that have specific points of access and do not typically offer data integration services. Therefore, it is common practice for researchers to use a multitude of services, often linking them together with manual conversion and formatting of the data in intermediate steps. This "context switching" between services is a bottleneck for research data sharing, subsequent reuse of data in analyses, and scientific reproducibility in general. This is even more of a burden on those researchers that do not have a computational background, and rely on bioinformaticians and/or specialist tools to assist them in their investigations.

In the current "big data" multi-disciplinary research environment, access to information stored in single standalone databases is not sufficient to undertake the integrative aspects of modern computational analysis. Efforts such as Ensembl Plants have made significant inroads into providing a system that allows comparative analysis across multiple plant species, and CerealsDB aims to expose a large amount of informative wheat variation data freely and openly. However, these systems are not able to intercommunicate easily, with users often having to manually undertake multiple analysis steps. The integrative workflow proposed in this project will provide the necessary infrastructure to connect and query multiple resources of genomic information in order to make the process of marker-assisted primer design in wheat faster, more efficient, and more comprehensive than is currently available.

Planned Impact

Academic and Commercial Impacts:
Within academic institutes, researchers are very well accustomed to finding and using the latest resources, datasets and methods that are disseminated through traditional publication routes. As such, open and efficient access to these elements is crucial to promote uptake, realise impact, and continue to foster the global effort in wheat genomics. However, not all research groups have access to computational expertise in order to carry out and streamline what can be complex pipelines of data retrieval, conversion, analysis and exploitation. Open reusable workflows packaged up in easy-to-use well-designed web-based solutions are a vital part in the toolbox of researchers to maximise their time and promote scientific reproducibility.

Within breeding companies, the number of highly trained researchers who keep abreast of the newest wheat genomic resources is limited. Typically, it is these researchers and their teams that need to dedicate a large proportion of their work time to painstakingly navigating the available resources to design just one marker. Providing breeding companies with automated open access workflows to carry out these manual steps will increase efficiency and therefore improve productivity of breeding new varieties. In addition, many beneficial traits in wheat are introduced from other cultivars or wild wheat relatives. The ability to generate markers that tag the gene of interest as precisely as possible and in a reproducible manner will help to reduce introgressed regions in newly bred varieties, and therefore a larger number of improved varieties will be released onto the market. This has a direct and tangible impact to both the drive towards food security, but also the public perception of wheat breeding efforts.

Societal impacts:
The wheat community has seen reluctance to adopt open data conventions and widespread data sharing. This is understandable, given the direct applications of the translational aspects of wheat genomics research to breeding programmes. However, it is becoming clear that the field of food sustainability and security requires openness and transparency in order to enhance the public perception of crop improvement. The objectives of this proposal will not only make researchers' lives easier through the application of bioinformatics techniques, but will also highlight to the public the increasing need for computational infrastructures to improve the efficiency of crop research and handling the increasingly large datasets it produces. In this way, this project will contribute to the movement towards a more open and inclusive approach to wheat data sharing.

Drs Davey and Krasileva are committed to rapid dissemination of fundamental research into crop improvement, alongside open source computer software development, to multi-disciplinary beneficiaries. This is evidenced in Dr Davey's membership of the Open Bioinformatics Foundation (OBF) and both investigators' existing collaborations in functional wheat genomics, crop data infrastructure development, large-scale analytical platforms, and data sharing projects. Similarly, TGAC's body of freely available bioinformatics tools and resources reflect the commitment of the institute to furthering life science research through open science and computational excellence.

Publications

10 25 50

publication icon
Wilkinson PA (2020) CerealsDB-new tools for the analysis of the wheat genome: update 2020. in Database : the journal of biological databases and curation

 
Description After examining the Representational State Transfer (REST) for Ensembl available at http://rest.ensemblgenomes.org/, we developed an exemplar service, initially for sequence-based queries and is available at https://grassroots.tools/dynamic-web/services.html?service=Ensembl%20Plants%20service. As part of the DFW work, this service will get further expanded to increase the integration with Ensembl.
We have created the layer for Polymarker to be used as a Grassroots service. The source is available on GitHub at https://github.com/TGAC/grassroots-service-polymarker.
As well as integrating Polymarker into Grassroots, we have extended the code adding features that are not available on the original Polymarker website such as the ability to fine-tune parameters for Primer3 (https://sourceforge.net/projects/primer3/), the component used to design the PCR primers from a DNA sequence.
We have completed a fully-specified JSON format documented at https://grassroots.tools/docs/api/schema_guide.html using as many existing onotological standards as possible such as schema.org (https://schema.org/), EDAM (http://edamontology.org/page), Sequence Ontology (http://www.sequenceontology.org/) and many others.
The infrastructure tools to communicate between different layers and services has been completed. One feature of this is the ability for services to be distributed between remote Grassroots servers and appear to the user as a single unified instance.
We have created and integrated the marker design tool into our dynamic user interface web frontend. (https://grassroots.tools/dynamic-web/services.html?service=Polymarker%20service)
It populates parameters required for the tool from backend automatically such as available sequence files, Primer3 settings. From our BLAST search service interface (https://grassroots.tools/dynamic-web/services.html?service=BlastN%20service), it is now possible to click the SNP and ran the Polymarker service to get the results directly.
The web interface provides compatibility across different types of devices and can track its usage through Google Analytics anonymously.
Exploitation Route Our linked service functionality makes user input extremely straightforward as they do not need to navigate to multiple web pages and collate analysis results manually; it is all automatically handled. These linked services can take the output of one service and automatically convert it to be the input of another service. For example, we have a Samtools service (https://grassroots.tools/dynamic-web/services.html?service=SamTools%20service) that automatically parses the output from running a BLAST service, and generates the correct parameters to allow the user to download the full scaffold in which the hit occurred with a single click. The Polymarker service has also been extended and can be automatically initiated for a SNP within a BLAST result using the same linked service features.
Sectors Agriculture, Food and Drink,Digital/Communication/Information Technologies (including Software)

 
Description Our updated BLAST service has been used successfully in production completing thousands of jobs. This resource is freely available for researchers across the academic and industrial spectrum. As part of the JSON standardisation, we also took the results from BLAST searches and converted them into fully marked-up and well-described data using these existing standard ontologies allowing for easier reuse.
First Year Of Impact 2018
Sector Agriculture, Food and Drink,Digital/Communication/Information Technologies (including Software)
 
Description Interview with Environment Adviser from the UK Parliamentary Office of Science and Technology
Geographic Reach National 
Policy Influence Type Implementation circular/rapid advice/letter to e.g. Ministry of Health
Impact Contacted by UK Parliament to contribute to a POSTnote (short document to advise ministers on a given topic) on genebanks and Digital Sequence Information as a result of my recent election to the DivSeek Board of Directors. I was interviewed to provide information around current international policies on DSI and how future UK involvement might be shaped around open licencing/MTAs of DSI datasets.
URL https://www.parliament.uk/postnotes
 
Title Grassroots Genomics grid infrastructure 
Description Integrative research requires extensive multi-level approaches to enrich and expose data and workflows so that informatics infrastructures can process them effectively. The Grassroots Infrastructure is developed at the Earlham Institute (EI) to consolidate data and analyses, facilitating consistent approaches to generating, processing and disseminating public datasets in the plant sciences. Its lightweight reusable software stack comprises: an iRODS data management layer to provide structure to unstructured filesystems, with Elasticsearch indexed metadata and Davrods exposed WebDAV APIs; interfaces to interact with local or cloud-based analysis platforms; an Apache web server layer to deliver content and provide access to public programmatic interfaces; services such as: BLAST search on multiple databases across different sites; a mapping tool showing pathogen samples with temporal and spatial data. It can be run locally or packaged in virtual containers and deployed on a variety of hardware thus representing a decentralised system, allowing information generators to retain control over their resources but allowing interconnected resources to access each other consistently. As such, Grassroots represents EI's contribution to the Wheat Initiative Wheat Information System (WheatIS) project, formalising the infrastructure as the federated UK WheatIS node involving partners from the University of Bristol, the European Bioinformatics Institute, Rothamsted Research, and the John Innes Centre. We are currently working on lightweight mechanisms to expose underlying grid architecture using WebDAV, standardised APIs such as the Breeding API (BrAPI) and schemas such as Frictionless Data and BioSchemas to enable greater interoperability with a variety of existing services, and integration with data analysis platforms such as CyVerse and Galaxy. 
Type Of Material Data handling & control 
Year Produced 2014 
Provided To Others? Yes  
Impact This infrastructure powers the handling and release of the wheat genomics data arising from EI's flagship wheat programmes, as well as aggregating previously published datasets. Currently we have a BLAST service running on top of this infrastructure, but we are currently building federation options into the platform with the iRODS data grid software. The CerealsDB project at the University of Bristol is a widely used and vital resource for the wheat community, and the Bristol group are deploying the Grassroots infrastructure to facilitate integration of the resources held there with the resources at EI. The Field Pathogenomics project (BBSRC IPA funded project BB/M025519/1) is also powered by the Grassroots platform, enabling a fast and informative web-based user interface based on data collected by the project relating to wheat yellow rust epidemiology. 
URL https://wheatis.tgac.ac.uk/grassroots/api/
 
Description DivSeek Partnership 
Organisation DivSeek International
Sector Learned Society 
PI Contribution I bring infrastructure expertise to this partnership, influencing and impacting policy to provide computational and training capacity to other DivSeek partners. I promote the range of infrastructure projects that are developed in my group at EI, but also solutions developed at other centres that can contribute to the DivSeek consortium. Partners are exposed to EI projects such as COPO, Grassroots (Wheat Information System, CerealsDB, marker design), CyVerse UK and Galaxy, through working group communications and meetings at international conferences such as PAG and RDA. I lead the Data Standards for Interoperable Tools working group, and we aim to collate community-suggested standards and tools, and advise the partnership and their stakeholders in best practice for delivery of sustainable and interoperable infrastructure.
Collaborator Contribution The DivSeek consortium contributes expertise and knowledge exchange in advances in crop diversity, improving our networking and understanding of challenges and potential solutions to social, structural, and biological problems. With over 66 global partners including EI, this is a powerful and highly respected group of research institutes that are working together to enable a step change in efficiency of interactions, leading to improved crop diversity research and data sharing.
Impact EI is a founding partner of DivSeek, and Dr Davey leads one of the new working groups, "Data Standards for Interoperable Tools" (http://www.divseek.org/standards/)
Start Year 2015
 
Description Wheat Information System (WheatIS) 
Organisation Cold Spring Harbor Laboratory (CSHL)
Country United States 
Sector Charity/Non Profit 
PI Contribution The Grassroots infrastructure (https://grassroots.tools) developed at EI is being used to consolidate data and analyses, facilitating consistent approaches to generating, processing and disseminating public wheat datasets. The Grassroots infrastructure comprises: a data management layer to provide structure to unstructured filesystems; interfaces to interact with local or cloud-based analysis platforms; a search layer to provide multi-faceted metadata and literature querying; a web server layer to deliver content and provide access to public programmatic interfaces. EI has an extensive National Capability to provide scientific computing hardware to the UK research community and is therefore perfectly positioned to build a point-of-access to previously disparate resources to serve wheat breeders, biologists and bioinformaticians. Coupling the Grassroots project with BBSRC-funded efforts to bring Galaxy and CyVerse UK to UK researchers provides community standardised methodologies for data integration, interpretation and discovery in wheat. These resources are designed to be queried programmatically, and we are integrating them with other WheatIS resources (such as CerealsDB) accordingly via open source and freely available infrastructure. By doing so we will be promoting and facilitating an inclusive and collaborative community of experts to provide access to an interconnected network of wheat data to a scale that was simply not available previously. EI also has representation on the WheatIS Expert Working Group, meeting yearly at PAG to discuss strategy and policy for the Wheat Initiative.
Collaborator Contribution All WheatIS partners contribute to the global effort in harmonising, standardising, and sharing wheat data in a way that is technically sensible and user focused, thus minimising cost across a multi-faceted and independently funded project.
Impact This collaboration is multi-disciplinary in scope, undertaken by biologists, bioinformaticians, and breeders. Wheat Data Interoperability Guidelines - https://ist.blogs.inra.fr/wdi/
Start Year 2011
 
Description Wheat Information System (WheatIS) 
Organisation EMBL European Bioinformatics Institute (EMBL - EBI)
Country United Kingdom 
Sector Academic/University 
PI Contribution The Grassroots infrastructure (https://grassroots.tools) developed at EI is being used to consolidate data and analyses, facilitating consistent approaches to generating, processing and disseminating public wheat datasets. The Grassroots infrastructure comprises: a data management layer to provide structure to unstructured filesystems; interfaces to interact with local or cloud-based analysis platforms; a search layer to provide multi-faceted metadata and literature querying; a web server layer to deliver content and provide access to public programmatic interfaces. EI has an extensive National Capability to provide scientific computing hardware to the UK research community and is therefore perfectly positioned to build a point-of-access to previously disparate resources to serve wheat breeders, biologists and bioinformaticians. Coupling the Grassroots project with BBSRC-funded efforts to bring Galaxy and CyVerse UK to UK researchers provides community standardised methodologies for data integration, interpretation and discovery in wheat. These resources are designed to be queried programmatically, and we are integrating them with other WheatIS resources (such as CerealsDB) accordingly via open source and freely available infrastructure. By doing so we will be promoting and facilitating an inclusive and collaborative community of experts to provide access to an interconnected network of wheat data to a scale that was simply not available previously. EI also has representation on the WheatIS Expert Working Group, meeting yearly at PAG to discuss strategy and policy for the Wheat Initiative.
Collaborator Contribution All WheatIS partners contribute to the global effort in harmonising, standardising, and sharing wheat data in a way that is technically sensible and user focused, thus minimising cost across a multi-faceted and independently funded project.
Impact This collaboration is multi-disciplinary in scope, undertaken by biologists, bioinformaticians, and breeders. Wheat Data Interoperability Guidelines - https://ist.blogs.inra.fr/wdi/
Start Year 2011
 
Description Wheat Information System (WheatIS) 
Organisation French National Institute of Agricultural Research
Department INRA Versailles
Country France 
Sector Academic/University 
PI Contribution The Grassroots infrastructure (https://grassroots.tools) developed at EI is being used to consolidate data and analyses, facilitating consistent approaches to generating, processing and disseminating public wheat datasets. The Grassroots infrastructure comprises: a data management layer to provide structure to unstructured filesystems; interfaces to interact with local or cloud-based analysis platforms; a search layer to provide multi-faceted metadata and literature querying; a web server layer to deliver content and provide access to public programmatic interfaces. EI has an extensive National Capability to provide scientific computing hardware to the UK research community and is therefore perfectly positioned to build a point-of-access to previously disparate resources to serve wheat breeders, biologists and bioinformaticians. Coupling the Grassroots project with BBSRC-funded efforts to bring Galaxy and CyVerse UK to UK researchers provides community standardised methodologies for data integration, interpretation and discovery in wheat. These resources are designed to be queried programmatically, and we are integrating them with other WheatIS resources (such as CerealsDB) accordingly via open source and freely available infrastructure. By doing so we will be promoting and facilitating an inclusive and collaborative community of experts to provide access to an interconnected network of wheat data to a scale that was simply not available previously. EI also has representation on the WheatIS Expert Working Group, meeting yearly at PAG to discuss strategy and policy for the Wheat Initiative.
Collaborator Contribution All WheatIS partners contribute to the global effort in harmonising, standardising, and sharing wheat data in a way that is technically sensible and user focused, thus minimising cost across a multi-faceted and independently funded project.
Impact This collaboration is multi-disciplinary in scope, undertaken by biologists, bioinformaticians, and breeders. Wheat Data Interoperability Guidelines - https://ist.blogs.inra.fr/wdi/
Start Year 2011
 
Description Wheat Information System (WheatIS) 
Organisation Helmholtz Association of German Research Centres
Department Helmholtz Zentrum Munchen
Country Germany 
Sector Academic/University 
PI Contribution The Grassroots infrastructure (https://grassroots.tools) developed at EI is being used to consolidate data and analyses, facilitating consistent approaches to generating, processing and disseminating public wheat datasets. The Grassroots infrastructure comprises: a data management layer to provide structure to unstructured filesystems; interfaces to interact with local or cloud-based analysis platforms; a search layer to provide multi-faceted metadata and literature querying; a web server layer to deliver content and provide access to public programmatic interfaces. EI has an extensive National Capability to provide scientific computing hardware to the UK research community and is therefore perfectly positioned to build a point-of-access to previously disparate resources to serve wheat breeders, biologists and bioinformaticians. Coupling the Grassroots project with BBSRC-funded efforts to bring Galaxy and CyVerse UK to UK researchers provides community standardised methodologies for data integration, interpretation and discovery in wheat. These resources are designed to be queried programmatically, and we are integrating them with other WheatIS resources (such as CerealsDB) accordingly via open source and freely available infrastructure. By doing so we will be promoting and facilitating an inclusive and collaborative community of experts to provide access to an interconnected network of wheat data to a scale that was simply not available previously. EI also has representation on the WheatIS Expert Working Group, meeting yearly at PAG to discuss strategy and policy for the Wheat Initiative.
Collaborator Contribution All WheatIS partners contribute to the global effort in harmonising, standardising, and sharing wheat data in a way that is technically sensible and user focused, thus minimising cost across a multi-faceted and independently funded project.
Impact This collaboration is multi-disciplinary in scope, undertaken by biologists, bioinformaticians, and breeders. Wheat Data Interoperability Guidelines - https://ist.blogs.inra.fr/wdi/
Start Year 2011
 
Description Wheat Information System (WheatIS) 
Organisation International Centre for Maize and Wheat Improvement (CIMMYT)
Country Mexico 
Sector Charity/Non Profit 
PI Contribution The Grassroots infrastructure (https://grassroots.tools) developed at EI is being used to consolidate data and analyses, facilitating consistent approaches to generating, processing and disseminating public wheat datasets. The Grassroots infrastructure comprises: a data management layer to provide structure to unstructured filesystems; interfaces to interact with local or cloud-based analysis platforms; a search layer to provide multi-faceted metadata and literature querying; a web server layer to deliver content and provide access to public programmatic interfaces. EI has an extensive National Capability to provide scientific computing hardware to the UK research community and is therefore perfectly positioned to build a point-of-access to previously disparate resources to serve wheat breeders, biologists and bioinformaticians. Coupling the Grassroots project with BBSRC-funded efforts to bring Galaxy and CyVerse UK to UK researchers provides community standardised methodologies for data integration, interpretation and discovery in wheat. These resources are designed to be queried programmatically, and we are integrating them with other WheatIS resources (such as CerealsDB) accordingly via open source and freely available infrastructure. By doing so we will be promoting and facilitating an inclusive and collaborative community of experts to provide access to an interconnected network of wheat data to a scale that was simply not available previously. EI also has representation on the WheatIS Expert Working Group, meeting yearly at PAG to discuss strategy and policy for the Wheat Initiative.
Collaborator Contribution All WheatIS partners contribute to the global effort in harmonising, standardising, and sharing wheat data in a way that is technically sensible and user focused, thus minimising cost across a multi-faceted and independently funded project.
Impact This collaboration is multi-disciplinary in scope, undertaken by biologists, bioinformaticians, and breeders. Wheat Data Interoperability Guidelines - https://ist.blogs.inra.fr/wdi/
Start Year 2011
 
Description Wheat Information System (WheatIS) 
Organisation Monogram Network
Sector Academic/University 
PI Contribution The Grassroots infrastructure (https://grassroots.tools) developed at EI is being used to consolidate data and analyses, facilitating consistent approaches to generating, processing and disseminating public wheat datasets. The Grassroots infrastructure comprises: a data management layer to provide structure to unstructured filesystems; interfaces to interact with local or cloud-based analysis platforms; a search layer to provide multi-faceted metadata and literature querying; a web server layer to deliver content and provide access to public programmatic interfaces. EI has an extensive National Capability to provide scientific computing hardware to the UK research community and is therefore perfectly positioned to build a point-of-access to previously disparate resources to serve wheat breeders, biologists and bioinformaticians. Coupling the Grassroots project with BBSRC-funded efforts to bring Galaxy and CyVerse UK to UK researchers provides community standardised methodologies for data integration, interpretation and discovery in wheat. These resources are designed to be queried programmatically, and we are integrating them with other WheatIS resources (such as CerealsDB) accordingly via open source and freely available infrastructure. By doing so we will be promoting and facilitating an inclusive and collaborative community of experts to provide access to an interconnected network of wheat data to a scale that was simply not available previously. EI also has representation on the WheatIS Expert Working Group, meeting yearly at PAG to discuss strategy and policy for the Wheat Initiative.
Collaborator Contribution All WheatIS partners contribute to the global effort in harmonising, standardising, and sharing wheat data in a way that is technically sensible and user focused, thus minimising cost across a multi-faceted and independently funded project.
Impact This collaboration is multi-disciplinary in scope, undertaken by biologists, bioinformaticians, and breeders. Wheat Data Interoperability Guidelines - https://ist.blogs.inra.fr/wdi/
Start Year 2011
 
Description Wheat Information System (WheatIS) 
Organisation Rothamsted Research
Country United Kingdom 
Sector Academic/University 
PI Contribution The Grassroots infrastructure (https://grassroots.tools) developed at EI is being used to consolidate data and analyses, facilitating consistent approaches to generating, processing and disseminating public wheat datasets. The Grassroots infrastructure comprises: a data management layer to provide structure to unstructured filesystems; interfaces to interact with local or cloud-based analysis platforms; a search layer to provide multi-faceted metadata and literature querying; a web server layer to deliver content and provide access to public programmatic interfaces. EI has an extensive National Capability to provide scientific computing hardware to the UK research community and is therefore perfectly positioned to build a point-of-access to previously disparate resources to serve wheat breeders, biologists and bioinformaticians. Coupling the Grassroots project with BBSRC-funded efforts to bring Galaxy and CyVerse UK to UK researchers provides community standardised methodologies for data integration, interpretation and discovery in wheat. These resources are designed to be queried programmatically, and we are integrating them with other WheatIS resources (such as CerealsDB) accordingly via open source and freely available infrastructure. By doing so we will be promoting and facilitating an inclusive and collaborative community of experts to provide access to an interconnected network of wheat data to a scale that was simply not available previously. EI also has representation on the WheatIS Expert Working Group, meeting yearly at PAG to discuss strategy and policy for the Wheat Initiative.
Collaborator Contribution All WheatIS partners contribute to the global effort in harmonising, standardising, and sharing wheat data in a way that is technically sensible and user focused, thus minimising cost across a multi-faceted and independently funded project.
Impact This collaboration is multi-disciplinary in scope, undertaken by biologists, bioinformaticians, and breeders. Wheat Data Interoperability Guidelines - https://ist.blogs.inra.fr/wdi/
Start Year 2011
 
Description Wheat Information System (WheatIS) 
Organisation U.S. Department of Agriculture USDA
Department Agricultural Research Service
Country United States 
Sector Public 
PI Contribution The Grassroots infrastructure (https://grassroots.tools) developed at EI is being used to consolidate data and analyses, facilitating consistent approaches to generating, processing and disseminating public wheat datasets. The Grassroots infrastructure comprises: a data management layer to provide structure to unstructured filesystems; interfaces to interact with local or cloud-based analysis platforms; a search layer to provide multi-faceted metadata and literature querying; a web server layer to deliver content and provide access to public programmatic interfaces. EI has an extensive National Capability to provide scientific computing hardware to the UK research community and is therefore perfectly positioned to build a point-of-access to previously disparate resources to serve wheat breeders, biologists and bioinformaticians. Coupling the Grassroots project with BBSRC-funded efforts to bring Galaxy and CyVerse UK to UK researchers provides community standardised methodologies for data integration, interpretation and discovery in wheat. These resources are designed to be queried programmatically, and we are integrating them with other WheatIS resources (such as CerealsDB) accordingly via open source and freely available infrastructure. By doing so we will be promoting and facilitating an inclusive and collaborative community of experts to provide access to an interconnected network of wheat data to a scale that was simply not available previously. EI also has representation on the WheatIS Expert Working Group, meeting yearly at PAG to discuss strategy and policy for the Wheat Initiative.
Collaborator Contribution All WheatIS partners contribute to the global effort in harmonising, standardising, and sharing wheat data in a way that is technically sensible and user focused, thus minimising cost across a multi-faceted and independently funded project.
Impact This collaboration is multi-disciplinary in scope, undertaken by biologists, bioinformaticians, and breeders. Wheat Data Interoperability Guidelines - https://ist.blogs.inra.fr/wdi/
Start Year 2011
 
Description Wheat Information System (WheatIS) 
Organisation University of Bristol
Country United Kingdom 
Sector Academic/University 
PI Contribution The Grassroots infrastructure (https://grassroots.tools) developed at EI is being used to consolidate data and analyses, facilitating consistent approaches to generating, processing and disseminating public wheat datasets. The Grassroots infrastructure comprises: a data management layer to provide structure to unstructured filesystems; interfaces to interact with local or cloud-based analysis platforms; a search layer to provide multi-faceted metadata and literature querying; a web server layer to deliver content and provide access to public programmatic interfaces. EI has an extensive National Capability to provide scientific computing hardware to the UK research community and is therefore perfectly positioned to build a point-of-access to previously disparate resources to serve wheat breeders, biologists and bioinformaticians. Coupling the Grassroots project with BBSRC-funded efforts to bring Galaxy and CyVerse UK to UK researchers provides community standardised methodologies for data integration, interpretation and discovery in wheat. These resources are designed to be queried programmatically, and we are integrating them with other WheatIS resources (such as CerealsDB) accordingly via open source and freely available infrastructure. By doing so we will be promoting and facilitating an inclusive and collaborative community of experts to provide access to an interconnected network of wheat data to a scale that was simply not available previously. EI also has representation on the WheatIS Expert Working Group, meeting yearly at PAG to discuss strategy and policy for the Wheat Initiative.
Collaborator Contribution All WheatIS partners contribute to the global effort in harmonising, standardising, and sharing wheat data in a way that is technically sensible and user focused, thus minimising cost across a multi-faceted and independently funded project.
Impact This collaboration is multi-disciplinary in scope, undertaken by biologists, bioinformaticians, and breeders. Wheat Data Interoperability Guidelines - https://ist.blogs.inra.fr/wdi/
Start Year 2011
 
Description Wheat Information System (WheatIS) 
Organisation University of California, Davis
Department UC Davis College of Biological Sciences
Country United States 
Sector Academic/University 
PI Contribution The Grassroots infrastructure (https://grassroots.tools) developed at EI is being used to consolidate data and analyses, facilitating consistent approaches to generating, processing and disseminating public wheat datasets. The Grassroots infrastructure comprises: a data management layer to provide structure to unstructured filesystems; interfaces to interact with local or cloud-based analysis platforms; a search layer to provide multi-faceted metadata and literature querying; a web server layer to deliver content and provide access to public programmatic interfaces. EI has an extensive National Capability to provide scientific computing hardware to the UK research community and is therefore perfectly positioned to build a point-of-access to previously disparate resources to serve wheat breeders, biologists and bioinformaticians. Coupling the Grassroots project with BBSRC-funded efforts to bring Galaxy and CyVerse UK to UK researchers provides community standardised methodologies for data integration, interpretation and discovery in wheat. These resources are designed to be queried programmatically, and we are integrating them with other WheatIS resources (such as CerealsDB) accordingly via open source and freely available infrastructure. By doing so we will be promoting and facilitating an inclusive and collaborative community of experts to provide access to an interconnected network of wheat data to a scale that was simply not available previously. EI also has representation on the WheatIS Expert Working Group, meeting yearly at PAG to discuss strategy and policy for the Wheat Initiative.
Collaborator Contribution All WheatIS partners contribute to the global effort in harmonising, standardising, and sharing wheat data in a way that is technically sensible and user focused, thus minimising cost across a multi-faceted and independently funded project.
Impact This collaboration is multi-disciplinary in scope, undertaken by biologists, bioinformaticians, and breeders. Wheat Data Interoperability Guidelines - https://ist.blogs.inra.fr/wdi/
Start Year 2011
 
Description Wheat Information System (WheatIS) 
Organisation University of Western Australia
Country Australia 
Sector Academic/University 
PI Contribution The Grassroots infrastructure (https://grassroots.tools) developed at EI is being used to consolidate data and analyses, facilitating consistent approaches to generating, processing and disseminating public wheat datasets. The Grassroots infrastructure comprises: a data management layer to provide structure to unstructured filesystems; interfaces to interact with local or cloud-based analysis platforms; a search layer to provide multi-faceted metadata and literature querying; a web server layer to deliver content and provide access to public programmatic interfaces. EI has an extensive National Capability to provide scientific computing hardware to the UK research community and is therefore perfectly positioned to build a point-of-access to previously disparate resources to serve wheat breeders, biologists and bioinformaticians. Coupling the Grassroots project with BBSRC-funded efforts to bring Galaxy and CyVerse UK to UK researchers provides community standardised methodologies for data integration, interpretation and discovery in wheat. These resources are designed to be queried programmatically, and we are integrating them with other WheatIS resources (such as CerealsDB) accordingly via open source and freely available infrastructure. By doing so we will be promoting and facilitating an inclusive and collaborative community of experts to provide access to an interconnected network of wheat data to a scale that was simply not available previously. EI also has representation on the WheatIS Expert Working Group, meeting yearly at PAG to discuss strategy and policy for the Wheat Initiative.
Collaborator Contribution All WheatIS partners contribute to the global effort in harmonising, standardising, and sharing wheat data in a way that is technically sensible and user focused, thus minimising cost across a multi-faceted and independently funded project.
Impact This collaboration is multi-disciplinary in scope, undertaken by biologists, bioinformaticians, and breeders. Wheat Data Interoperability Guidelines - https://ist.blogs.inra.fr/wdi/
Start Year 2011
 
Title Eirods-dav 
Description Eirods-dav provides access to iRODS servers using the WebDAV protocol and exposes a REST API for accessing and manipulating metadata from within a web browser. It adds a substantial amount of functionality to the original Davrods module written by Ton Smeele and Chris Smeele, which is a bridge between the WebDAV protocol and the iRODS API. Davrods leverages the Apache server implementation of the WebDAV protocol, mod_dav, for compliance with the WebDAV Class 2 standard. 
Type Of Technology Webtool/Application 
Year Produced 2018 
Open Source License? Yes  
Impact Eirods-dav is used to allow web-based access to a selection of files and research data released to the public by the Earlham Institute such as the Triticum Aestivum assemblies. It is used by the Grassroots Infrastructure to allow access to data produced by the Designing Future Wheat project. The Eirods-dav application runs within the CyVerse UK National Capability infrastructure. 
URL https://grassroots.tools/data/
 
Title Grassroots BrAPI web service 
Description This is a web service that uses the Grassroots Field Trial service and adds a Breeding API (BrAPI) layer on top to allow other BrAPI-compliant software to access the field trial data. We currently have complete support for approximately a third of BrAPI classes and calls with partial support for others. 
Type Of Technology Webtool/Application 
Year Produced 2020 
Open Source License? Yes  
Impact This allows other data scientists, software developers and applications to easily access the field trial data stored in our system using a standard nomenclature and REST API. 
 
Title Grassroots Field Trial service 
Description A web-based application for submitting and searching for various aspects of field trial experimental data. 
Type Of Technology Webtool/Application 
Year Produced 2019 
Open Source License? Yes  
Impact A web-based application for submitting and searching for field trial data. 
URL https://grassroots.tools/beta/dynamic/fieldtrial_dynamic.html?type=AllFieldTrials
 
Title Grassroots Parental Genotype service. 
Description This software stores information regarding peak markers and parental genotype information for various QTL. It is part of a collaboration between the University of Bristol, the John Innes Centre and the Earlham Institute. 
Type Of Technology Software 
Year Produced 2019 
Open Source License? Yes  
Impact This software is used by the CerealsDB web service to give users a simple way to browse between QTL, peak marker informations and the parental genotype information. 
URL http://www.cerealsdb.uk.net/cerealgenomics/CerealsDB/select_QTL.php
 
Title Grassroots core server software 
Description The Grassroots Infrastructure project aims to create an easily-deployable suite of computing middleware tools to help users and developers gain access to scientific data infrastructure that can easily be interconnected. With the data-generative approaches that are increasingly common in modern life science research, it is vital that the data and metadata produced by these efforts can be shared and reused. The Grassroots Infrastructure project wraps up industry-standard software tools with a consistent API that can be federated on a number of levels. This means institutions and groups can deploy a simple lightweight virtual machine, expose local data, connect up any existing data services, and federate their instance of the Grassroots with others out-of-the-box. The Grassroots Infrastructure uses a controlled vocabulary of JSON messages to communicate, so any server or client that can understand JSON can be used to access and connect to the platform. We provide infrastructure to ensure that the scientific data remains the important factor, and not the worry about how to build a system to expose your data. 
Type Of Technology Software 
Year Produced 2019 
Open Source License? Yes  
Impact The Grassroots Inftrastructure has allowed researchers data scientists, breeders to perform a variety of data analyses such as sequence searching using BLAST, map-based interactive searches for field pathogenomic data, field trial service as well as custom bespoke software web services utiliisd by third parties such as the CerealsDB team at the University of Bristol as part of systems that they have developed for users. 
URL https://grassroots.tools
 
Title Grassroots free-text search engine 
Description The Grassroots free-text search engine, based upon Lucene, allows us to give ranked, faceted results for various types of field trial data. Each facet automatically weights searches for its specific fields. For example, queries that match study names get ranked higher than those that match queries in their description field instead. This is used for general searches as well as a specific faceted search applications such as the one we have for Measured Variables to denote phenotypic data. 
Type Of Technology Webtool/Application 
Year Produced 2020 
Open Source License? Yes  
Impact This has allowed field trial data scientists to search across all of our data and allows them to search for the correct ontological terms to describe the phenotypic traits that have been measured within their trials. This has allowed researchers to be able to upload their data to our systems more quickly by allowing them to determine the correct ontological terms more easily. 
URL https://grassroots.tools/beta/public/SearchTreatment
 
Title Polymarker Grassroots service 
Description PolyMarker is an automated bioinformatics pipeline for SNP assay development which increases the probability of generating homoeologue-specific assays for polyploid wheat. PolyMarker generates a multiple alignment between the target SNP sequence and the IWGSC chromosome survey sequences for each of the three wheat genomes. It then generates a mask with informative positions which are highlighted with respect to the target genome. This implementation integrates this service with all of the data sets and other services within the Grassroots Infrastructure. 
Type Of Technology Webtool/Application 
Year Produced 2018 
Open Source License? Yes  
Impact The Grassroots Polymarker web service is used to allow web-based access and integration with files and research data released to the public by the Earlham Institute such as the Triticum Aestivum assemblies. It is used by the Grassroots Infrastructure to allow access to data produced by the Designing Future Wheat project. The Grassroots Polymarker web service runs within the CyVerse UK National Capability infrastructure. 
 
Title The Grassroots Infrastructure 
Description The Grassroots software is an open source "as-a-Service" stack that powers a number of data dissemination and analysis activities at EI, and other sites such as CerealsDB at the University of Bristol. We have continued to develop the functionality within the software stack to share crop-related datasets. 
Type Of Technology Webtool/Application 
Year Produced 2018 
Open Source License? Yes  
Impact Grassroots has previously been used to host the Field Pathogenomics project website and Yellow Rust map, the EI wheat BLAST service, the CerealsDB federation project, and the multi-scale improvements to the Polymarker marker design software. Recently, Grassroots has been put forward as the main data repository and metadata catalogue for the Designing Future Wheat project, and has started to host data from this project, the Open Wild Wheat Consortium, and 5 new wheat genomes from EI. The Grassroots service runs within the CyVerse UK National Capability infrastructure. 
URL https://grassroots.tools/
 
Description AI for Wheat workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact The AI for Wheat workshop was a meeting of approximately 50 people from academia and industry to examine ways to use AI methods and algorithms on wheat-based data.
Year(s) Of Engagement Activity 2020
 
Description Building infrastructure for open science - British Computer Society 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Invited speaker at the Advanced Programming Group annual Christmas lecture
Year(s) Of Engagement Activity 2015
URL http://www.bcs.org/category/18516
 
Description DFW Hackathon 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact A workshop to discuss and implement potential collaborations to create tools to solve bioinformatic needs within the DFW community.
Year(s) Of Engagement Activity 2019
 
Description Data Stewardship in the Life Sciences 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact I spoke at the "Challenges and Opportunities in Plant Science Data Management" workshop on the subject of data management in the life sciences.

Open data and integrative data sharing are fundamental factors in order to address the challenges of modern data-intensive science. There is a clear need to develop and maintain community-focussed, semantically-aware data stewardship and management platforms, such as COPO, that are able to cope with the description and sharing of potentially huge datasets arising from the life sciences. Once made available, it is not sufficient to assume that researchers around the globe have requisite skills and resources to analyse these data. Therefore, we need to provide large-scale data analysis environments that are fit for purpose, incorporating state-of-the-art interfaces and programmatic layers to meet broad end-user requirements, such as CyVerse and Galaxy. Finally, this can only happen when there are community-led efforts into implementing solutions for data standardisation, best practice, and FAIR data policy. We are now only just starting to take advantage of groundbreaking opportunities to make integrated data a reality, and thus enabling scientists to store, manage, and share their data as a first-class citizen of the scientific process.
Year(s) Of Engagement Activity 2017
URL http://app.core-apps.com/pag_2017/event/e2bec353017762d275ce250c23e011e6
 
Description Divseek Working Group - Data Standards for Interoperable Tools 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact As part of the "DivSeek - Addressing the challenges and opportunities for information and data sharing associated with plant germplasm" session at PAG, I spoke about the DivSeek Data Standards for Interoperable Tools Working Group. This WG will promote best practice in data sharing in the plant sciences, through the use of open and interoperable software powered by the adoption of open standards, i.e. programmatic interoperability standards (APIs), controlled vocabularies, trait dictionaries, metadata standards, and ontologies. We aim to highlight gaps in interoperability that impede workflows important to the communities supported by DivSeek partners, by liaising with research development groups, other DivSeek working groups, and consortia with relevance to DivSeek. We will educate and train data generators about standards and the tools and resources that use them, in order to promote and foster standards-compliance for long-term open data stewardship.
Year(s) Of Engagement Activity 2017
URL https://pag.confex.com/pag/xxv/meetingapp.cgi/Paper/26202
 
Description Down The Tubes! Talk at the Norwich Science Festival 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact Dr Davey gave a talk on the internet and data science entitled "Down The Tubes!" at the 2018 Norwich Science Festival.
Year(s) Of Engagement Activity 2018
URL https://norwichsciencefestival.co.uk/events/down-the-tubes/
 
Description Engagement with Industry - KWS UK Ltd 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Industry/Business
Results and Impact Mr Bian and Miss Minotto showed EI CyVerse infrastructure and Grassroots Infrastructure's features, including data sharing to the staffs from KWS UK Ltd: Ed Byrne, Janina Dordel, Andreas Menze and Vipul Patel.
Year(s) Of Engagement Activity 2018
 
Description Grassroots Infrastructure and the Wheat Information System (Genome 10K & Genome Science 2017) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Mr Bian and Dr Tyrrell presented a poster at Genome 10K & Genome Science 2017 conference.
Year(s) Of Engagement Activity 2017
URL http://www.earlham.ac.uk/genome-10k-and-genome-science-conference
 
Description Grassroots Infrastructure and the Wheat Information System (RDA Interest Group on Agricultural Data (IGAD), Barcelona) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Dr Davey delivered a talk about the Grassroots software infrastructure for the dissemination of wheat data through federation and integration of storage and compute e-infrastructure.
Year(s) Of Engagement Activity 2017
URL https://www.rd-alliance.org/rda-interest-group-agricultural-data-igad-pre-plenary-meeting-3-4-april-...
 
Description Grassroots: An infrastructure for sharing services & data 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact A talk at a conference on agricultural data to show the various applications available as part of the Grassroots Infrastructure for disseminating bioinformatics data.
Year(s) Of Engagement Activity 2019
 
Description RDA Wheat Data Interoperability Working Group meeting, RDA Plenary, Barcelona 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The Wheat Data Interoperability Working Group aims to provide a common framework for describing, representing linking and publishing Wheat data with respect to open standards.Such a framework will promote and sustain Wheat data sharing, reusability and operability. Specifying the Wheat linked data framework will come with many questions: which (minimal) metadata to describe which type of data? Which vocabularies/ontologies/formats? Which good practices? Mainly based on the the needs of the Wheat initiatiative Information System (WheatIS) in terms of functionalities and data types, the working group will identify relevant use cases in order to produce a "cookbook" on how to produce "wheat data" that are easily shareable, reusable and interoperable. This meeting saw the maturation of the Working Group into a Maintenance Group, showing that we have moved from an inception phase to an implementation phase, promoting the outputs of the WG (the Wheat Data Interoperability guidelines) to users.
Year(s) Of Engagement Activity 2016
URL https://www.rd-alliance.org/group/agricultural-data-ig-igad-wheat-data-interoperability-wg-agriseman...
 
Description Support open science and FAIRness through an integrated collaborative platform for life science: CyVerse UK and hosted services 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The Earlham Institute, an Elixir UK node, is home to CyVerse UK, a collaborative cyberinfrastructure for life science. CyVerse UK objectives align greatly with the Elixir vision, as it aims to ensure researchers have easy access to HTC resources while lowering the entry barrier to bioinformatics, thanks both to the easy of use of the platform and the trainings provided. Great focus is posed on data storage, management, and overall how to ensure FAIRness. The Cyverse Data Store and Data Commons come with attached metadata, in the latter case a bare minimum set is required. Data availability and reliable data transfer take advantage of iRODs. The CyVerse cyberinfrastructure also hosts COPO and Grassroots, which are of particular interest to the data ecosystem. COPO is a brokering service between scientists and public repositories, enabling management, aggregation and publication of research outputs. COPO eases the process of metadata attribution by presenting the same intuitive interface for different repositories, and a wizard to guide the user through the steps of adding metadata. The Grassroots Genomics project aims to facilitate consistent approaches to generating, processing and disseminating public wheat datasets so that research efforts can be translated into community valuable resources thanks to effective sharing and reuse of data. On the computational side, CyVerse UK offers a number of registered and versionised applications users can run both using an API or through the parent CyVerse US web interface. Our last report shows how researchers not only from the UK, but also from Europe, America, Africa and Asia benefited from these applications. The CyVerse UK pool also hosts a Galaxy instance reserved to collaborators at BeCA. The expansion of the infrastructure will allow us to offer on demand virtual machines to the research community to support them in development, training or with collaborative virtual laboratory.
Year(s) Of Engagement Activity 2019
 
Description Wheat Initiative group discussion at Plant and Animal Genome conference 2019 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Discussion of the latest research activities from the Wheat Initiative members.
Year(s) Of Engagement Activity 2019
 
Description iRODS functionality within the Grassroots Infrastructure (iRODS User Group Meeting 2017, Utrecht, The Netherlands) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Dr Tyrrell presented work on the development of the eirods-dav software package for the Grassroots data dissemination platform.
Year(s) Of Engagement Activity 2017
URL https://irods.org/ugm2017/