Bilateral NSF/BIO-BBSRC:A Metagenomics Exchange - enriching analysis by synergistic harmonisation of MG-RAST and the EBI Metagenomics Portal

Lead Research Organisation: EMBL - European Bioinformatics Institute
Department Name: Sequence Database Group

Abstract

Micro-organisms are found in virtually all environments. Typically, they form the base of the food chain (such as plankton in the sea) and play essential roles in their ecosystems. There is often a complex interplay between different micro-organisms, with some organisms requiring that others be present in order for them to exist. When there is an imbalance within a community, this can lead to severe effects, such as disease in the human gut, or the inability for plants to grow efficiently in soil. An understanding of the composition and interplay within the communities allows us to potentially manipulate them. Thus, there is intense research into micro-organism communities in many different fields, such as improving livestock yields, the recovery from bacterial infections using fecal transplants and the efficient production of biofuels. Many of these communities also contain important proteins that could be useful to the biotechnological and pharmaceutical industries, such as enzymes involved in the production of antibiotics.
Metagenomics is the study of these different micro-organism communities, which is achieved by isolating the DNA from the organisms within an environmental sample (e.g. water, soil, animal stool), sequencing the DNA, followed by the computational analysis to decode which organisms are present and the functions they might be performing. This computation is complicated: (1) there is a huge amount of data; (2) The sequence data is a jumbled mix of fragments from different organisms; (3) Decoding the DNA is hard - typically >90% of organisms within a sample are not well characterised.
This proposal brings together three major resources within the field of metagenomics data archiving and analysis. The European Nucleotide Archive (ENA) is a repository of DNA sequence data. Importantly, ENA also captures metagenomic contextual data, such as where and when the sample was taken, how the DNA was extracted and sequenced. The EBI metagenomics portal (EMG, UK) and MG-RAST (MGR, US) are two metagenomics sequence analysis platforms. Uniquely, they represent the only free to use services, whereby researchers can upload sequence data and have it analysed without restriction. Despite the widespread use of metagenomics, currently the community lacks standards to ensure that metagenomics sequence data and the derived functional and taxonomic information are deposited within a database of record. Consequently, the navigation between metagenomics datasets is very difficult for even experienced users. As they offer slightly different, yet complementary, analysis services, there is often the desire to have a metagenomics dataset analysed by both resources. But, the number of equivalent datasets between the two resources is unknown. Unless a user has prior knowledge about equivalent projects, they remain disconnected. Also, sequence data submitted to MGR may not necessarily be deposited in ENA. We propose to set up a computational framework, termed Metagenomics Exchange (ME), to enable metagenomics datasets and the results of their analysis to be linked. All sequences will become available to the research community via ENA and analysis results we be automatically exchanged between EMG and EMR. The ME will be implemented to enable other metagenomics analysis providers to join, and so that it can be used by researchers wishing to perform large scale analyses. We will also investigate ways that our own pipelines can be enhanced through the use of the ME, sharing software and processing tasks, for example. This will lead to computational savings, increasing the capacity for metagenomics analysis. We will also generate a knowledge transfer forum, enabling the exchange of ideas on a range of topics, from hardware solutions to algorithms. Finally, we will undertake a research program to investigate the optimal combination of pipeline analysis components, and whether a single, unified analysis pipeline could be engineered.

Technical Summary

Metagenomics is a widely used approach to investigate the composition of microbial communities. With the development of modern sequencing platforms, (sequence) data generation is rarely the bottleneck, but rather its analysis. MG-RAST (MGR) and EBI Metagenomics (EMG) are the two world-leading metagenomics analysis platform. These analysis platforms employ distinct, yet complementary, approaches for the functional characterisation of metagenomic sequences. However, their pipelines closely align in the early stages of analysis, such as quality control. Unlike the other datatypes, there is no mandate for researchers to submit metagenomics data to an analysis platform. Furthermore, resources such as MGR are not linked to an INSDC member, such as the European Nucleotide Archive (ENA). Currently metagenomics sequence data, associated contextual metadata and derived functional and taxonomic assignments are disjointed within the field. Consequently, it is virtually impossible to navigate these cumbersome datasets. We propose to solve this problem by the development of a 'Metagenomics Exchange' (ME), which builds upon ENA technologies, to provide a registry of metagenomics datasets. MGR and EMG will use this registry to discover new datasets and publish their derived annotations, using tools and RESTful APIs to push/pull information from the registry. With the ME in place, we will populate it with existing datasets - developing the tools necessary to identify equivalent datasets. MGR and EMG will standardise on common analysis components and utilise the ME to enable crosstalk between pipelines, reducing computational overhead. The two teams will also exchange technology knowledge, such as data storage solutions and pipeline containerization. The websites will be harmonised to seamlessly present federated analysis results from both platforms, thereby enriching interpretation. We will investigate optimal pipeline solutions that may pave the way for a unified pipeline.

Planned Impact

The use of metagenomics is widespread, with its application in diverse fields, e.g. agriculture, food manufacture, the elucidation of both antibiotic products and antibiotic resistance mechanisms, bioenergy, crop yields and animal/human health. Consequently, metagenomics data continues to grow exponentially, with ever increasing demands on community analysis services. As yet, the field lacks systematic co-ordination and organisation of sequence data and derived functional and taxonomic information. We propose to solve this through the development of the Metagenomics Exchange (ME), which will primarily address the key area of data driven bioscience, but also have significant influences on many of the strategic priorities for the BBSRC and NSF.
The impact of both the EBI metagenomics (EMG) and MG-RAST (MGR) analysis platforms on academic research are already in effect. Both provide robust, specialised analyses and access to significant amounts of compute (~55 million CPU hours/year). The ME will catalogue information about different metagenomic sets and their analyses, enabling users from both academic and industrial sectors to rapidly discover them. Moreover, EMG and MGR will collect and present results from each other's platform, ensuring that a user is presented with all available analyses (saving user time/effort). To reduce duplications and to minimise differences, EMG and MGR will standardise on common parts of their pipelines. This will improve consistency and, as the project matures, allow crosstalk between the analysis pipelines. Crosstalk will also reduce computational overhead, allowing greater throughput for the community. The EMG and MGR websites collectively have 100,000s of individual visitors per year. Steps to harmonise the websites will improve user experience for both new and existing users.
Our objective of improving data discoverability via ME is to allow metagenomics results to reach a broader life science community, where individuals may be otherwise unaware of the data. It is important to also note that, in this project, we are also establishing a new collaboration, enabling MGR and EMG to become more aligned. Knowledge transfer between the groups will expand both UK and US skills in high throughput bioinformatics analysis.
The staff employed on this grant will receive hands-on training from members in the Finn, Cochrane and Meyer teams. All the institutes have excellent training schemes and career development courses and the staff will be working in world class laboratories of internationally renowned scientists. They will have opportunities to present their work within the groups, between the groups and at international conferences. Both technical developments and research findings will be presented at conferences and published in peer reviewed journals. Information about all the resources, especially the new ME, will be disseminated to the community via peer-review journals, conference presentations, a specialist workshop, and online training materials. We will also engage with the non-specialist and public domains via non-scientific literature, social media (blogs and tweets) and by attending meetings aimed at a range of audiences. These activities will maximize dissemination into the academic, industrial and 3rd-party communities.
MGR and EMG will leverage their links to the industrial sectors to ensure that this sector's needs are met. Indeed, the biotechnology industry may benefit the most from the implementation of ME, as they are frequently engaged in identifying catalytic activities across multiple datasets. The ME will enhance the translation of metagenomics research to industrial applications. In the longer term, the knowledge gained from understanding complex communities will have significant impacts for the UK, US and World economies from more efficient industrial enzymes, through improved soil conditions and crop yields, to healthcare solutions by comparing diseased and healthy states.

Publications

10 25 50

publication icon
Amid C (2020) The European Nucleotide Archive in 2019. in Nucleic acids research

publication icon
Harrison PW (2019) The European Nucleotide Archive in 2018. in Nucleic acids research

publication icon
Mitchell AL (2020) MGnify: the microbiome analysis resource in 2020. in Nucleic acids research

publication icon
Silvester N (2018) The European Nucleotide Archive in 2017. in Nucleic acids research

publication icon
Tarkowska A (2018) Eleven quick tips to build a usable REST API for life sciences. in PLoS computational biology

publication icon
Toribio AL (2017) European Nucleotide Archive in 2016. in Nucleic acids research

 
Description We have designed, implemented, and tested the first basic infrastructure for housing the Metagenomics Exchange (ME) data. This has subsequently been released to project partners as a production service, who are starting to populate it with their data. At the core of the ME is a simple registry that captures whether a particular resource has either a sequence dataset, or a set of analysis results for a dataset. Both resources providers and end users interact with the ME Registry through the API. At the time of writing, the registry contains >100 metagenomics sequences sets, a number (>50) which is steadily increasing as MG-RAST datasets are brokered into ENA. Over and above this, mappings between ENA and MGnify (formerly known as EBI metagenomics, EMG) and/or MG-RAST analysis datasets will be populated in the near future. MGnify have currently analysed 75% of the MG-RAST brokered datasets, paving the way for exchanging analysis results between the two sites. As MGnify has introduced assembly, the ME Registry has been extend to handle both assembled sequence data accessions, as well as the original sequencing project.

No sequence data is held within the registry, but rather the accessioned mappings between two "equivalent" sequences sets found in each of the two broker resources (ENA and MG-RAST). Identified sequence set accessions can then be used to query the ME registry to provide access to the locations of the respective sequence sets and associated analysis results.

The ME Registry consists of two types of API - an administration interface and a public read-only interface. The administration panel allows resource providers to register and manage their datasets for exchange, while the second publicly available read-only interface allows users to find and query the identified runs for mappings to metagenomics datasets. Authorization for the registry is performed using access tokens. Each resource provider group (MG-RAST, MGnify and ENA) has their own token, which is required for all administration tasks (submit, update, delete), as well as read access to pre- publication data.

Determining sequence datasets that are equivalent in MG-RAST and ENA is proving to be challenging. One of the main issues is that MG-RAST does not store the original raw FASTQ files, but rather quality controlled FASTA files. Thus, retrospective population of the ME registry will be less straightforward than originally anticipated. There are a number of strategies being investigated that may allow us to match sequences datasets, based on similarities, rather than exact matches. We have extended the ME registry to include the method(s) used to infer equivalence: hash_of_sequence, kmer_profile, taxonomy_signature, functional_signature, gps_coordinates, biome, other_metadata. Confidence in the results will be provided and is defined as full if the sequence hashes match, high if the biome and GPS match, medium for a good combination of other fields and low for uncertain matches. While many of the older datasets will not be mapped, part of the work focuses on enabling the brokering of MG-RAST sequence datasets into ENA. This will enable the direct capture of equivalence between MG-RAST and ENA.

Both MGnify and MG-RAST have adopted the use of Common Workflow Language (CWL) for the description of their analysis pipelines in a standard fashion. To achieve this, the MG-RAST execution framework (AWE) has been extended to be able to execute CWL pipelines. The most widely used MGnify pipeline, analysis of paired-end sequencing, has been described and updated as the version of the pipeline has been incremented. We have also described the MGnify assembly pipeline in CWL for versions 3 and 4. Describing pipelines in a standard format allows complete provenance of the pipeline (e.g. allowing reproducible science), simpler comparison between the two, as well as rapidly rebuilding and combining components. Since this initial work, the MGnify pipeline has been re-worked to have three distinct versions (amplicon, raw read and assembly analysis). These have been rapidly built using common components where appropriate. Current work is focusing on the evaluating different execution engines for running the CWL on different compute infrastructues. We are also refining the resource requests (number of CPUs and memory) according to the input datasets. Both MG-RAST and MGnify teams are promoting the use of CWL as part of the Genomic Standard Consortium.

To enrich search and retrieval of data from MGnify, we have developed and released a RESTful API, providing programmatic access to all of the data contained within the resource. The base address to the API gives access to several collections of resources, such as studies, samples, runs, experiment- types, biomes and annotations. Combined with appropriate relationships to other resources, these can be filtered and sorted by selected attributes, allowing complex queries to be constructed (for example: 'retrieve all oceanographic samples from metagenomic studies taken at temperatures less than 10C). The provision of such complex queries allows metadata to be combined with annotation for powerful data analysis and visualisation. We have utilized an interactive documentation framework (Swagger UI) to visualize and simplify interaction with the API's resources via an HTML interface, allowing less experienced users to interactively build up API queries. Detailed explanations of the purpose of all resources, along with many examples, are also provided to guide end-users. This, in combination with the MG-RAST API, provides the underlying mechanisms for data exchange.

The former MGnify website, first developed in 2010, was not designed with modern API approaches in mind, and adopted the now antiquated design of the server directly contacting the backend database. Thus, exposing new data types and pulling in data via the MG-RAST API was going to be extremely time consuming. Thus, we have completely rewritten the MGnify site in order to consume the new MGnify API (thereby reducing duplication of effort). Furthermore, the website was rebuilt in a modern framework; this included the development of a portable JavaScript library to consume the MGnify API (implemented in Backbone JS). This may be released using a public package repository in the future and will be shared with MG-RAST to enable them to consume and display MGnify outputs with minimal effort.

During the course of this project, MGnify has added metagenomic assembly as another component of the pipeline repertoire. We have each shared our experiences with metagenomics assembly, especially in terms of different algorithms performance and quality of assemblies. The MGnify team has showcased their neural network for assembly parameter estimation. We have also exchanged ideas on API design and the benefits (and drawbacks) of using standards/best practices. As part of this work, we have technical article on API provision published in Plos Comp Biol. We have also tried to look for consistencies between our API endpoints and have a clear understanding of each other's APIs (and infrastructures). For example, MG-RAST has a Cassandra back system, where MGnify is backed via MongoDB. While we both adopt NoSQL solutions, Cassandra offers greater search functionality compared to MGnify's current MongoDB system. This limited search is being overcome by releasing software solutions that enable equivalent searches by combining API queries (e.g. the Metagenomics Tool Kit). Furthermore, we have had specific meetings describing the containerisation of our workflows. We have also exchanged ideas surrounding the use of Simka for Kmer profiling of datasets. Due to the nature of this algorithm, which removes lowly abundant Kmers (and can be the cause of small variations introduced by quality control), it has been possible to match imperfect datasets. However, we have not been able to scale the update procedure of this and are currently investigating solutions and alternatives.

We have reviewed the respective steps in our pipelines to identify commonalities and where a common solution may prove beneficial. The initial comparisons of the pipelines have indicated that the highest degree of overlap resides in the initial quality control and trimming sections. We also strongly believe that our independent approaches to functional annotation are complementary. MG-RAST provides the best match to a sequence using sequence similarity searches against a large sequence database, while MGnify provides matches to different protein family databases. As many sequences have no functional annotation, the domain annotations can be more informative, while on the other hand, the presence of certain domains does not always provide a description of the overarching function of a sequence, where a full-length match to annotated sequence would. Currently, MGnify is looking to adopt the DIAMOND searches with UniRef90 for the annotations of their assemblies. Since the commencement of this project, MGnify has moved to offering assembly as a service, a capacity that MG-RAST is yet to afford. We are sharing our workflow descriptions for this process, putting CWL into practice to achieve these outcomes.

We have also started a comparison of our different taxonomic assignments to determine the relative merits of the different approaches. A substantial data set (1,096 runs) drawn from amplicon and whole metagenome shotgun datasets from aquatic, human host-associated and soil environments have been identified in ENA and sequence data has been exchanged between MGnify and MG-RAST. This will be used to compare outputs of the two analysis pipelines.
Exploitation Route Although the Metagenomics Exchange (ME) has originally been developed with MG-RAST and MGnify, the model is completely agnostic about analysis source. The only restriction is that the underlying sequence data that the analysis is based upon is found within ENA (submitted directly or to one of the INSDC partners). This means that other metagenomics analysis resources, such as IMG/M and iMicrobe, could also using the ME to expose their analysis results, making them discoverable for other research scientists.

With the current systems, we will make is simple for research scientists to know when a common dataset has been analysed in both resources. As both resources have different analysis strategies, they may highlight different features in the dataset, accelerating the rate of novel discovery. Moreover, when the results are consistent, it provides independent validation of the results.

The new MGnify website now provides a more consistent view of the data, plus the associated API, provide access to the terabytes of processed data. This API is accompanied by software libraries that both illustrate the use of the API using standard libraries and can be used to access the data. At the time of writing, these libraries have been downloaded over 25,000 times.

The CWL descriptions of our pipelines allow for complete provenance of the analysis, increasing transparency of how the results were derived and how two pipelines may differ and allowing for a scientists to account for difference that arise from informatics variation. Furthermore, these CWL descriptions can be taken and extend or modified (e.g. inclusion of new tools or reference databases). Our use of CWL is also driving the execution frameworks (being developed by third parties), e.g. Toil and CWLEXEC. As CWL is not confined to biology, has potentially very broad impact.
Sectors Agriculture, Food and Drink,Digital/Communication/Information Technologies (including Software),Environment,Healthcare,Manufacturing, including Industrial Biotechology

 
Description Co-lead of the the Genome Standards Constortium project M5 - A meta-infrastructure enabling exchange of large (metagen)omics data sets
Geographic Reach Multiple continents/international 
Policy Influence Type Participation in a advisory committee
Impact This standard is aimed at providing complete provenance of the large quantities of data in ever growing data sets pose significant infrastructure challenges to biologists and bioinformaticians. The old, very loosely integrated approaches relying on the INSDC network for sequence data sharing are still important, however additional layers of data infrastructure (standards driven) will emerge over time simply driven by the cost of data analysis. Already review of scientific papers for shotgun metagenomic data sets is problematic as the cost for computational re-analysis is significant. Only by sharing derived results in robust ways can the community overcome the computational burden. Basically speaking, minimizing the number of times a particular data sets is undergoing a specific analysis will maximize the amount of analyses the community as a whole can perform. Technology, Standards and community buy-in are required and the group is working on creating the missing pieces of a more complete data sharing ecosystem. This project is aimed at improving the knowledge and standards associated with the above, particularly through the use of CWL.
URL https://gensc.org/projects/m5/
 
Description Workflow systems turn raw data into scientific knowledge
Geographic Reach Multiple continents/international 
Policy Influence Type Influenced training of practitioners or researchers
Impact These workflow tools can make your computational methods portable, maintainable, reproducible and shareable.
URL https://www.nature.com/articles/d41586-019-02619-z
 
Description (EOSC-Life) - Providing an open collaborative space for digital biology in Europe
Amount € 23,745,996 (EUR)
Funding ID 824087 
Organisation European Commission 
Sector Public
Country European Union (EU)
Start 03/2019 
End 02/2023
 
Title Application of CWL for describing analysis workflow 
Description Different services provided by the MGnify resource, namely assembly and analysis have been encapsulated in the common workflow language (CWL), which allows complete provenance of the software and/or reference databases used, associated parameters and more recently, associated containers providing access to these tools. 
Type Of Material Improvements to research infrastructure 
Year Produced 2018 
Provided To Others? Yes  
Impact Our use of CWL has driven both the specification and the development of CWL execution engines that are required to execute them. The CWL is community project, involving cross disciplinary teams. One execution framework, Toil, is an open source software project and have been developed by the community when bugs in the software have been reported by us. Similarly, IBM developers have been improving CWLEXEC in response to our work. Both MGnify and MG-RAST pipelines are now both described in CWL, allowing both teams, as well as others, to more readily compare the pipelines and understand the similarities and differences. These CWL descriptions can also be reused by the community, either to build novel workflows, or to adapt the existing workflows by introducing new tools and reference databases. Finally, the adoption of CWL has allowed us to elastically scale our compute, by using both academic and commercial clouds to assess cost/benefits, in this changing landscape. 
URL https://www.commonwl.org
 
Title ENA 
Description The European Nucleotide Archive (ENA) captures and presents information relating to experimental workflows that are based around nucleotide sequencing. A typical workflow includes the isolation and preparation of material for sequencing, a run of a sequencing machine in which sequencing data are produced and a subsequent bioinformatic analysis pipeline. ENA records this information in a data model that covers input information (sample, experimental setup, machine configuration), output machine data (sequence traces, reads and quality scores) and interpreted information (assembly, mapping, functional annotation). Data arrive at ENA from a variety of sources. These include submissions of raw data, assembled sequences and annotation from small-scale sequencing efforts, data provision from the major European sequencing centres and routine and comprehensive exchange with our partners in the International Nucleotide Sequence Database Collaboration (INSDC). 
Type Of Material Database/Collection of data 
Provided To Others? Yes  
Impact This ENA is the European arm of INSDC. However, ENA has specifically been extended to allow the deposition of metagenome assemblies, binned assemblies and metagenome assemblies. We have also worked on ensuring that metadata associated with sequence data are appropriately capture by the development of checklists. 
URL https://www.ebi.ac.uk/ena
 
Title MGnfiy (formerly called EBI metagenomics) 
Description The MGnify resources is an automated pipeline for the analysis and archiving of metagenomic data that aims to provide insights into the phylogenetic diversity as well as the functional and metabolic potential of a sample. It enables users to freely browse all the public data and associated analysis results that are contained within the resource. More recently (in 2018) we have started to provide metagenomics assembly as a service to the community, which is often not performed due to the computational overheads. 
Type Of Material Database/Collection of data 
Year Produced 2012 
Provided To Others? Yes  
Impact The MGnify provides access to some of the largest metagenomics projects and is the large collection of analysed metagenomic datasets. Uniquely, it enables the consistent analysis between projects enabling scientist to compare results to other datasets in the resource or to their own. 
URL https://www.ebi.ac.uk/metagenomics
 
Title Metagenome Exchange Registry 
Description Database for the capture and presentation of data linking metagenomics analyses, such as from MG-RAST and MGnify to raw data sets in INSDC databases; includes Application Programmatic Interfaces for data input and access. 
Type Of Material Database/Collection of data 
Year Produced 2018 
Provided To Others? Yes  
Impact The Metagenome Exchange Registry has been promoted towards users external to the project, such as JGI and the MAR databases. 
URL https://www.ebi.ac.uk/ena/registry/metagenome/api/
 
Description MG-RAST 
Organisation Argonne National Laboratory
Country United States 
Sector Public 
PI Contribution Discussing ideas and experiences on large scale bioinformatics analysis of metagenomics. Knowledge of data submission.
Collaborator Contribution Data submission to ENA of metagenomcis datasets. Knowledge of metagenomics analysis.
Impact Plan for pipeline interoperability.
Start Year 2017
 
Title Metagenomics toolkit 
Description Metagenomics toolkit enables scientists to download all of the sample metadata for a given study or sequence to a single csv file. 
Type Of Technology Software 
Year Produced 2018 
Open Source License? Yes  
Impact Improved access to sample metadata enabling easier integration to workflows. 
URL https://pypi.org/project/mg-toolkit/
 
Description 21 GSC Meeting talk "EBI's use of CWL workflows" 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Policymakers/politicians
Results and Impact Talk given at the 21st Genomic Standards Consortium Meeting held at the University of Vienna, Austria.
Year(s) Of Engagement Activity 2019
URL https://gensc.org/meetings/gsc21/
 
Description BiATA 2019 invited talk "Insights into the human gut microbiota from a (meta-)genomic perspective" 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact PI Dr Robert Finn was a featured speaker at the 2019 BiATA conference held at the Graduate School of Management St Petersburg University, Russia. The talk covered recent work carried out by the team that resulted in new insights into the human gut microbiota.
Year(s) Of Engagement Activity 2019
URL http://biata2019.spbu.ru
 
Description CABANA training workshop titled "Introduction to Metagenomics" 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Training modules in metagenomics were delivered during the 5 day CABANA workshop held at the Faculty of Natural Sciences - University of Buenos Aires (FCEN-UBA), Argentina. In this course, participants learnt the basics of metagenomics, covering experimental design and workflows, moving through to microbiome analysis via metabarcoding and shotgun metagenomics. The course theme focused on metagenomics oriented to biodiversity. Talks and hands on practical sessions were delivered to cover all aspects of the course work.
Year(s) Of Engagement Activity 2019
URL https://www.ebi.ac.uk/training/events/2019/cabana-workshop-introduction-metagenomics
 
Description EBI Industry talk titled "A new genomic blueprint of the human gut microbiota" 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact This talk was presented during the EBI industry programme quarterly meeting held at EMBL-EBI, UK and focused on future developments of MGnify and making human genomes accessible.
Year(s) Of Engagement Activity 2019
 
Description EOSC-Life hackathon titled "Tool profiling in Toil, and testing of cwl-toil-runner" 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Participated in a hackathon for technical experts to improve CWL implementation of tools organised as part of the EOSC-Life WP1 held in Germany. The hackathon brings together individuals interested in common data types (e.g. genomics) but who may originate from different communities (e.g. plant genomics and rare diseases for instance).
Year(s) Of Engagement Activity 2019
URL https://www.eosc-life.eu/news/hackathon/
 
Description Laboratory News feature titled "The secret microbiome" 
Form Of Engagement Activity A magazine, newsletter or online publication
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact Research work utilising MGnify to unlock the complexity and diversity of the human gut microbiome was featured in an article printed by Laboratory News.
Year(s) Of Engagement Activity 2019
URL http://www.labnews.co.uk/article/2024791/the_secret_microbiome
 
Description NATURE Milestone 25 titled "Metagenome-assembled genomes provide unprecedented characterization of human-associated microbiota" 
Form Of Engagement Activity A magazine, newsletter or online publication
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact A publication from the group titled "A new genomic blueprint of the human gut microbiota" [Nature https://doi.org/10.1038/s41586-019- 0965-1 (2019)] was featured by Nature as part of a Milestone in Human Microbiota Research. https://media.nature.com/original/magazine-assets/d42859-019-00061-9/d42859-019-00061-9.pdf
Year(s) Of Engagement Activity 2019
URL https://www.nature.com/articles/d42859-019-00061-9
 
Description National Microbiome Data Collaborative Workshop: linking MIxS standards, Environment ontology, and GAZ; Burgin 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact A workshop from the US National Microbiome Data Collaborative initiative. We represent standards and tools that have been developed under MGP-III that are of value to this initiative. Discussions took place around these and other tools. Alignment with this project will secure global data accessibility and reach for data already routed towards ENA and MGNify.
Year(s) Of Engagement Activity 2019
 
Description Popular Science article titled "Scientists think they've found 1,952 new species living in our poop" 
Form Of Engagement Activity A magazine, newsletter or online publication
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Research work from the group that utilised MGnify to reveal the complexity and diversity of the human got microbiome resulted in a 2019 Nature publication titled "A new genomic blueprint of the human gut microbiota [https://doi.org/10.1038/s41586-019-0965-1]. This work was then featured in the Popular Science article.
Year(s) Of Engagement Activity 2019
URL https://www.popsci.com/gut-microbiome-new-bacteria/