EBI Metagenomics - enabling the reconstruction of microbial populations

Lead Research Organisation: European Bioinformatics Institute
Department Name: Genome Assembly and Annotation

Abstract

Microorganisms inhabit practically all environments on Earth. For example, there are more microbes in the ocean than stars in the known universe, with complex communities living in vastly different niches, from the tropics to the polar waters and from well-lit surface waters to the deep abyss. They harvest and transduce solar energy and is estimated that they contribute 50-90% to global primary production, turning light into biomass through photosynthesis, making them vital to the world's food chain. Microbes produce and consume most greenhouse gases (carbon dioxide, nitrous oxide and methane), which is of particular importance in relation to man-made climate change. They are also responsible for over half of all oxygen production on Earth. Within ecosystems, microbes catalyse the key bio-geochemical transformations of nutrients and trace elements that sustain organic productivity. Understanding these processes would bring many potential benefits. For example, working out the mechanisms by which microbes unlock organic phosphate to a soluble form that can be absorbed by plants could reduce the use of fertilizers and increase agricultural yields.
Within each environment, the microbial population contains a vast and dynamic reservoir of genetic variability, much of which is yet to be studied. Current biological databases do not represent the vast majority of environmental organisms, as traditional genome sequencing approaches require isolation and culturing. Metagenomics, the sequencing of the entire collection of DNA found within an environmental sample, circumvents this need. As a result, we have begun to answer some of the key questions about which organisms are found in which environments. There has been a huge uptake of the approach across a broad range of disciplines. Nevertheless, the majority of metagenomics projects produced over the past decade have given only a fragmentary picture of underlying micro-organisms genomes, as larger volumes of sequencing are required to improve the level of genomic detail.
In the era of data driven science, and with widespread access to sequencing technology and ever diminishing costs, huge volumes of sequence data present an amazing opportunity to understand the microbial world at a more detailed level. However, the field of metagenomics faces the following issues: 1) given the vast data volumes, specialist expert-built pipelines are required for efficient, high-throughput analysis; 2) bioinformatics analysis of results is costly to produce and requires expert knowledge; 3) to extract maximum knowledge from experiments, there is a need to systematically capture the associated experimental data along with the sequence data; 4) there is a lack of consistency between different analysis approaches, affecting comparability. The EBI Metagenomics (EMG) resource solves these issues by offering a free service for the analysis and archiving of all metagenomic data.
With advances in algorithms and methods, it is now possible to piece together the fragments that make up an individual organism's genome. In this project, we will not only continue the provision of the EMG, but also develop the analysis, archiving, tools and data presentation frameworks required to generate genomes from metagenomes. Due to the unique position of EMG, we will be also able to combine data across different projects that contain similar microbial communities. This important data reuse will enable us to generate the highest quality genomes, allow us to detect different strains of bacteria and ensure that we capitalize on previous investments. Our genomes will enrich the current tree of life, and we will extend the EMG interfaces to accommodate the new data that we will produce. This will empower research and innovation in the environment, bioindustries, agriculture and medicine (human and livestock). We will work closely with biotechnological industries, to enable them to harness the huge potential for discovery.

Technical Summary

Metagenomics is a widely used approach to investigate the composition and function of microbial communities. With the development of modern sequencing platforms, data generation is rarely the bottleneck, but rather its analysis. Even when researchers have access to large-scale computing facilities, two metagenomics datasets are rarely analysed in the same way and the workflows used to produce results are virtually impossible to reconstruct. The EBI metagenomics (EMG) resource solves all of the above problems by providing a freely available service for the analysis and archiving (via the European Nucleotide Archive, (ENA)) of metagenomics data. It also provides a platform for the discovery of analysed metagenomics datasets. As these are uniformly analysed, it enables comparability and meta-analysis across projects and biomes. Unlike any other public analysis service, EMG has an archiving remit. The capture of rich, contextual metadata associated with the sequencing data ensure maximal data longevity and reuse. Over and above this, EMG is also a data generator, in terms of functional and taxonomic annotations, and has already analysed a world leading 100,000 publicly available datasets.

To date, EMG has focused entirely on annotating raw reads. While this provides a comprehensive analysis of all sampled micro-organisms, the disconnected and fragmentary nature of the data has some limitations, e.g. lack of full length peptides. To overcome this, we will expand the service to include assembly of metagenomics data. We will build reproducible workflows (deployable within multiple cloud environments) and develop tools to reveal near complete genome maps for the more abundant organisms found within a sample, or that occur commonly across samples. ENA will be extended to allow more comprehensive capture of this assembly data. We will extend EMG to include a catalogue of metagenome assembled genomes, offering insights into 10,000s of novel microbial genomes.

Planned Impact

Metagenomics is a rapidly expanding field and the depth and breadth of data are constantly increasing. At the same time, experimental approaches for investigating different microbiomes are constantly improving, providing deeper insights into microbes occupying particular environments. The use of metagenomics is widespread in research projects associated with BBSRC strategic priorities - agriculture and food security, industrial biotechnology and bioscience for health - and the field represents the epitome of data driven biology. This proposal will contribute to the continued support and development of the world leading EBI metagenomics (EMG) resource. Moreover, its expansion to offer assembly (and genomic reconstruction) as a public service will make EMG unique in the world of metagenomics analysis provision. The application of assembly workflows will be taken to an unprecedented level of scale, scope and precision, allowing even deeper insights into the microbial world. This will enable the scientific community to make the leap from correlative observations to mechanistic hypothesis generation. Such deep knowledge will be of particular importance for cross cutting themes, such as understanding antimicrobial resistance, discovery of new secondary metabolites (e.g antimicrobial agents), host-microbe interactions (plant/animal) and microbial ecology.

The scientific community benefits from EMG in many ways. Primarily it provides freely available services for analysis and archiving (via the ENA) of microbiome sequence data, helping democratise the research field by overcoming limitations of compute and informatics expertise. It also provides a platform for discovery of analysed metagenomics data, already amassing over 100,000 datasets (representing nearly a petabyte of processed data). These are uniformly analysed, enabling comparability and meta-analysis across projects and biomes. Archiving of sequence data with rich experimental metadata also encourages data re-use. Beyond this, EMG outputs will have applications in a wide range of academic and industrial fields, including enzyme discovery, environmental science, diagnostics and animal/human health, as assembly begins to provide a more complete picture of microbial communities.

The results of the project will be of exceptional value to the commercial sector, and the benefits will eventually feed through to the public, in the form of new antibiotics for humans and livestock, higher agricultural yields from the understanding of socio-ecological interplay (e.g., food chain microbes) and expanded discovery of novel enzymes capable of operating at extremes, such as psychrophilic enzymes for detergents, or with novel catalytic functionality (e.g., anaerobic digestion pathways in biofuel production). Industrial partnering has demonstrated that EMG data outputs have increased translation rates within this sector, and continued support for the resource will enhance this.

There are also many technical developments within this project that will have far reaching impacts and can be applied to other analytical disciplines. For example, the use of workflows and containerisation of software for Cloud compute infrastructures will enable a new level of reproducibility and sharing.

We will ensure impact to all academic and industrial audiences by the publication of software, workflows, compute containers and peer reviewed articles. To address the skills shortages in the field of metagenomics informatics, we will also deliver training and webinars.

Metagenomics is pivotal to the notion of One Health - the collaborative effort of multiple disciplines working at national and international levels to to attain optimal health for people, animals and the environment. This proposal (and EMG) encapsulates this philosophy, serving the major UK and international communities, and will deliver a cost effective resource that will become the world's leading microbiome data service.
 
Title MGnify Web API - e-learning 
Description Recording of a webinar covering analysing and visualising microbiome-derived datasets using the MGnify Web API. 
Type Of Art Film/Video/Animation 
Year Produced 2018 
Impact 279 views on YouTube 
URL https://www.ebi.ac.uk/training/online/course/analysing-and-visualising-microbiome-derived-datasets-u...
 
Description At the end of the funding period, the overall data content in MGnify was as follows: >4,343 publicly available projects, with >439,179 publicly available datasets analysed by the resource, representing a 50% increase in data content when compared to the start of the grant. In terms of sequencing runs, we started from 7,000 at the grant start and have since assembled 29,836 runs drawn from a wide range of biomes and submitted the assembled contigs back into ENA, which have been subsequently analysed within MGnify and uploaded to the website. To simplify user requests for assembly and/or analysis of publicly available sequences or their own privately held data, a 'request analysis' section has been added to the MGnify website. In the final year of the funding period the number of user-requested analyses performed using this facility has increased more than four-fold.
Pipeline updates - An updated approach to data analysis with MGnify was released in January 2020, replacing the previous single pipeline (v4.1) with multiple analysis pipelines (v5.0) that are tailored according to the input data. They build on the annotations offered in v4.1, providing additional approaches for taxonomic assertions based on ribosomal internal transcribed spacer (ITS1/2) regions via the ITSoneDB and UNITE databases, or marker gene-based approaches (mOTUs). They also provide expanded functional annotations in the form of KEGG orthologue (KO) annotations, which are calculated using HMMER and the KOfam library (using a slightly modified form of the profile hidden Markov model (HMM) database of KOs) on predicted protein sequences. A greatly expanded number of analyses have been added for assembled contigs, including new pathway and systems annotations. eggNOG annotations are generated for predicted proteins from the contigs using the eggNOG-mapper tool. Meanwhile, KO results are used to generate KEGG pathway annotations. In addition, InterPro annotations are processed to generate a compendium of Genome Properties results, detailing whether a property is present, partially present, or absent in the dataset. antiSMASH is also run on the predicted protein set, providing annotation of biosynthetic gene clusters. Finally, the proteins are compared individually against the UniRef90 database using DIAMOND in 'blastP' mode to identify the accession, description and taxonomic identifier of the best matching sequence. As well as being available to download from the website and API, annotations on the contigs can be visualised using a newly developed contig viewer (described below).
Viral Pipeline - We have developed a viral analysis pipeline that identifies contigs of viral, phage or prophage origin within an assembly. These undergo taxonomic assignment (which is currently a very low fraction), with the analysis expandable to include other reference databases, such as VOGdb, RVDB, viral protein family (VPF) models from IMG/VR, and pVOGs. This pipeline was also repurposed for the identification of Coronavirus in metatranscriptomes and has been encapsulated in both Nextflow and Common Workflow Language (CWL).
Long-Reads - The rates of assembly requests from our users continued to grow and we observed more depositions of long-read datasets, which may or may not be accompanied by a corresponding short-read sequence dataset pertaining to the same sample. As such, we are in the final testing phase of three additional assembly pipelines: (i) long-read assembly pipeline, where Flye is the core assembly algorithm; (ii) a hybrid-assembly (long- and short-reads using metaSPAdes; and (iii) a hybrid-assembly using Flye. There is a fundamental difference between the two hybrid pipelines, the metaSPAdes pipeline utilises the long-reads to resolve the short-read assembly graph, whereas the Flye pipeline uses the short-reads to polish the long-read assemblies. The first results from these assembly pipelines have been produced and we have extended the MGnify analysis request interface and ENA submission tool to cater for these assembly types. Examples of assemblies using long-reads have already been submitted to ENA and we have refined the analysis pipeline to accommodate this additional data type, which has a higher error rate compared to shot-read based sequencing methods. There is a small number (<5) of long-read assemblies now available in MGnify.
Assembled contigs viewer - To enable users to access information on genomic context, the analysis section for assemblies on the MGnify website now includes a 'Contig Viewer' tab that uses the Integrative Genomics Viewer (IGV) framework to provide access to each contig with the corresponding functional annotations. As a metagenomics assembly may typically contain >1,000 contigs, a range of parameters can be selected to filter the results. These include attributes, such as contig length, coverage by raw reads, and name. Additional filter parameters are based on the contig annotations, such as COG category code (produced as part of the eggNOG annotations), KO accession, GO accession, Pfam accession, and/or InterPro accession. Once contigs have been selected, annotations for the predicted proteins can be coloured according to their functional annotations (InterPro, GO, Pfam, EggNOG, COG, and KEGG). This viewer has also been reused in a new section of the MGnify website, designed at capturing MAGs.
MGnify Genomes - We have produced our second release of a new section of the MGnify website called MGnify Genomes, which presents non-redundant sets of metagenome assembled genomes (MAGs) to the users. The first release contained the Unified Human Gut Catalogue (UHGC), which presents 4,644 species based on an underlying set of >280,000 MAGs. In addition to being able to browse the MAG genomes using the aforementioned contig view, supplementary information relating to the pangenome of the species, completeness, and contamination is also included. Since this first release, we have generated an additional collection of MAGs derived from the marine environment, cow rumen, and human oral cavity. We have also updated the UHGC, developing a pipeline that enable MGnify genomes updates to be achieved in a manor that retains the previous catalogue, thus establishing the protocols of when an existing MAG should be replaced by another MAG, and how to update the pan-genome information when a new genome is added to a cluster. Furthermore, we are recalculating the pangenome data with Panaroo (a new tool released in January 2020), which facilitates incremental updates (unlike Roary that was used initially). Due to the growing numbers and demand for MAGs, we have also introduced an accession system for the species representatives, overcoming the limitations of the ENA layers. Finally, the MGnify website has been extended to include a specific section for accessing MAGs. It allows people to search for MAGs via a faceted search, as well as browsing genomes. The genome collection can also be searched using a query DNA sequence, via a BIGSI search system or other MAGs via a sourmash search.
Protein database updates - MGnify produces a non-redundant protein database that is generated by amalgamating the open reading frame predictions from the analysis of all of the underlying assembled datasets. Since the start of the grant, this database has grown from 500M sequences to >2.1B. To better understand the redundancy within the database and to provide a more manageable dataset for users, we have used Linclust from the MMseqs suite to cluster the sequences using 90% thresholds for overlap (of the shortest sequence) and sequence identity. Using cluster representatives as a target database (350M), we have furnished the MGnify website with a sequence search functionality to aid the exploration and discovery of this unique set of proteins. A user can further restrict the search via the web interface, limiting searches to sequences originating from a particular subset of biomes, or according to the Prodigal prediction (full length, truncated, or partial). The entire set of protein sequences is available via the MGnify FTP site.
Training workshops - Training workshops were held at EMBL-EBI in June 2019 and virtually in November 2020 and 2021. Each workshop involved 30 participants, with training materials available online. A further training workshop was conducted as part of the 2020 BiATA conference and delivered to >150 people. We have also delivered the first of two webinars aimed at increasing awareness of MGnify and the use of the API.
Statistical Analysis of MGnify data - In collaboration with our Newcastle colleagues we released an R interface to MGnify called MGnifyR in 2020. In a slight departure from the original intention to embed this functionality in the MGnify website, this standalone client application uses the MGnify API and has multiple advantages. Firstly, it facilitates the ingestion of MGnify data directly into a range of widely used R packages, such as PhyloSeq. Secondly, it provides both greater scalability and flexibility of the analysis and visualisation that would be allowed within the website. In addition to being taught in the 2020 EMBL-EBI training course, we are currently extending the MGnify documentation (housed in ReadTheDocs) to include step-by-step examples of using MGnifyR. To make this visible to the MGnify audience, we have established a new "research" area of MGnify that allows Jupyter notebooks to be connected to the MGnify API via MGnifyR. This was released in 2022, along with an updated visual interface.
ENA - The ENA team continues to provide the supporting data infrastructure for primary data submissions of relevance to MGnify. Bringing globally comprehensive primary metagenomics and metabarcoding data from the scientific community via EMBL-EBI data submission services and through our collaboration with partners in the INSDC, we continue adapting to deal with emerging data types and standards. For example, the ENA team has developed new data structures for metagenome assemblies. Primary and binned metagenomes are now archived as new 'analyses' within ENA. These analyses hold simplified and standardised FASTA files with associated metadata. Assemblies classed as 'MAGs' in the system are now only those that a submitter has asserted to be single-taxon assemblies and a close representation of an individual genome (that could match an already existing isolate or represent a novel "virtual" isolate). This allows high-quality MAGs to continue to sit beside isolate genomes in the database and without contamination to downstream services from higher-volume low-quality environmental genomes. We have also removed the need to generate flatfile products for primary binned metagenomes, thus simplifying assembly processing and increasing speed from 3,000 to 20,000 assemblies per day. By the end of 2018, we had implemented a system to label different classes of assemblies during submission using the metadata attribute 'assembly type'. Assembly type supports values: 'primary metagenome', 'binned metagenome', 'MAG', 'Environmental Single-Cell Amplified Genome (SAG)' and 'clone or isolate'. At the time of writing, there have been 43,982 primary metagenome, 121,805 binned metagenome, 8,377 MAG and 12,718 SAG submissions to the ENA, showing great use of this feature. From January 2020, labelling assemblies with an assembly type on submission has become mandatory to ensure users define their assembly data class. With search already available for primary and binned metagenomes, the team is currently working to index these assembly classes in the ENA Discovery API. We have fully adapted the ENA submission tool, Webin-CLI, to support the new assembly classes in full, providing such features as pre-submission validation, and simplified data transfer to EMBL-EBI for assembly classes. To support the new submission services, a team member is dedicated to MGnify data flow support, working on outreach, standards development, documentation, training, and helpdesk. Over the course of the project, the documentation for the ENA submissions has also been improved to provide better support to users of the submission services, including the addition of specific metagenome assembly submission guidelines for each level of assembly as well as an FAQ for more specific scenarios (e.g. co-assembly). The ENA team continues to develop data standards alongside the GSC. To support the submission of environmental assemblies, the ENA has implemented four new checklists (MIMAGs, MISAG, MIUVIG, and the ENA binned metagenome), enabling the capture of standardised metadata. New fields from these checklists have been indexed for search in the discovery API (e.g. completeness and contamination scores for binned metagenomes) to allow more refined searches on metagenomic data.
Exploitation Route The new layering of MAGs in ENA allows research scientists to discover this emerging and important class of data available from metagenomics analysis. It enables the contextualisation of the MAGs provided from both MGnify and those from the wider scientific community, thus improving discovery of both assemblies and MAGs and increasing data reuse. The broader utility of MAGs lies in understanding microbiomes from a genomic perspective. This includes identifying pathways and gene clusters that provide key functionality to a particular genome. Similarly, these MAGs are starting to be explored for novel functionalities that can lead to therapeutic applications. For example, a novel cholesterol dehydrogenase has been discovered in a distinct clade that remains to be experimentally uncultured. Alternatively, these MAGs can be analysed for genome evolution in order to understand the adaptation of the microbe to a particular environment. Finally, these collections of MAGs allow scientists to understand the biodiversity of environmental samples. The large protein dataset has already proven particularly useful for the biotechnological industries, who are (and will continue to) mining the resource for novel enzymes. Applications already include bioremediation, the food industry and novel chemical transformations. This dataset has also been recently utilised to generate deep multiple sequence alignments, which are then used to generate the inputs for de novo structure predictions. Specifically, the MGnify database is bundled with the Colab version of AlphaFold, and the protein sequences have been used by ESMfold to generate 100s of millions of structures, especially for those that currently lack them, revealing the fold and providing insights into function. These new functions, for example, can be used for new biological processes and/or understand microbe adaptations. The assemblies in MGnify have also been used both for the development and/or application of new machine learning algorithms, e.g. use of natural language processing to increase the functional annotations in (meta-)genomes. Moreover, these assemblies have been screened for contigs that are believed to be of eukaryotic origin, which have then been used as inputs for the discovery of eukaryotic MAGs.
Sectors Agriculture, Food and Drink,Chemicals,Digital/Communication/Information Technologies (including Software),Energy,Environment,Healthcare,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology

 
Description Our new approaches to the assembly and analysis of metagenomics datasets have started to provide insights into the vast majority of microbes (~99%) yet to be cultured. This has given rise to a protein dataset 10x larger than the well established resources such as UniProt. This new dataset allows scientists to better understand protein evolution and is beginning to be exploited by both academic and industrial research scientists to produce novel enzymes/products for the benefit of society. This includes enzymes involved in bioremediation and those that can be used within the food industry. The MGnify protein set was used by Deep Mind's AlphaFold2 protein folding program that excelled in the CASP14 challenge. Our novel genomes also provide a deeper understanding of the microbes that are found in the human gut, human oral cavity, cow rumen and marine environments, which may be exploited in the future for medical applications (e.g. diagnostics and/or therapeutics) or biomarkers. Our pipelines have been encapsulated using the common workflow language (CWL). These pipelines have been used as exemplars to a wide range of communities (industrial and academic) on how to develop complex computational analysis pipelines, and how this can be used to capture the provenance of the analysis (software tools, parameters, reference databases, inputs and outputs). We have also contributed to open access software for executing these workflows, namely Toil which is also used across sectors. The taxonomic identification data generated in MGnify is now flowing into the Global Biodiversity Information Facility (GBIF), which has a completely different audience to MGnify but importantly, is used to inform governmental policies across the world. MGnify now represents one of the largest contributors of taxonomic identifications to GBIF and this data is presenting a new view on environmental biodiversity and provides the ability to connect between biodiversity and museum collections.. As an indication of impact, 329 citations have been attributed to MGnify within the GBIF framework. Since such citations are never recorded in a typical academic fashion, the GBIF DOI allows full traceability back to MGnify. During the course of this work we have witnessed a substantial interest in the gut microbiome, both from the media and from the general public. We have undertaken significant public engagement initiatives to raise awareness and also demystify some of the aspects of microbiome research.
First Year Of Impact 2020
Sector Agriculture, Food and Drink,Chemicals,Digital/Communication/Information Technologies (including Software),Environment,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology
Impact Types Societal,Economic,Policy & public services

 
Description Board Member of the Genome Standards Consortium
Geographic Reach Multiple continents/international 
Policy Influence Type Participation in a guidance/advisory committee
URL https://gensc.org
 
Description Co-lead of the the Genome Standards Constortium project M5 - A meta-infrastructure enabling exchange of large (metagen)omics data sets
Geographic Reach Multiple continents/international 
Policy Influence Type Participation in a guidance/advisory committee
Impact This standard is aimed at providing complete provenance of the large quantities of data in ever growing data sets pose significant infrastructure challenges to biologists and bioinformaticians. The old, very loosely integrated approaches relying on the INSDC network for sequence data sharing are still important, however additional layers of data infrastructure (standards driven) will emerge over time simply driven by the cost of data analysis. Already review of scientific papers for shotgun metagenomic data sets is problematic as the cost for computational re-analysis is significant. Only by sharing derived results in robust ways can the community overcome the computational burden. Basically speaking, minimizing the number of times a particular data sets is undergoing a specific analysis will maximize the amount of analyses the community as a whole can perform. Technology, Standards and community buy-in are required and the group is working on creating the missing pieces of a more complete data sharing ecosystem. This project is aimed at improving the knowledge and standards associated with the above, particularly through the use of CWL.
URL https://gensc.org/projects/m5/
 
Description Contributed the Microbiology society "unlocking the microbiome" report.
Geographic Reach Multiple continents/international 
Policy Influence Type Participation in a guidance/advisory committee
URL https://microbiologysociety.org/news/society-news/microbiology-society-unlocking-the-microbiome-repo...
 
Description Member, UKRI Knowledge Transfer Network (KTN) Microbiome Innovation Network
Geographic Reach National 
Policy Influence Type Participation in a guidance/advisory committee
 
Description Scientific Advisory Board Member for the National Microbiome Data Collaborative
Geographic Reach Multiple continents/international 
Policy Influence Type Participation in a guidance/advisory committee
URL https://microbiomedata.org
 
Description (FindingPheno) - Unified computational solutions to disentangle biological interactions in multi-omics data
Amount € 5,793,085 (EUR)
Funding ID 952914 
Organisation European Commission 
Sector Public
Country European Union (EU)
Start 03/2021 
End 02/2025
 
Description (HoloFood) - Holistic solution to improve animal food production through deconstructing the biomolecular interactions between feed, gut microorganisms and animals in relation to performance parameters
Amount € 10,825,325 (EUR)
Funding ID 817729 
Organisation European Commission 
Sector Public
Country European Union (EU)
Start 01/2019 
End 12/2022
 
Title VIRify workflow 26 
Description VIRify is a recently developed pipeline for the detection, annotation, and taxonomic classification of viral contigs in metagenomic and metatranscriptomic assemblies. The pipeline is part of the repertoire of analysis services offered by MGnify. VIRify's taxonomic classification relies on the detection of taxon-specific profile hidden Markov models (HMMs), built upon a set of 22,014 orthologous protein domains and referred to as ViPhOGs. VIRify was implemented in CWL. https://github.com/EBI-Metagenomics/emg-viral-pipeline 
Type Of Material Improvements to research infrastructure 
Year Produced 2020 
Provided To Others? Yes  
Impact The VIRify pipeline was recently used to identify coronaviruses in metavirome and metatranscriptome datasets, which facilitated foresight on potential new betacorona pathogens and enabled investigators to map potential hosts (DOI: 10.1093/bib/bbaa232). 
URL https://workflowhub.eu/workflows/26
 
Title VIRify workflow 27 
Description VIRify is a recently developed pipeline for the detection, annotation, and taxonomic classification of viral contigs in metagenomic and metatranscriptomic assemblies. The pipeline is part of the repertoire of analysis services offered by MGnify. VIRify's taxonomic classification relies on the detection of taxon-specific profile hidden Markov models (HMMs), built upon a set of 22,014 orthologous protein domains and referred to as ViPhOGs. VIRify was implemented in CWL. 
Type Of Material Improvements to research infrastructure 
Year Produced 2020 
Provided To Others? Yes  
Impact The VIRify pipeline was recently used to identify coronaviruses in metavirome and metatranscriptome datasets, which facilitated foresight on potential new betacorona pathogens and enabled investigators to map potential hosts (DOI: 10.1093/bib/bbaa232). 
URL https://workflowhub.eu/workflows/27
 
Title ENA 
Description The European Nucleotide Archive (ENA) captures and presents information relating to experimental workflows that are based around nucleotide sequencing. A typical workflow includes the isolation and preparation of material for sequencing, a run of a sequencing machine in which sequencing data are produced and a subsequent bioinformatic analysis pipeline. ENA records this information in a data model that covers input information (sample, experimental setup, machine configuration), output machine data (sequence traces, reads and quality scores) and interpreted information (assembly, mapping, functional annotation). Data arrive at ENA from a variety of sources. These include submissions of raw data, assembled sequences and annotation from small-scale sequencing efforts, data provision from the major European sequencing centres and routine and comprehensive exchange with our partners in the International Nucleotide Sequence Database Collaboration (INSDC). 
Type Of Material Database/Collection of data 
Provided To Others? Yes  
Impact This ENA is the European arm of INSDC. However, ENA has specifically been extended to allow the deposition of metagenome assemblies, binned assemblies and metagenome assemblies. We have also worked on ensuring that metadata associated with sequence data are appropriately capture by the development of checklists. 
URL https://www.ebi.ac.uk/ena
 
Title MGnfiy (formerly called EBI metagenomics) 
Description The MGnify resources is an automated pipeline for the analysis and archiving of metagenomic data that aims to provide insights into the phylogenetic diversity as well as the functional and metabolic potential of a sample. It enables users to freely browse all the public data and associated analysis results that are contained within the resource. More recently (in 2018) we have started to provide metagenomics assembly as a service to the community, which is often not performed due to the computational overheads. 
Type Of Material Database/Collection of data 
Year Produced 2012 
Provided To Others? Yes  
Impact The MGnify provides access to some of the largest metagenomics projects and is the large collection of analysed metagenomic datasets. Uniquely, it enables the consistent analysis between projects enabling scientist to compare results to other datasets in the resource or to their own. 
URL https://www.ebi.ac.uk/metagenomics
 
Title MGnify (previously EBI Metagenomics Portal) 
Description MGnify, previously EBI Metagenomics, (https://www.ebi.ac.uk/metagenomics/) is a database of richly described shotgun metagenomics data sets from across sample environments. Drawing on user-submitted data, functional and taxonomic analysis pipelines provide systematic processing and analysis of data. Both input data and analysis outputs available freely in a variety of presentations and downloadable data formats. The database combines permanent archiving functions (through connectivity with public sequence databases) and state-of-the-art analysis methods. 
Type Of Material Database/Collection of data 
Year Produced 2012 
Provided To Others? Yes  
Impact The database is core to the MGnify programme, such that general programme impacts (see elsewhere in our outcome reporting for the programme) are all relevant to the database. 
URL https://www.ebi.ac.uk/metagenomics/
 
Title Metagenome Assembled Genome Catalogues 
Description Metagenome assembled genome catalogues for a variety of biomes, with species representatives made available via the MGnify website, and hundreds of thousands of genomes available via FTP site. For each species, and where appropriate, pangenome analysis has also been made available. Also, for each catalogue, there is an associated protein catalogue. 
Type Of Material Database/Collection of data 
Year Produced 2021 
Provided To Others? Yes  
Impact Increasingly metagenome assembled genomes can be generated from metagenomic assemblies. However, these are currently not consistently collected and compared to each other. We have developed the necessary workflows to aggregate over multiple different sources, compare the genomes and select the most representative species. This allows comparison of new databases, to understand the novelty that additional datasets may contain. 
URL https://www.ebi.ac.uk/metagenomics/browse#genomes
 
Title Metagenomic non-redundant protein database 
Description Database of protein sequences produced from assembly of metagenomic datasets. 
Type Of Material Database/Collection of data 
Year Produced 2017 
Provided To Others? Yes  
Impact This database has supported the discovery of novel enzymes by an SME biotech company (BioCatalysts) as part of an InnovateUK BBSRC grant. 
URL https://www.ebi.ac.uk/metagenomics/sequence-search/search/phmmer
 
Description Exploring the use of public informatics resources in the understanding of Type-2 Diabetes and new therapeutic opportunities 
Organisation Novo Nordisk
Country Denmark 
Sector Private 
PI Contribution We have applied our metagenomics assembly pipeline and protocols for generating metagenome assembled genomes to a wide collection of type 2 diabetes shotgun datasets (and related comorbidities). This work has enabled us to identify a unique collection of genomes found in the human gut in different disease states.
Collaborator Contribution The partners are providing intellectual input to help link the functions attributed to the novel genomes to phenotypic features of the diseases state.
Impact A set of metagenome assembled genomes from human gut, and identified a subset of genomes that are either enriched or depleted in type 2 diabetes.
Start Year 2018
 
Description Potential use of WGS data for functional analysis of the skin 
Organisation Unilever
Department Research and Development Colworth
Country United Kingdom 
Sector Private 
PI Contribution We have applied our assembly and metagenome assembled genome generation protocols to a publicly available dataset. The overarching aim was to help our industrial partner understand different approaches to investigating the skin microbiome, comparing the results of metabarcoding and shotgun metagenomics, especially from a functional perspective, which is only possible from a metagenomics point of view.
Collaborator Contribution Our partners provided the scientific use case and gave feedback on the types of analysis that were required to address the scientific question. The results from MGnify have guided internal policies and scientific practices, while the partner's feedback has helped shaped future development in MGnify.
Impact A collection of metagenome assembled genomes (MAGs) for the skin. The collaboration is multi-disciplinary, involving industrial / academic partners, whom cover skills that include biostatistics/risk modeller, bioinformatics and software development.
Start Year 2019
 
Description 1st Microbiome PT Summit keynote talk "Can microbiome analysis be FAIR?" 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Microbiome research has grown substantially and produced massive data over the past decade in terms of the range of biomes sampled, facing challenges in terms of data findability, accessibility, interoperability and reuse. ELIXIR, the European Infrastructure for Biological Data, is addressing these topics via domain-specific communities, namely via its Microbiome Community. BioData.pt, as the Portuguese Node of ELIXIR, is assembling its National Community to engage Portuguese researchers on this topic in this European effort. Therefore Portuguese Microbiome Community, led by Isabel Gordo, from Instituto Gulbenkian de Ciência, organised its first National Summit to raise awareness and gather scientists addressing microbiome research in Portugal. Dr Rob Finn (EMBL-EBI) - Head of ELIXIR Microbiome Community and EMBL-EBI's Microbiome Informatics team chaired the opening session and gave the keynote talk.
Year(s) Of Engagement Activity 2021
URL https://www.biodata.pt/node/289
 
Description 2019 H.BIOINFO invited talk titled "Broadening our knowledge of the human microbiome" 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact PI Dr Robert Finn was an invited speaker at the 2019 H.BIOINFO conference held at the Foundation forResearch and Technology, Hellas (FORTH), Greece, where he presented recent efforts carried out by his team to decipher the complexity and diversity of the human microbiome.
Year(s) Of Engagement Activity 2019
URL https://hscbio.wordpress.com/conferences-when/2019-2/
 
Description 2020 Annual Research Conference of the Pontificia Universidad Católica del Perú talk "Broadening our genomic knowledge of the human microbiome" 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact PI Dr Rob Finn described the role of MGnify, including the resource's gut catalogue in microbiome research. He highlighted how Latin American samples were underrepresented. Finally, he provided advice on the different career paths available for researchers in bioinformatics.
Year(s) Of Engagement Activity 2020
 
Description 2020 POGO International Virtual Conference on the use of Environmental DNA (eDNA) in Marine Environments: Opportunities and Challenges talk "MGnify: An open and scalable platform for the analysis, discovery and dissemination of molecular based biodiversity data" 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact PI Dr Rob Finn gave a talk during the 2020 POGO International Virtual Conference on the use of Environmental DNA (eDNA) in Marine Environments: Opportunities and Challenges. His talk focused on MGnify during day 2 of the conference; session on Data and Information. Session description is as follows: Through systems such as the International Nucleotide Sequence Database Collaboration (INSDC) and global standards like FASTA/Q format, the eDNA/omics community have benefitted from world-class data and information resources. However, our handling of what is, from our perspective, "metadata" and participation/interoperability with data systems from other disciplines is still in need of advancement. In the marine realm, we now have new opportunities to augment our digital capacities while aligning them with global digital strategies such as those within the UN Decade of Ocean Science for Sustainable Development. This session will explore some examples of how this is already taking place, and will welcome discussion on how we can collectively mainstream sequence data (as well as the information and knowledge derived from it) in the emerging digital ocean ecosystem.
Year(s) Of Engagement Activity 2020
URL https://pogo-ocean.org/capacity-development/activity-related-workshop/environmental-dna-edna-marine-...
 
Description 2021 Pint of Science - Microbiome: the world within us 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Public/other audiences
Results and Impact A general talk about the secret world of your microbiome. Delivered to a diverse audience with a general interest in science. Many discussion about the role of the human gut microbiome in health and disease.
Year(s) Of Engagement Activity 2021
URL https://pintofscience.co.uk/event/microbiome-the-world-within-us
 
Description 21 GSC Meeting talk "EBI's use of CWL workflows" 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Policymakers/politicians
Results and Impact Talk given at the 21st Genomic Standards Consortium Meeting held at the University of Vienna, Austria.
Year(s) Of Engagement Activity 2019
URL https://gensc.org/meetings/gsc21/
 
Description 5th Microbiome Movement - Drug Development Europe conference talk "Magnifying the Human Gut Microbiome" 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact During 5th Microbiome Movement - Drug Development Europe conference, PI Dr Rob Finn presented a talk on the unified human gut genome catalogue and phages, with a view to understanding the potential translational impact of human microbiome research.
Year(s) Of Engagement Activity 2021
URL https://microbiome-europe.com/?utm_source=hw-corporate&utm_medium=backlink&utm_campaign=brand-page
 
Description Attended COP26 climate change conference 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Policymakers/politicians
Results and Impact Attended COP26, where biological solutions to address climate change where discussed. Involved in meetings and attending conference stands to engage and increase awareness of microbiome based solutions for monitoring climate change and potential solutions.
Year(s) Of Engagement Activity 2022
URL https://www.nature.com/articles/d41586-021-03029-w
 
Description BBSRC cross-institute workshop on Microbiomes 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Two day workshop, encompasing (i) presentation on BBSRC funded institutes research/developments and (ii) gap analysis and strategy
Year(s) Of Engagement Activity 2019
 
Description BiATA 2020 workshop on "Analysing metagenomic assemblies using MGnify" 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact In this two day remote tutorial provided by PI Dr Rob Finn and his team during the BiATA2020 conference, participants explored common approaches to analysing and annotating contigs produced from a metagenomics assembly. The course was a mixture of introductory lectures, followed by hands-on practicals. Due to time constraints, participants either investigated pre-calculated examples or used a web browser to explore outputs via the MGnify website (www.ebi.ac.uk/metagenomics). By the end of the course, participants understood how to process contigs, functionally and taxonomically characterise the contigs, and were able to generate metagenome assembled genomes from your assemblies.
Year(s) Of Engagement Activity 2020
URL http://biata2020.spbu.ru/workshop/
 
Description CSHL Biology of Genomes 2020 talk titled "Broadening our genomic knowledge of the human microbiome" 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The Biology of Genomes 2020 meeting organised by the Cold Spring Harbor Laboratory addressed DNA sequence variation and its role in molecular evolution, population genetics and complex diseases, comparative genomics, large-scale studies of gene and protein expression, and genomic approaches to ecological systems. Both technologies and applications were emphasized. There was a special session on the ethical, legal and social implications (ELSI) of genome research. PI Dr Rob Finn chaired the session on Complex Traits and Microbiome and presented a talk.
Year(s) Of Engagement Activity 2020
URL https://meetings.cshl.edu/meetings.aspx?meet=GENOME&year=20
 
Description Cafe Sci talk "Gut bacteria and human health" 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Public/other audiences
Results and Impact This talk was presented as part of the Cafe Sci events, a public engagement initiative in Cambridge where people meet and explore the latest ideas in science and technology. Work on metagenome assembled genomes (MAGs) was presented here.
Year(s) Of Engagement Activity 2019
URL https://publicengagement.wellcomegenomecampus.org/sites/default/files/media/project/caf-sci-cambridg...
 
Description EBI course: Metagenomics Bioinformatics 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Organised and participated in the Metagenomics Bioinformatics training workshop at EBI, which involved lectures and hands-on sessions with world leaders in metagenomic data analysis.
Year(s) Of Engagement Activity 2018
URL https://www.ebi.ac.uk/training/events/2018/metagenomics-bioinformatics-3
 
Description ELIXIR EXCELERATE All Hands Meeting 2019 talk titled "MGnify and Marine Assemblies" 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact This talk was presented during the ELIXIR - EXCELERATE All Hands meeting 2019 held at Lisbon, Portugal. The ELIXIR All Hands meeting brings together members of the ELIXIR community from across the ELIXIR Nodes, and collaborators from partner organisations, in order to review ELIXIR's achievements and activities so far and discuss plans for the future.
Year(s) Of Engagement Activity 2019
URL https://elixir-europe.org/events/elixir-excelerate-all-hands-meeting-2019
 
Description EMBL Newsletter "Unparalleled inventory of the human gut ecosystem" 
Form Of Engagement Activity A magazine, newsletter or online publication
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact EMBL newsletter on the Finn team's Nature Biotech publication (10.1038/s41587-020-0603-3). In this paper, Dr Finn and his collaborators describe their work of compiling into a public database over 200 000 genomes from more than 4,600 species of gut bacteria.
Year(s) Of Engagement Activity 2020
URL https://www.embl.org/news/science/inventory-of-the-human-gut-ecosystem/
 
Description EMBL Science Education (ELLS Heidelberg) tweet "Meet my microbiome 2020" 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Schools
Results and Impact Tweet from the official Twitter account of the EMBL Sceince Education (ELLS Heidelberg) inviting school teachers from Europe and beyond to learn about current research on the human microbiome and how to transfer this knowledge to their classrooms! Each module takes one week and is designed to fit the busy schedule of teachers! #MeetingMyMicrobiome2020
Year(s) Of Engagement Activity 2020
URL https://twitter.com/ELLS_Heidelberg/status/1314259917612736515
 
Description EMBL newsletter interview "Microbiomes take the stage at New Scientist Live" 
Form Of Engagement Activity A magazine, newsletter or online publication
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact PI Dr Robert Finn was interviewed for news.embl.de, the official newsletter of EMBL, in which he shared his insights on the public engagement work he's carried out. in particular, this interview focused on his featured talk at the New Scientist Live 2019 festival where he presented his team's work on acquiring novel insights into the human gut microbiome.
Year(s) Of Engagement Activity 2019
 
Description EMBL-EBI SME forum 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Industry/Business
Results and Impact Presented to the forum a talk titled "Establishing a metagenomics derived protein database for biotechnological discovery".
Year(s) Of Engagement Activity 2018
 
Description EMBL-EBI online tutorial "Metagenomics bioinformatics - A practical introduction" 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact This course covered the use of publicly available resources to manage, share, analyse and interpret metagenomics data, including marker gene, whole gene shotgun (WGS) and assembly-based approaches. It makes use of recorded lecutures and materials from the "Metagenomics Bioinformatics" training course that took place 17 - 20 July 2018 at EMBL-EBI. The recorded lecture material is aimed at life scientists working in the field of metagenomics who are in the early stages of their data analysis. These recordings are suitable for beginners with an undergraduate knowledge of metagenomics. The exercises included in this course are intended for an audience with experience of using bioinformatics in their research. A working knowledge of Unix command line and the R statistical package is required.
Year(s) Of Engagement Activity 2020
URL https://www.ebi.ac.uk/training/online/courses/metagenomics-bioinformatics/
 
Description EMBL-EBI press release "Unparalleled inventory of the human gut ecosystem" 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact EMBL-EBI press release on the Finn team's Nature Biotech publication (10.1038/s41587-020-0603-3). In this paper, Dr Finn and his collaborators describe their work of compiling into a public database over 200 000 genomes from more than 4,600 species of gut bacteria.
Year(s) Of Engagement Activity 2020
URL https://www.ebi.ac.uk/about/news/press-releases/inventory-human-gut-ecosystem
 
Description EMBL-EBI training course "Metagenomics bioinformatics (virtual)" 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact This course covered the metagenomics data analysis workflow from the point of newly generated sequence data. Participants explored the use of publicly available resources and tools to manage, share, analyse and interpret metagenomics data. The content included issues of data quality control and how to submit to public repositories. While sessions detailed marker-gene and whole-genome shotgun (WGS) approaches; the primary focus was on assembly-based approaches. Discussions also explored considerations when assembling genome data, the analysis that can be carried out by MGnify on such datasets, and what downstream analysis options and tools are available.
Year(s) Of Engagement Activity 2020
URL https://www.ebi.ac.uk/training/events/metagenomics-bioinformatics-virtual/
 
Description EMBL-EBI/ Sanger internal seminar titled "Evaluating the eukaryotic fraction of microbiomes" 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Professional Practitioners
Results and Impact This talk was part of the EMBL-EBI/ Sanger campus seminar series and presented recent work on identifying eukaryotes within the microbiome.
Year(s) Of Engagement Activity 2019
 
Description EMBL-EBI/WSI Seminar Series talk "The human microbiome beyond the gut" 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Professional Practitioners
Results and Impact This seminar was presented as a part of the EMBL-EBI and Wellcome Sanger Institute joint monthly seminar series. PI Dr Rob Finn's talk focused on recent efforts to recover MAGs from the human skin microbiome, which not only harbours a very distinct microbial composition compared to the gut, but also carries additional challenges such as low DNA yield. Approaches were presented to overcome these challenges and some of the insights we have obtained into the microbial skin diversity.
Year(s) Of Engagement Activity 2020
URL https://www.ebi.ac.uk/about/events/seminars/2020/ebisanger-seminar-series-rob-finn-and-phil-jones-zo...
 
Description EMBO practical course - Lecture and hands-on workshop titled "Functional and taxonomic analysis with MGnify" 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact A half day long event comprising of a lecture followed by hand-on training workshop was delivered as part of "EMBO practical course - Microbial Metagenomics: A 360º Approach" held at EMBL Heidelberg, Germany.
Year(s) Of Engagement Activity 2019
URL https://www.embl.de/training/events/2019/MET19-01/index.html
 
Description ENA Overview and Data Submission module at Microbiota Data Analysis Workshop, ETH Zurich, Jan. 2020; Holt 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Information and training module "ENA Overview and Data Submission" given at Microbiota Data Analysis Workshop, ETH Zurich, Jan. 2020; Sam Holt
Year(s) Of Engagement Activity 2020
 
Description EOSC-Life Seminar Series talk "Metagenomic data analysis workflows in CWL from scratch to multi-environment production" 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact MGnify team member Martin Beracochea gave a talk during the EOSC-Life Seminar Series. He described how the MGnify pipelines already adhere to the FAIR principles, and how these could be deployed in cloud environments.
Year(s) Of Engagement Activity 2021
URL https://www.eosc-life.eu/d3/
 
Description Earlham Institute Metagenomics course - Lecture and training workshop titled "Functional and taxonomic analysis with MGnify: extracting knowledge from microbiome data" 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Delivered a one day training workshop as part of the 4 day Metagenomics course titled "Metagenomics: Data Analysis and Interpretation" organised at the Earlham Institute.
The workshop consisted of three modules: (1) Lecture: Meta-analysis across multiple metagenomic projects and identifying novel organisms; (2) Lecture: Functional and taxonomic analysis with MGnify: extracting knowledge from microbiome data; (3) Hands -on session: Functional and taxonomic analysis with MGnify: extracting knowledge from microbiome data.
Year(s) Of Engagement Activity 2019
URL https://www.earlham.ac.uk/metagenomics-data-analysis-and-interpretation
 
Description European Learning Laboratory for the Life Sciences, ELLS blog "Introducing your microbiome" 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Schools
Results and Impact The European Learning Laboratory for the Life Sciences (ELLS), EMBL's education facility, invited secondary school science teachers to participate in a virtual training course in the autumn of 2020 entitled 'Introducing your microbiome'. The course was divided into four modules, providing an overview of current human microbiome research, introducing bioinformatics as a tool in microbiome research, and exploring microbiome research in health and disease. The final module consisted of group work in small teams, in which participants developed their own educational materials. The modules were taught by EMBL scientists PI Drs Rob Finn and Michael Zimmerman. The course was organised in collaboration with the Public Engagement officer at EMBL's European Bioinformatics Institute (EMBL-EBI) and was held entirely online.
Year(s) Of Engagement Activity 2020
URL http://emblog.embl.de/ells/virtual-llab-microbiome-2020/
 
Description GBIF/molecular data workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Participation in workshop involved dissemination of MGnify results to another resource, who's user base in different to MGnify, thereby extending the reach of the MGnify resource. Based on our initial prototype, over 7 million taxonomic observations have been imported to GBIF.
Year(s) Of Engagement Activity 2019
 
Description ICG-17 Keynote talk "Genome-level resolution metagenomics: from viruses to eukaryotes" 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Keynote speech by PI Rob Finn at the ICG-17 Conference held at Riga, Latvia.
Year(s) Of Engagement Activity 2022
URL https://www.youtube.com/watch?v=x8WJysdL5zA&ab_channel=ICG-17Riga
 
Description ITVNews featuring the interview for "The Gut Stuff" 
Form Of Engagement Activity A broadcast e.g. TV/radio/film/podcast (other than news/press)
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact PI Dr Robert Finn was featured and interviewed by the Mac Twins in one episode of their online series on Gut Microbiome health, titled "the Gut Stuff", where he discussed how big data fits into gut microbiome analysis [https://www.youtube.com/watch?v=s2PExZElXbc&feature=youtu.be]. This was then featured by ITVNews in 2019.
Year(s) Of Engagement Activity 2019
URL https://www.youtube.com/watch?v=s2PExZElXbc&feature=youtu.be
 
Description Marine Microbes Gordon Research Conference 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact The theme of the 2018 Marine Microbes Gordon Research Conference was "Elucidating Microbial Processes Across Spatial and Temporal Scales". It featured cross-disciplinary presentations of unpublished, frontier work by investigators at the forefront of the field and will provide opportunities for scientists to exchange ideas in a collegial, collaborative atmosphere through extensive discussion sessions and daily informal gatherings. I presented MGnify, and work on data standards and ELIXIR
Year(s) Of Engagement Activity 2018
URL https://www.grc.org/marine-microbes-conference/2018/
 
Description Meta AI Research blogpost "ESM Metagenomic Atlas: The first view of the 'dark matter' of the protein universe" 
Form Of Engagement Activity A magazine, newsletter or online publication
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Meta AI blogpost describing the release of 600+ million protein ESM Metagenomic Atlas, with predictions for nearly the entire MGnify90 database, a public resource cataloging metagenomic sequences.
Year(s) Of Engagement Activity 2022
URL https://ai.facebook.com/blog/protein-folding-esmfold-metagenomics/
 
Description Metagenomics for Bioinformatics in-person course at EMBL-EBI 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Presentation of metagenomics data submissions system and delivery of training on its use; in-person training at EMBL-EBI.
Year(s) Of Engagement Activity 2018
 
Description Metagenomics related training session modules delivered - University of Porto - May 2019; Burgin and Holt 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact 2x 1-hour training presentations to University of Porto relating to data access in ENA and metagenomics data submissions; Josie Burgin and Sam Holt.
Year(s) Of Engagement Activity 2019
 
Description Microbiome supoort workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The Common Ground Workshop set the foundation for the project to build upon in the 4 years to come. The event brought together around 100 participants, including representatives from the European Commission, project partners, advisory board members from industry, academia and policy, and representatives from the 4 accompanying innovation actions SIMBA, MASTER, CIRCLES and HoloFood.
Year(s) Of Engagement Activity 2019
URL https://www.microbiomesupport.eu/finding-common-ground-microbiomesupport-partners-in-action/
 
Description Module delivered in "Metagenomics Bioinformatics 2019" at EMBL-EBI, June 2019; Burgin and Holt 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Metagenomics data submission module delivered in "Metagenomics Bioinformatics 2019" at EMBL-EBI, June 2019; Josie Burgin and Sam Holt
Year(s) Of Engagement Activity 2019
 
Description National Microbiome Data Collaborative Workshop: linking MIxS standards, Environment ontology, and GAZ; Burgin 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact A workshop from the US National Microbiome Data Collaborative initiative. We represent standards and tools that have been developed under MGP-III that are of value to this initiative. Discussions took place around these and other tools. Alignment with this project will secure global data accessibility and reach for data already routed towards ENA and MGNify.
Year(s) Of Engagement Activity 2019
 
Description Nature News "Meta just dropped 600+ million protein structure predictions, made using a large language model." 
Form Of Engagement Activity A magazine, newsletter or online publication
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Nature coverage publication on Meta AI's new ESM Mategenomic Atlas "AlphaFold's new rival? Meta AI predicts shape of 600 million proteins"
Year(s) Of Engagement Activity 2022
URL https://www.nature.com/articles/d41586-022-03539-1
 
Description New Scientist Live 2019 Talk "Meet your microbiome" 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact This talk was given at the New Scientist Live 2019 festival at ExCeL, London, and covered the topic of gut microbiomes, how we analyse them, and how they can potentially be influenced. this was followed by a 30 minute Q&A session with the audience.
Year(s) Of Engagement Activity 2019
URL https://live.newscientist.com/2019-speaker-programme/meet-your-microbiome#/
 
Description Outreach for 2019 Bioscience Lite - Bioinformatics and Big Data 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Schools
Results and Impact Bioscience LITE is a series of free twilight teacher training events hosted by the Wellcome Genome Campus and the Babraham Institute to explore some of the big topics in bioscience. These sessions provide updates from scientists on contemporary bioscience, access to some of our scientific facilities and introduce you to resources that can be used in the classroom. As part of this event, participated in public outreach to train high school teachers on microbiome research.
Year(s) Of Engagement Activity 2019
URL https://www.babraham.ac.uk/news/2019/11/bioscience-lite-bioinformatics-and-big-data
 
Description Panelist on the Naked Scientist podcast 
Form Of Engagement Activity A broadcast e.g. TV/radio/film/podcast (other than news/press)
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact Panel based discussion discussing a wide range of questions such as "Does counting calories really work? Could the universe ever implode? And what makes duct tape so sticky?" Panel also include Giles Yeo, astrophysicist Katie Mack and chemist Kate Biberdorf. This was broadcast on national radio, available via BBC website, Naked Scientist website, as well as broadcast in Australia.
Year(s) Of Engagement Activity 2021
URL https://www.thenakedscientists.com/podcasts/naked-scientists-podcast/qa-diets-duct-tape-dark-matter
 
Description Participation in ELIXIR marine metagenomics face-to-face meeting, representing metagenomics standards, Paris, Jan. 2020, Burgin 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Participation by Josie Burgin in an ELIXIR community workshop to discuss the issues around marine metagenomics data standards and flows relating to ENA, covering possible extension of the community into broader metagenomics.
Year(s) Of Engagement Activity 2020
 
Description Participation in the 21st Genomics Standards Consortium meeting, Vienna, May 2019 - Burgin and Amid 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Participation of Josie Burgin and Clara Amid in the 21st Genomics Standards Consortium meeting, including various sessions of relevance to metagenomics in ENA and MGNify, not least the MIxS compliance and interoperability working group
Year(s) Of Engagement Activity 2019
URL https://www.gensc.org/meetings/gsc21/
 
Description Participation in the CZ Biohub workshop: "Quantitative methods in microbiome population genetics" 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact PI Dr Robert Finn participated in the "CZ Biohub workshop: Quantitative methods in microbiome population genetics" organised at the CZ Biohub in California, USA. This workshop aimed to bring together leaders poised to drive solutions toward the algorithmic, informatic, and modeling challenges in characterizing microbiome genetic diversity and its role in human biology and health.
Year(s) Of Engagement Activity 2020
 
Description Peruvian Bioinformatics Symposium on Repeat Proteins talk titled "How metagenomics is changing the landscape of molecular biology - billions of proteins and millions of genomes" 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact This talk focused on the work of the MGnify platform and the metagenome assembled genome (MAG) catalogue. It was presented during the Peruvian Bioinformatics Symposium on Repeat Proteins held at the Pontificia Universidad Católica del Perú, Peru.
Year(s) Of Engagement Activity 2019
URL http://simposio.pucp.edu.pe/refract-latam/
 
Description Presentation at ELIXIR Innovation and SME Forum. 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact Presented a talk titled "Mining metagenomics for novel protein functions" at the ELIXIR Innovation and SME Forum in Frankfurt.
Year(s) Of Engagement Activity 2018
 
Description Presentation at Hellenic Bioinformatics Society conference in Greece 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Presented a seminar describing MAG research from the Finn team that will pave the way for production pipelines in MGnify. Talk title "Increasing our knowledge of the human gut microbiota".
Year(s) Of Engagement Activity 2018
URL https://hscbio.files.wordpress.com/2018/07/hbio11-flp051.pdf
 
Description Royal society talk 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Presented at a one-day conference held by the Royal Society bringing together experts from both agricultural and human microbiome sectors to discuss the barriers to technology translation and common principles across both sectors. Covered challenges in determining microbiota through to the dynamic forces determining colony structures. The meeting also explored intentional modulation of microbial populations to develop medical therapeutics and improvements in agricultural productivity, covering technical challenges in determining causation and methods to drive the technology forward. I presented a talk titled "The microbiome: human medicine and agriculture in a microbial world."
Year(s) Of Engagement Activity 2018
URL https://royalsociety.org/science-events-and-lectures/2018/10/ToF_Microbiome/
 
Description Seminar to Novo Nordisk titled "Will big data deliver new drug targets?" 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Professional Practitioners
Results and Impact This seminar on the potential impact of big data was presented along with a colleague to collaborators at the Novo Nordisk Research Centre in Oxford, UK.
Year(s) Of Engagement Activity 2019
 
Description Swiss Academy of Sciences talk "Mining the novelty from metagenomic sequencing" 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact In this talk to the Swiss Academy of Sciences, PI Dr Rob Finn presented the services and research offered by his resource MGnify and how they can be used by European researchers.
Year(s) Of Engagement Activity 2020
 
Description Swiss Institute of Bioinformatics SIB talk titled "Recovery of thousands of novel species from the human microbiome" 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact This seminar was presented at the Swiss Institute of Bioinformatics SIB, Switzerland, during a visit to discuss another project.
Year(s) Of Engagement Activity 2019
 
Description Symposium & Workshop: Microbiome 2018 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Participated in a Symposium and associated Workshop on the subject "Microbiome". The symposium included both lectures by invited experts in the fields and discussions in smaller groups to promote dynamic exchanges between students and experts of different disciplines and close contacts with the invited speakers. This was followed by a hands-on workshop demonstrating analysis of microbiome data. This workshop also exploited our e-training materials, by utilising a Docker machine image for running the training course. Outcomes are being using to further improve this offering, and shaping the e-training more broadly across EMBL-EBI
Year(s) Of Engagement Activity 2018
URL https://edu.sib.swiss/course/view.php?id=356
 
Description Targeted training to Watson team delivered in Roslin 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact In-person meeting with Mick Watson and team in Roslin, Edinburgh to present and discuss current methods for MAG submissions and potential changes to current submission model.
Year(s) Of Engagement Activity 2019
 
Description Training course titled "Metagenomics Bioinformatics 2019" 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Organised a four-day training course on Metagenomics Bioinformatics at EMBL-EBI. Delivered lectures and hands-on training sessions along with world leaders in Metagenomics data analysis to course participants.
Year(s) Of Engagement Activity 2019
URL https://www.ebi.ac.uk/training/events/2019/metagenomics-bioinformatics-2019
 
Description Training workshop titled "Metagenomics - MGnify" 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact A half day training session on metagenomics- MGnify was delivered as part of the wider 3 day EMBL-EBI Workshop: "Genomics, transcriptomics and metagenomics tools and resources for biology research" held at the University of Porto, Portugal. This workshop provided training on computational analysis tools and databases that can be used to answers relevant questions in molecular ecology. Participants learnt how to browse, search and access biological data across the fields of genomics, transcriptomics and metagenomics.
Year(s) Of Engagement Activity 2019
URL https://www.ebi.ac.uk/training/events/2019/embl-ebi-workshop-genomics-transcriptomics-and-metagenomi...
 
Description University of Warwick Seminar series talk "Broadening our genomic knowledge of human microbiomes" 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Professional Practitioners
Results and Impact Life Sciences seminar by Dr Rob Finn. He described his team's recently publication on the Unified Human Gastrointestinal Genome (UHGG) catalogue, which is an unprecedented collection of nearly 5,000 gut species found in the the gut microbiome, with 70% yet to be cultured. Dr Finn provided further details on the team's recent efforts to recover genomes from the human skin microbiome, which not only harbours a very distinct microbial composition compared to the gut, but also carries additional challenges such as low DNA yield. For both microbiomes, the team is currently investigating the microbiota beyond bacteria. An overview of these results were presented, assessing the challenges faced when researchers try to understand microbial community structures.
Year(s) Of Engagement Activity 2020
URL https://warwick.ac.uk/insite/events/events?calendarItem=8a17841b75f501d70175f549d9980163
 
Description Untapping diversity through Metagenomics: An Introductory Workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Participated in a remote workshop covering metagenomics analysis. In conjunction with the CABANA project, one team member was in location in Colombia, and one team member presented remotely from the EBI. The workshop was a combination of presentation and hands-on practical session.
Year(s) Of Engagement Activity 2018
URL https://www.ebi.ac.uk/training/events/2018/cabana-workshop-untapping-diversity-through-metagenomics-...
 
Description Webinar "ENA Overview, Submission and Retrieval for Metagenomics" 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Presentation and delivery of training on ENA services relating to metagenomics - covering both data submissions and retrieval; webinar delivery to UniAndes, Colombia.
Year(s) Of Engagement Activity 2018
 
Description Webinar "Submitting Metagenomic Data To ENA - course" 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Presentation of metagenomics data submission system and delivery of training course on its use; webinar.
Year(s) Of Engagement Activity 2018