EBI Metagenomics Portal - Towards a better understanding of community metabolism

Lead Research Organisation: Newcastle University
Department Name: Sch of Engineering

Abstract

Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.

Technical Summary

EBI-MP is a global portal for the metagenomics research community. Offering data submission, archiving and sharing functions, community standards-compliant curation, and functional and taxonomic diversity analyses, the service has attracted a growing user-base of UK, European and global researchers.

We intend to improve the pipeline infrastructure to offer analysis provenance by modularising pipeline components, defining a dependency tree between modules, and module versioning. Subsequently, we will perform updates to reference databases and analysis software, and make results reanalysis with the updated pipeline actionable for our users. We will improve the range of taxonomic annotations provided by the resource, moving beyond 16S rRNA-based analyses. We will also investigate the application of the UniPept approach to taxonomic classification for metagenomic datasets. We will add pathway information to the functional annotation provided by EBI-MP, using the latest version of InterProScan to provide KEGG, MetaCyc and UniPathway links, and develop a tool to visualise the catalytic potential of a sample, highlighting reactions where there is support for the existence of constituent proteins.

We will implement CRAM compressed sequence data formats within the system to increase the speed of upload of data to EBI-MP and to facilitate internal processing and storage. We will also design and build data discovery tools that provide a full range of search functions across the sample, contextual and analysis data, and provide these tools as web services and via the website. Finally, we will develop mathematically sound methods to estimate depth of sequencing required to capture a specific fraction of diversity, and to normalise samples so that they can be compared in statistically meaningful ways. These analyses will be provided from the website, along with visualisation tools capable of producing heatmaps and PCA plots for sample comparison.

Planned Impact

The use of metagenomics is widespread, with its application in diverse fields such as agriculture, food manufacture, the elucidation of antibiotic resistance mechanisms, bioenergy production, and animal/human health. The EBI Metagenomics Portal (EBI-MP) covers data submission, archiving and sharing functions, community standards-compliant curation, and rich functional and taxonomic diversity analyses. Launched in 2011, the resource has become a world leader in metagenomics data analysis, attracting a growing userbase across the UK, European and global communities. The impact on academic research is already in effect, with the EBI-MP providing both a robust analysis platform and access to a large compute resource. Both of these features are often lacking within academia. Thus, the EBI-MP is making metagenomics analysis available to more researchers, and relieves a significant bottleneck between obtaining sequence data and results.

One vital impact of the project will be continued support for archiving and analysis of metagenomic data in the face of ever increasing data volumes. The proposed work provides a number of mechanisms, including CRAM-based sequence compression and a tightly controlled way of updating analysis algorithms, by which the pipeline can be made more efficient, with higher throughput and the ability to scale. Improved sample analyses, through updated reference databases and extended taxonomic and functional analyses, are also critical, since they will increase the usefulness of EBI-MP to researchers and better meet the community's needs. These benefits will be felt in the short term, and will also persist into the longer term, as updates and improvements are made throughout the course of the project.

In the medium term, these developments should allow the EBI-MP to grow with increasing demand, without significantly increasing the computational overhead. This will be achieved by the incorporation of more efficient algorithms, thereby increasing throughput. Updating the reference database will facilitate a more in-depth functional and taxonomic analysis, as more diverse organisms are represented in them. The infrastructural changes to the pipeline will also allow other tools to be more easily incorporated into the analysis platform, not only providing scientific exposure to the tool developer, but also enriching the analysis results. Our objective of improving data discoverability, by linking from other databases to the EBI-MP, will allow metagenomics results reach a broad life science community, whom may be unaware of the data. It is important to note, that in this project we are also establishing new collaborations that cross scientific disciplines (EMBL-EBI, Newcastle University and OeRC). This should expose our own staff to novel approaches and scientific challenges. Nevertheless, from these collaborations, we aim to produce statistical protocols to provide additional confidence and information about the data. Cross-sample analyses will inevitably provide researchers with a significantly deeper understanding of complex communities.

In the medium to longer, the knowledge gained from understanding complex communities will have significant impacts for the UK. The impacts could economical form more efficient industrial enzymes, to improved soil conditions providing greater crop yields, to healthcare solutions by comparing diseased and health states. One of the key areas will be the translation of metagenomics to industry. Through out industrial connections both at EMBL-EBI and Newcastle University, we will engage with this sector, to establish their requirements. To ensure our users are able to utilise the new features we will provide online training material, publish in scientific and non-scientific literature, attend meetings and conferences aimed at a range of audiences and run training workshops, to maximize dissemination into the academic, industrial and third-party communities.

Publications

10 25 50
 
Description We discovered that the methods for estimating sampling effort were so inaccurate as to be almost useless. The exact reasons behind this poor result was not apparent despite a great deal of work by the researcher.
Exploitation Route We would still like to improve our ability to predict sample sizes in metagenomic studies. But it is clearly a much harder problem than previously realised. We would hope to find new collaborators who might be able to bring a fresh pair of eyes to the problem.
Sectors Agriculture, Food and Drink,Education,Environment,Pharmaceuticals and Medical Biotechnology

 
Description Frontier Engineering
Amount £5,577,007 (GBP)
Funding ID EP/K039083/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 10/2013 
End 04/2019
 
Title A new R package 
Description A new R package for the analyis of results from the EBI database 
Type Of Material Improvements to research infrastructure 
Year Produced 2017 
Provided To Others? Yes  
Impact We have used this package in training in metagenomics 
 
Title Statistical Tools in Support of the EBI online facility 
Description We have developed simple tools to improve the analysis of metagenomic data submitted to the EBI portal 
Type Of Material Improvements to research infrastructure 
Year Produced 2020 
Provided To Others? Yes  
Impact Too early to say 
 
Title R Script for the analysis of EBI data 
Description This a free suite of R scripts developed by Prof Darren Wilkinson to facilitate the analysis of ribosomal RNA data in the EBI database 
Type Of Technology Software 
Year Produced 2018 
Impact This software can be downloaded from the Darren Wilkinson's Github page 
URL https://github.com/darrenjw/ebi-metagenomics-stats
 
Description Scientific 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact A talk to researchers and practitioners interested in microbiome research
Year(s) Of Engagement Activity 2017