EBI Metagenomics Portal - Towards a better understanding of community metabolism
Lead Research Organisation:
Newcastle University
Department Name: Sch of Engineering
Abstract
Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.
Technical Summary
EBI-MP is a global portal for the metagenomics research community. Offering data submission, archiving and sharing functions, community standards-compliant curation, and functional and taxonomic diversity analyses, the service has attracted a growing user-base of UK, European and global researchers.
We intend to improve the pipeline infrastructure to offer analysis provenance by modularising pipeline components, defining a dependency tree between modules, and module versioning. Subsequently, we will perform updates to reference databases and analysis software, and make results reanalysis with the updated pipeline actionable for our users. We will improve the range of taxonomic annotations provided by the resource, moving beyond 16S rRNA-based analyses. We will also investigate the application of the UniPept approach to taxonomic classification for metagenomic datasets. We will add pathway information to the functional annotation provided by EBI-MP, using the latest version of InterProScan to provide KEGG, MetaCyc and UniPathway links, and develop a tool to visualise the catalytic potential of a sample, highlighting reactions where there is support for the existence of constituent proteins.
We will implement CRAM compressed sequence data formats within the system to increase the speed of upload of data to EBI-MP and to facilitate internal processing and storage. We will also design and build data discovery tools that provide a full range of search functions across the sample, contextual and analysis data, and provide these tools as web services and via the website. Finally, we will develop mathematically sound methods to estimate depth of sequencing required to capture a specific fraction of diversity, and to normalise samples so that they can be compared in statistically meaningful ways. These analyses will be provided from the website, along with visualisation tools capable of producing heatmaps and PCA plots for sample comparison.
We intend to improve the pipeline infrastructure to offer analysis provenance by modularising pipeline components, defining a dependency tree between modules, and module versioning. Subsequently, we will perform updates to reference databases and analysis software, and make results reanalysis with the updated pipeline actionable for our users. We will improve the range of taxonomic annotations provided by the resource, moving beyond 16S rRNA-based analyses. We will also investigate the application of the UniPept approach to taxonomic classification for metagenomic datasets. We will add pathway information to the functional annotation provided by EBI-MP, using the latest version of InterProScan to provide KEGG, MetaCyc and UniPathway links, and develop a tool to visualise the catalytic potential of a sample, highlighting reactions where there is support for the existence of constituent proteins.
We will implement CRAM compressed sequence data formats within the system to increase the speed of upload of data to EBI-MP and to facilitate internal processing and storage. We will also design and build data discovery tools that provide a full range of search functions across the sample, contextual and analysis data, and provide these tools as web services and via the website. Finally, we will develop mathematically sound methods to estimate depth of sequencing required to capture a specific fraction of diversity, and to normalise samples so that they can be compared in statistically meaningful ways. These analyses will be provided from the website, along with visualisation tools capable of producing heatmaps and PCA plots for sample comparison.
Planned Impact
The use of metagenomics is widespread, with its application in diverse fields such as agriculture, food manufacture, the elucidation of antibiotic resistance mechanisms, bioenergy production, and animal/human health. The EBI Metagenomics Portal (EBI-MP) covers data submission, archiving and sharing functions, community standards-compliant curation, and rich functional and taxonomic diversity analyses. Launched in 2011, the resource has become a world leader in metagenomics data analysis, attracting a growing userbase across the UK, European and global communities. The impact on academic research is already in effect, with the EBI-MP providing both a robust analysis platform and access to a large compute resource. Both of these features are often lacking within academia. Thus, the EBI-MP is making metagenomics analysis available to more researchers, and relieves a significant bottleneck between obtaining sequence data and results.
One vital impact of the project will be continued support for archiving and analysis of metagenomic data in the face of ever increasing data volumes. The proposed work provides a number of mechanisms, including CRAM-based sequence compression and a tightly controlled way of updating analysis algorithms, by which the pipeline can be made more efficient, with higher throughput and the ability to scale. Improved sample analyses, through updated reference databases and extended taxonomic and functional analyses, are also critical, since they will increase the usefulness of EBI-MP to researchers and better meet the community's needs. These benefits will be felt in the short term, and will also persist into the longer term, as updates and improvements are made throughout the course of the project.
In the medium term, these developments should allow the EBI-MP to grow with increasing demand, without significantly increasing the computational overhead. This will be achieved by the incorporation of more efficient algorithms, thereby increasing throughput. Updating the reference database will facilitate a more in-depth functional and taxonomic analysis, as more diverse organisms are represented in them. The infrastructural changes to the pipeline will also allow other tools to be more easily incorporated into the analysis platform, not only providing scientific exposure to the tool developer, but also enriching the analysis results. Our objective of improving data discoverability, by linking from other databases to the EBI-MP, will allow metagenomics results reach a broad life science community, whom may be unaware of the data. It is important to note, that in this project we are also establishing new collaborations that cross scientific disciplines (EMBL-EBI, Newcastle University and OeRC). This should expose our own staff to novel approaches and scientific challenges. Nevertheless, from these collaborations, we aim to produce statistical protocols to provide additional confidence and information about the data. Cross-sample analyses will inevitably provide researchers with a significantly deeper understanding of complex communities.
In the medium to longer, the knowledge gained from understanding complex communities will have significant impacts for the UK. The impacts could economical form more efficient industrial enzymes, to improved soil conditions providing greater crop yields, to healthcare solutions by comparing diseased and health states. One of the key areas will be the translation of metagenomics to industry. Through out industrial connections both at EMBL-EBI and Newcastle University, we will engage with this sector, to establish their requirements. To ensure our users are able to utilise the new features we will provide online training material, publish in scientific and non-scientific literature, attend meetings and conferences aimed at a range of audiences and run training workshops, to maximize dissemination into the academic, industrial and third-party communities.
One vital impact of the project will be continued support for archiving and analysis of metagenomic data in the face of ever increasing data volumes. The proposed work provides a number of mechanisms, including CRAM-based sequence compression and a tightly controlled way of updating analysis algorithms, by which the pipeline can be made more efficient, with higher throughput and the ability to scale. Improved sample analyses, through updated reference databases and extended taxonomic and functional analyses, are also critical, since they will increase the usefulness of EBI-MP to researchers and better meet the community's needs. These benefits will be felt in the short term, and will also persist into the longer term, as updates and improvements are made throughout the course of the project.
In the medium term, these developments should allow the EBI-MP to grow with increasing demand, without significantly increasing the computational overhead. This will be achieved by the incorporation of more efficient algorithms, thereby increasing throughput. Updating the reference database will facilitate a more in-depth functional and taxonomic analysis, as more diverse organisms are represented in them. The infrastructural changes to the pipeline will also allow other tools to be more easily incorporated into the analysis platform, not only providing scientific exposure to the tool developer, but also enriching the analysis results. Our objective of improving data discoverability, by linking from other databases to the EBI-MP, will allow metagenomics results reach a broad life science community, whom may be unaware of the data. It is important to note, that in this project we are also establishing new collaborations that cross scientific disciplines (EMBL-EBI, Newcastle University and OeRC). This should expose our own staff to novel approaches and scientific challenges. Nevertheless, from these collaborations, we aim to produce statistical protocols to provide additional confidence and information about the data. Cross-sample analyses will inevitably provide researchers with a significantly deeper understanding of complex communities.
In the medium to longer, the knowledge gained from understanding complex communities will have significant impacts for the UK. The impacts could economical form more efficient industrial enzymes, to improved soil conditions providing greater crop yields, to healthcare solutions by comparing diseased and health states. One of the key areas will be the translation of metagenomics to industry. Through out industrial connections both at EMBL-EBI and Newcastle University, we will engage with this sector, to establish their requirements. To ensure our users are able to utilise the new features we will provide online training material, publish in scientific and non-scientific literature, attend meetings and conferences aimed at a range of audiences and run training workshops, to maximize dissemination into the academic, industrial and third-party communities.
Organisations
Publications
Mitchell AL
(2018)
EBI Metagenomics in 2017: enriching the analysis of microbial communities, from sequence reads to assemblies.
in Nucleic acids research
Description | We discovered that the methods for estimating sampling effort were so inaccurate as to be almost useless. The exact reasons behind this poor result was not apparent despite a great deal of work by the researcher. |
Exploitation Route | We would still like to improve our ability to predict sample sizes in metagenomic studies. But it is clearly a much harder problem than previously realised. We would hope to find new collaborators who might be able to bring a fresh pair of eyes to the problem. |
Sectors | Agriculture Food and Drink Education Environment Pharmaceuticals and Medical Biotechnology |
Description | Frontier Engineering |
Amount | £5,577,007 (GBP) |
Funding ID | EP/K039083/1 |
Organisation | Engineering and Physical Sciences Research Council (EPSRC) |
Sector | Public |
Country | United Kingdom |
Start | 09/2013 |
End | 04/2019 |
Title | A new R package |
Description | A new R package for the analyis of results from the EBI database |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2017 |
Provided To Others? | Yes |
Impact | We have used this package in training in metagenomics |
Title | Statistical Tools in Support of the EBI online facility |
Description | We have developed simple tools to improve the analysis of metagenomic data submitted to the EBI portal |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2020 |
Provided To Others? | Yes |
Impact | Too early to say |
Title | R Script for the analysis of EBI data |
Description | This a free suite of R scripts developed by Prof Darren Wilkinson to facilitate the analysis of ribosomal RNA data in the EBI database |
Type Of Technology | Software |
Year Produced | 2018 |
Impact | This software can be downloaded from the Darren Wilkinson's Github page |
URL | https://github.com/darrenjw/ebi-metagenomics-stats |
Description | Scientific |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Industry/Business |
Results and Impact | A talk to researchers and practitioners interested in microbiome research |
Year(s) Of Engagement Activity | 2017 |