MicrobesNG: A scalable replicable biological sample repository incorporating whole-genome sequence data and analysis of thousands of microbial strains

Lead Research Organisation: University of Birmingham
Department Name: Immunity and Infection

Abstract

One of the greatest advances of modern science is the ability to determine the DNA sequences of living organisms. Access to complete DNA sequences has provided researchers, environmentalists, clinicians, law enforcement and industrialists with a myriad of benefits. The information permits a deeper understanding of the diversity of life in the world. Comparing DNA sequences allows researchers to determine relationships between the three kingdoms of life and to infer evolutionary lineages. Understanding the genetic content of animals and plants permits the selection and development of stronger, more disease-resistant plants and animals increasing outputs and reducing waste. Understanding the human and animal genomes has had an enormous impact on the ability of clinicians to determine genetic susceptibility to a variety of diseases; one only has to witness the lives saved by identifying those most at risk of breast cancer and offering early treatment. The ability to harness genome information has allowed industry to develop new therapies for disease and produce new methods for a variety of manufacturing processes e.g. .enzymes are being used to bleach paper pulp. To the lay person, perhaps the most familiar benefit to genomic data is the use of DNA to identify perpetrators of crime.

One of the subjects to benefit most from genome sequencing is microbiology. The ability to rapidly obtain genome sequences has allowed researchers to find new bacterial and viral genes which allow pathogens to cause disease. It allows epidemiologists to identify strains in outbreaks of infection and to trace those outbreaks back to the point of origin so further infections can be prevented. Genome sequencing has elucidated mechanisms by which bacteria become resistant to antibiotics allowing clinicians to use a more informed decision making process when prescribing antibiotics. Microbial genomics helps the pharmaceutical industry gain a better understanding of the vulnerabilities of bacteria and viruses, information which is used to identify and develop new drugs. Recently, genomic studies have linked changes in certain bacterial populations growing in and on the human body with non-infectious disease such as obesity and cancer.

The tremendous success of genome sequencing has provided us with a problem. Rapid technological advances in DNA sequencing have resulted in the ability of researchers to generate sequence data quicker than it can be analysed and interpreted with current methodologies. This is for a number of different reasons including the development of ad-hoc methods to prepare, analyse and store the data and the lack of standard operating procedures across the discipline. Importantly, the ability to rapidly sequence bacterial genomes in particular, means thousands of strains are being sequenced yet there is no common method for accessing and studying these strains and garnering the benefits of this significant investment of Public money.

This proposal seeks to redress some of the problems that have become apparent in the community. We will establish a resource that becomes a paradigm for sequencing bacterial strains, for storing and analysing the genomic data, and for archiving sequenced strains so they can be interrogated later. We will achieve this by (1) establishing best practices for sequencing protocols that deliver maximum output, (2) providing a framework to efficiently manage the storage of collections of bacterial strains, (3) developing software that allows whole-genome sequencing data and meta-data (information on the source, and characteristics of the strain) to be correlated with strains in the store and (4) delivering novel analysis tools to permit the comparison of hundreds or thousands of microbial strains simultaneously, thereby easing the analysis bottleneck. We will make these tools and protocols available to the wider community so that they can be adopted by genome sequencing centres in other disciplines.

Technical Summary

MicrobesNG is a novel community resource for the integration of whole-bacterial genome sequence data, genome scale experimental data, strains and strain information. Strains provided by users will be deposited into a resilient strain repository and made available to the community, associated with their whole-genome information and available phenotypic and strain information. A rapid whole-genome sequencing service is available to BBSRC-funded users at cost, or users may analyse their own data directly using the service. The service is provided in the form of a user-friendly web interface, with several novel elements. STRAINDB is used to discover strains, e.g. by phenotype or source, and to place orders, and to access whole-genome information. BEYONDWGS is a next-generation web-based viewer for user data which integrates other whole-genome data sets, annotations, discovered variants, and published experimental data including RNA-Seq, ChIP-SEQ and TraDIS information. COMPARATIVEASSEMBLER packages the current best practices for QC, de novo assembly and variant calling, adapted for the available sequencing platforms including Illumina, Ion Torrent and Pacific Biosciences into a user-friendly package for user-supplied data, with an emphasis on revealing uncertainties in these processes to users. SPECIESBAM is a regularly updated database of sequence data aligned to species or genus pan-genomes, to permit rapid interrogate of core and pan-genomes for epidemiological and phylogenetic analysis. FASTDEPOSIT aims to simplify the process of submitting WGS data to public archives from the EBI. The resource emphasises REPRODUCIBLE RESEARCH with Github, RESILIENCE using mirrored services and the novel methods of deployment including use of CLOUD-BASED GENOME services. Interfaces will be designed to be REAL-TIME and PROGRESSIVELY UPDATED. Taken together, MicrobesNG will produce novel bioinformatics research for the benefit of microbial researchers in the UK.

Planned Impact

This project will benefit a wide variety of individuals including:
-Birmingham microbiologists
-UK and International microbiologists and sequencing facilities
-Industry
-Healthcare professionals
Whole-genome sequencing (WGS) represents one of the greatest advances in science and has driven enormous benefits to society. Outputs from this technology are of significant benefit to researchers, clinicians, law enforcement and industry. Progress has driven developments in therapies and diagnostics. However, the tremendous success in the creation of sequencing platforms has presented users with significant challenges. No standard operating procedures have developed alongside the technologies; as such maximal performance from the sequencing platforms has not been delivered. Further, data flow from the sequencing outputs has outstripped our ability to process and interpret data. This proposal seeks to remedy these limitations by forming a sequencing resource with an emphasis of microbial genomics. The resource will 1) establish best practice for sequencing protocols delivering maximum output, (2) provide a framework to manage the storage of bacterial strains, (3) develop software to integrate WGS data and strain meta-data to be correlated with strains in the store and (4) delivering novel tools for comparative genomics of hundreds or thousands of microbial strains simultaneously.

The University of Birmingham has the largest grouping of bacteriologists in UK academia with 29 Principal Investigators. Birmingham has a wide range of PIs who access WGS technologies, representing a diverse range of projects such as tracking antibiotic resistant organisms, understanding drug targets, understanding the molecular basis for metabolic pathways, understanding basic mechanisms of transcription and understanding the virulence of human and animal pathogens including in vivo evolution. This proposal will provide a streamlined platform for data acquisition and subsequent downstream analyses allowing these researchers to gain maximal benefit from RCUK investments. It will enhance rapid acquisition of new knowledge leading to scientific advances. It will increase our competitiveness. It will deliver significant training benefits for researchers allowing them learn best practice in WGS.

We will develop best practice and analyses platforms for the scientific community. This will ensure worldwide academic advancement to address issues of importance globally. An example of our aspirations is Lomans recent Nature Biotechnology article comparing the performance of the current sequencing platforms and data analysis software. Importantly, we have in Birmingham Josh Quick who formerly worked for Illumina and was involved in developing the MiSeq platform. Understanding of the MiSeq platform has allowed Quick to develop methods to increase data output from the MiSeq platform obtaining read lengths 50% longer than that currently available to the community - we will share this methodology with the community. Therefore, this project will contribute to the development and utilisation of new and innovative methodologies, equipment, techniques, technologies, and cross-disciplinary approaches.

This resource will enhance UK plc by fostering relationships with industry and permitting industry to benefit from increases in WGS performance. Importantly, Henderson and Cole have recently set-up a company (Bugs4drugs) which is reliant on WGS to identify genes which permit significant improvements in protein production. This work has resulted in partnerships with Novartis, Pfizer and GSK and arises out of BBSRC investments. Henderson, Lund and Piddock have established relationships with Discuva, a drug discovery company, to use WGS to identify bacterial genes that can be targeted for antibiotic therapies.

Publications

10 25 50
 
Description We have set-up a national facility to provide whole genome sequencing of bacteria to researchers within the UK and internationally.
Exploitation Route we provide a service which cna be used by microbiologists across the world
Sectors Agriculture, Food and Drink,Healthcare,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology

URL https://microbesng.uk/
 
Description Work in this project has been an order o magnitude greater than anticipated. We planned to sequence 10,000 genome sover 5 years. Currently we are delivering 17,000 per year to a world wide client base
First Year Of Impact 2014
Sector Creative Economy,Healthcare,Pharmaceuticals and Medical Biotechnology
Impact Types Societal,Economic