Exploiting next-generation sequencing data for measurement of biological phenomena

Lead Research Organisation: UNIVERSITY OF EXETER
Department Name: Biosciences

Abstract

Next-generation sequencing (NGS) technologies, such as Illumina's Solexa and Roche's 454 GS-FLX, offer orders of magnitude increases in throughput and decreases in per-nucleotide costs. Up to now these technologies have mostly been applied to qualitative studies such as gene discovery, complete-genome sequencing and genome re-sequencing. Recently they have begun to be applied to quantitative transcript profiling. Given their digital nature and great dynamic range, potentially these technologies could be used for measurement of a wide range of biological, environmental, and toxicological phenomena. For example, the abundance of important microorganisms (pathogens, biocontrol agents, bio-remediation agents) will be correlated with abundance of diagnostic sequence tags. Similarly, environmental load of toxins and other bio-active substances are expected to be correlated with changes in gene expression, heralding new fields of quantitative meta-transcriptomics and environmental transcriptomics. However, before we can leverage the potential of NGS technologies for such novel quantitative applications, there is a pressing need for proper testing and validation of the methods. Among the important questions are: [1] How much consistency is there between alternative methods (e.g. Illumina, 454, Quantitative PCR) [2] What is the degree of accuracy? That is, what is the degree of correlation between actual quantity and measurement? [3] Over what dynamic range is optimal accuracy maintained? [4] How robust are the technologies to increasingly complex mixtures of test material? [5] How robust are measurements with respect to the variations in DNA library preparation protocols? [6] What are the inherent biases of each method with respect to DNA sequence composition? [7] How reproducible are measurements made with NGS technologies? Other challenges include optimising methods for converting raw sequence reads into census counts. The student will have access to standard reference biological materials (well-characterised mixtures of two or more bacterial strains, for example) as well as Illumina (via Exeter) and 454 (via LGC) NGS platforms. The student will also have access to more established analytical methods such as quantitative PCR. The first step will be to devise a series of metrics of reproducibility and bias. As well as classical statistical approaches such as linear regression, the student will also make use of bioinformatics approaches, developed in the Studholme group, for tasks such as quality-filtering, mapping reads against reference sequences.

Publications

10 25 50