Developing new statistical approaches for finely spaced markers arising in the context of genome wide association studies

Lead Research Organisation: Queen Mary University of London
Department Name: Wolfson Institute

Abstract

Recently large volumes of genetic data have become available. It is possible to use these data to look for genetic differences between people with a certain disease and people without the disease. The aim of doing this is to find the genetic location of the potential cause of the disease. One way of doing this is to consider small segments of DNA called haplotypes and to construct plausible histories of how each haplotype in a sample arose from a common ancestor through evolutionary events. Our work could lead to an improved methodology for disease mapping in order to inform potential therapies.

We also want to investigate genetic abnormalities which have an effect on human foetal development and the probability that a foetus will survive until it is born and we hope to identify previously unsuspected genes involved in this.

Humans can have different numbers of copies of the same section of DNA. We want to investigate this phenomenon and write software to find instances of this. We also want to write software to assess whether having specific numbers of copies of such sections influences whether or not people are likely to have a particular disease.

Technical Summary

Recently, progress in human genome research has led to the availability of large volumes of genomic data and high-density marker maps from initiatives such as the International HapMap project. By considering single nucleotide polymorphisms (SNPs) in people with a disease (cases) and people without (controls) we can look for associations between genetic variants in cases and the disease in the hope of identifying susceptibility loci. One may also consider the association of different haplotypes (sets of consecutive marker alleles on the same chromosome). One problem with haplotype analysis is that with a relatively small number of markers, large numbers of different haplotypes may be inferred and that statistical analyses which treat all haplotypes separately may incorporate large numbers of degrees of freedom with consequent loss of power. Various methods have been proposed to group haplotypes which are similar to each other and then to explore their relationship with disease status. Examples of such methods include clustering, the use of artificial neural networks and ancestral recombination graphs (ARGs). A recent method (Minichiello and Durbin 2006) described the use of ARGs to detect association with disease and it has been implemented in a program written in Java called Margarita. We propose several improvements to the algorithm used in this program in order to model recombination events more accurately.

We have recently identified regions in which groups of neighbouring markers exhibit deviation from HWE using genotypes obtained from the Affymetrix 500K marker set made available by the Wellcome Trust Case Control Consortium (WTCCC) (Vine and Curtis, 2008). The results strikingly demonstrated very marked deviations from HWE in particular genetic regions and these could not be explained as genotyping errors of individual markers. We plan to investigate this phenomenon further with a view to identifying previously unsuspected genes having an important effect on foetal viability.

Copy number variation is thought to play a significant role in the aetiology of common disease. The data associated with copy number variation studies are inherently different from those of SNP studies. We aim to develop more sophisticated methods to investigate disease associations with common copy number variants (CNVs). We would also hope to address the more difficult problem of looking for association with rare copy number variants and would develop methods for integrated association studies where data regarding SNP genotypes and CNVs can be considered jointly.

Publications

10 25 50