Assessment of worldwide human genomic diversity

Lead Research Organisation: Imperial College London
Department Name: Infectious Disease Epidemiology

Abstract

The completion of the human genome was arguably one of the greatest scientific achievements of all time.
However, the availability of the human genome does not tell us much about why certain individuals or populations are more at risk to certain diseases. What is important is the variation between individuals and populations at specific regions of the genome. Genetic variation between human populations has been previously explored by the HapMap project. This new project is a direct extension of this effort, which was significantly limited in its coverage of human genetic diversity since it intentionally focused on a very small
number of populations.

Here we will examine one million sites known to be variable in the human genome in over 1,000 individuals
worldwide. In addition to the variation in sequence structure, we will include structural variation of the genome (essentially genomic regions that are present in a variable number of copies in different individuals), as such variation has recently been shown to be important in disease.

This project will deliver a step change in our knowledge of variation in human genetic diversity and provide an extraordinary public resource for the entire scientific community. This resource will be extremely useful to all researchers in human genetics as it will be possible in a few clicks of a mouse to obtain the geographic distribution for any medically relevant gene. The database will also help researchers to find new genes involved in disease susceptibility and progression.

All information will be made immediately available to anyone through the Ensembl web site. The site will also allow researchers to access information on the function of the genes and their known involvement in disease susceptibility and progression.

Technical Summary

The aim of the project is to genotype over one million single nucleotide polymorphisms (SNPs) on the HGDP-CEPH panel, a resource that offers the best possible coverage of worldwide human genetic diversity available today. At the end of this project full genotypes at 1 million SNPs will be validated, catalogued and fully annotated for over 1,200 individuals (including HapMap samples). Copy number variants (CNVs) and haplotypes will be determined for over 50 human populations. The data will be put into the public domain as soon as possible, both through the Ensembl website and through standard archive resources such as dbSNP for all of the discovered genotypes and for the raw data associated with the arrays. The new Ensembl browser will comprise for the first time data on structural genomic variation (CNVs) in addition to SNP polymorphisms. We will further include population genetics summary statistics. This project will deliver a
step change in our knowledge of variation in human genetic diversity and provide an extraordinary public
resource for the entire scientific community. It is a direct extension to the HapMap project, which was
significantly limited in its coverage of human genetic diversity since it intentionally focused on a very small number of populations. Such an extended resource will be most useful to scientists working in basic and applied human genetics research. The completion of the international HapMap and emergence of affordable high throughput genotyping technology makes this unprecedented goal achievable within three years.

Publications

10 25 50