Detecting signatures of natural selection in the human genome with geographically explicit models

Lead Research Organisation: Imperial College London
Department Name: School of Public Health


Modern sequencing techniques have provided us with very large genetic datasets, on a scale that was hard to imagine only a couple of years ago. As these datasets comprise human populations from the entire globe, it is tempting to look at the geographic distribution of genetic variants and try to find explanations for why some variants are more common in some places rather than others. After all, we have known for a long time that sickle cell anaemia is found in regions where malaria was prevalent, as it can confer resistance to the deadly disease. So, could we find other important genetic variants that have been affected by natural selection by examining their geographic distribution? While this approach sounds promising, it raises the issue of being able to distinguish between those patterns that truly reflect past and present selection, and patterns that might have simply arisen by chance. In this project, we propose to develop a population genetics framework that will allow us to reconstruct the spread of anatomically modern humans around the globe, taking into account past changes in climate and the shape of continents. By knowing how and when people got to different parts of the world, we will then be able to distinguish which genetic variants have geographic distributions too extreme to be the result of mere chance, and thus have been the target of natural selection. Besides looking for regions under selection in the nuclear genome, we will also consider the small amount of genetic material contained in the mitochondria, small organelles that act as the biochemical powerhouses in our cells. Mitochondrial DNA is arguably the most widely used source of information for reconstructing human past history, but such reconstructions rely on the assumption that mitochondrial DNA has not been affected by natural selection. Our new framework, together with a better geographic coverage of mitochondrial genetic variability that will be achieved in this project, will allow us to test the assumption of neutrality and to find any deviation that should be taken into account in future work on human settlement history.

Technical Summary

We propose to exploit the recently available datasets on worldwide human genomic diversity to test for possible targets of natural selection in the genome. We will first develop a demographic, geographically explicit inference framework for the analysis of genetic data. Using this tool, we will reconstruct the expansion out of Africa by anatomically modern humans, taking into account climatic changes over the last 100k years. We will then run stochastic simulations within this well parameterised demography to characterise genomic regions likely to have been affected by natural selection. The analyses will be run on the 650k SNPs already typed for the HGDP-CEPH panel (~1,000 individuals from 51 populations) and subsequently on larger datasets, which will be sourced from ongoing dense re-sequencing projects. To get further insights into the underlying selective forces, plausible targets of natural selection will be tested for their spatial association with environmental variables such as climate and diseases. We will also expand our approach to investigate natural selection on human mitochondrial DNA (mtDNA). Our group has recently uncovered new strong evidence that worldwide mtDNA diversity has been partly shaped by climate. We will sequence complete mtDNA genomes for 1,400 individuals belonging to 76 populations (the HGDP-CEPH panel and 25 Amerindian and Siberian populations previously genotyped at a large number of neutral autosomal loci). We will then investigate whether the current geographic distribution of mitochondrial haplotypes is compatible with our understanding of past human migrations as inferred from nuclear markers. Our demographic, spatially explicit model will provide a formal framework to test whether the association between some haplotypes and temperature that we detected in our previous work can be explained by stochastic events, or whether selection has to be invoked.

Planned Impact

The research herein proposed comprises four different objectives, which are likely to appeal to different parts of the scientific community and the wider society. We intend to fill a major gap in the toolbox of population biologists with an eco-geographic inference framework. This should be of interest to human population biologists. However, so far the framework has encountered most enthusiasm from population biologists outside the human genetics community. Despite very limited publicity so far, we have been approached by numerous groups working on organisms as diverse as plant pathogens or marine mammals. We wish to encourage the use of the framework by making it freely available and producing extensive and user-friendly documentation. We also hope that the approach will be adopted by epidemiologists in the longer term. Our reconstruction of human settlement history should provide a richer more detailed picture of human evolution over the last 100,000 years. We expect the results to be of interest to our colleagues in human genetics as well as to anthropologists and archaeologists. This is also a topic of interest to the general public. In addition to peer reviewed publications destined to the academic community, we wish to engage with a wider audience. To this effect, we are planning to produce a series of interactive flash applets capturing the main results. These will be made available through our websites but will also be used in talks and exhibitions. The new analyzes on selection in the human genome should again appeal to scientists and non-scientists alike. This part of the project is really a leap into the unknown and it is thus difficult to make plans on how to publicize the results. Our methodology combined with the extraordinary increase in human genomic data should provide us with unprecedented power, making it likely that we will identify previously unsuspected genes of interest. The appeal of the results, in particular to the general public, will largely depend on the new genes we will identify. Irrespective of the results, we expect that the wider community of geneticists will be interested due to the novelty of the approach and the high statistical power of the analysis. Selection in the mitochondrial genome is a completely different situation from the genome-wide data mining as we will test a very specific hypothesis. We have previously shown that mitochondrial diversity correlates with minimum temperature and have identified two plausible SNPs that make sense from a functional perspective. The manuscript was reviewed by Nature, Science and PLoS; the reviewers rejected it eventually on all three instances mainly because they felt that the results had such far reaching consequences that not the slightest doubt could be allowed to exist. Indeed, probably over 80% of the literature based on human settlement history relies inferences from mtDNA and a correlation with climate would require revisiting it entirely. While we ran considerable controls, we were unable to perform the final control analysis as this requires matched samples for mtDNA and neutral genomic markers we did not have. Our proposed research will remedy this problem and clarify whether the previous results stemmed from a sampling artifact, an unknown complex demographic mechanism or will confirm our original results. In the latter case, this would arguably constitute one of the most important results in human population genetics and would lead to several paradigm shifts, such as reconsidering the pervasive notion of an 'out of Africa bottleneck'. We have no doubt that such a result would significantly impact large parts of the scientific community and generate considerable media attention.

Related Projects

Project Reference Relationship Related To Start End Award Value
BB/H008802/1 01/07/2010 01/10/2011 £555,557
BB/H008802/2 Transfer BB/H008802/1 12/10/2011 11/01/2014 £400,377
Description The primary objective of the grant was to develop the most sophisticated spatiotemporal framework ever to simulate the colonisation of the world by our ancestors to generate simulated genomes under a range of of hypotheses and parameter values. By comparing the fit of these simulated genomes to actual genomic data from a large number of populations form all over the world, we could infer the most likely scenarios for the colonisation of the world by our ancestors.

This framework allowed us to show that climate change in particular during the Pleistocene was the main driver for the initial expansion of our ancestors from a cradle situated in Subsaharan Africa some 60-70,000 years ago.

Combined with sequence data generated form ancient remains, we reconstructed the colonisation of Australia and the Americas. We could show in particular that modern Native Americans are the direct descendants of the people who were the first to enter the Americas some 15-16,000 years ago.

On a more technical note, we demonstrated that very complex forward computer simulations coupled to Approximate Bayesian Statistics (ABC) could be used to reach inference on far more complex demographic scenarios that was assumed before.
Exploitation Route Our findings have been adopted by most our colleagues and have informed and coloured the majority of recent high-profile publications in the field of evolutionary history of anatomically modern humans.
Sectors Education,Environment,Healthcare,Culture, Heritage, Museums and Collections

Description The models of the colonisation of the world by our ancestors pushed the boundaries of - and likely reached the desirable limits of the complexity of - scenarios what can be achieved in statistical/population genetics. Some elements we re re-used in the studies of human genetics in particular for answering the colonisation of Australia and the Americas, but also inspired some of the more complex models in the spread of infectious diseases. Our findings that both Native Australians and Natives Americans were likely the first inhabitants of the continent, were well received by Aboriginal communities and had an important political impact in particular in the USA. Our results showed that climate change in the Pleistocene was the driving force in the spread of humans across the globe, which is interesting from a cultural perspective, in particular in the context of predicted climate changes in the future of similar, if not higher amplitude
First Year Of Impact 2013
Sector Environment,Healthcare,Culture, Heritage, Museums and Collections
Impact Types Cultural,Societal

Title 1,500 human mitochondrial genomes 
Description 1,500 high quality mitochondrial genomes from individuals in the HGDP-CEPH panel, and 500 American Natives 
Type Of Material Database/Collection of data 
Year Produced 2014 
Provided To Others? Yes  
Impact The data has been submitted on Genbank very recently and only a small fraction has already been published