Multiple merger coalescent models in population genetics.

Lead Research Organisation: University of Oxford
Department Name: Statistics

Abstract

The proposed research lies at the rich interface between mathematics and population genetics. The main purpose of theoretical population genetics is to understand the ways in which the forces of mutation, natural selection, random genetic drift and population structure interact to produce and maintain the complex patterns of genetic variation observed within species. The first step is to distill our understanding of how these forces operate into a workable mathematical model whose predictions can then be compared with data. One of the outstanding successes of this approach is Kingman's coalescent which provides a simple and elegant description of the genealogical trees relating individuals in a sample from a large `panmictic' population. Variations of Kingman's coalescent allow for the introduction of more realistic assumptions about the population such as spatial or genetic structure, selection and recombination. In the resulting `ancestral influence graph', lineages then branch, migrate and coalesce. However, comparison with data shows that these models are still inadequate. The first key observation is that genetic diversity is orders of magnitude lower than predicted by census population size and the `standard' genetic drift captured by Kingman's coalescent. The explanation is that whereas Kingman's coalescent assumes that the total number of offspring produced by a single individual is very small relative to the total population size, in reality offspring distributions can be very skewed. This can be driven by many things, for example large scale extinction-recolonization events or repeated appearances of highly beneficial mutations that rapidly sweep through the population. As a result, when one examines the genealogical trees relating individuals in a sample from the population, they are best approximated by models in which multiple (by which we mean at least three) ancestral lineages can coalesce in a single event. This contrasts with Kingman's coalescent in which only pairwise coalescences are allowed. In recent work of Eldon and Wakeley in the journal Genetics, it is proposed that the reproductive biology of certain marine organisms (including Atlantic cod and Pacific oyster) dictates that we should use such multiple merger coalescents even before we consider demography and natural selection. These organisms are characterized by broadcast spawning, external fertilization, extremely high fecundity and high initial mortality. Similar considerations apply, for example, to some plant populations (which distribute pollen) and some insect populations (where individuals of one gender far outnumber those of the other). Eldon and Wakeley also point out that their mode of reproduction can also account for the excess of single variants observed in sequence data for these organisms, a feature more usually attributed to other causes such as natural selection. There are, then, at least three different mechanisms through which we are led to multiple merger coalescent models. But so far there has been surprisingly little analysis of what the most appropriate models should be. Within the vast collection of so-called lambda and xi-coalescents, are there natural subclasses most suitable for modelling biological populations? And how can we distinguish between them? Almost certainly it will not suffice to look at just a single genetic locus, but rather we must understand correlations across loci. The starting point of this project is, through careful consideration of the biological mechanisms driving the population, to identify suitable classes of coalescent model. We must then understand the (multiple) ancestral selection and ancestral recombination graphs consistent with a given coalescent. The overarching aim is, through a mixture of analysis and simulation, to find ways to disentangle from genetic data the signals of the various demographic and genetic forces that have shaped the population.

Publications

10 25 50
 
Description Multiple merger coalescent models were introduced to model the way in which genes in individuals sampled from a population of high fecundity organisms are related to one another. Such organisms include, for example, many species living in the inter-tidal zone. In contrast to the now classical Kingman coalescent, which underpins so much of modern statistical and population genetics, the theory of multiple merger coalescents, and especially the part required to perform statistical inference, is still in the early stages of development. Developing such theory lay at the heart of this proposal. We considered both panmictic populations (that is ones that have no spatial structure for example, so one can think of them as living in a huge melting pot) and spatially structured populations (through a relatively new model called the spatial Lambda Fleming Viot process). Not only did we develop new approaches to simulation, but in collaboration with German colleagues Birkner and Blath, the postdoc employed on this project presented (in the non-spatial setting) what one can call the ancestral recombination graph for a sample from a high-fecundity diploid population. In such populations, one must take account of recombination: individuals do not inherit a whole chromosome from each parent, but rather a mosaic of the two chromosomes that each parent carries. The corresponding ancestral recombination graph, which simultaneously traces the ancestry of all the different locations on a given chromosome, is a much more complicated object than in the classical Kingman case and required very careful analysis. For populations in a spatial continuum, we also considered the effect of recombination. In contrast to the nonspatial case, when stretches of chromosome come from different parents, the ancestors of those stretches can move apart, thereby reducing the chance that they choose a common parent in a given generation. In a two-dimensional continuum, one can check that either those ancestral lineages will come back together very quickly, or they will stay apart for a long time. In various works with French, Austrian and British collaborators, we performed preliminary analyses of how this might be exploited to infer parameters of the model such as migration rates and local population size. This work has since been taken forward by others and shows considerable promise as a tool for inference in spatially distributed populations. Since the grant ended, joint work with colleagues connected to the 100,000 genomes project has resulted in a new way of coding the information contained in the classical coalescent with recombination which reduces computational needs by many orders of magnitude and colleagues are now extending this to the multiple merger coalescents.
Exploitation Route It is to be hoped that our findings are the first step towards some useful inference tools which could be used to analyse data from high fecundity organisms for which very basic questions are not well understood. An organism of particular interest to us is the Atlantic Cod, for which some data is available and for which one would like to be able to disentangle the effects of high fecundity from the effects of selection. This could be important in assessing, for example, the impact of modern fishing techniques.
Sectors Environment

 
Description Population genomics of highly fecund codfish
Amount 149,900,000 ISK (ISK)
Funding ID 185151-051 
Organisation Icelandic Research Fund 
Sector Public
Country Iceland
Start 09/2018 
End 08/2020