📣 Help Shape the Future of UKRI's Gateway to Research (GtR)

We're improving UKRI's Gateway to Research and are seeking your input! If you would be interested in being interviewed about the improvements we're making and to have your say about how we can make GtR more user-friendly, impactful, and effective for the Research and Innovation community, please email gateway@ukri.org.

Efficient simulation and inference under approximate models of ancestry

Lead Research Organisation: University of Edinburgh
Department Name: Sch of Biological Sciences

Abstract

While large whole genome data sets are now being generated routinely for many taxa and populations, analyses of these data remain superficial and largely descriptive. In order to make sense of the genetic variation present in samples of genomes, we need to relate it mathematically to the evolutionary processes that generated it. This requires mathematical models of genetic ancestry that are tractable, yet realistic, and general enough to capture all fundamental evolutionary forces. At a minimum, a null model of genomes sampled from a population should capture the randomness of meiotic recombination and the fact that most mutations are either neutral or deleterious, and so are likely to be removed from the population as a result of genetic drift and (background) selection. Although the ancestry for a sample of recombining genomes can be described mathematically as a graph, this full backward-in-time description does not scale to large populations and currently does not include background selection. This means that it is currently impossible to efficiently simulate genomic variation even under the simplest biologically plausible null model. Statistical inference from genomic data is even more limited and state of the art statistical approaches for inferring past selection or demography from genomic data are based on crude (and extremely lossy) summaries of genome-wide variation.

This cross-disciplinary project brings together experts in computer science and mathematical biology and builds on recent breakthroughs to develop efficient approximate algorithms that accurately capture the effect of recombination and background selection on genome-wide ancestry and sequence variation. These algorithms will be implemented both as part of a standard simulation software and tools that calculate the fit of sequence data to models of past demography and selection. Such tools are fundamental for interpreting the vast volumes of genome sequence data that are now being generated across the tree of life. While the algorithms and tools to be developed are general, this project will immediately improve our ability to scan genomic data for signals of past positive selection whilst accounting for the randomness of ancestry.

Publications

10 25 50
 
Description Inferring species barriers in genomes: bridging gaps between theory and data 
Organisation University of Lille
Country France 
Sector Academic/University 
PI Contribution As part of this networking grant, the Lohse Lab hosted researchers working on barriers to gene flow from three labs Lille (Fraisse) and Vienna (Sachdeva & Barton) for a 2 day workshop in September 2024.
Collaborator Contribution Our partner labs in Lille and Vienna will host postdocs and PhD student for research/training visits.
Impact The BridgeBarriers project will improve our ability to infer the genetic architecture of reproductive isolation by developing inference schemes based on theoretical advancements, and testing their power against realistic simulations as well as empirical data. We aim to identify species barriers at two spatio-temporal scales, that is from admixed genomes in hybrid zones and sequence polymorphisms in peripheral populations. This will allow us to compare the genetic architecture of reproductive isolation at these two scales, and thus better understand the overlap between loci that reduce the fitness of early generation hybrids and loci whose introgression in the other species background is impeded in the long term.
Start Year 2024