📣 Help Shape the Future of UKRI's Gateway to Research (GtR)

We're improving UKRI's Gateway to Research and are seeking your input! If you would be interested in being interviewed about the improvements we're making and to have your say about how we can make GtR more user-friendly, impactful, and effective for the Research and Innovation community, please email gateway@ukri.org.

Using genomes to dissect the speciation process - a comparative approach

Lead Research Organisation: University of Edinburgh
Department Name: Sch of Biological Sciences

Abstract

Although speciation is one of the most fundamental biological processes, we still know surprisingly little about it. For example, it is not know whether species splits are generally abrupt, which would be expected if speciation mostly occurs as a result of populations becoming separated in different places, or whether speciation generally involves a protracted period of hybridisation which eventually ceases. It is also unclear what type of selection matters most for making species reproductively isolated: species may become isolated from each other simply by adapting to different environments or, alternatively, as a result of selection on traits that only benefit one sex. For example, in many insects, males increase the number of offspring by producing very many or very large sperm, while females have evolved mechanisms to kill, store and select sperm. This sexual antagonism leads to a reciprocal armsrace and rapid evolution of reproductive traits which in itself may be strong enough to drive speciation.

Because the genome of an individual is made up of contributions from an enormous number of ancestors, even a small sample of genomes contains a lot of information about the population of ancestors and about when and how fast this ancestral population split into distinct species. Since speciation is a slow process, the only chance to understand how species typically arise in nature is by extracting this genomic information about past speciation events. For example, recent comparisons of individual human genomes have shown that our own genomes are a result of past hybridisation between modern humans and more archaic forms such as Neandertals.

The main aim of my project is to use genomic data to estimate speciation histories and find out what factors drive speciation in nature. Comparing speciation histories across many different insect species and between different parts of the genome, will allow me answer fundamental questions about how new species are born. This work comes in two parts. Firstly, I will develop new statistical methods to reconstruct past speciation events from genome data. To make such inferences realistic, many biological processes that effect patterns of diversity in the genome must be incorporated into a mathematical model: During reproduction, genetic material is combined from different parents and passed on to successive generations by chance. While the splitting of populations leads to separated gene pools, individuals from different populations may migrate and hybridise, causing genes to "flow" between diverging species. In particular, I will focus on reconstructing the duration and direction of such gene flow after divergence which gives a measure of how fast speciation has happened. Secondly, I will use these methods to ask how the process of speciation has played out in 40 species of wasps, flies, beetles and butterflies many of which are common in UK. I will sequence multiple individual genomes in 20 pairs of closely related species and compare speciation parameters between species pairs with more and less intensive sexual antagonism, as indicated by their mating behaviour. This will reveal whether sexual antagonism speeds up speciation. A second comparison will explore the link between speciation and ecological specialisation by testing whether species that specialise on a small number of hosts generally evolve from generalists or vice versa. Finally, I will compare the speed at which sex chromosomes and autosomes become distinct during speciation to test whether genes important in speciation accumulate more rapidly on sex chromosomes.

This work will build a statistical framework for us to use genome sequences as a window into the past and to understand the role of selection, geography and hybridisation in speciation - an important step towards solving Darwin's mystery of mysteries of how species come about.

Planned Impact

The proposed research project spans a range of disciplines: bioinformatics, statistics, genomics, entomology and evolutionary biology and seeks to develop new computational tools to analyse genome wide variation and address fundamental questions about speciation. As such my scientific research is pure rather than applied. However, there are a number of stakeholders for which the methods developed in this project are of immediate relevance:

Applied research
Research into crop breeding increasingly uses genomic data for marker assisted breeding and to identify natural populations and cultivars that contain genetic variation of potential interest for modern breeding programs. Many economically important horticultural crop species in the UK (apples, strawberries) are the result of past hybridisation events between several wild ancestral forms and their complex population histories need to be taken into account explicitly in such analyses. Likewise, many insect (e.g. the pear sucker) and fungal crop pests have evolved very recently from less harmful or abundant wild forms and there is great commercial interest in understanding when and how this divergence took place and what genetic changes were involved. In many cases some of the most devastating crop pests and pathogens are invasive species. Being able to reconstruct the histories of these species will help to identify key adaptive genes that have diverged rapidly or introgressed from other species. Such knowledge is crucial for identifying targets for chemical and biological control.

Conservation Biology
Conservation research is increasingly reliant on genetic data to determine conservation units and strategies and to optimising captive breeding programs (which in the case of the East Asian Sus species include a number of zoos in the UK). However, to date this is mainly restricted to small numbers of genetic markers such as microsatellites. With the sequencing costs falling, it is only a matter of time until conservation biologists will routinely base conservation strategies on genomic data. A major concern in conservation biology is to detect and minimise hybridisation between endangered wild populations and domestic forms and to understand the causes of population structure and size changes in the wild.

General Public
Finally, I see the main immediate societal Impact of my research in increasing the public understanding of science, in particular research into speciation. There is generally great public interest in this topic, as for example, demonstrated by the media coverage of the genomic analyses of our own history. However, speciation research (including my own projects in the past) is mainly focused on either very exotic systems that are unknown and inaccessible to the general public (e.g. African Lake Cichlids, Heliconius butterflies) and/or model species that are small and indistinguishable to non-experts (Drosophila fruit flies). As a result, speciation research is almost entirely disconnected from the activities of hobby naturalists (which have a long tradition in the UK) and may easily be perceived as arcane or irrelevant for understanding and conserving native wild life. Likewise, organisations devoted to the conservation in the UK, although very active in educating the public about the natural history of native species and the need to conserve them and their habitats, very rarely put this information into an evolutionary context. Many of the species I will investigate at a genomic level in this project are charismatic, big insects that occur in the UK and will be familiar to many hobby naturalists and gardeners (e.g. butterflies, dung beetles). This provides a unique opportunity to educate the general public about the evolutionary and geographic origin of our own fauna and raise the public profile of genomics and speciation research (see Pathways to Impact).
 
Description - new and computationally efficient ways to infer past population processes (e.g. changes in population size, gene flow and splits between populations) from genomic data.

- a new method for estimating recomabination rates from individual genomes.

- a new quantitative approach for identifying genomic outliers between species and populations.

- a new and general causal relationship between the number of chromosomes and the levels of genetic diversity in wild populations.

- a detailed understanding of the mode and timescale of speciation in European butterflies

- new mathematical predictions for the signal of past natural selection in genome-wide variation and a demonstration of how these can be used to characterise positive selection from sequence variation.
Exploitation Route Genomic regions may contain targets of past selection and using genomic data to scree for such targets is of great interest in a wide range of fields including: archaeology, conservation biology and the control of disease vectors and agricultural and horticultural pests.
Sectors Agriculture

Food and Drink

Education

Environment

 
Description Efficient simulation and inference under approximate models of ancestry
Amount £358,032 (GBP)
Funding ID EP/X024881/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 06/2023 
End 06/2026
 
Description ModelGenomLand, ERC starting grant
Amount £1,521,119 (GBP)
Funding ID 757648 
Organisation European Research Council (ERC) 
Sector Public
Country Belgium
Start 02/2018 
End 02/2023
 
Title ABLE 
Description We make use of the distribution of blockwise SFS (bSFS) patterns for the inference of arbitrary population histories from mutliple genome sequences. The latter can be whole genomes or of a fragmented nature such as RADSeq data. Our method notably allows for the simultaneous inference of demographic history and the genome-wide historical recombination rate. Additionally, we do not require phased genomes as the bSFS approach does not distinguish the sampled lineage in which a mutation occurred. As with the Site Frequency Spectrum (SFS), we can also ignore outgroups by folding the bSFS. Our Approximate Blockwise Likelihood Estimation (ABLE) approach implemented in C/C++ and taking advantage of parallel computing power is tailored for studying the population histories of model as well as non-model species 
Type Of Material Data analysis technique 
Year Produced 2016 
Provided To Others? Yes  
Impact This method is now being used by several research groups to analyse population genomic data 
URL https://github.com/champost/ABLE
 
Title Computing likelihoods under the coalescent 
Description The inference of demographic history from genome data is hindered by a lack of efficient computational approaches. In particular, it has proved difficult to exploit the information contained in the distribution of genealogies across the genome. We have previously shown that the generating function (GF) of genealogies can be used to analytically compute likelihoods of demographic models from configurations of mutations in short sequence blocks (Lohse et al. 2011). Although the GF has a simple, recursive form, the size of such likelihood calculations explodes quickly with the number of individuals and applications of this framework have so far been mainly limited to small samples (pairs and triplets) for which the GF can be written by hand. Here we investigate several strategies for exploiting the inherent symmetries of the coalescent. In particular, we show that the GF of genealogies can be decomposed into a set of equivalence classes that allows likelihood calculations from nontrivial samples. Using this strategy, we automated blockwise likelihood calculations for a general set of demographic scenarios in Mathematica. These histories may involve population size changes, continuous migration, discrete divergence, and admixture between multiple populations. 
Type Of Material Computer model/algorithm 
Year Produced 2016 
Provided To Others? Yes  
Impact The method is being used for population genomic analysis by a number of research group 
 
Title Data from: Demographically explicit scans for barriers to gene flow using gIMble 
Description Identifying regions of the genome that act as barriers to gene flow between recently diverged taxa has remained challenging given the many evolutionary forces that generate variation in genetic diversity and divergence along the genome, and the stochastic nature of this variation. Here we implement a composite likelihood approach for the quantification of barriers to gene flow. This analytic framework captures background selection and selection against locally maladaptive alleles (i.e. genomic barriers) in a model of isolation with migration (IM) as heterogeneity in effective population size (Ne) and effective migration rate (me), respectively. Variation in both effective demographic parameters is estimated in sliding windows via pre-computed likelihood grids. We have implemented genomewide IM blockwise likelihood estimation (gIMble) as a modular tool, which includes modules for pre-processing/filtering of genomic data and performing parametric bootstraps using coalescent simulations. To demonstrate the new approach, we analyse data from a well-studied sister species pair of tropical butterflies with a known history of post-divergence gene flow: Heliconius melpomene and H. cydno. Our analysis uncovers both large effect barrier loci (including well-known wing-pattern genes) and a genome-wide signal of polygenic barrier architecture. 
Type Of Material Database/Collection of data 
Year Produced 2023 
Provided To Others? Yes  
Impact The ability to identify and characterise genomic regions in local adpdation and speciation 
URL https://datadryad.org/stash/dataset/doi:10.5061/dryad.4j0zpc8jc
 
Title Data from: Do chromosome rearrangements fix by genetic drift or natural selection? Insights from Brenthis butterflies 
Description Large-scale chromosome rearrangements, such as fissions and fusions, are a common feature of eukaryote evolution. They can have considerable influence on the evolution of populations, yet it remains unclear exactly how rearrangements become established and eventually fix. Rearrangements could fix by genetic drift if they are weakly deleterious or neutral, or they may instead be favoured by positive natural selection. Here we compare genome assemblies of three closely related Brenthis butterfly species and characterise a complex history of fission and fusion rearrangements. An inferred demographic history of these species suggests that rearrangements became fixed in populations with large long-term effective size (Ne). However, we also find large runs of homozygosity within individual genomes and show that a model of population structure with smaller local Ne can reconcile these observations. Using a recently developed analytic framework for characterising hard selective sweeps, we find that chromosome fusions are not enriched for evidence of past sweeps compared to other regions of the genome. Nonetheless, one chromosome fusion in the B. daphne genome is associated with a valley of diversity where genealogical branch lengths are distorted, consistent with a selective sweep. Our results suggest that drift is a stronger force in these populations than suggested by overall genetic diversity, but that the fixation of strongly underdominant rearrangements remains unlikely. Additionally, although chromosome fusions do not typically exhibit signatures of selective sweeps, a single example raises the possibility that natural selection may sometimes play a role in their fixation. 
Type Of Material Database/Collection of data 
Year Produced 2023 
Provided To Others? Yes  
URL https://datadryad.org/stash/dataset/doi:10.5061/dryad.cnp5hqcbf
 
Title Data from: Para-allopatry in hybridizing fire-bellied toads (Bombina bombina and B. variegata): inference from transcriptome-wide coalescence analyses 
Description Ancient origins, profound ecological divergence, and extensive hybridization make the fire-bellied toads Bombina bombina and B. variegata (Anura: Bombinatoridae) an intriguing test case of ecological speciation. Previous modeling has proposed that the narrow Bombina hybrid zones represent strong barriers to neutral introgression. We test this prediction by inferring the rate of gene exchange between pure populations on either side of the intensively studied Kraków transect. We developed a method to extract high confidence sets of orthologous genes from de novo transcriptome assemblies, fitted a range of divergence models to these data and assessed their relative support with analytic likelihood calculations. There was clear evidence for postdivergence gene flow, but, as expected, no perceptible signal of recent introgression via the nearby hybrid zone. The analysis of two additional Bombina taxa (B. v. scabra and B. orientalis) validated our parameter estimates against a larger set of prior expectations. Despite substantial cumulative introgression over millions of years, adaptive divergence of the hybridizing taxa is essentially unaffected by their lack of reproductive isolation. Extended distribution ranges also buffer them against small-scale environmental perturbations that have been shown to reverse the speciation process in other, more recent ecotypes. 
Type Of Material Database/Collection of data 
Year Produced 2016 
Provided To Others? Yes  
URL https://datadryad.org/stash/dataset/doi:10.5061/dryad.88r69
 
Title gIMble 
Description A bioinformatic toolset for demographically explicit genome scans for species barrier. 
Type Of Material Computer model/algorithm 
Year Produced 2023 
Provided To Others? Yes  
Impact My research team organized and ran a 2 day workshop for the research community as part of an SMBE satellite meeting in May 2019 showcasing this method. 
URL https://github.com/LohseLab/gIMble
 
Description "Integration of speciation research" ESEB funded network grant 
Organisation Institute of Science and Technology Austria
Country Austria 
Sector Academic/University 
PI Contribution - helped to organise an international workshop ion speciation reserach - helped moderate a series of monthly online seminars and discussions - helped design a database for speciation data
Collaborator Contribution - helped to organise an international workshop ion speciation reserach - helped moderate a series of monthly online seminars and discussions - helped design a database for speciation data
Impact see publication outputs
Start Year 2021
 
Description "Integration of speciation research" ESEB funded network grant 
Organisation University of Amsterdam
Department Institute for Biodiversity and Ecosystem Dynamics (IBED)
Country Netherlands 
Sector Academic/University 
PI Contribution - helped to organise an international workshop ion speciation reserach - helped moderate a series of monthly online seminars and discussions - helped design a database for speciation data
Collaborator Contribution - helped to organise an international workshop ion speciation reserach - helped moderate a series of monthly online seminars and discussions - helped design a database for speciation data
Impact see publication outputs
Start Year 2021
 
Description "Integration of speciation research" ESEB funded network grant 
Organisation University of Montpellier
Country France 
Sector Academic/University 
PI Contribution - helped to organise an international workshop ion speciation reserach - helped moderate a series of monthly online seminars and discussions - helped design a database for speciation data
Collaborator Contribution - helped to organise an international workshop ion speciation reserach - helped moderate a series of monthly online seminars and discussions - helped design a database for speciation data
Impact see publication outputs
Start Year 2021
 
Description "Integration of speciation research" ESEB funded network grant 
Organisation University of Sheffield
Country United Kingdom 
Sector Academic/University 
PI Contribution - helped to organise an international workshop ion speciation reserach - helped moderate a series of monthly online seminars and discussions - helped design a database for speciation data
Collaborator Contribution - helped to organise an international workshop ion speciation reserach - helped moderate a series of monthly online seminars and discussions - helped design a database for speciation data
Impact see publication outputs
Start Year 2021
 
Description Butterfly speciation genomics 
Organisation University of Exeter
Country United Kingdom 
Sector Academic/University 
PI Contribution Wet lab work and sequencing for 40 butterfly species. Genome assembly and annotation for 40 species
Collaborator Contribution Wet lab work and sequencing for 40 butterfly species
Impact 10.1101/534123
Start Year 2017
 
Description Speciation genomics in skipper butterflies 
Organisation Institute of Evolutionary Biology
Country Spain 
Sector Private 
PI Contribution - Generation of a reference assembly for Spialia orbifer - Hosting a visiting PhD student (March -May 2020)
Collaborator Contribution - Tissue samples for whole genome sequencing - Ecology and range data - PhD student visit
Impact n/a
Start Year 2018
 
Description UK Tree of Life 
Organisation The Wellcome Trust Sanger Institute
Country United Kingdom 
Sector Charity/Non Profit 
PI Contribution My lab is providing samples, taxonomic and bioinformatic expertise to the Darwin Tree of Life initiative to generate chromosomal level genome assemblies for all 58 species of butterfly occurring in the UK.
Collaborator Contribution The Wellcome Trust Sanger Institute covers all wet lab and sequencing costs and will also generate genome assemblies
Impact Still ongoing
Start Year 2019
 
Title DRL/gIMble: gimble v1.0.3 
Description A genome-wide IM blockwise likelihood estimation toolkit 
Type Of Technology Software 
Year Produced 2023 
Impact A general statistical tool for identifying genomic regions associated with local adaptation and/or speciation in population genomic data 
URL https://zenodo.org/record/8006869
 
Description School visit (Edinburgh) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Schools
Results and Impact Three interactive sessions on butterfly biology and conservation in a local primary school (P3). Following this activity all three classes started a biology project on butterfly development.
Year(s) Of Engagement Activity 2017