Using genomes to dissect the speciation process - a comparative approach
Lead Research Organisation:
University of Edinburgh
Department Name: Sch of Biological Sciences
Abstract
Although speciation is one of the most fundamental biological processes, we still know surprisingly little about it. For example, it is not know whether species splits are generally abrupt, which would be expected if speciation mostly occurs as a result of populations becoming separated in different places, or whether speciation generally involves a protracted period of hybridisation which eventually ceases. It is also unclear what type of selection matters most for making species reproductively isolated: species may become isolated from each other simply by adapting to different environments or, alternatively, as a result of selection on traits that only benefit one sex. For example, in many insects, males increase the number of offspring by producing very many or very large sperm, while females have evolved mechanisms to kill, store and select sperm. This sexual antagonism leads to a reciprocal armsrace and rapid evolution of reproductive traits which in itself may be strong enough to drive speciation.
Because the genome of an individual is made up of contributions from an enormous number of ancestors, even a small sample of genomes contains a lot of information about the population of ancestors and about when and how fast this ancestral population split into distinct species. Since speciation is a slow process, the only chance to understand how species typically arise in nature is by extracting this genomic information about past speciation events. For example, recent comparisons of individual human genomes have shown that our own genomes are a result of past hybridisation between modern humans and more archaic forms such as Neandertals.
The main aim of my project is to use genomic data to estimate speciation histories and find out what factors drive speciation in nature. Comparing speciation histories across many different insect species and between different parts of the genome, will allow me answer fundamental questions about how new species are born. This work comes in two parts. Firstly, I will develop new statistical methods to reconstruct past speciation events from genome data. To make such inferences realistic, many biological processes that effect patterns of diversity in the genome must be incorporated into a mathematical model: During reproduction, genetic material is combined from different parents and passed on to successive generations by chance. While the splitting of populations leads to separated gene pools, individuals from different populations may migrate and hybridise, causing genes to "flow" between diverging species. In particular, I will focus on reconstructing the duration and direction of such gene flow after divergence which gives a measure of how fast speciation has happened. Secondly, I will use these methods to ask how the process of speciation has played out in 40 species of wasps, flies, beetles and butterflies many of which are common in UK. I will sequence multiple individual genomes in 20 pairs of closely related species and compare speciation parameters between species pairs with more and less intensive sexual antagonism, as indicated by their mating behaviour. This will reveal whether sexual antagonism speeds up speciation. A second comparison will explore the link between speciation and ecological specialisation by testing whether species that specialise on a small number of hosts generally evolve from generalists or vice versa. Finally, I will compare the speed at which sex chromosomes and autosomes become distinct during speciation to test whether genes important in speciation accumulate more rapidly on sex chromosomes.
This work will build a statistical framework for us to use genome sequences as a window into the past and to understand the role of selection, geography and hybridisation in speciation - an important step towards solving Darwin's mystery of mysteries of how species come about.
Because the genome of an individual is made up of contributions from an enormous number of ancestors, even a small sample of genomes contains a lot of information about the population of ancestors and about when and how fast this ancestral population split into distinct species. Since speciation is a slow process, the only chance to understand how species typically arise in nature is by extracting this genomic information about past speciation events. For example, recent comparisons of individual human genomes have shown that our own genomes are a result of past hybridisation between modern humans and more archaic forms such as Neandertals.
The main aim of my project is to use genomic data to estimate speciation histories and find out what factors drive speciation in nature. Comparing speciation histories across many different insect species and between different parts of the genome, will allow me answer fundamental questions about how new species are born. This work comes in two parts. Firstly, I will develop new statistical methods to reconstruct past speciation events from genome data. To make such inferences realistic, many biological processes that effect patterns of diversity in the genome must be incorporated into a mathematical model: During reproduction, genetic material is combined from different parents and passed on to successive generations by chance. While the splitting of populations leads to separated gene pools, individuals from different populations may migrate and hybridise, causing genes to "flow" between diverging species. In particular, I will focus on reconstructing the duration and direction of such gene flow after divergence which gives a measure of how fast speciation has happened. Secondly, I will use these methods to ask how the process of speciation has played out in 40 species of wasps, flies, beetles and butterflies many of which are common in UK. I will sequence multiple individual genomes in 20 pairs of closely related species and compare speciation parameters between species pairs with more and less intensive sexual antagonism, as indicated by their mating behaviour. This will reveal whether sexual antagonism speeds up speciation. A second comparison will explore the link between speciation and ecological specialisation by testing whether species that specialise on a small number of hosts generally evolve from generalists or vice versa. Finally, I will compare the speed at which sex chromosomes and autosomes become distinct during speciation to test whether genes important in speciation accumulate more rapidly on sex chromosomes.
This work will build a statistical framework for us to use genome sequences as a window into the past and to understand the role of selection, geography and hybridisation in speciation - an important step towards solving Darwin's mystery of mysteries of how species come about.
Planned Impact
The proposed research project spans a range of disciplines: bioinformatics, statistics, genomics, entomology and evolutionary biology and seeks to develop new computational tools to analyse genome wide variation and address fundamental questions about speciation. As such my scientific research is pure rather than applied. However, there are a number of stakeholders for which the methods developed in this project are of immediate relevance:
Applied research
Research into crop breeding increasingly uses genomic data for marker assisted breeding and to identify natural populations and cultivars that contain genetic variation of potential interest for modern breeding programs. Many economically important horticultural crop species in the UK (apples, strawberries) are the result of past hybridisation events between several wild ancestral forms and their complex population histories need to be taken into account explicitly in such analyses. Likewise, many insect (e.g. the pear sucker) and fungal crop pests have evolved very recently from less harmful or abundant wild forms and there is great commercial interest in understanding when and how this divergence took place and what genetic changes were involved. In many cases some of the most devastating crop pests and pathogens are invasive species. Being able to reconstruct the histories of these species will help to identify key adaptive genes that have diverged rapidly or introgressed from other species. Such knowledge is crucial for identifying targets for chemical and biological control.
Conservation Biology
Conservation research is increasingly reliant on genetic data to determine conservation units and strategies and to optimising captive breeding programs (which in the case of the East Asian Sus species include a number of zoos in the UK). However, to date this is mainly restricted to small numbers of genetic markers such as microsatellites. With the sequencing costs falling, it is only a matter of time until conservation biologists will routinely base conservation strategies on genomic data. A major concern in conservation biology is to detect and minimise hybridisation between endangered wild populations and domestic forms and to understand the causes of population structure and size changes in the wild.
General Public
Finally, I see the main immediate societal Impact of my research in increasing the public understanding of science, in particular research into speciation. There is generally great public interest in this topic, as for example, demonstrated by the media coverage of the genomic analyses of our own history. However, speciation research (including my own projects in the past) is mainly focused on either very exotic systems that are unknown and inaccessible to the general public (e.g. African Lake Cichlids, Heliconius butterflies) and/or model species that are small and indistinguishable to non-experts (Drosophila fruit flies). As a result, speciation research is almost entirely disconnected from the activities of hobby naturalists (which have a long tradition in the UK) and may easily be perceived as arcane or irrelevant for understanding and conserving native wild life. Likewise, organisations devoted to the conservation in the UK, although very active in educating the public about the natural history of native species and the need to conserve them and their habitats, very rarely put this information into an evolutionary context. Many of the species I will investigate at a genomic level in this project are charismatic, big insects that occur in the UK and will be familiar to many hobby naturalists and gardeners (e.g. butterflies, dung beetles). This provides a unique opportunity to educate the general public about the evolutionary and geographic origin of our own fauna and raise the public profile of genomics and speciation research (see Pathways to Impact).
Applied research
Research into crop breeding increasingly uses genomic data for marker assisted breeding and to identify natural populations and cultivars that contain genetic variation of potential interest for modern breeding programs. Many economically important horticultural crop species in the UK (apples, strawberries) are the result of past hybridisation events between several wild ancestral forms and their complex population histories need to be taken into account explicitly in such analyses. Likewise, many insect (e.g. the pear sucker) and fungal crop pests have evolved very recently from less harmful or abundant wild forms and there is great commercial interest in understanding when and how this divergence took place and what genetic changes were involved. In many cases some of the most devastating crop pests and pathogens are invasive species. Being able to reconstruct the histories of these species will help to identify key adaptive genes that have diverged rapidly or introgressed from other species. Such knowledge is crucial for identifying targets for chemical and biological control.
Conservation Biology
Conservation research is increasingly reliant on genetic data to determine conservation units and strategies and to optimising captive breeding programs (which in the case of the East Asian Sus species include a number of zoos in the UK). However, to date this is mainly restricted to small numbers of genetic markers such as microsatellites. With the sequencing costs falling, it is only a matter of time until conservation biologists will routinely base conservation strategies on genomic data. A major concern in conservation biology is to detect and minimise hybridisation between endangered wild populations and domestic forms and to understand the causes of population structure and size changes in the wild.
General Public
Finally, I see the main immediate societal Impact of my research in increasing the public understanding of science, in particular research into speciation. There is generally great public interest in this topic, as for example, demonstrated by the media coverage of the genomic analyses of our own history. However, speciation research (including my own projects in the past) is mainly focused on either very exotic systems that are unknown and inaccessible to the general public (e.g. African Lake Cichlids, Heliconius butterflies) and/or model species that are small and indistinguishable to non-experts (Drosophila fruit flies). As a result, speciation research is almost entirely disconnected from the activities of hobby naturalists (which have a long tradition in the UK) and may easily be perceived as arcane or irrelevant for understanding and conserving native wild life. Likewise, organisations devoted to the conservation in the UK, although very active in educating the public about the natural history of native species and the need to conserve them and their habitats, very rarely put this information into an evolutionary context. Many of the species I will investigate at a genomic level in this project are charismatic, big insects that occur in the UK and will be familiar to many hobby naturalists and gardeners (e.g. butterflies, dung beetles). This provides a unique opportunity to educate the general public about the evolutionary and geographic origin of our own fauna and raise the public profile of genomics and speciation research (see Pathways to Impact).
Organisations
- University of Edinburgh (Lead Research Organisation)
- University of Sheffield (Collaboration)
- Institute of Science and Technology Austria (Collaboration)
- University of Montpellier (Collaboration)
- Institute of Evolutionary Biology (Collaboration)
- University of Amsterdam (Collaboration)
- University of Exeter (Collaboration)
- The Wellcome Trust Sanger Institute (Collaboration)
People |
ORCID iD |
| Konrad Lohse (Principal Investigator / Fellow) |
Publications
Beeravolu C
(2018)
ABLE: blockwise site frequency spectra for inferring complex population histories and recombination
in Genome Biology
Bishop G
(2021)
The genome sequence of the small tortoiseshell butterfly, Aglais urticae (Linnaeus, 1758)
in Wellcome Open Research
Bisschop G
(2021)
Sweeps in time: leveraging the joint distribution of branch lengths
in Genetics
Bisschop G
(2021)
Sweeps in time: leveraging the joint distribution of branch lengths
Bisschop G
(2020)
The impact of global selection on local adaptation and reproductive isolation.
in Philosophical transactions of the Royal Society of London. Series B, Biological sciences
Bisschop Gertjan
(2022)
The Laplace transform in population genetics : from theory to efficient algorithms
Bunnefeld L
(2018)
Whole-genome data reveal the complex history of a diverse ecological community.
in Proceedings of the National Academy of Sciences of the United States of America
| Description | - new and computationally efficient ways to infer past population processes (e.g. changes in population size, gene flow and splits between populations) from genomic data. - a new method for estimating recomabination rates from individual genomes. - a new quantitative approach for identifying genomic outliers between species and populations. - a new and general causal relationship between the number of chromosomes and the levels of genetic diversity in wild populations. - a detailed understanding of the mode and timescale of speciation in European butterflies - new mathematical predictions for the signal of past natural selection in genome-wide variation and a demonstration of how these can be used to characterise positive selection from sequence variation. |
| Exploitation Route | Genomic regions may contain targets of past selection and using genomic data to scree for such targets is of great interest in a wide range of fields including: archaeology, conservation biology and the control of disease vectors and agricultural and horticultural pests. |
| Sectors | Agriculture Food and Drink Education Environment |
| Description | Efficient simulation and inference under approximate models of ancestry |
| Amount | £358,032 (GBP) |
| Funding ID | EP/X024881/1 |
| Organisation | Engineering and Physical Sciences Research Council (EPSRC) |
| Sector | Public |
| Country | United Kingdom |
| Start | 06/2023 |
| End | 06/2026 |
| Description | ModelGenomLand, ERC starting grant |
| Amount | £1,521,119 (GBP) |
| Funding ID | 757648 |
| Organisation | European Research Council (ERC) |
| Sector | Public |
| Country | Belgium |
| Start | 02/2018 |
| End | 02/2023 |
| Title | ABLE |
| Description | We make use of the distribution of blockwise SFS (bSFS) patterns for the inference of arbitrary population histories from mutliple genome sequences. The latter can be whole genomes or of a fragmented nature such as RADSeq data. Our method notably allows for the simultaneous inference of demographic history and the genome-wide historical recombination rate. Additionally, we do not require phased genomes as the bSFS approach does not distinguish the sampled lineage in which a mutation occurred. As with the Site Frequency Spectrum (SFS), we can also ignore outgroups by folding the bSFS. Our Approximate Blockwise Likelihood Estimation (ABLE) approach implemented in C/C++ and taking advantage of parallel computing power is tailored for studying the population histories of model as well as non-model species |
| Type Of Material | Data analysis technique |
| Year Produced | 2016 |
| Provided To Others? | Yes |
| Impact | This method is now being used by several research groups to analyse population genomic data |
| URL | https://github.com/champost/ABLE |
| Title | Computing likelihoods under the coalescent |
| Description | The inference of demographic history from genome data is hindered by a lack of efficient computational approaches. In particular, it has proved difficult to exploit the information contained in the distribution of genealogies across the genome. We have previously shown that the generating function (GF) of genealogies can be used to analytically compute likelihoods of demographic models from configurations of mutations in short sequence blocks (Lohse et al. 2011). Although the GF has a simple, recursive form, the size of such likelihood calculations explodes quickly with the number of individuals and applications of this framework have so far been mainly limited to small samples (pairs and triplets) for which the GF can be written by hand. Here we investigate several strategies for exploiting the inherent symmetries of the coalescent. In particular, we show that the GF of genealogies can be decomposed into a set of equivalence classes that allows likelihood calculations from nontrivial samples. Using this strategy, we automated blockwise likelihood calculations for a general set of demographic scenarios in Mathematica. These histories may involve population size changes, continuous migration, discrete divergence, and admixture between multiple populations. |
| Type Of Material | Computer model/algorithm |
| Year Produced | 2016 |
| Provided To Others? | Yes |
| Impact | The method is being used for population genomic analysis by a number of research group |
| Title | Data from: Demographically explicit scans for barriers to gene flow using gIMble |
| Description | Identifying regions of the genome that act as barriers to gene flow between recently diverged taxa has remained challenging given the many evolutionary forces that generate variation in genetic diversity and divergence along the genome, and the stochastic nature of this variation. Here we implement a composite likelihood approach for the quantification of barriers to gene flow. This analytic framework captures background selection and selection against locally maladaptive alleles (i.e. genomic barriers) in a model of isolation with migration (IM) as heterogeneity in effective population size (Ne) and effective migration rate (me), respectively. Variation in both effective demographic parameters is estimated in sliding windows via pre-computed likelihood grids. We have implemented genomewide IM blockwise likelihood estimation (gIMble) as a modular tool, which includes modules for pre-processing/filtering of genomic data and performing parametric bootstraps using coalescent simulations. To demonstrate the new approach, we analyse data from a well-studied sister species pair of tropical butterflies with a known history of post-divergence gene flow: Heliconius melpomene and H. cydno. Our analysis uncovers both large effect barrier loci (including well-known wing-pattern genes) and a genome-wide signal of polygenic barrier architecture. |
| Type Of Material | Database/Collection of data |
| Year Produced | 2023 |
| Provided To Others? | Yes |
| Impact | The ability to identify and characterise genomic regions in local adpdation and speciation |
| URL | https://datadryad.org/stash/dataset/doi:10.5061/dryad.4j0zpc8jc |
| Title | Data from: Do chromosome rearrangements fix by genetic drift or natural selection? Insights from Brenthis butterflies |
| Description | Large-scale chromosome rearrangements, such as fissions and fusions, are a common feature of eukaryote evolution. They can have considerable influence on the evolution of populations, yet it remains unclear exactly how rearrangements become established and eventually fix. Rearrangements could fix by genetic drift if they are weakly deleterious or neutral, or they may instead be favoured by positive natural selection. Here we compare genome assemblies of three closely related Brenthis butterfly species and characterise a complex history of fission and fusion rearrangements. An inferred demographic history of these species suggests that rearrangements became fixed in populations with large long-term effective size (Ne). However, we also find large runs of homozygosity within individual genomes and show that a model of population structure with smaller local Ne can reconcile these observations. Using a recently developed analytic framework for characterising hard selective sweeps, we find that chromosome fusions are not enriched for evidence of past sweeps compared to other regions of the genome. Nonetheless, one chromosome fusion in the B. daphne genome is associated with a valley of diversity where genealogical branch lengths are distorted, consistent with a selective sweep. Our results suggest that drift is a stronger force in these populations than suggested by overall genetic diversity, but that the fixation of strongly underdominant rearrangements remains unlikely. Additionally, although chromosome fusions do not typically exhibit signatures of selective sweeps, a single example raises the possibility that natural selection may sometimes play a role in their fixation. |
| Type Of Material | Database/Collection of data |
| Year Produced | 2023 |
| Provided To Others? | Yes |
| URL | https://datadryad.org/stash/dataset/doi:10.5061/dryad.cnp5hqcbf |
| Title | Data from: Para-allopatry in hybridizing fire-bellied toads (Bombina bombina and B. variegata): inference from transcriptome-wide coalescence analyses |
| Description | Ancient origins, profound ecological divergence, and extensive hybridization make the fire-bellied toads Bombina bombina and B. variegata (Anura: Bombinatoridae) an intriguing test case of ecological speciation. Previous modeling has proposed that the narrow Bombina hybrid zones represent strong barriers to neutral introgression. We test this prediction by inferring the rate of gene exchange between pure populations on either side of the intensively studied Kraków transect. We developed a method to extract high confidence sets of orthologous genes from de novo transcriptome assemblies, fitted a range of divergence models to these data and assessed their relative support with analytic likelihood calculations. There was clear evidence for postdivergence gene flow, but, as expected, no perceptible signal of recent introgression via the nearby hybrid zone. The analysis of two additional Bombina taxa (B. v. scabra and B. orientalis) validated our parameter estimates against a larger set of prior expectations. Despite substantial cumulative introgression over millions of years, adaptive divergence of the hybridizing taxa is essentially unaffected by their lack of reproductive isolation. Extended distribution ranges also buffer them against small-scale environmental perturbations that have been shown to reverse the speciation process in other, more recent ecotypes. |
| Type Of Material | Database/Collection of data |
| Year Produced | 2016 |
| Provided To Others? | Yes |
| URL | https://datadryad.org/stash/dataset/doi:10.5061/dryad.88r69 |
| Title | gIMble |
| Description | A bioinformatic toolset for demographically explicit genome scans for species barrier. |
| Type Of Material | Computer model/algorithm |
| Year Produced | 2023 |
| Provided To Others? | Yes |
| Impact | My research team organized and ran a 2 day workshop for the research community as part of an SMBE satellite meeting in May 2019 showcasing this method. |
| URL | https://github.com/LohseLab/gIMble |
| Description | "Integration of speciation research" ESEB funded network grant |
| Organisation | Institute of Science and Technology Austria |
| Country | Austria |
| Sector | Academic/University |
| PI Contribution | - helped to organise an international workshop ion speciation reserach - helped moderate a series of monthly online seminars and discussions - helped design a database for speciation data |
| Collaborator Contribution | - helped to organise an international workshop ion speciation reserach - helped moderate a series of monthly online seminars and discussions - helped design a database for speciation data |
| Impact | see publication outputs |
| Start Year | 2021 |
| Description | "Integration of speciation research" ESEB funded network grant |
| Organisation | University of Amsterdam |
| Department | Institute for Biodiversity and Ecosystem Dynamics (IBED) |
| Country | Netherlands |
| Sector | Academic/University |
| PI Contribution | - helped to organise an international workshop ion speciation reserach - helped moderate a series of monthly online seminars and discussions - helped design a database for speciation data |
| Collaborator Contribution | - helped to organise an international workshop ion speciation reserach - helped moderate a series of monthly online seminars and discussions - helped design a database for speciation data |
| Impact | see publication outputs |
| Start Year | 2021 |
| Description | "Integration of speciation research" ESEB funded network grant |
| Organisation | University of Montpellier |
| Country | France |
| Sector | Academic/University |
| PI Contribution | - helped to organise an international workshop ion speciation reserach - helped moderate a series of monthly online seminars and discussions - helped design a database for speciation data |
| Collaborator Contribution | - helped to organise an international workshop ion speciation reserach - helped moderate a series of monthly online seminars and discussions - helped design a database for speciation data |
| Impact | see publication outputs |
| Start Year | 2021 |
| Description | "Integration of speciation research" ESEB funded network grant |
| Organisation | University of Sheffield |
| Country | United Kingdom |
| Sector | Academic/University |
| PI Contribution | - helped to organise an international workshop ion speciation reserach - helped moderate a series of monthly online seminars and discussions - helped design a database for speciation data |
| Collaborator Contribution | - helped to organise an international workshop ion speciation reserach - helped moderate a series of monthly online seminars and discussions - helped design a database for speciation data |
| Impact | see publication outputs |
| Start Year | 2021 |
| Description | Butterfly speciation genomics |
| Organisation | University of Exeter |
| Country | United Kingdom |
| Sector | Academic/University |
| PI Contribution | Wet lab work and sequencing for 40 butterfly species. Genome assembly and annotation for 40 species |
| Collaborator Contribution | Wet lab work and sequencing for 40 butterfly species |
| Impact | 10.1101/534123 |
| Start Year | 2017 |
| Description | Speciation genomics in skipper butterflies |
| Organisation | Institute of Evolutionary Biology |
| Country | Spain |
| Sector | Private |
| PI Contribution | - Generation of a reference assembly for Spialia orbifer - Hosting a visiting PhD student (March -May 2020) |
| Collaborator Contribution | - Tissue samples for whole genome sequencing - Ecology and range data - PhD student visit |
| Impact | n/a |
| Start Year | 2018 |
| Description | UK Tree of Life |
| Organisation | The Wellcome Trust Sanger Institute |
| Country | United Kingdom |
| Sector | Charity/Non Profit |
| PI Contribution | My lab is providing samples, taxonomic and bioinformatic expertise to the Darwin Tree of Life initiative to generate chromosomal level genome assemblies for all 58 species of butterfly occurring in the UK. |
| Collaborator Contribution | The Wellcome Trust Sanger Institute covers all wet lab and sequencing costs and will also generate genome assemblies |
| Impact | Still ongoing |
| Start Year | 2019 |
| Title | DRL/gIMble: gimble v1.0.3 |
| Description | A genome-wide IM blockwise likelihood estimation toolkit |
| Type Of Technology | Software |
| Year Produced | 2023 |
| Impact | A general statistical tool for identifying genomic regions associated with local adaptation and/or speciation in population genomic data |
| URL | https://zenodo.org/record/8006869 |
| Description | School visit (Edinburgh) |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | Local |
| Primary Audience | Schools |
| Results and Impact | Three interactive sessions on butterfly biology and conservation in a local primary school (P3). Following this activity all three classes started a biology project on butterfly development. |
| Year(s) Of Engagement Activity | 2017 |