Using genomes to dissect the speciation process - a comparative approach
Lead Research Organisation:
University of Edinburgh
Department Name: Sch of Biological Sciences
Abstract
Although speciation is one of the most fundamental biological processes, we still know surprisingly little about it. For example, it is not know whether species splits are generally abrupt, which would be expected if speciation mostly occurs as a result of populations becoming separated in different places, or whether speciation generally involves a protracted period of hybridisation which eventually ceases. It is also unclear what type of selection matters most for making species reproductively isolated: species may become isolated from each other simply by adapting to different environments or, alternatively, as a result of selection on traits that only benefit one sex. For example, in many insects, males increase the number of offspring by producing very many or very large sperm, while females have evolved mechanisms to kill, store and select sperm. This sexual antagonism leads to a reciprocal armsrace and rapid evolution of reproductive traits which in itself may be strong enough to drive speciation.
Because the genome of an individual is made up of contributions from an enormous number of ancestors, even a small sample of genomes contains a lot of information about the population of ancestors and about when and how fast this ancestral population split into distinct species. Since speciation is a slow process, the only chance to understand how species typically arise in nature is by extracting this genomic information about past speciation events. For example, recent comparisons of individual human genomes have shown that our own genomes are a result of past hybridisation between modern humans and more archaic forms such as Neandertals.
The main aim of my project is to use genomic data to estimate speciation histories and find out what factors drive speciation in nature. Comparing speciation histories across many different insect species and between different parts of the genome, will allow me answer fundamental questions about how new species are born. This work comes in two parts. Firstly, I will develop new statistical methods to reconstruct past speciation events from genome data. To make such inferences realistic, many biological processes that effect patterns of diversity in the genome must be incorporated into a mathematical model: During reproduction, genetic material is combined from different parents and passed on to successive generations by chance. While the splitting of populations leads to separated gene pools, individuals from different populations may migrate and hybridise, causing genes to "flow" between diverging species. In particular, I will focus on reconstructing the duration and direction of such gene flow after divergence which gives a measure of how fast speciation has happened. Secondly, I will use these methods to ask how the process of speciation has played out in 40 species of wasps, flies, beetles and butterflies many of which are common in UK. I will sequence multiple individual genomes in 20 pairs of closely related species and compare speciation parameters between species pairs with more and less intensive sexual antagonism, as indicated by their mating behaviour. This will reveal whether sexual antagonism speeds up speciation. A second comparison will explore the link between speciation and ecological specialisation by testing whether species that specialise on a small number of hosts generally evolve from generalists or vice versa. Finally, I will compare the speed at which sex chromosomes and autosomes become distinct during speciation to test whether genes important in speciation accumulate more rapidly on sex chromosomes.
This work will build a statistical framework for us to use genome sequences as a window into the past and to understand the role of selection, geography and hybridisation in speciation - an important step towards solving Darwin's mystery of mysteries of how species come about.
Because the genome of an individual is made up of contributions from an enormous number of ancestors, even a small sample of genomes contains a lot of information about the population of ancestors and about when and how fast this ancestral population split into distinct species. Since speciation is a slow process, the only chance to understand how species typically arise in nature is by extracting this genomic information about past speciation events. For example, recent comparisons of individual human genomes have shown that our own genomes are a result of past hybridisation between modern humans and more archaic forms such as Neandertals.
The main aim of my project is to use genomic data to estimate speciation histories and find out what factors drive speciation in nature. Comparing speciation histories across many different insect species and between different parts of the genome, will allow me answer fundamental questions about how new species are born. This work comes in two parts. Firstly, I will develop new statistical methods to reconstruct past speciation events from genome data. To make such inferences realistic, many biological processes that effect patterns of diversity in the genome must be incorporated into a mathematical model: During reproduction, genetic material is combined from different parents and passed on to successive generations by chance. While the splitting of populations leads to separated gene pools, individuals from different populations may migrate and hybridise, causing genes to "flow" between diverging species. In particular, I will focus on reconstructing the duration and direction of such gene flow after divergence which gives a measure of how fast speciation has happened. Secondly, I will use these methods to ask how the process of speciation has played out in 40 species of wasps, flies, beetles and butterflies many of which are common in UK. I will sequence multiple individual genomes in 20 pairs of closely related species and compare speciation parameters between species pairs with more and less intensive sexual antagonism, as indicated by their mating behaviour. This will reveal whether sexual antagonism speeds up speciation. A second comparison will explore the link between speciation and ecological specialisation by testing whether species that specialise on a small number of hosts generally evolve from generalists or vice versa. Finally, I will compare the speed at which sex chromosomes and autosomes become distinct during speciation to test whether genes important in speciation accumulate more rapidly on sex chromosomes.
This work will build a statistical framework for us to use genome sequences as a window into the past and to understand the role of selection, geography and hybridisation in speciation - an important step towards solving Darwin's mystery of mysteries of how species come about.
Planned Impact
The proposed research project spans a range of disciplines: bioinformatics, statistics, genomics, entomology and evolutionary biology and seeks to develop new computational tools to analyse genome wide variation and address fundamental questions about speciation. As such my scientific research is pure rather than applied. However, there are a number of stakeholders for which the methods developed in this project are of immediate relevance:
Applied research
Research into crop breeding increasingly uses genomic data for marker assisted breeding and to identify natural populations and cultivars that contain genetic variation of potential interest for modern breeding programs. Many economically important horticultural crop species in the UK (apples, strawberries) are the result of past hybridisation events between several wild ancestral forms and their complex population histories need to be taken into account explicitly in such analyses. Likewise, many insect (e.g. the pear sucker) and fungal crop pests have evolved very recently from less harmful or abundant wild forms and there is great commercial interest in understanding when and how this divergence took place and what genetic changes were involved. In many cases some of the most devastating crop pests and pathogens are invasive species. Being able to reconstruct the histories of these species will help to identify key adaptive genes that have diverged rapidly or introgressed from other species. Such knowledge is crucial for identifying targets for chemical and biological control.
Conservation Biology
Conservation research is increasingly reliant on genetic data to determine conservation units and strategies and to optimising captive breeding programs (which in the case of the East Asian Sus species include a number of zoos in the UK). However, to date this is mainly restricted to small numbers of genetic markers such as microsatellites. With the sequencing costs falling, it is only a matter of time until conservation biologists will routinely base conservation strategies on genomic data. A major concern in conservation biology is to detect and minimise hybridisation between endangered wild populations and domestic forms and to understand the causes of population structure and size changes in the wild.
General Public
Finally, I see the main immediate societal Impact of my research in increasing the public understanding of science, in particular research into speciation. There is generally great public interest in this topic, as for example, demonstrated by the media coverage of the genomic analyses of our own history. However, speciation research (including my own projects in the past) is mainly focused on either very exotic systems that are unknown and inaccessible to the general public (e.g. African Lake Cichlids, Heliconius butterflies) and/or model species that are small and indistinguishable to non-experts (Drosophila fruit flies). As a result, speciation research is almost entirely disconnected from the activities of hobby naturalists (which have a long tradition in the UK) and may easily be perceived as arcane or irrelevant for understanding and conserving native wild life. Likewise, organisations devoted to the conservation in the UK, although very active in educating the public about the natural history of native species and the need to conserve them and their habitats, very rarely put this information into an evolutionary context. Many of the species I will investigate at a genomic level in this project are charismatic, big insects that occur in the UK and will be familiar to many hobby naturalists and gardeners (e.g. butterflies, dung beetles). This provides a unique opportunity to educate the general public about the evolutionary and geographic origin of our own fauna and raise the public profile of genomics and speciation research (see Pathways to Impact).
Applied research
Research into crop breeding increasingly uses genomic data for marker assisted breeding and to identify natural populations and cultivars that contain genetic variation of potential interest for modern breeding programs. Many economically important horticultural crop species in the UK (apples, strawberries) are the result of past hybridisation events between several wild ancestral forms and their complex population histories need to be taken into account explicitly in such analyses. Likewise, many insect (e.g. the pear sucker) and fungal crop pests have evolved very recently from less harmful or abundant wild forms and there is great commercial interest in understanding when and how this divergence took place and what genetic changes were involved. In many cases some of the most devastating crop pests and pathogens are invasive species. Being able to reconstruct the histories of these species will help to identify key adaptive genes that have diverged rapidly or introgressed from other species. Such knowledge is crucial for identifying targets for chemical and biological control.
Conservation Biology
Conservation research is increasingly reliant on genetic data to determine conservation units and strategies and to optimising captive breeding programs (which in the case of the East Asian Sus species include a number of zoos in the UK). However, to date this is mainly restricted to small numbers of genetic markers such as microsatellites. With the sequencing costs falling, it is only a matter of time until conservation biologists will routinely base conservation strategies on genomic data. A major concern in conservation biology is to detect and minimise hybridisation between endangered wild populations and domestic forms and to understand the causes of population structure and size changes in the wild.
General Public
Finally, I see the main immediate societal Impact of my research in increasing the public understanding of science, in particular research into speciation. There is generally great public interest in this topic, as for example, demonstrated by the media coverage of the genomic analyses of our own history. However, speciation research (including my own projects in the past) is mainly focused on either very exotic systems that are unknown and inaccessible to the general public (e.g. African Lake Cichlids, Heliconius butterflies) and/or model species that are small and indistinguishable to non-experts (Drosophila fruit flies). As a result, speciation research is almost entirely disconnected from the activities of hobby naturalists (which have a long tradition in the UK) and may easily be perceived as arcane or irrelevant for understanding and conserving native wild life. Likewise, organisations devoted to the conservation in the UK, although very active in educating the public about the natural history of native species and the need to conserve them and their habitats, very rarely put this information into an evolutionary context. Many of the species I will investigate at a genomic level in this project are charismatic, big insects that occur in the UK and will be familiar to many hobby naturalists and gardeners (e.g. butterflies, dung beetles). This provides a unique opportunity to educate the general public about the evolutionary and geographic origin of our own fauna and raise the public profile of genomics and speciation research (see Pathways to Impact).
People |
ORCID iD |
Konrad Lohse (Principal Investigator / Fellow) |
Publications
Hayward A
(2022)
The genome sequence of the silver-studded blue, Plebejus argus (Linnaeus, 1758).
in Wellcome open research
Hayward A
(2023)
The genome sequence of the Brown Argus, Aricia agestis (Denis & Schiffermüller, 1775)
in Wellcome Open Research
Hayward A
(2022)
The genome sequence of the grizzled skipper, Pyrgus malvae (Linnaeus, 1758)
in Wellcome Open Research
Kelleher J
(2020)
Coalescent Simulation with msprime.
in Methods in molecular biology (Clifton, N.J.)
Kolora SRR
(2019)
Divergent evolution in the genomes of closely related lacertids, Lacerta viridis and L. bilineata, and implications for speciation.
in GigaScience
Laetsch DR
(2023)
Demographically explicit scans for barriers to gene flow using gIMble.
in PLoS genetics
Lohse K
(2021)
The genome sequence of the red admiral, Vanessa atalanta (Linnaeus, 1758)
in Wellcome Open Research
Lohse K
(2023)
The genome sequence of the Mazarine Blue, Cyaniris semiargus (Rottemburg, 1775)
in Wellcome Open Research
Description | - new and computationally efficient ways to infer past population processes (e.g. changes in population size, gene flow and splits between populations) from genomic data. - a new method for estimating recomabination rates from individual genomes. - a new quantitative approach for identifying genomic outliers between species and populations. - a new and general causal relationship between the number of chromosomes and the levels of genetic diversity in wild populations. - a detailed understanding of the mode and timescale of speciation in European butterflies - new mathematical predictions for the signal of past natural selection in genome-wide variation and a demonstration of how these can be used to characterise positive selection from sequence variation. |
Exploitation Route | Genomic regions may contain targets of past selection and using genomic data to scree for such targets is of great interest in a wide range of fields including: archaeology, conservation biology and the control of disease vectors and agricultural and horticultural pests. |
Sectors | Agriculture, Food and Drink,Education,Environment |
Description | ModelGenomLand, ERC starting grant |
Amount | £1,521,119 (GBP) |
Funding ID | 757648 |
Organisation | European Research Council (ERC) |
Sector | Public |
Country | Belgium |
Start | 02/2018 |
End | 02/2023 |
Title | ABLE |
Description | We make use of the distribution of blockwise SFS (bSFS) patterns for the inference of arbitrary population histories from mutliple genome sequences. The latter can be whole genomes or of a fragmented nature such as RADSeq data. Our method notably allows for the simultaneous inference of demographic history and the genome-wide historical recombination rate. Additionally, we do not require phased genomes as the bSFS approach does not distinguish the sampled lineage in which a mutation occurred. As with the Site Frequency Spectrum (SFS), we can also ignore outgroups by folding the bSFS. Our Approximate Blockwise Likelihood Estimation (ABLE) approach implemented in C/C++ and taking advantage of parallel computing power is tailored for studying the population histories of model as well as non-model species |
Type Of Material | Data analysis technique |
Year Produced | 2016 |
Provided To Others? | Yes |
Impact | This method is now being used by several research groups to analyse population genomic data |
URL | https://github.com/champost/ABLE |
Title | Computing likelihoods under the coalescent |
Description | The inference of demographic history from genome data is hindered by a lack of efficient computational approaches. In particular, it has proved difficult to exploit the information contained in the distribution of genealogies across the genome. We have previously shown that the generating function (GF) of genealogies can be used to analytically compute likelihoods of demographic models from configurations of mutations in short sequence blocks (Lohse et al. 2011). Although the GF has a simple, recursive form, the size of such likelihood calculations explodes quickly with the number of individuals and applications of this framework have so far been mainly limited to small samples (pairs and triplets) for which the GF can be written by hand. Here we investigate several strategies for exploiting the inherent symmetries of the coalescent. In particular, we show that the GF of genealogies can be decomposed into a set of equivalence classes that allows likelihood calculations from nontrivial samples. Using this strategy, we automated blockwise likelihood calculations for a general set of demographic scenarios in Mathematica. These histories may involve population size changes, continuous migration, discrete divergence, and admixture between multiple populations. |
Type Of Material | Computer model/algorithm |
Year Produced | 2016 |
Provided To Others? | Yes |
Impact | The method is being used for population genomic analysis by a number of research group |
Title | gIMble |
Description | A bioinformatic toolset for demographically explicit genome scans for species barrier. |
Type Of Material | Computer model/algorithm |
Year Produced | 2019 |
Provided To Others? | Yes |
Impact | My research team organized and ran a 2 day workshop for the research community as part of an SMBE satellite meeting in May 2019 showcasing this method. |
URL | https://github.com/DRL/gIMble |
Description | Butterfly speciation genomics |
Organisation | University of Exeter |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Wet lab work and sequencing for 40 butterfly species. Genome assembly and annotation for 40 species |
Collaborator Contribution | Wet lab work and sequencing for 40 butterfly species |
Impact | 10.1101/534123 |
Start Year | 2017 |
Description | Speciation genomics in skipper butterflies |
Organisation | Institute of Evolutionary Biology |
Country | Spain |
Sector | Private |
PI Contribution | - Generation of a reference assembly for Spialia orbifer - Hosting a visiting PhD student (March -May 2020) |
Collaborator Contribution | - Tissue samples for whole genome sequencing - Ecology and range data - PhD student visit |
Impact | n/a |
Start Year | 2018 |
Description | UK Tree of Life |
Organisation | The Wellcome Trust Sanger Institute |
Country | United Kingdom |
Sector | Charity/Non Profit |
PI Contribution | My lab is providing samples, taxonomic and bioinformatic expertise to the Darwin Tree of Life initiative to generate chromosomal level genome assemblies for all 58 species of butterfly occurring in the UK. |
Collaborator Contribution | The Wellcome Trust Sanger Institute covers all wet lab and sequencing costs and will also generate genome assemblies |
Impact | Still ongoing |
Start Year | 2019 |
Description | School visit (Edinburgh) |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | Local |
Primary Audience | Schools |
Results and Impact | Three interactive sessions on butterfly biology and conservation in a local primary school (P3). Following this activity all three classes started a biology project on butterfly development. |
Year(s) Of Engagement Activity | 2017 |