Using genomes to dissect the speciation process - a comparative approach

Lead Research Organisation: University of Edinburgh
Department Name: Sch of Biological Sciences

Abstract

Although speciation is one of the most fundamental biological processes, we still know surprisingly little about it. For example, it is not know whether species splits are generally abrupt, which would be expected if speciation mostly occurs as a result of populations becoming separated in different places, or whether speciation generally involves a protracted period of hybridisation which eventually ceases. It is also unclear what type of selection matters most for making species reproductively isolated: species may become isolated from each other simply by adapting to different environments or, alternatively, as a result of selection on traits that only benefit one sex. For example, in many insects, males increase the number of offspring by producing very many or very large sperm, while females have evolved mechanisms to kill, store and select sperm. This sexual antagonism leads to a reciprocal armsrace and rapid evolution of reproductive traits which in itself may be strong enough to drive speciation.

Because the genome of an individual is made up of contributions from an enormous number of ancestors, even a small sample of genomes contains a lot of information about the population of ancestors and about when and how fast this ancestral population split into distinct species. Since speciation is a slow process, the only chance to understand how species typically arise in nature is by extracting this genomic information about past speciation events. For example, recent comparisons of individual human genomes have shown that our own genomes are a result of past hybridisation between modern humans and more archaic forms such as Neandertals.

The main aim of my project is to use genomic data to estimate speciation histories and find out what factors drive speciation in nature. Comparing speciation histories across many different insect species and between different parts of the genome, will allow me answer fundamental questions about how new species are born. This work comes in two parts. Firstly, I will develop new statistical methods to reconstruct past speciation events from genome data. To make such inferences realistic, many biological processes that effect patterns of diversity in the genome must be incorporated into a mathematical model: During reproduction, genetic material is combined from different parents and passed on to successive generations by chance. While the splitting of populations leads to separated gene pools, individuals from different populations may migrate and hybridise, causing genes to "flow" between diverging species. In particular, I will focus on reconstructing the duration and direction of such gene flow after divergence which gives a measure of how fast speciation has happened. Secondly, I will use these methods to ask how the process of speciation has played out in 40 species of wasps, flies, beetles and butterflies many of which are common in UK. I will sequence multiple individual genomes in 20 pairs of closely related species and compare speciation parameters between species pairs with more and less intensive sexual antagonism, as indicated by their mating behaviour. This will reveal whether sexual antagonism speeds up speciation. A second comparison will explore the link between speciation and ecological specialisation by testing whether species that specialise on a small number of hosts generally evolve from generalists or vice versa. Finally, I will compare the speed at which sex chromosomes and autosomes become distinct during speciation to test whether genes important in speciation accumulate more rapidly on sex chromosomes.

This work will build a statistical framework for us to use genome sequences as a window into the past and to understand the role of selection, geography and hybridisation in speciation - an important step towards solving Darwin's mystery of mysteries of how species come about.

Planned Impact

The proposed research project spans a range of disciplines: bioinformatics, statistics, genomics, entomology and evolutionary biology and seeks to develop new computational tools to analyse genome wide variation and address fundamental questions about speciation. As such my scientific research is pure rather than applied. However, there are a number of stakeholders for which the methods developed in this project are of immediate relevance:

Applied research
Research into crop breeding increasingly uses genomic data for marker assisted breeding and to identify natural populations and cultivars that contain genetic variation of potential interest for modern breeding programs. Many economically important horticultural crop species in the UK (apples, strawberries) are the result of past hybridisation events between several wild ancestral forms and their complex population histories need to be taken into account explicitly in such analyses. Likewise, many insect (e.g. the pear sucker) and fungal crop pests have evolved very recently from less harmful or abundant wild forms and there is great commercial interest in understanding when and how this divergence took place and what genetic changes were involved. In many cases some of the most devastating crop pests and pathogens are invasive species. Being able to reconstruct the histories of these species will help to identify key adaptive genes that have diverged rapidly or introgressed from other species. Such knowledge is crucial for identifying targets for chemical and biological control.

Conservation Biology
Conservation research is increasingly reliant on genetic data to determine conservation units and strategies and to optimising captive breeding programs (which in the case of the East Asian Sus species include a number of zoos in the UK). However, to date this is mainly restricted to small numbers of genetic markers such as microsatellites. With the sequencing costs falling, it is only a matter of time until conservation biologists will routinely base conservation strategies on genomic data. A major concern in conservation biology is to detect and minimise hybridisation between endangered wild populations and domestic forms and to understand the causes of population structure and size changes in the wild.

General Public
Finally, I see the main immediate societal Impact of my research in increasing the public understanding of science, in particular research into speciation. There is generally great public interest in this topic, as for example, demonstrated by the media coverage of the genomic analyses of our own history. However, speciation research (including my own projects in the past) is mainly focused on either very exotic systems that are unknown and inaccessible to the general public (e.g. African Lake Cichlids, Heliconius butterflies) and/or model species that are small and indistinguishable to non-experts (Drosophila fruit flies). As a result, speciation research is almost entirely disconnected from the activities of hobby naturalists (which have a long tradition in the UK) and may easily be perceived as arcane or irrelevant for understanding and conserving native wild life. Likewise, organisations devoted to the conservation in the UK, although very active in educating the public about the natural history of native species and the need to conserve them and their habitats, very rarely put this information into an evolutionary context. Many of the species I will investigate at a genomic level in this project are charismatic, big insects that occur in the UK and will be familiar to many hobby naturalists and gardeners (e.g. butterflies, dung beetles). This provides a unique opportunity to educate the general public about the evolutionary and geographic origin of our own fauna and raise the public profile of genomics and speciation research (see Pathways to Impact).
 
Description - new and computationally efficient ways to infer past population processes (e.g. changes in population size, gene flow and splits between populations) from genomic data.

- a new method for estimating recomabination rates from individual genomes.

- a new quantitative approach for identifying genomic outliers between species and populations.

- a new and general causal relationship between the number of chromosomes and the levels of genetic diversity in wild populations.

- a detailed understanding of the mode and timescale of speciation in European butterflies

- new mathematical predictions for the signal of past natural selection in genome-wide variation and a demonstration of how these can be used to characterise positive selection from sequence variation.
Exploitation Route Genomic regions may contain targets of past selection and using genomic data to scree for such targets is of great interest in a wide range of fields including: archaeology, conservation biology and the control of disease vectors and agricultural and horticultural pests.
Sectors Agriculture, Food and Drink,Education,Environment

 
Description ModelGenomLand, ERC starting grant
Amount £1,521,119 (GBP)
Funding ID 757648 
Organisation European Research Council (ERC) 
Sector Public
Country Belgium
Start 02/2018 
End 02/2023
 
Title ABLE 
Description We make use of the distribution of blockwise SFS (bSFS) patterns for the inference of arbitrary population histories from mutliple genome sequences. The latter can be whole genomes or of a fragmented nature such as RADSeq data. Our method notably allows for the simultaneous inference of demographic history and the genome-wide historical recombination rate. Additionally, we do not require phased genomes as the bSFS approach does not distinguish the sampled lineage in which a mutation occurred. As with the Site Frequency Spectrum (SFS), we can also ignore outgroups by folding the bSFS. Our Approximate Blockwise Likelihood Estimation (ABLE) approach implemented in C/C++ and taking advantage of parallel computing power is tailored for studying the population histories of model as well as non-model species 
Type Of Material Data analysis technique 
Year Produced 2016 
Provided To Others? Yes  
Impact This method is now being used by several research groups to analyse population genomic data 
URL https://github.com/champost/ABLE
 
Title Computing likelihoods under the coalescent 
Description The inference of demographic history from genome data is hindered by a lack of efficient computational approaches. In particular, it has proved difficult to exploit the information contained in the distribution of genealogies across the genome. We have previously shown that the generating function (GF) of genealogies can be used to analytically compute likelihoods of demographic models from configurations of mutations in short sequence blocks (Lohse et al. 2011). Although the GF has a simple, recursive form, the size of such likelihood calculations explodes quickly with the number of individuals and applications of this framework have so far been mainly limited to small samples (pairs and triplets) for which the GF can be written by hand. Here we investigate several strategies for exploiting the inherent symmetries of the coalescent. In particular, we show that the GF of genealogies can be decomposed into a set of equivalence classes that allows likelihood calculations from nontrivial samples. Using this strategy, we automated blockwise likelihood calculations for a general set of demographic scenarios in Mathematica. These histories may involve population size changes, continuous migration, discrete divergence, and admixture between multiple populations. 
Type Of Material Computer model/algorithm 
Year Produced 2016 
Provided To Others? Yes  
Impact The method is being used for population genomic analysis by a number of research group 
 
Title gIMble 
Description A bioinformatic toolset for demographically explicit genome scans for species barrier. 
Type Of Material Computer model/algorithm 
Year Produced 2019 
Provided To Others? Yes  
Impact My research team organized and ran a 2 day workshop for the research community as part of an SMBE satellite meeting in May 2019 showcasing this method. 
URL https://github.com/DRL/gIMble
 
Description Butterfly speciation genomics 
Organisation University of Exeter
Country United Kingdom 
Sector Academic/University 
PI Contribution Wet lab work and sequencing for 40 butterfly species. Genome assembly and annotation for 40 species
Collaborator Contribution Wet lab work and sequencing for 40 butterfly species
Impact 10.1101/534123
Start Year 2017
 
Description Speciation genomics in skipper butterflies 
Organisation Institute of Evolutionary Biology
Country Spain 
Sector Private 
PI Contribution - Generation of a reference assembly for Spialia orbifer - Hosting a visiting PhD student (March -May 2020)
Collaborator Contribution - Tissue samples for whole genome sequencing - Ecology and range data - PhD student visit
Impact n/a
Start Year 2018
 
Description UK Tree of Life 
Organisation The Wellcome Trust Sanger Institute
Country United Kingdom 
Sector Charity/Non Profit 
PI Contribution My lab is providing samples, taxonomic and bioinformatic expertise to the Darwin Tree of Life initiative to generate chromosomal level genome assemblies for all 58 species of butterfly occurring in the UK.
Collaborator Contribution The Wellcome Trust Sanger Institute covers all wet lab and sequencing costs and will also generate genome assemblies
Impact Still ongoing
Start Year 2019
 
Description School visit (Edinburgh) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Schools
Results and Impact Three interactive sessions on butterfly biology and conservation in a local primary school (P3). Following this activity all three classes started a biology project on butterfly development.
Year(s) Of Engagement Activity 2017