Evolutionary ecological genomics of the great tit

Lead Research Organisation: University of Sheffield
Department Name: Animal and Plant Sciences

Abstract

Adaptation lies at the heart of Darwin's theory of evolution. For many years, biologists have tried to identify "adaptive genes" that are responsible for phenotypic changes that enable an organism to adapt to its environment. This research is important not only for understanding how Darwinian evolution comes about, but also for practical reasons. For instance, in a time of rapid environmental change, understanding the mechanisms and genetic basis of adaptation is critical to assessing the potential of wild organisms to respond to selective pressures. Studying adaptation is, however, challenging. It requires the use of rigorous mathematical models to analyse genomic DNA sequences, in order to help us understand how the genome of the organism in question evolves over time, and to identify genes showing unusual characteristics compared to the rest of the genome as a result of their involvement in recent adaptations. Thus far, this theory-based approach has mainly been applied to several well-studied organisms such as humans and fruit flies where the necessary resources are available. More recently, the rapid development of high-throughput DNA sequencing technologies has made it economical and efficient to study the genomes of wild organisms, including those studied by ecologists. In this proposal, we will use the theory-based approach to study a wild songbird, the great tit (Parus major). The great tit has been the focus of classical long-term ecological studies throughout Europe, and has been used to study key topics such as when and how the timing of breeding responds to climate change. We will use high-throughput sequencing technologies to obtain the genome sequences of multiple great tit individuals. By making use of, and further developing, state-of-the-art theoretical models, we will obtain a detailed understanding of how genetic variation is distributed across the great tit genome, and what evolutionary forces have influenced this distribution. This knowledge will enable us to search for genes that are responsible for recent adaptations, and will help us answer important questions such as whether genes affecting ecologically important traits such as the timing of breeding are involved in recent adaptations. In addition, we will examine to what extent the great tit genome differs from those of other birds such as the zebra finch, in order to understand the longer-term evolutionary history of genes of interest. These results will help us bridge the gaps in our knowledge of adaptation and genome evolution in wild great tits. More importantly, the methodology we employ is generic, and can be used to study other wild organisms. Thus, our research will be of interest to biologists working on a wide range of topics.

Planned Impact

Birds are of enormous ecological, societal and economical importance. Our research concerns the investigation of fundamental questions about genome evolution, as well as the mechanisms and genetic basis of adaptation, in a classic ecological model organism, the great tit (Parus major). As such, this work represents basic science for which the most direct beneficiaries will be biologists working on evolutionary/population genetics, avian genetics, ecological genetics, and genomics. However, a deeper understanding of the nature of adaptation and genome evolution in wild organisms will facilitate progress in management/conservation of biodiversity and improvement of domesticated species. In addition, our project will provide training in statistics and computer programming, which are widely transferrable skills that are essential for both the academic and commercial sectors. We believe our research will be of interest to the following groups of beneficiaries:

1. The general public. Birds are widely-appreciated organisms, with bird watching being one of the most popular hobbies in the UK. The research being carried out by members of the team, both inside and outside this proposal, will allow us to locate genes in the genome that underlie interesting phenotypic variation, demonstrate how natural selection has acted on these genes, and interpret these findings using knowledge about the ecology and natural history of great tits. These outputs can be turned into accessible stories and provide a route for the public to understanding basic genetic and evolutionary principles, which will spark interest in science and increase awareness of the importance of conserving biodiversity.

2. Policymakers and conservation managers. An accurate understanding of the mechanisms and the genetic basis of adaptation is critical to predicting how organisms respond to changing environments. Our project is directly relevant to this issue. In addition, the genetic data will allow us to infer the dynamics of the population size of the great tit population thousands of years into the past. This will put the knowledge about the population dynamics gathered from long-term ecological studies into a much longer evolutionary timescale. This approach can also be applied to other wild species.

3. Scientists working on economically important domesticated birds. The avian karyotype is highly conserved even between distantly-related species. Thus, insights into genome evolution gathered from our research will be useful for studies of other birds such as chickens. For instance, our methods for detecting genes under positive selection can be used to search for genes that respond to artificial selection. In fact, some methods developed by the PI have already been used by other researchers for this purpose in chicken, rice, and sorghum. Identifying these genes will help to understand the genetic basis of economically important traits, which will in turn enable the design of better breeding programmes to improve productivity.
 
Description The Great Tit Genome Project
For over 50 years, the great tit (Parus major) has been a model species for research in evolutionary, ecological and behavioural research. In particular, great tits can learn socially in the wild, and solve complex learning tasks; there is also evidence that cognitive abilities may be important for their survival. Here, to provide further insight into the molecular mechanisms behind these important traits, we sequenced the genomes of 30 individuals from across Europe. Consistent with the importance of learning and cognition, we found that genes related to neuronal functions, learning and cognition were more likely to be under positive Darwinian selection than other genes. In addition, neuronal non-CpG methylation patterns in great tits are very similar to those observed in mammals, suggesting a universal role in neuronal epigenetic regulation, which can affect learning-, memory- and experience-induced plasticity. The high-quality genomic data obtained in this project will play an instrumental role in furthering the integration of ecological, evolutionary, behavioural and genomic approaches in this model species. This study was published in January 2016 in Nature Communications.

The Effects of Background and Interference Selection on Patterns of Genetic Variation in Subdivided Populations
In this theoretical study, we intended to address a fundamental question: how do natural selection against deleterious mutations and population structure jointly shape patterns of genetic variation. This is important because most new mutations that affect fitness are deleterious, and natural populations are often subdivided. Thus, a sound understanding of their joint effects is essential for analysing data collected from natural populations. To this end, we constructed new models, derived analytic equations, and devised a novel algorithm for generating samples under these complex models. We carried out extensive tests to validate our results. It was found that our new methods can accurately predict patterns of genetic diversity. Furthermore, we discovered a "scattered" sampling scheme, which could be exploited to effectively remove the effects of purifying selection on diversity patterns. Finally, we observed that there is a limit as to how much patterns of diversity could inform us about the strength of selection. Overall, these results, as well as the simulation program we developed, provide useful tools and guidance for analysing data collected from real populations. This student was published in December 2015 in Genetics.

Genome-wide evidence for adaptive evolution in two wild passerine species with different effective population sizes
This is a major paper from this grant, and has been published in Genome Biology and Evolution. In this paper, we generated whole-genome resequencing data of 10 great tits, and also downloaded the genomes of 10 wild zebra finches published recently by Singhal et al. (2015). We tried to address several fundamental questions. (1) How widespread are positive selection and negative selection, respectively, in coding and noncoding regions of their genomes? (2) Do these two species have significantly different effective population sizes, and if so, does the difference in effective population size translate into difference in the efficacy of natural selection, as predicted by population genetic theory? (3) How does variation in recombination rate across the genome modulate the efficacy of selection, especially after removing the confounding effects of GC-biased gene conversion (gBGC)? Our main findings are: (1) Both positive and negative selection are widespread in the genomes of the two birds. In particular, a larger proportion of the between-species differences we observed in coding and noncoding regions were driven by positive selection. Our paper provides the first genome-wide estimate of the prevelance of adaptive substitutions in noncoding regions in birds. (2) Zebra finches have a significantly larger effective population size than great tits. In both coding and noncoding regions, we find clear support for selection (both positive and negative) being more effective in zebra finches than in great tits. This is in agreement with the predictions of population genetic theory. (3) We found clear evidence that negative selection is more effective in high-recombination regions, in agreement with theoretical predictions. Interestingly, after controlling for the effects of gBGC, the rate of adaptive evolution does not increase with local recombiantion rate, contrary to what theory predicts. This shows the importance of controlling for gBGC in the study of selection using the DFE-alpha approach, but this factor has received relatively little attention to date. This paper completes the majority of the objectives described in Objectives 1 and 2 of the proposal. It also contains methodological innovations that will be of interest to many researchers working on evolutionary/population geneticis.

anavar - a program for estimating selection and mutation parameters sequence variability
Estimating selection and mutation parameters lies in the heart of population genetics. The key innovation implemented in anavar is its ability to simultaneously estimate selection and mutation parameters, while explicitly taking into account the confounding effects of demographic changes and misidentification of ancestral states. These methodological developments mean that anavar is the first method that can reliably estimate mutation and selection parameters from insertion and deletion sequence variability. It can also be used to quantify the importance of GC-biased gene conversion in shaping genome evolution (as detailed in Objective 2 in the proposal). Thus, anavar not only represents an important step in completing a central Objective of the proposal, but is also likely to be of interest to many researchers working on evolutionary/population genetics. The paper reporting anavar was published in Molecular Biology and Evolution.

varne - a package for estimating demography and detecting between-locus differences in the effective population size and mutation rate
Understanding past demographic changes and changes in the mutation rate and effective population size across the genome are fundamental questions in evolutionary genetics. However, no existing methods can simultaneously estimate demography and detect between-locus differences in the effective population size and mutation rate using genome-scale datasets. Here, we develop an efficient method for filling this gap. This paper was recently published in Molecular Biology and Evolution.

A study on selection on insertions and deletions in the great tit genome.
Using the anavar model described above, we analysed insertions and deletions (INDELs) in the great tit genome. Our main findings include (1) most polymorphic INDELs are strongly deleterious, (2) segregating INDELs are typically under very weak selection, (3) most fixed INDEL differences were driven by positive selection. This is the first systematic analysis of INDEL variants in the great tit genome. It highlights the importance of INDELs on genome evolution.
Exploitation Route The high-quality genomic data obtained from the Great Tit Genome Project and the whole-genome data of the 10 great tits will play an instrumental role in furthering the integration of ecological, evolutionary, behavioural and genomic approaches in this model species.

The theoretical results and the simulation program we developed for the background selection modelling project will provide useful tools and guidance for researchers working on data collected from natural populations.

The results and methodological innovations described in the paper involving the examination of the great tit and zebra finch genomes will be of interest to a wide audience.

The anavar package will provide useful new tools for estimating selection and mutation parameters from sequence variability (SNPs and INDELs), which will be of interest to many researchers in our field.

The varne package contains a set of methods that can be used to study various important questions in evolutionary genetics (e.g., estimating demography, comparing sex chromosomes and autosomes for detecting sex-biased processes). It is likely that the package will attract the attention of many researchers in the field.
Sectors Education,Environment

 
Title anavar - a program for estimating selection and mutation parameters sequence variability 
Description Estimating selection and mutation parameters lies in the heart of population genetics. The key innovation implemented in anavar is its ability to simultaneously estimate selection and mutation parameters, while explicitly taking into account the confounding effects of demographic changes and misidentification of ancestral states. These methodological developments mean that anavar is the first method that can reliably estimate mutation and selection parameters from insertion and deletion sequence variability. It can also be used to quantify the importance of GC-biased gene conversion in shaping genome evolution (as detailed in Objective 2 in the proposal). Thus, anavar not only represents an important step in completing a central Objective of the proposal, but is also likely to be of interest to many researchers working on evolutionary/population genetics. 
Type Of Material Computer model/algorithm 
Provided To Others? No  
Impact This program is still being developed, and we plan to submit the paper in the next few weeks. Given the importance of estimating selection and mutation parameters, the methodological innovations included, and the user-friendliness of the package, we expect the package will be attractive to many researchers in our field. 
 
Title msbgs: a coalescent simulator for generating samples under background selection and demographic changes 
Description msbgs is a simulation program for generating variability under background selection models using the coalescent framework first described in Zeng and Charlesworth (2011). It can accommodate biologically important factors including recombination (crossover), variation in selection coefficient against deleterious mutations across sites, and changes in population size, population structure and migration (Zeng, 2013; Zeng and Corcoran, 2015). 
Type Of Material Computer model/algorithm 
Year Produced 2015 
Provided To Others? Yes  
Impact Modelling the effects of background selection on patterns of genetic diversity is an essential task. Previously, there was no available software that could simultaneously generate random samples in the presence of background selection and many biologically important factors, such as changes in population size and population structure. By providing such a program, msbgs is likely to be of interest to many researchers working DNA sequence polymorphism data. 
URL http://zeng-lab.group.shef.ac.uk/wordpress/?page_id=28
 
Title varne - a package for estimating demography and detecting between-locus differences in the effective population size and mutation rate 
Description Understanding past demographic changes and changes in the mutation rate and effective population size across the genome are fundamental questions in evolutionary genetics. However, no existing methods can simultaneously estimate demography and detect between-locus differences in the effective population size and mutation rate using genome-scale datasets. Here, we develop an efficient method for filling this gap. 
Type Of Material Computer model/algorithm 
Year Produced 2019 
Provided To Others? Yes  
Impact This paper was published in February 2019. It is too soon to know whether it may lead to any impact.