Molecular evolution and variation in genomic regions with low recombination

Lead Research Organisation: University of Edinburgh
Department Name: Inst of Evolutionary Biology

Abstract

One of the classical problems of biology is the evolutionary role of sexual reproduction. This involves the bringing together of the genomes of two parents, and reshuffling them by the process known as genetic recombination, so that an individual offspring receives a mixture of contributions from each parent. This allows the evolutionary fates of genetic variants at different places in the genome to behave more or less independently of each other. One consequence of this is that natural selection can act at one site in the genome without interfering with what happens at other sites. Many specific models of evolutionary processes that can cause interference between different sites when sex is absent have been proposed: they all predict that selection is less effective in the absence of sex, leading to a large loss in fitness. It is, however, hard to test these predictions in nature, since asexual species or populations are very rare, and in most cases have arisen only recently from sexual ancestors. Nevertheless, it is important to test these predictions, both for the intellectual interest in understanding why sexual reproduction is so common in nature, and because there are plans to develop asexual strains of plants for the purpose of breeding crops. A way around this difficulty is to compare different regions of the same genome. Genetic recombination, the process that creates the reshuffling of parental contributions during sexual reproduction, is rare or absent in some parts of the genome, especially the part of the genome known as the heterochromatin. Such parts of the genome are expected to behave like asexual species, in terms of their evolutionary patterns. Until very recently, however, the heterochromatin has been impossible to study at the level of DNA sequences, since it contains large amounts of DNA that are repeated over and over again, making it hard to study at the sequence level. Recent breakthroughs in research on the fruitfly Drosophila, the best-studied model animal species, have led to the characterisation of several hundred genes in the heterochromatin. This means we are now in a position to study evolution and variation of genes in the heterochromatin almost as easily as genes in the rest of the genome, and can therefore see whether or not they show the patterns expected from their lack of recombination. Sophisticated statistical methods are available for this purpose, but require large datasets to be used effectively. We plan to exploit new technologies for sequencing DNA, which allow large quantities of information to be generated rapidly and cheaply. We will use these to generate data on variability in a large number of genes in the heterochromatin and in other parts of the genome, within populations of two closely related species of Drosophila. By combining the results of these studies with computer-based analyses of the published genome sequences of other species of Drosophila, we will be able to determine whether or not the patterns that we seen in regions of the genome with low levels of recombination agree with our theoretical models. If they do, we will have much more convincing evidence than currently exists that sexual reproduction has an evolutionary advantage, and that an absence of sex leads to a severe decline in fitness. The data that we generate can also be used for answering a wide range of other questions, such as the extent to which there is any recombination in the heterochromatin, the extent to which natural selection acts on DNA sequences that are not involved in determining protein sequences, and the intensity of selection acting on the protein sequences themselves. We will make these data publicly available through databases, scientific publications and conferences. Given the great public interest in questions of this kind, we will also communicate our results to the media.

Technical Summary

The overall goal of the project is to elucidate the nature of the differences in evolutionary forces affecting the low-recombination heterochromatin and the high-recombination euchromatin of Drosophila, with the aim of illuminating the evolutionary significance of sex and recombination. We will use the large number of genes recently identified from the Drosophila melanogaster heterochromatin to study patterns of evolution when recombination is rare. Public databases will be used to test whether patterns of nucleotide substitutions between related species of Drosophila at neutral sites differ between the euchromatin and heterochromatin, in a way that is consistent with their differences in base composition. Similar comparisons will determine whether protein sequence evolution is accelerated for genes in the low-recombining heterochromatin, as expected if selection is less effective there. We will use next-generation sequencing technology to generate a large dataset on genetic variation at heterochromatic and euchromatic genes in D. melanogaster and D. simulans. The polymorphism and divergence studies will be combined to test whether or not the differences between euchromatin and heterochromatin in patterns of substitution at both coding and non-coding sites reflect mutational biases, or the effects of a reduced intensity of selection and/or biased gene conversion in heterochromatin. We will also test whether the polymorphism data show evidence for gene conversion events in heterochromatin, as might be expected from studies of other non-recombining regions. This will enable us to parse out any contribution of biased gene conversion to patterns of base substitution. We will also generate gene expression datasets in D. melanogaster and D. simulans, to ensure that corrections for gene expression as a covariate can be used when interpreting the patterns that we detect, and to test whether levels of gene expression are related to recombination rates, as found previously.

Planned Impact

The chief beneficiaries of this project are the scientific community engaged in research on molecular and genome evolution, a rapidly growing field. Both researchers on the project have extensive links with members of this community in both Europe and North America, many of whom were trained by BC. During the course of the project we will generate very large DNA polymorphism and gene expression datasets using relatively novel, next-generation sequencing technology. In particular, we will generate polymorphism data for a set of orthologous genes in two Drosophila species, in regions of the genome for which sequence variation has never previously been collected. We will disseminate these data by submitting them to the public single nucleotide polymorphism databases (The Berkeley Drosophila Genome Project and dbSNP at the National Center for Biotechnology Information), and will also make the entire dataset available as gff3 (generic feature format) files, a standard file format used for storing genomic features. The gene expression data will be submitted to the ArrayExpress database at the European Bioinformatics Institute, a public archive for transcriptomics data. The community will therefore be able to use these data; with the growing need for large data sets of this kind, it is important that individual laboratories contribute such data to a pool that can be widely used. We will also report the results of our own analyses to the community through publication of research papers, presentations as scientific meetings and through invitations to present and discuss our results at other universities in the UK and beyond. The rapidly increasing availability of whole-genome sequences and large-scale data on genetic variation in populations has resulted in a demand for expertise in population genetics approaches that can be used to analyse these types of data. The BBSRC Summer School in Molecular Evolution trains researchers at all stages of their careers who wish to use these evolutionary approaches; we have both participated in teaching on this course, and expect to do so again during the course of the project. We also often advise other researchers both within our own Institute and around the world about useful evolutionary analyses, as well as the principles on which they are based. In addition, we are both involved in teaching the next generation of scientists, and always try to convey to students the importance of evolutionary biology in the wider scientific arena. We are enthusiastic about communicating with the general public on the importance and relevance of scientific research and BC regularly gives public lectures to general audiences on evolutionary biology. We are also very keen to publicise our research in the non-academic media whenever there are findings that are of general interest to the public. The University of Edinburgh has a very active press office, within which the Institute of Evolutionary Biology Press Gang operates to highlight any work of general interest to the public that comes out of the Institute, and also helps researchers to communicate their work effectively to the non-specialist. Finally, we believe there are wider impacts of our work. There is a need to apply population genetics methods to the understanding of human disease, a rapidly moving area of research due to the generation of large-scale survey of genetic variants associated with diseases and complex traits. Much of the analysis of these data is based on population genetics methods developed in relation to Drosophila. The expertise and intellectual environment provided by population genetics laboratories such as this one plays a critical role in the training of workers in this area, who are currently in short supply. There are, therefore, important long-term benefits for the UK's medical science and biotechnology communities from this type of research, of critical importance for the UK economy.

Publications

10 25 50
 
Description The major achievements of the grant are as follows.

1. Comparisons of the genome sequences of Drosophila melanogaster and D. yakuba using newly-identified genes in the heterochromatin show that the rate of protein sequence evolution is elevated in genome regions that lack crossing over, and selection in favour of codon usage is ineffective.
2. Combining these between-species comparisons with analyses of within-population variation in D. melanogaster, we have shown that reduced rates of genetic recombination are associated with a reduced effectiveness of selection in favour of beneficial mutations, and against harmful mutations. In addition, the X chromosome shows evidence of both a higher rate of adaptive evolution of protein sequences and of more effective selection on codon usage.
3. We have successfully generated a set of whole-genome DNA sequences for 22 lines of D. simulans derived from 2 natural populations, and subjected them to the same type of analyses as for D.melanogaster. This provides a valuable resource.
Exploitation Route Further use of the D. simulans sequences can be used by researchers in the evolutionary genetics community; this is already happening.
Sectors Agriculture, Food and Drink,Education

 
Description Communication of results of first paper from project to media. Lecture to A-level students at Loretto School on 'The evolutionary significance of sexual reproduction'
Sector Education
Impact Types Cultural,Societal

 
Title Sequences uploaded to a public database 
Description Short read Illumina sequences from multiple individuals of Drosophila simulans were uploaded to to the European Nucleotide Archive. Study accession number: PRJEB7673 
Type Of Material Database/Collection of data 
Year Produced 2014 
Provided To Others? Yes  
Impact We anticipate that this will be useful research tool for Drosophila geneticists and evolutionary biologists; we have already provided the data to two other research groups. A paper by one of these, describing the results of analyzing one aspect of the data, has been published as online early in December 2016, in Genome Biologgy and Evolution. 
URL https://www.ebi.ac.uk/ena/submit/sra
 
Description Communication of results of first paper from project to media, including interviews on radio and by journalists. 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Media (as a channel to the public)
Results and Impact Several media organisations and people contacted us for further information.

The topic (the evolutionary significance of sexual reproduction) clearly aroused interest among members of the public.
Year(s) Of Engagement Activity 2012
 
Description Lecture to A-level students at Loretto School on 'The evolutionary significance of sexual reproduction' 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Schools
Results and Impact Several questions were asked by the audience members.

Nothing special.
Year(s) Of Engagement Activity 2013
 
Description Visit to plant scientists at the James Hutton Institute, Dundee 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Professional Practitioners
Results and Impact I was able to assist scientists working on barley improvement to interpret their data on genetic variation correctly.

The corrected analyses appeared in a scientific publication.
Year(s) Of Engagement Activity 2013