Estimating selection on amino-acid sequence polymorphisms in Drosophila

Lead Research Organisation: University of Edinburgh
Department Name: Inst of Evolutionary Biology

Abstract

Mutations are the result of very rare errors in the replication of DNA, which lead to a change in the genetic information specifying the properties of organisms. A major question in biology is the extent to which mutations affect the fitness of an organism (i.e. its ability to survive and reproduce successfully). In particular, knowledge of the fitness effects of mutations is of great importance if we want to know whether species that have been reduced to very small numbers of individuals risk extinction because of the accumulation of harmful mutations. This happens especially rapidly in small populations, by the process of chance fluctuations of frequencies of mutations (genetic drift), which can overwhelm the ability of natural selection to eliminate harmful mutations. The rate at which this happens depends both on the numbers of mutations present in the population and on their effects on fitness. Many of the effects of mutations are mediated through their effects on the structure of protein molecules, determined by the DNA sequences of the corresponding genes. We can therefore assess at least some of the effects of mutations on the fitnesses of individuals in a population by looking at the fitness effects of mutations that change protein sequences, in a set of genes sampled at random from the whole genome. We can then scale up to the genome as a whole, in order to estimate the likely impact of mutations on the fitness of the population as whole, and the consequent risks of reducing population size. Most mutations probably have such small fitness effects that they cannot be measured directly, so we have to use indirect methods, based on theoretical models of the behaviour of genes in populations. The state of a population or species with respect to a given site on a DNA molecule can be exactly specified in terms of the relative frequencies of the four alternative nucleotide 'letters' that can exist at the site: A, T, G or C. These can be measured by determining the DNA sequences of individuals sampled from a population. By comparing DNA sequence variation within a species for mutations affecting protein sequences with variation caused by mutations with no such effects, and by comparing DNA sequences between members of related species, it is possible to analyse the results in relation to what is expected from mathematical and computer models of the effects of mutation, selection and drift. This allows us to estimate the fitness effects of harmful mutations that change protein sequences. I propose to use two closely species of fruitfly, Drosophila miranda and D. pseudoobscura, for this purpose. They live in the mountain forests of western North America, in habitats that are undisturbed by human activity, unlike many species of Drosophila used in evolutionary studies. The genome of D. pseudoobscura has recently being sequenced, reflecting its status as a classic organism for genetic and evolutionary studies. I will collect data on DNA sequence variability within these species for a set of about 60 genes, and compare these sequences with a more distant relative, D. affinis. Species which have not been subject to recent disturbances of population size, as seems to be the case for D. miranda, are especially useful for this purpose. The use of data on variability in two species, one of which (D. miranda) is much rarer than the other (D. pseudoobscura), is known from theoretical studies to be especially useful for the purpose of estimating the fitness effects of harmful mutations. It is also possible to use these data to answer the very interesting question of what fraction of the differences in protein sequences among related species have been caused by natural selection accumulating mutations which improve fitness, as well as several other questions of importance to biologists studying evolution.
 
Description The research has provided new DNA sequence variability data on two species of the fruitfly Drosophila that are of especial scientific interest, and validate methods for estimating the extent and intensity of selection on DNA and protein sequence variants.