Quantifying functional constraints in the mammalian genome

Lead Research Organisation: University of Edinburgh
Department Name: Inst of Evolutionary Biology

Abstract

Very recently, the complete DNA sequences of several mammalian genomes have been made available to the scientific community. These include the genome sequences of human, chimpanzee, macaque, dog, mouse and rat. Genomes contain genes that code for proteins, which are the building blocks of all living things. However, it has long been known that more than 98.5% of the mammalian genome consists of 'noncoding' DNA that does not code for proteins. Noncoding DNA is nonetheless important because it contains sequences that control the 'expression' of genes; that is, when and in what cells and tissues genes produce proteins. Gene expression control sequences are therefore of great interest to biologists, yet very little is known about how much of the genome consists of these sequences and where they are located in the genome. In our proposed project, we shall attempt to find out where the important gene expression control sequences are located in mammalian genomes. We will do this by comparing the genome sequences of several mammals. We will search for those parts of the genome that have remained similar to each other, and therefore have retained common functions, over the many tens of millions of years of mammalian evolution. As part of these comparisons, we shall measure and compare the amount of similarity in gene expression control sequences in the genomes of apes and rodents. Our previous work has suggested that gene expression control sequences are less strongly conserved in apes than rodents, suggesting that natural selection has been less effective in apes. We also propose to obtain DNA sequences from genes and gene expression control regions from individuals of a population of wild mice from India. We are proposing to study an Indian population because these mice are highly genetically variable. We expect to find differences between individual mice in their DNA sequences. The numbers of DNA sequence differences will allow us to estimate how much the sequence differences affect the reproductive success of individual mice.

Technical Summary

The great organismal complexity of mammals is believed to be determined to a large extent by the complexity of gene regulation. The elements that control the timing and specificity of gene expression are for the most part located in noncoding DNA, but are typically less well conserved than coding sequences, and the understanding of their nature is incomplete. For example, the fraction of the genome that is involved in gene expression control is largely unknown, as is the mode and strength of natural selection that operates on sequence variation. The comparative genomics approach can allow the identification of regulatory regions in noncoding DNA on the basis of evolutionary conservation. In our proposed project, we will develop an approach to estimate the fraction of selectively constrained coding and noncoding sites in the genome that have been conserved deep into the mammalian phylogeny. This will be based on comparisons between outgroup species that allow the inference of potentially conserved elements, then comparisons of more closely related species to estimate the fraction of selectively constrained sites in these elements. In particular, we shall compare genome-wide levels of selective constraints between murids and hominids, for which our previous work has indicated a substantially lower effectiveness of purifying selection in hominids. In parallel, we shall obtain a large polymorphism data set from wild mice that will enable us to estimate the distribution of effects of deleterious mutations in both coding and noncoding regions, using statistical methods that we have recently developed for this purpose. These data will also allow us to disentangle mutation rate variation in the genome from purifying selection as causes of conservation in putative gene regulatory regions.

Publications

10 25 50
 
Description 1. We developed a new method to infer the distribution of effects of new mutations and the fraction of adaptive substitutions based on within-species nucleotide polymorphism data and between-species divergence data, and applied this to data sets of coding and noncoding DNA in humans, Drosophila and wild mice.

2. We have set up a web server that runs software we have written to allow evolutionary biologists to analyse nucleotide polymorphism and divergence data in order to make inferences on the nature of selection operating in the genome.

3. We showed that in wild mice, about 50% of amino acid substitutions have been driven to fixation between species by positive selection.

4. We showed that ultra-conserved noncoding elements and their flanking regions are subject to substantially higher selective constraints in murid rodents than hominids, presumably due to differences in effective population size.

5. We have shown that most mutation occurring in noncoding DNA flanking protein-coding genes are either weakly deleterious or selectively neutral, and that only a small proportion of differences between species in these regions have been fixed by positive natural selection.

6. We have collected a data set of polymorphisms in conserved noncoding elements in wild mice The low level of polymorphisms and the distribution of allele frequencies indicates that most new mutations are strongly deleterious.
Exploitation Route The project is fundamental in character, and the end users are likely to be other academics. Our findings, however, have relevance for undrestanding the nature of variation affecting quantitative traits in humans and for animal and plant improvement.
Sectors Agriculture, Food and Drink,Other

 
Description Our research was fundamental in character, and we are unaware of applications by commercial organizations. However, our methods to infer the distribution of fitness effects of new mutations and the fraction of adaptive substitutions have been widely taken up by the scientific community.
Sector Other