Molecular convergence at the sequence level: a genome-wide approach in a novel mammalian model

Lead Research Organisation: Queen Mary University of London
Department Name: Sch of Biological and Chemical Sciences

Abstract

Convergent evolution is the independent origin of the same feature in different groups of living things. Classic examples include the vertebrate wing, which has independently evolved a number of times, for example in bats, birds and pterosaurs, and the similar image-forming eyes of vertebrates and some invertebrates such as squid. The fact that similar structures have evolved several times suggests that they evolved to perform similar functions, so convergent evolution is powerful evidence that natural selection has shaped these features - there can be little doubt that bat and bird wings both evolved to allow powered flight, for example. Yet though examples of convergence are extremely common in the tree of life, we understand very little about the extent to which convergent evolution happens at the genetic level, in sequences of DNA and the proteins that they code for. We have recently identified several examples of apparent convergence in a suite of genes involved in hearing in different groups of echolocating mammal. Echolocation involves the production of sonar pulses and processing of the returning echoes for hunting and orientation, and poses particular challenges for high frequency hearing. It is seen at its most sophisticated in some lineages of bats and whales. For many years, echolocating bats were separated from fruit bats; however, advances in our ability to resolve species' relationships provided irrefutable evidence that some echolocating bats were in fact more related to the fruit bats than they were to each other. This finding has led to a revision of bat evolutionary relationships; so raising the issue that echolocation has either been lost by the fruit bats, or has evolved more than once by convergence. We have studied 'hearing genes' in bats and whales, and found that evolutionary trees based on four of these genes all unite echolocating bats into a single but technically incorrect group. Even more surprising, one of these genes leads to a well supported group of these bats with echolocating dolphins. These results raise the intriguing possibility that convergence in anatomical traits might sometimes be underpinned by convergence at the sequence level. Any finding of convergence of this kind is surprising, as the number of possible sequences for any gene is astronomically large. Therefore, such cases are unlikely to arise by chance. The identification of convergent molecular evolution in a number of different genes associated with a particular trait is, to our knowledge, unprecedented. Our evidence suggests that molecular convergence may be far more common than currently suspected. This may be partly because few scientists have been looking for this kind of convergence, and there has been no systematic attempt to investigate how common it might be. Methods to detect convergence in DNA or protein sequences are also relatively new. To confirm our finding, we want to search for convergent sequences across entire genomes, looking for genes that show convergence between the groups of echolocating bats, between bats that share similar echolocation calls, and also between bats and whales. If convergence is common in this system, it will advance our knowledge of echolocation, for example by identifying a number of genes probably involved in this system. More importantly, if confirmed in other systems, it will change the way scientists think about how genes and proteins evolve, suggesting that the pathways that evolution can take may be more constrained than previously thought, so that there may be relatively few good ways for evolution to fashion a protein to do a particular job.

Technical Summary

Cases of adaptive functional and structural convergence, where different lineages evolve similar traits independently by natural selection, have proven central to our understanding of evolution. Yet convincing reports of molecular convergence at the sequence level are exceptionally rare. This paucity of cases might either be due to a genuine lack of sequence convergence, or could simply reflect under-reporting because of our inability to detect and test for this phenomenon. The independent evolution of echolocation in some bat lineages, and in toothed whales, represents a spectacular example of multiple convergence in mammals that has led to numerous shared auditory features among unrelated taxa. We have unpublished data from four different 'hearing genes' that show sequence convergence among unrelated groups of echolocators. All four gene trees unite paraphyletic echolocating bats into one clade, and one unites echolocating bats and dolphins. Using echolocation as a model system, we will develop a novel pipeline to test for convergence at the sequence level at multiple taxonomic levels. Our approach will utilise deep sequencing technology and phylogenetics to assess individual site-wise support (nucleotides and amino acids) for competing true versus convergent phylogenetic hypotheses. Our pilot data suggest that the pathways that evolution can take may be more constrained than previously thought, and we anticipate our results from genome-wide scans for convergence will change the way scientists think about how genes and proteins evolve. Our project will have benefits for those working in genetics, bioinformatics and comparative genomics, and have potential applications for the detection of disease-causing mutations.

Planned Impact

Who will benefit? Our proposal tackles a pivotal question in evolution: to what extent has the evolution of convergent phenotypic traits arisen by common genetic routes? Our outputs (incl. a pipeline developed for its diagnosis) will be of interest and benefit to multiple user-groups include phylogeneticists, evolutionary geneticists, genome biologists, bioinformaticians, as well as researchers in echolocation, hearing and bats/cetaceans. Indeed, an ability to resolve true species relationships is vital in forming a comparative framework against which much biological data must be understood. Our 2X genome data for 4 new bat spp. will ensure that as well as addressing our specific aims, our invested time and money will have enormous impacts (publicly archived genetic data are used by 1000s of researchers daily). Medical geneticists will also derive indirect benefits. Comparative genetic data frequently inform our understanding of human disease, and vice versa. Indeed, genes implicated in diseases/developmental abnormalities in humans often show mutations in other animal models with similar conditions, and sometimes have undergone evolutionary changes in other lineages (e.g. opsin genes with blindness-causing mutations in humans have been turned off in some nocturnal taxa). There is also now interest in diagnosing whether cancers and other diseases arise via parallel/convergent or different somatic mutations across individuals. Bat genome data are arguably especially interesting given the evolutionary innovations seen in bats, and their potential implications for understanding processes in other taxa (e.g. wing development/limb deformities; echolocation/deafness; hibernation/fat metabolism). How will they benefit? Our findings will have immediate and major impacts in the fields of evolution, phylogenetics and zoology. As well as addressing a major gap in current knowledge, our methodological pipeline for testing for sequence convergence will also advance current capacity to process genomic data, so contributing to the basic toolkit of comparative biology. In the longer-term, the publicly accessible genetic and genomic data generated will provide immeasurable benefit for the users listed. Indeed, the impact of these data will extend well beyond our personal research interests. Updated annotations will ensure our generated data provide a valuable evolving resource for scientists worldwide, with the potential to enhance knowledge of gene function and evolution. Our PDRA will be trained in handling and assembling data from Next Generation Sequencing, as well as downstream analyses. These competitive skills are already lacking in the UK workforce, and are likely to become critical as new genome data are generated at ever increasing rates. Our PDRA's skills will benefit our national research base, and will be transferable to a wide range of sectors. What will be done to ensure they benefit? To ensure other researchers derive maximum benefit from our findings and data, we will make all parts of our results freely available. Briefly, raw sequence data will be uploaded to GenBank, following protocol. All source code for our pipeline for diagnosing convergent sequence evolution, and our tree files, will be hosted on our server for free download. Cotton already hosts source code on his web page. Our processed data will be written up and submitted to high impact journals. Where possible, we will pay for Open Access publication. We will also present our findings at national and international meetings. To communicate results that are of wider or public interest, we will work with the press offices (QMUL, BBSRC and journals publishing our work). We have track records in writing press releases, with our work covered by national media (newspapers/radio e.g. BBC), journal editorials or reviews (e.g. Nature, Current Biology, TREE) and internal publications (e.g. NERC) (see Rossiter's web site).

Publications

10 25 50
 
Description Our analysis of sequence convergence showed that signatures of convergence are actually widespread across the genome and that bats, and also bats and cetaceans, show adaptive sequence convergence in multiple sensory genes. The paper in Nature has been cited over 200 times, and widely used. Meta analyses of our method versus other methods have shown that our approach performs favorably in terms of balancing false positives and false negatives.

Our phylogenomic analyses indicate that, contrary to recent findings, bats are not closely related to odd-toed ungulates but instead have a more ancient origin, as sister-group to a large clade of carnivores, odd-toed ungulates and cetartiodactyls.
Exploitation Route We have performed the first truly genome-wide scan of convergence. We have built software to view alignments of genome-scale sequence data, and also screen these data for signatures of convergence. The method has been widely used by others. In addition, the genomic datasets generating have been incorporated in numerous studies of genomes, including endogenous viruses, functional genes, and phylogenomics.
Sectors Education,Other

 
Description Genome data generated by this project are fully available on GenBank and have used by other research groups, including in published papers. Additionally, the methods we developed have attracted wide interest and are also being used by a number of research groups that are actively studying the phenomenon of evolutionary convergence.
First Year Of Impact 2013
Sector Education
 
Description European Research Council Starting Grant
Amount € 1,499,914 (EUR)
Funding ID GA 310482-EVOGENO 
Organisation European Research Council (ERC) 
Sector Public
Country Belgium
Start 02/2013 
End 01/2018
 
Description Marie Curie (IIF)
Amount £179,000 (GBP)
Funding ID PIEF-GA-2010-276243 
Organisation European Commission 
Sector Public
Country European Union (EU)
Start 02/2012 
End 03/2014
 
Description Newton Fellowship
Amount £66,000 (GBP)
Funding ID NF130915 
Organisation The Royal Society 
Sector Charity/Non Profit
Country United Kingdom
Start 01/2014 
End 01/2016
 
Description Standard research grant (I am a Co-I)
Amount £512,030 (GBP)
Funding ID BB/L012162/1 
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 08/2014 
End 08/2017
 
Description Strategic Environmental Science Capital Funding
Amount £284,000 (GBP)
Organisation Natural Environment Research Council 
Sector Public
Country United Kingdom
Start 01/2014 
End 03/2016
 
Title bioinformatics pipeline for detecting sequence convergence 
Description This is a bioinformatics method (written in JAVA) for genome-wide detection and characterization of adaptive sequence convergence. 
Type Of Material Biological samples 
Year Produced 2013 
Provided To Others? Yes  
Impact This method was reported in our Nature paper arising from the BBSRC grant, and is now being used by a number of groups internationally. We are now preparing software for release. 
 
Title Four novel bat genomes 
Description We generated and assembled new short-read pair-end sequence data (Illumina HiSeq) from the genomes of four bat species: 1. Eidolon helvum (family Pteropodidae) 2. Megaderma lyra (Megadermatidae) 3. Rhinolophus ferrumequinum (Rhinolophidae) 4. Pteronotus parnellii (Mormoopidae) 
Type Of Material Database/Collection of data 
Provided To Others? No  
Impact Both short read data and genome assemblies were deposited to public databases (Genbank and SRA). The genomes have been already used by other research groups (e.g. Ahn et al Scientific Reports 2016). 
 
Title Phylogenomic dataset of 2,320 CDSs 
Description As part of this award, we generated new genome data for four bat species which we combined with data form another two bats and 16 laurasiatherian mammals to assemble a phylogenomic dataset of 2,320 orthologous coding DNA sequence (CDS) alignments across 22 mammals. 
Type Of Material Database/Collection of data 
Year Produced 2014 
Provided To Others? Yes  
Impact This large phylogenetic dataset has been requested repeatedly by other research groups. To our knowledge it was used as a experimental dataset in studies aimed to address: 1) The impact of missing data on species tree estimation (Xi et al MBE 2015) 2) Coalescent methods robustness to the effects of long branches and incomplete lineage sorting (Liu et al MBE 2015) 
 
Description Lend me your ears: exploring peculiar similarities in bat and whale hearing genes 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? Yes
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact Converegence in bats and whales for a non-specialist audience

no actual impacts realised to date
Year(s) Of Engagement Activity 2012
 
Description Paper title: Signatures of genome-wide convergent molecular evolution: initial results. 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? Yes
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Abstract of poster presented: Recent studies have demonstrated that convergent sequence evolution can be detected in vertebrates using statistically robust phylogenetic methods that model parallel substitutions in genetic data. We are scaling this approach to a genomic level using 5 novel bat genomes and orthologous genes in over 30 other published genomes. Focusing on genes that have undergone convergence during the independent evolution of echolocation, we describe the computational algorithms and informatics framework we have developed, and our initial results. These early findings give promising indications that signatures of convergent molecular evolution are more prevalent in vertebrate genomes than previously recognised. poster

no actual impacts realised to date
Year(s) Of Engagement Activity 2012
 
Description Seminar: Interpreting tree space in the context of very large empirical datasets 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Undergraduate students
Results and Impact Invited seminar presented to Department of Mathematics lunchtime seminar series, University of Portsmouth.
Year(s) Of Engagement Activity 2013,2014
 
Description Seminar:Highly Parallel Phylogenetics of genomic coding sequence (CDS) data using high-throughput computing resources 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Study participants or study members
Results and Impact Invited seminar presented to MSc (Evolutionary Biology) students of University College in Dublin.
Year(s) Of Engagement Activity 2013
 
Description Talk: Developing a flexible platform for highthroughput phylogenomics: case study, conclusions and lessons for the future 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Study participants or study members
Results and Impact Talk presented at the Tropical biodiversity in the 21st century symposium / Genomic Observatories 3 Workshop, Natural History Museum, London, UK.
Year(s) Of Engagement Activity 2013
 
Description Talk: Developing a flexible platform for highthroughput phylogenomics: case study, conclusions and lessons for the future 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Study participants or study members
Results and Impact Talk presented at the Tropical biodiversity in the 21st century symposium / Genomic Observatories 3 Workshop, Natural History Museum, London, UK.
Year(s) Of Engagement Activity 2013
 
Description Title of paper delivered: The molecular basis of convergence: a genome-wide approach in a novel mammalian model 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? Yes
Geographic Reach International
Primary Audience Participants in your research or patient groups
Results and Impact Invited seminar to Molecular Phylogeny Group at the University of Montpellier

no actual impacts realised to date
Year(s) Of Engagement Activity 2012