Rapid construction of reference chromosome-level mammalian genome assemblies and insights into the mechanisms of gross genomic rearrangement

Lead Research Organisation: Royal Veterinary College
Department Name: Comparative Biomedical Sciences CBS

Abstract

We live in an era in which the genomes of new species are being sequenced all the time. The most modern ways to sequence DNA have many advantages over older approaches (the prominent one being a vastly reduced cost) but a problem that arises each time the genome of a new species is sequenced is that assigning large blocks of sequence to an overall genomic "map" can be problematic and/or very expensive. It's a little like finding your location on Google Maps but not being able to "zoom out" to establish where that position is in relation to the whole country. In essence the aim of this project is to rectify this problem at one fifth of the current cost. Using our experience with birds we have developed a high-throughput approach and the tools for assigning the sequences to their proper positions in chromosomes. This involves our own adaptations to a technique called "FISH" that can take the data from sequenced genomes and visualize directly blocks of DNA sequence as they appear in their rightful place in the genome. In this study we will focus our attention on 25 newly sequenced mammal species. More importantly however we will provide the means through which this can be achieved for any of the 5,000 living mammalian species. Mammals are important to our lives in that many are models for human disease and development and are critical to agriculture (both meat and milk). Others are threatened or endangered and, with impending global warming, molecular tools for the study of their ecology and conservation are essential. Our combined efforts have also developed computer-based browser methods to compare the overall structure of one genome with another, directly visualizing the similarities and differences between the genomes of several animals at a time, something we can share widely amongst the scientific community and general public via the world wide web. The differences between mammalian genomes arose through changes that happened during evolution. One of the main aims of this project is to find out how this occurred and what are implications of these changes. We have a number of ideas such as we think there may be different "signatures" that classify why blocks of genes tend to stay together during evolution. Armed with this information, we fully intend to take it out into the world. The devices that we will develop can be adapted for the screening of individual animals for genomic rearrangements that may cause e.g. breeding problems. Moreover, the resources we will develop provide a source for public information and student learning through a dedicated, outwardly-facing web site. We have received overwhelming support from numerous laboratories all over the world who are interested in using the resources that we will develop to ask biological questions of their own. For this reason, we feel that this project will help us understand evolution in mammals and contribute to establishing the UK as a central international hub of mammalian genomics.

Technical Summary

Unless a whole genome sequence is assembled to the level of one "(super)scaffold" per chromosome, the resultant assembly can be studied for gene structure and function but cannot be used effectively to address biological questions pertaining to critical aspects of evolutionary and applied biology. Multiple letters of support for this application attest to this. Contemporary genome sequencing projects however usually fall short of this "chromosome level" assembly unless supported by extensive funding resources (~$100,000/genome). In reality, with the genomes of more animals being sequenced but with limited resources, this problem will only increase unless lower cost solutions can be found. Recently we have, in birds, developed means of taking sub-chromosomal sized scaffold based assemblies (e.g. enhanced by Dovetail or bioinformatically by RACA) and "upgrading" them to chromosome level at a fraction ~20% of the cost. This approach involves a novel method of selecting BAC clones that will hybridise to any mammalian metaphase then multiplex adaptations of FISH approaches. Mammals are the most studied phylogenetic Class, however only ~25/5000 species have sequenced genomes assembled to chromosome level. Indeed, most recent de-novo sequencing projects typically produce assemblies of several super-scaffolds per chromosome. Our approach will upgrade 25 further genomes and provide both proof of principle and the practical means through which many hundreds more can be mapped and compared. Our approach will allow easy comparative visualization of multiple genome assemblies and testing of fundamental hypothesis pertaining to the importance of overall chromosome structure in the formation of lineage-specific and ancestral phenotypes and the conservation of blocks of homologous synteny who's functional and sequence features define phenotypic traits with medical, veterinary or agricultural relevance.

Planned Impact

At the core of this application is a commitment to high impact activity, specifically benefitting industry (UK plc), academia, the third sector and the general public (academic beneficiaries are dealt with in another section). The primary industrial supporter (and beneficiary) of this research is Cytocell Ltd who specialize in the development of multiple hybridization FISH probes. Building on a long-standing collaboration initiated by a Knowledge Transfer Partnership for the development of non-human probes, the company is very interested in our approach as it will lead to new product development and maximize the potential of the human BAC collection present in the company. After extensive market research we have collectively identified "chromosome evolution Multiprobe devices" and a range of individual animal translocation screening devices. Cytocell's generous in-kind contribution is outlined in the application and, as clearly stated, represents a genuine partnership incorporating real cash-equivalent contributions designed to maximize our collective skills to bring cross species hybridization probes to market and thus ultimately to the scientific community. Going into partnership with a company in this way means that the highest possible quality product can reach the widest market worldwide.

Digital Scientific UK have identified considerable benefit in collaboration on this project through the development of its new animal karyotyping software suites as a contribution to this project. They have generously agreed to provide these free of charge. Their new "Batch Capture" protocols integrating microscope hardware with their in house algorithms for multiple FISH capture normally are charged to customers at market price but the company have kindly donated unlimited use software to this project. Both these companies also see this project as means of working together with one anther more closely, adding to their R&D portfolio and thereby increasing their share value and the value of UK plc. Finally, Dovetail see value to their company and efforts to generate contiguous chromosome assemblies, seeing our approach as entirely complementary to theirs.

A gap in perception exists in understanding the role of gross chromosomal evolution in academia and industry. While in academia it is accepted that chromosome structures play an important role in gene regulation, industry application is still focused mostly on protein changes and ignores many other features of the genome. Our project will aim to start changing this perception by providing popular resources and outreach activities for non-scientists. These resources and events will hopefully have influence on the general public including the future policy makers (see Pathways to impact for details). Therefore, we expect to have an impact on future policies in animal sciences.

The third sector (museums) will benefit from our project through the inclusion of mammalian chromosome evolution histories into the interactive tools aiming at student education and popular science exhibitions in museums. One of such tools we recently built with ESEB is called 'Evolution Factory' which teaches schoolchildren the principles of chromosome and genome evolution. A more advanced version of the tool is interactive screen that we develop with a group from the University of California at Davis to be displayed in San Francisco Exploratorium. After the tool is developed and tested we will also approach the London Science Museum to investigate their interest in using this and other interactive games we develop for their exhibitions.
 
Description During the first 17 months of the project we built alignments of over 20 mammalian species against the human and cattle genomes and identified conserved sequence elements. These elements were used twofold:

1. We evaluated preliminary results of fluorescence in situ hybridisation of human and cattle BAC clones obtained from our grant partners and made selection of BAC clones for them to be hybridised on multiple mammalian species chromosomes. Based on the results of these hybridizations conclusions will be made regarding the genomic features of universal probes which should be suitable for all mammals.

2. These conserved sequences were utilised in our analysis of chromosome evolution in ruminant genomes in order to understand if chromosome breakage in evolution is related to changes of gene regulation used by natural selection to produce adaptive phenotypes and eventually, the new species. This work has also utilised some of the genome alignments we built. In addition, we looked at patterns of regulatory sequence (enchanter) changes near evolutionary breakpoints in ruminants and other species. Our findings demonstrated that near evolutionary breakpoints gene regulation is significantly different between species due to changes in enhancer and conserved element profiles caused by insertion of transposable elements. Our paper on this subject was published in Genome Research in 2019 (Farre et al., 2019a).

We upgraded three fragmented mammalian assemblies to chromosome levels. The first was the genome of gemsbok, a species adapted to survive very hot climates of Africa. Its genome will help us and others to reveal adaptations to hot climates (Farre et al., 2019b). The second genome is the genome of the Dromedary camel (Ruvinsky et al., 2019). This is an economically important species. Its chromosome level assembly could be used to improve camel breeds, look for milk-production QTLs and to understand camel adaptations. The genomes of the gemsbok and camel were assembled using our Reference Assisted Chromosome Assembly (RACA) algorithm combined with PCR verification of chromosome assemblies. For the last genome we upgraded to chromosomes in 2018, the giraffe we also utilised our preliminary panel of 140 cattle "conserved" BAC clones in addition to RACA and PCR-based verification of reconstructed chromosome structures (Farre et al., under review). This genome will be utilised to understand the biology of giraffes, their adaptations and unique features.

In collaboration with the Broad Institute and Prof. Harris Lewin we are currently working on assembling the chevrotain genome. This species is a primitive ruminant. Its genome could contain clues to the formation of unique ruminant features which made ruminants the most popular livestock. The Illumina Discovar fragmented assembly produced by the Broad Institute has been upgraded using Dovetail Chicago. In addition, we placed over 100 BAC clones on chevrotain chromosomes to produce a chromosome level, reference quality assembly for this species. We are working on making the HiC assembly for chevrotain as well. This method should allow us to reach a near chromosome level for the genome. Coupled with our BACs and PCR verification we should be able to achieve a chromosome reference quality assembly for this species in 2019. In addition, we are going to upgrade at least three more genomes to chromosome levels in 2019: Indian muntjac, Chinese muntjac, and aardvark. These species are important to study the patterns of chromosome evolution in mammals.

In 2019 we established a new collaboration with prof Juha Kantanen (Filand) to upgrade the reindeer genome to chromosome level using our conserved BAC panel. To make this panel more efficient we are going to extend the number of clones from 150 to ~300 in 2019 and test them in the laboratory of our grant partner Prof. Darren Griffin.

All our upgraded genomes are either available from the NCBI and Evolution Highway Chromosome browser or will be upon publication(s) acceptance.
Exploitation Route Our grant partners are using our recommendations as a guide for selecting BAC clones to be tested for multi-species hybridisation experiments

The chromosome level assemblies produced by us in the course of this grant could be utilised to study evolution, adaptations and specific traits in a number of mammalian species.

Our ruminant Evolution Highway website and the UCSC genome hub is an excellent resource to study chromosome evolution by students.
Sectors Agriculture, Food and Drink,Education

 
Description During the first year of this grant we produced and published a chromosome-level assembly of the Dromedary Camel (Ruvinskiy et al., 2018). This is an economically important species in many countries. Therefore the chromosome-level assembly we generated will be a tool to be used for detecting genes related to economically traits in camels and map disease genes to produce better camel breeds.
First Year Of Impact 2018
Sector Agriculture, Food and Drink
Impact Types Economic

 
Description Resequencing of Russian cattle and sheep breeds adapted to cold climates
Amount 24,000,000 руб. (RUB)
Organisation Russian Science Foundation 
Sector Public
Country Russian Federation
Start 04/2019 
End 03/2023
 
Title Chromoosme level assembly of the gemsbok genome 
Description A near chromosome level chromosome assembly of the gemsbok genome was constructed using a combination of the Reference-Assisted Chromosome Assembly tool and PCR verification of reconstructed chromosomes. 
Type Of Material Biological samples 
Year Produced 2018 
Provided To Others? Yes  
Impact A chromosome level assembly of the gemsbok, a highly adapted to desert conditions ruminant, has been published in GigaScience 
URL http://eh-demo.ncsa.uiuc.edu/ruminants
 
Title Chromsoome level assembly of Dromedary Camel 
Description A near chromosome level assembly of the Dromedary camel has been produced using a combination of the Reference Assisted Chromosome Assembly, PCR verification of the reconstructed chromosomes, and comparison with FISH and physical maps of camel and alpaca. 
Type Of Material Biological samples 
Year Produced 2019 
Provided To Others? Yes  
Impact A paper describing the dromedary camel genome was published in Frontiers in Genetics. 
 
Title Genome alignments of mammalian genomes against the cattle genome on the UCSC genome browser 
Description Alignments of 15 mammalian genomes against the cattle genome visualised on the UCSC Genome Browser. The alignments were obtained using the lastz aligner and parsed with the Kent utility tools 
Type Of Material Biological samples 
Year Produced 2019 
Provided To Others? Yes  
Impact A paper in Genome Research was published demonstrating effects of evolutionary rearrangements on the expression of nearby genes. 
URL http://sftp.rvc.ac.uk/rvcpaper/ruminantsHUB/hub.txt
 
Title Genome alignments of mammalian genomes and reconstructed ancestors on Evolution Highway 
Description 1.Visualisation of homologous synteny between the cattle genome and 12 additional mammalian species on our Evolution Highway Comparative Chromosome Browser was achieved by parsing lastz genome alignments the Kent utilities and the maf2synteny tool to build comparative synteny blocks. 2. Visualisation of homologous synteny between reconstructed cetartiodactyl, ruminant, and pecoran ancestors (reconstructed with DESCHRAMBLER software) with the extant mammalian genomes and other reconstructed ancestors 
Type Of Material Biological samples 
Year Produced 2019 
Provided To Others? Yes  
Impact A paper was published in Genome Research demonstrating that chromosome rearrangements in ruminants have functional effect on gene expression of the nearby genes. 
URL http://eh-demo.ncsa.uiuc.edu/ruminants/
 
Title Mammalian genomes Evolution Highway Comparative Chromosome Browser 
Description We built a database containing visulaization of homologous synteny for mammalian genomes assembled with illumina scaffolding, Dovetail Chicago and Dovetail HiC methods. This database contains over 20 genomes aligned to the cattle, goat and human genomes. The data is utilised to assemble these genomes to chromosome levels and to verify assemblies. 
Type Of Material Database/Collection of data 
Year Produced 2019 
Provided To Others? No  
Impact A subset of this database containing the ruminant genomes was utilised in our recent publication on chromosome evolution in ruminants (Farre et al., Genome Research 2019) 
 
Description Camel chromosome level genome assemblies 
Organisation University of Veterinary Medicine Vienna
Department Department of Pathobiology
Country Austria 
Sector Academic/University 
PI Contribution Dr. Larkin served as a consultant at a IAEA meeting in Vienna dedicated to construction of radiation hybrid chromosomal maps for camel species. As the result of this meeting it was decided that Dr. Larkin's group will be responsible for constructing reference-assisted assemblies of the dromedary and Bactrian camels. Dr. Larkin has appointed an RVC master student to perform this work who is doing this project now. In addition Dr. Larkin is now a partner on the ongoing FWF-RSF application to study chromosome evolution and selection in camel breeds in Central Asia. In 2018 the reference assisted assembly of the Dromedary camel has been finished and published.
Collaborator Contribution Dr. Pamela Burger from the University of Veterinary Medicine in Vienna has provided us with the genome assemblies and raw read data to perform reference assisted chromosome assemblies of dromedary camel. She also provided with the DNA samples required to perform verification of RACA camel assemblies. Dr. Polina Perelman from the Institute of Molecular Biology, Novosibirsk, Russia has provided us with BAC maps of alpaca genome to facilitate RACA chromosome-level assemblies of camel genomes.
Impact Denis Larkin's group performed initial reference assisted chromosome assembly using RACA for the dromedary camel genome (Fitak et al. 2016). The tool assembled 1,797 scaffolds (10 Kb minimum size) into 154 predicted chromosome fragments (PCFs) of which one was homologous to a complete cattle chromosome (chromosome 25). The longest PCF was 112 Mb long containing 97 scaffolds and the shortest PCF was 117 Kb long containing two scaffolds. The N50 of the RACA initial assembly was 31,2 Mb which is 21 times higher than the N50 = 1,48 Mb of the original assembly. The total length of the assembled PCFs was 1,886 Mb or 94% of the original dromedary scaffold-based assembly. RACA splits 44 (2%) scaffolds as potentially"chimeric". All split scaffolds are currently being verified by PCR prior to running the second (final) round of RACA were all scaffolds with confirmed structure will be kept intact. The paper describing this with our master student as a first author has been published in Frontiers in Genetics in a special collection dedicated to camel genomics (PMID: 30804979)
Start Year 2016
 
Description CytoCell 
Organisation Cytocell Ltd
Country United Kingdom 
Sector Private 
PI Contribution We established a formal collaboration with CytoCell company who providing in-kind contribution to support this project
Collaborator Contribution CytoCell is providing us with BAC clones to be hybridised on mammalian chromosomes to identify conserved probes to be used for genome mapping
Impact A set of 200 BAC probes has been transferred to our partner's laboratory (Darren Griffin, University of Kent).
Start Year 2017
 
Description Mammalian ancestral genome reconstructions 
Organisation University of California, Davis
Department Department of Evolution and Ecology
Country United States 
Sector Academic/University 
PI Contribution In collaboration with a group of Prof. Harris Lewin at UCD we have designed a novel algorithm to reconstruct structures of animal ancestral chromosomes. Dr. Larkin is a co-corresponding author on the paper being published PNAS in 2017. He and members of his group directly contributed to the design of the algorithm and its application to 19 mammalian genomes to reconstruct ancestral genomes in the lineage leading to human. Dr. Larkin's group has also applied this algorithm to the ruminant genomes, with the manuscript being published in Genome Research in 2019 (D. Larkin is a corresponding author) and works with the UCD group and the company Dovetail to improve qualities of Dovetail assemblies for mammalian genomes upgraded by Prof. Lewin to Dovetail scaffolds. The later is done using a combination of FISH technique, our reference-assisted assembly algorithm (RACA) and HiC approach.
Collaborator Contribution Prof. Lewin is paying for the upgrades of mammalian genomes to Dovetail superscaffolds (~10K USD per genome) and coordinates the collaborative project. Prof. Lewin contributed to interpretation of the data produced during our Ruminant genome analysis. Prof Graphodatsky was involved in fluorescence hybridization of cattle BACs on several ruminant genomes.
Impact A new assembly algorithm has been designed and applied to reconstruct chromosomal structures of several ancestral genomes in the lineage leading to human. The paper describing this approach and results have been published in PNAS (PMID: 28630326). The visualizations of the reconstructed assemblies are available from our Evolution Highway comparative chromosome browser. Reconstructed ruminant genomes were published in Genome Research (PMID: 30760546). This work was multidisciplinary as it involved bioinformatic analysis of sequenced mammalian genomes and fluorescence in situ hybridisation to verify the reconstructions and to infer ancestral genomes which were not available at sequence level.
Start Year 2015
 
Description School of yound scientists 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact More than 100 participants from the former soviet union counties have attended the School for young scientists held in Zvenigorod, Russia in 2018. Dr Larkin gave an invited lecture on the current status of animal genome studies resulting in a lot of questions from the audience and the follow up discussions. The organising committee has requested a review paper to be written and published based on the lecture given.
Year(s) Of Engagement Activity 2018