Large scale molecular haplotyping using next generation sequencing

Lead Research Organisation: University of Liverpool
Department Name: Sch of Biological Sciences

Abstract

The cost of sequencing whole genomes is falling rapidly and the genomes of 1000's of animals and humans are being sequenced. Most mammalian genomes contain two copies of each chromosome. The two copies of each chromosome are very similar but nevertheless contain tens of thousands of small differences. These differences are mostly single changes in the sequence of chemical bases that compose the chromosome. Present sequencing methods cannot discriminate between the sequences of the two chromosomes and the 'complete' sequences that are published are in fact a single composite of the members of each chromosome pair. This disguises the reality that each gene has two slightly different forms, one on each chromosome, the two different forms are known as alleles. Knowing the sequences of the individual chromosomes could make many other studies much easier. Each chromosome contains groups of multiple genes which are inherited as a block, knowing the allele of one gene makes it possible to predict the alleles of the others that are inherited with it, making it possible to predict the alleles of all genes in the genome with much less effort. Also genes are mainly controlled by adjacent regions on the same chromosome leading to interactions between different regions, knowing which alleles are adjacent to each other makes it much easier to identify these interactions. We will develop two methods for identifying the sequences of large regions of each chromosome. Firstly we will take single sperm or oocytes which only contain a single copy of each chromosome and secondly we will dilute DNA prepared from many cells or a single cell until there is only a small chance that fragments of DNA from the same region of each chromosome are present. We will then amplify the DNA using standard methods developed for larger quantities of DNA and obtain the sequence of the DNA using the next generation genome sequencers at the Liverpool Center for Genomic Research. We will initially test the two methods by sequencing an elite bull that has large numbers of offspring that have already been genotyped by Prof Jerry Taylor who is a named collaborator on this project. This will enable us to validate our haplotypes against those identified by well established methods based on known parentage and large amounts of genotype data. East African dairy cattle are the subject of a major genetic study by John Gibson of the University of New England in Australia who is a named collaborator on this project. That study is developing tests to identify the best dairy cattle in the region so that they can be selected for further breeding. This involves genotyping 2000 of these animals with over 500,000 markers to identify the alleles carried by each animal at each position. However as noted above this method cannot identify the haplotypes along a chromosome. We will use the most appropriate single molecule sequencing methods that we have developed in the first part of the project to discover the common haplotypes in this population by sequencing a fifth of the genome of 100 animals from this population. We will then combined the genotype data from the 2000 animals obtained in the other study with the common haplotypes obtained from the 100 animals in this study to reconstruct the haplotypes of all 2000 animals in the population. This data will make it possible to undertake a future Genome Wide Association Study to identify genes that are associated with specific traits such as milk production or disease resistance in this population. Finally we will develop software to automate most of our analysis so that other workers can easily use the methods that we have developed.

Technical Summary

The cost of sequencing has fallen dramatically and large numbers of genomes are being sequenced. However current methods only generate a composite sequence of the two genomes present in a diploid animal. We will work up two methods for obtaining the sequences of long stretches of each chromosome and ideally entire chromosomes. The first method will involve sequencing the DNA from a single sperm that has been amplified using the phi 29 polymerase, the second method will involve diluting DNA until there is only about one fifth of a haploid genome present and then amplifying that DNA with phi29 DNA polymerase. Both these amplification methods have been used before but they have not been used to sequence whole genomes. We have a particular interest in cattle genetics and have already sequenced the genomes of three cattle breeds. We will sequence an single elite Holstein bull with both the above methods as well as sequencing bulk DNA. This animal has extensive pedigree and genotype data that can be used to confirm the molecular haplotypes. The sequence of the bulk DNA will be used to develop algorithms to remove artifacts introduced by the amplification based methods. We will apply the best method to obtain the common haplotypes from a 100 animals from a population of East African dairy cattle that are the subject of a separate large scale genotyping study lead by Prof John Gibson. We will develop methods for combining the common haplotypes discovered in this study with the genotypes of 2000 animals obtained by Prof Gibson to reconstruct the haplotypes of all 2000 animals. This will make it possible to undertake a whole genome association study of the same population in the future. We will publish open source software to automate the analyses described above.

Planned Impact

Our most immediate impact will be through the East African dairy Development (EADD) project which is funded by Gates Foundation with $US43m to double milk production for 179,000 small holder dairy farmers in East Africa. Our collaborator Prof John Gibson is leading a complementary genetics project, which has been separately funded by the Gates Foundation and will develop a rational strategy for selecting and breeding cattle that have an optimal combination of the disease and drought tolerance of African cattle with the productivity of European cattle. A GWAS study will be undertaken to identify genes and markers associated with beneficial and deleterious phenotypes. The results of the GWAS study will be fed directly back into the EADD project to develop a new breeding strategy for dairy cattle in East Africa. Given the large numbers of farmers involved in this project even small gains in milk yields or small reductions in disease burden can make a substantial contribution to milk production in the region. Other farmers in the UK and throughout the world are likely to benefit from our work as well. Prof Gibson at the University of New England in Australia is leading the development of a database of cattle haplotypes that will be available to breeders and farmers world-wide. A significant project output will be a new version of the RLRPHLI software package that will include molecular haplotypes in its haplotype imputation algorithm. This will enable them to add a much broader range of haplotypes to their database than would have been possible with genotype data alone. Our work will be equally applicable to any diploid organism. We will communicate our discoveries by publication in the peer reviewed literature and by presenting our data at the Plant and Animal Genomics conference in January 2012. The method could also be important for human genetics studies. Collaboration Since our collaborator Prof Gibson is leading the genetics component of the EADD he will be responsible for ensuring that the results of this project are translated into a greater understanding of the genetics of the cattle and for communicating with other EADD partners to translate those insights into improved breeding programmes. Dr Noyes has worked with Prof Gibson for the last seven years on a Wellcome Trust funded project on the functional genomics of trypanosomiasis in African cattle. Dr Noyes and Prof Hall have strong links with the International Livestock Research Institute (ILRI) through our colleague Prof Steve Kemp (See letter of support). In the last year Dr. Noyes has spent 50% of his time at ILRI working on a genomic discovery project funded by the Google foundation. ILRI is a CGIAR institute and has a remit to improve livestock production, we will be working closely with Prof Kemp to identify other livestock groups that would benefit from our work. The collaboration with Paul Dear is of strategic importance as it will link the Dear lab and the CGR. Dr Dear has been a prolific innovator in the field of genomics and his link with the Liverpool CGR group is likely to lead to other fruitful collaborations. Economic impact The Hall lab has current collaborations with many different UK and non-UK based companies such as Shell Global, Unilever, Astra Zennica, Life Technologies, Roche Applied Sciences and Affymetrix. Affymetrix have funded much of the bovine sequencing that has been done in his lab and the SNP discovery project has translated directly into the design of the Affymetrix bovine SNP array. The University of Liverpool Center for Genomic Research (CGR) is partially funded by the North West Development Agency who clearly recognize the center's contribution to the economy of the region. The CGR has planned workshops over the next three years to train scientists from industry in the application next generation sequencing technology and the analysis of next generation sequencing data.
 
Description We have developed a technique to easily and cheaply find linkage between SNPs on single haplotypes using dilution then amplification of DNA. This has been applied to cows and we have used it to map a breeding population
Exploitation Route This can be used by people doing plant or animal breeding
Sectors Agriculture, Food and Drink

 
Description BBSRC ENWW panel membership
Geographic Reach National 
Policy Influence Type Membership of a guideline committee
Impact The panel advises BBSRC exec and councillor how best to deliver BBSRC strategy. Impact is difficult to quantify
 
Description Feature article on genomics in the Easton Daily Press 
Form Of Engagement Activity A magazine, newsletter or online publication
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact The Article was to cover the research activity at the Earlham Institute and at the Norwich Research park and how it would impact the general public
Year(s) Of Engagement Activity 2019
 
Description School visits 
Form Of Engagement Activity Participation in an open day or visit at my research institution
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Schools
Results and Impact Numerous school visits to the CGR
Year(s) Of Engagement Activity 2015,2016