Tools for haplotype-based inference from multiple array and sequence-based data in potato and other polyploid plant genomes

Lead Research Organisation: Imperial College London

Department Name: School of Public Health

Abstract

Discovering the genetic basis for different potato characteristics by modelling shared segments of DNA in a population. Potatoes are the third most important food crop in the world. There are 4000+ cultivated varieties of potato worldwide, as well as more than 200 wild species and subspecies. Each variety has different characteristics, such as size, starch content, ability to grow in different climates, resistance to potato pathogens such as blight, yield, etc. It is important to be able to understand the genetic basis of this variability in order to breed a hardier and more productive potato crop. However, studying the genetics of potato is complicated by several factors. Potatoes are (in general) tetraploid, which means that they carry 4 copies of every chromosome. Humans, by comparison, carry two copies of each chromosome, one of which we inherited from our mother, and another from our father. There is also a lot of variation between the 4 chromosome copies, and if potatoes are inbred in order to reduce this inter-chromosomal variation then they become non-viable. Thus, in order to properly understand potato genetics, it is essential to properly model these 4 chromosome copies. When two strains of potato are crossed, two chromosome copies are inherited from each parent. In general, these two chromosome copies are each identical to the corresponding copy in the parent, however occasionally a cross-over occurs, which means that the chromosome copy inherited by the child is a mosaic of different chromosome copies in the parent. This mosaic like structure of chromosomes can be used to map the location of genes associated with different potato traits. The idea is to identify the different `versions' of each segment of DNA in a large population, and to try to find segments of DNA which are correlated with a particular outcome - e.g increased resistance to blight. This information can then be used in selective breeding strategies whereby two parents that both have this chunk of benefical DNA are crossed - thus avoiding detrimental inbreeding, but still ensuring that the children have the protective DNA segment. In order to use this strategy to find `beneficial' DNA segments, the first step is to be able to differentiate between the different versions of each chunk of DNA. Genetic markers have been designed to fill this role. The rapid advancement of sequencing technologies means that many more genetic markers are being discovered, and thus we can identify different versions of finer and finer `chunks' of DNA. However, when we measure these markers, we can tell what the total number of each `version' present in each individual in the population, but not which of the 4 chromosome copies they belong to. Also, our measurements of these markers are imperfect, and thus errors are introduced. In this proposal we will develop software which predicts which of the 4 chromosome copies each different version, of each chunk of DNA belongs to. Thus we reconstruct the mosaic structure of each chromosome. Using this reconstruction, and the fact that cross-overs only occur rarely, we will also be able to correct many of the errors in reading these markers. This reconstruction will also be used to improve the ability to detect association of genes with different potato traits, and thus improve the ability to breed a hardier and more productive potato crop. We will use our software to produce a `map' of these segments of DNA variation. This will also be extremely useful for identifying segments of DNA which have been subject to natural selection in wild potato species as well as during cultivation.

Technical Summary

Genomewide genotyping of recombinant populations has been successful way to map quantitative trait loci in a range of diploid organisms, including humans, mice, Arabidopsis and other model organisms. The availability of genome sequence in plant species such as rice, maize, cucumber, potato and tomato, as well as the ability to rapidly identify novel single nucleotide polymorphisms via high throughput next generation sequencing, means that this strategy is also extremely promising for mapping quantitative trait loci in crops. Haplotype based inference has been shown to be an extremely valuable tool in diploid genetics. Modelling the haplotype structure of a population has been demonstrated to improve genotyping accuracy, particularly when using low coverage sequence data to call genotypes, but also when using genotyping microarrays. Haplotype based association models have been shown to be more powerful than genotype based association models. Haplotype based modelling has been used to impute unmeasured genotypes using a haplotype map in a reference population. Haplotypes have also been used to identify recent positive selection. Modelling the haplotype structure of polyploid species is more challenging than for diploid species. As a result there has been little application of haplotype methods described above to polyploids. However, haplotype based modelling of polyploids will be essential in order to fully exploit genotyping and sequencing technologies in order to map quantitative trait loci. We address this gap in this proposal. We will develop algorithms and software for both inferring polyploid haplotypes from multiple array and sequence based technologies, and for using these haplotypes to improve accuracy of polymorphism calling, genotyping, and ultimately improving the detection of genotype-phenotype association in polyploid crops. As a proof of principle, we will apply these methods in potato genomics.

Planned Impact

The underlying motivation of this proposal is to enable private and public sector crop scientists to breed crops with desireable phenotypes -- e.g. increased yield, or increased resistance to cold -- in order increase the efficiency and reliability of food production, and thus boost the economic competitiveness of The United Kingdom. An important goal is to create multiple viable crop strains in order to avoid dependency on any single strain and reduce the economic impact of plant pathogens. We will particularly focus applying our methodology to the tetraploid potato (Solanum tuberosum), although it will be also available for researchers working on other crops. Thus the immediate beneficiaries of this research include all individuals and organisiations working with polyploid crops. This includes governmental agencies involved in crop production, companies in the crop sciences sector, and research organisations in plant sciences. Downstream beneficiaries include the crop growers themselves, the UK government, and the general population. To help deliver this outcome, we will develop haplotype based inference methods for polyploid crops. Modelling haplotypes are critical to understanding the genetics and evolution of a species. They are also critical for extracting the most information from low coverage next-generation sequence and genotype data. Haplotype based inference will help enable researchers in crop science to apply the powerful paradigm of trait mapping via dense high throughput genotyping of large recombinant populations to polyploid species. This paradigm has already been succesfully applied to human and other model organisms, but has as yet had limited application to polyploid plant species, partially as a result of the difficulties of working with polyploid genomes, and partially as a result of the lack of sequence data and a dense set of SNP markers. However, genome sequence data is now available for tetraploid potato as well as hexaploid wheat. The application of next generation sequence technology will enable rapid detection of polymorphisms in these species. To ensure that this benefit to crop science, with subsequent downstream benefits to crop producers and consumers, we will work with our collaborators at TCAG and SSCRI to ensure that our software tools are accessible and visible to crop scientists. We will publish our validation studies in high impact plant science journals, and present our findings at an international conference. Together with SSRCRI and TCAG we will engage with crop science companies and organisations. We will also consider the possibility of commercialising our software, if it is deemed that this will enable it to be more widely used. We will also demonstrate proof of principle by applying our methods in the first instance to tetraploid potato (Solanum tuberosom). Potato is the world's third most important food crop, and of vital global economic importance. Potato genetics has largely focussed on resistance to major potato pathogens. However, the genetics of pathogen resistance, as well as many other important agrinomic traits are still poorly understood. Modelling the tetraploid structure of the potato genome, in order to infer potato haplotypes, will be critical in unravelling the genetics of these traits.

Funded Value:

£120,649

Funded Period:

Nov 10 - Dec 12

Funder:

BBSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

BB/H024808/1

Principal Investigator:

Lachlan Coin

Research Subject:

Omic sciences & technologies (40%)

Plant & crop science (20%)

Tools, technologies & methods (20%)

Research Topic:

Bioinformatics (20%)

Genomics (40%)

Plant organisms (20%)

Organisations

People	ORCID iD
Lachlan Coin (Principal Investigator)

Publications

Author Name

Title Publication Date Published

10 25 50

Bellos E (2012) cnvHiTSeq: integrative models for high-resolution copy number variation detection and genotyping using population sequencing data. in Genome biology

Coin LJ (2012) An exome sequencing pipeline for identifying and genotyping common CNVs associated with disease with application to psoriasis. in Bioinformatics (Oxford, England)

Shao H (2013) A population model for genotyping indels from next-generation sequence data. in Nucleic acids research

Key Findings
Impact Summary


Description	We developed approaches to call genetic variation in polyploid crops - i.e. crops with more than 2 copies of each chromosome, like potato. We also developed approaches to figure out which of the multiple chromosome copies contains each variant. This has applications in mapping the genetic basis of phenotypic variation in these crops and is now being used to improve sweet potato.
Exploitation Route	These algorithms can be used for mapping the genetic basis of phenotypic variation in crops, and can be used to improve different traits in these crops via genomic selection.
Sectors	Agriculture, Food and Drink


Description	The tools developed in this proposal are now in use in both the potato community, as well as the sweetpotato community to phase tetraploid and hexaploid samples respectively. We have also developed an easy to use galaxy tool for people to access these algorithms (http://www.genomicsresearch.org/galaxy). This tool is now in use by the Genomic Tools for Sweet Potato consoriutm (GT4SP) which is funded by the Bill and Melinda Gates foundation.
First Year Of Impact	2014
Sector	Agriculture, Food and Drink
Impact Types	Economic