Mining the allohexaploid wheat genome for useful sequence polymorphisms

Lead Research Organisation: University of Bristol
Department Name: Biological Sciences


Bread wheat is of fundamental importance to UK, European and world agriculture, with an estimated 2007 world harvest of ~ 550 m tonnes. In the UK, ~1.8 m hectares are planted with wheat, yielding ~7.2 tonnes per hectare, with a farm-gate value of £2.6 billion. The UK has ideal growth conditions for wheat and has a world-class crop improvement programme. Despite its importance, wheat production world-wide has not kept pace with increased demand, and productivity is threatened by disease, increased fertiliser costs, competition for high quality agricultural land, resource limitations, and adverse environmental conditions that dramatically reduce optimal yields. It has been estimated that in Europe productivity has to be doubled to keep pace with demand and to maintain stable prices. Therefore by narrowing the gap between maximal yields and actual yields, and increasing maximal potential yields, sustainable and adequate production of one of the world's most importance crops could be secured. The large increases in wheat yield have been primarily due to genetic improvements brought about by selective breeding of elite lines. The power of breeding can be increased by enabling the incorporation of wider genetic diversity and accelerating the identification of best-performing genotypes. This can be achieved using DNA sequence markers to identify genetic diversity underlying key traits. We aim to use next generation sequencing and a novel computational and comparative genomics strategy to identify sequence differences in the genomes of 5 key varieties that can be used to define different versions of a single gene in different varieties. Finding this type of marker in wheat has been problematic in the past because wheat is a hexaploid, with potentially 3 copies of each gene, and most of the sequence differences in wheat lines are between these three copies of a gene in a variety, rather than between genes in different varieties. With this information and a set of marker, breeding companies and academic scientists will be able to identify and select specific regions of the genomes of different varieties, and use this information to isolate genes and select lines with that region of DNA in it from crosses. This capability will fundamentally alter wheat research by enabling the use of more diverse lines in breeding, including wild species that have a wealth of under-exploited traits, including stress tolerance. Finally this genotyping study will facilitate a far greater level of academic research in a key UK crop. The sequencing and informatics strategies we aim to develop will also establish ways to sequence the complete genome of wheat. Currently the large size of the genome, its hexaploid composition and predominant repeat composition, is a large barrier to progress. However, the high throughput and low cost of next generation sequencing provides a solution to the scale of the wheat genome. Our proposed work will enable sequencing to focus on gene-rich regions and increase the potential for assembling gene-rich genome sequences. Furthermore, using a novel bioinformatics strategy that uses the complete genome sequence of a closely-related species as a 'template' for identifying both gene structures such as introns and an approximate order of genes, our work will define new ways of assembling gene sequences and the order of genes in wheat chromosomes. This will lower the barriers for future work aimed at larger-scale genome sequencing and analysis. Finally this project is closely linked to the UK breeding community through WGIN, to academic laboratories studying wheat in the UK through the Monogram Network, and to the international wheat genomics community through the International Wheat Genome Sequencing Consortium. This will ensure the rapid transfer of information to key stakeholders.

Technical Summary

The 16 Gb hexaploid genome of bread wheat is among one of the largest and most complex genomes, and its importance as a primary food crop demands projects that will generate useful genome sequence. The genomes of grasses vary greatly in size due to the expansion of repeats, primarily retroelements, while the order of genes is remarkably conserved in large chromosomal segments. This leads to large tracts of repeat, which are heavily methylated, interspersed with small groups of genes which have much lower levels of methylation. These features form the basis of a novel strategy to sequence the gene-rich regions of the wheat genome with three complementary approaches; methyl filtration, high Cot normalisation, and gene enrichment by hybridisation, using next-generation sequencing. Sequence from several key breeding lines will identify sequence polymorphisms in 5', 3' and intron regions of genes, where sequence diversity is much higher than in coding regions. Genome-scale alignments of wheat genome sequence with wheat transcriptome sequence assemblies, produced by 454-FLX sequencing and aligned with gene models established in the genome sequence of Brachypodium distachyon, form the template for sequence analysis. This alignment strategy is straightforward and uses existing computational resources and skills available in the partner labs. Sequence polymorphisms will be identified using existing software developed by Bristol and classified bionformatically into intra-varietal and inter-varietal sequence polymorphisms. The polymorphisms will be validated and used to screen a core set of UK and, in collaboration, a wider variety of Australian germplasm. The project is linked to the International Wheat Genome Sequencing Consortium. It forms part of the UK Wheat Genetic Improvement network that links breeders, scientists and funding agencies. The project will be coordinated within the Monogram Network of wheat scientists, ensuring links to all UK wheat researchers.
Description Developed a 5 x coverage of the Chinese Spring wheat genome sequence; which is now being used on an hourly bases by the global wheat community

We also generated 20 fold coverage of genome sequence for 4 UK wheat varieties; Avalon, Cadenza, Rialto and Savannah

From the above we found several thousand wheat SNPs which were reported in several publications
Exploitation Route The sequences and SNPs are now in daily use by UK wheat breeders and are being exploited by the genotyping company LGC, hence the information generated by the project forms the bases for a significant amount of economic activity within the UK All data can be found on our web site were it is free from IP (as requested by the funding agency)
Sectors Agriculture, Food and Drink,Environment

Description The sequence information and the wheat markers are being used by the wheat community including the wheat breeders
First Year Of Impact 2010
Sector Agriculture, Food and Drink
Impact Types Economic