Assessing Illumina and Velvet for sequencing a wheat chromosome arm

Lead Research Organisation: European Bioinformatics Institute
Department Name: Ensembl Group

Abstract

World agriculture faces two major and unprecedented challenges in the near future. The first is to expand production of food and feed to meet increased demand while ensuring environmental sustainability. The second is to understand and mitigate the effects of global climate change on food and production. Wheat is one of the world's primary sources of food and feed and is the most important crop by scale and value in Europe. However, wheat yield gains seen in the past 20 years have not been maintained, mainly because germplasm improvement has not continued at sufficient pace to address the major challenges of sustainable food production and adapting to climate change. The new science of genomics can make major contributions to increasing both the scope and speed of crop improvement. But the complexity and size of the wheat genome are major obstacles to establishing biology, breeding and crop improvement strategies based on detailed knowledge of the complete genome sequence. Major advances in genome sequencing methods have led to the development of commercially available instruments that dramatically increase sequence output and reduce costs, and sequence analysis methods have been created to assemble and analyse the sequence generated. However, the strategies and methods required for applying this new technology to the wheat genome remain to be determined. In this proposal we aim to develop methods for sequencing the expressed regions of genes, large insert clones and purified chromosome arms using the Illumina sequencing platform, and to adapt the Velvet sequence assembly software to wheat genomics. The outputs of this project will facilitate the development of cost- effective and efficient strategies for sequencing the complete wheat genome when scaled up. The complete genome sequence of wheat will identify each gene in its correct location on the genomes and will determine the correct structures of genes. This knowledge will be a new foundation for biology projects that aim to understand the functions of wheat genes in processes such as disease resistance, environmental interactions, mineral nutrition and grain yield and nutrition. The methods we aim to develop will also be suitable for re-sequencing wheat lines, for example, commercial varieties that are used in breeding. By comparing these sequences to the reference genome and to each other scientists can identify sequence variation associated with traits. This information can be used to identify genes and lines with desirable genotypes of breeding populations. As such the technologies we aim to develop in this project will be the first step in advancing our capability to breed the next generations of our key crop plant.

Technical Summary

Recent major advances in sequencing technology provide a timely opportunity to devise new sequencing strategies to tackle large and complex genomes that to date have been impracticable and unjustifiably expensive to sequence. We aim to assess the Illumina sequencing platform for generating transcriptome, BAC and genomic sequence of wheat, which has an extremely large (16 Gbp) hexaploid genome. The Illumina GA2 platform provides an excellent cost- effective sequence throughput that is continually being improved in terms of sequencing chemistry and base-calling. Although the GS-FLX Titanium system generates substantially longer read lengths, its cost per base is sufficiently higher than that of the Illumina platform to make the judgement that continued improvements in the Illumina platform will be adequate for generating sequence to sufficient depth for useful assembly of long contigs covering the low copy regions of wheat chromosomes. Once developed, these very cost- effective applications will be more readily taken up by the research community and applied for the rapid generation of wheat genome sequences. We also aim to develop specific computational and mathematical approaches to assemble and analyse wheat transcriptome and genome sequence. The outcomes of the proposed work will provide strategies for the cost effective sequencing of the complete wheat genome and for the compilation and analysis of the sequence into useful assemblies. Furthermore, strategies for targeting re-sequencing to gene-rich regions of the wheat genome will facilitate genotyping and association genetics studies in multiple commercial breeding lines.

Publications

10 25 50
publication icon
International Arabidopsis Informatics Consortium (2010) An international bioinformatics infrastructure to underpin the Arabidopsis community. in The Plant cell

 
Description In this project we developed a new genome assembly tool, Curtain, and applied it (and other assembly tools) to a wheat chromosome arm that had been sequenced using Illumina technology. The goal of the project was to establish the feasibility of sequencing the entire wheat genome through the use of chromosome arm isolation and relatively cheap next generation sequencing technologies. Sequence was generated and isolated, but long range assembly did not prove feasible using the existing technologies. The findings of the project have been used to influence the selection of strategies for genome sequencing and assembly on other BBSRC-funded cereal sequencing projects. The Curtain assembler was subsequently entered into the Assemblathon "competition" to explore the relative merits of genome assembly tools/algorithms and to determine quantatative metrics for the subsequent assessment of assemblies.
Exploitation Route Genome sequencing and de-novo assembly is increasingly undertaken by interested industries, especially in the plant sciences. More significantly, the knowledge generated by genome sequencing, assembly and annotation is ultimately of potential use in the development of improved crops for food and fuel, in terms of yield, (biotic and abiotic) stress-resistance, and speed of deployment. The reads generated have been deposited in the European Nucleotide Archive, making the sequence publicly available. The knowledge generated in this project has contributed to the development of improved strategies for cereal genome sequencing and assembly and these have already contributed to the generation of genome-wide molecular data made available through Ensembl Plants, EMBL-EBI's service for plant genomic data.
Sectors Agriculture, Food and Drink,Digital/Communication/Information Technologies (including Software)

URL http://code.google.com/p/curtain/
 
Title Curtain 
Description A genome assembly tool for scaffolding and gap fill. 
Type Of Material Improvements to research infrastructure 
Year Produced 2010 
Provided To Others? Yes  
Impact Contribution to the assemblathon project which defined standard approaches and metrics for genome assembly (and the subsequent QC of the results). 
URL https://code.google.com/p/curtain/
 
Title Curtain assembler 
Description Software to scaffold and gap-fill genomes initially assembled from short reads using an assembler such as Velvet 
Type Of Technology Software 
Year Produced 2010 
Open Source License? Yes  
Impact Curtain was among the programs assessed in the Assemblathon process, which helped defined standard procedures for genome assembly and for the assessment of the results. 
URL https://code.google.com/p/curtain/