Bioinformatics tools for plant genetic resources
Lead Research Organisation:
John Innes Centre
Department Name: Crop Genetics
Abstract
Modern agriculture needs crop varieties with improved performance for the consumer (e.g. flavour, shape, texture etc) and the producer (e.g. high yield, resistance to pests), and reduced environmental impact (e.g. lower fertiliser or pesticide input). These developments are all possible using conventional breeding backed by modern biotechnology, without the need for genetically modified (GM) plants. These improved properties are found in 'genebanks', which are collections of thousands of plant samples taken from the wild or old crop varieties, together with the many cultivars resulting from decades of selective breeding around the World. The problem with harnessing this potentially useful biodiversity in future breeding programmes is working out which samples to use. The solution is to 'genetically fingerprint' every sample and take accurate measurements of all the useful properties mentioned above. These experiments can tell us in principle which plants are likely to carry potentially useful genes. However, this huge quantity of potentially useful information remains difficult to use (hundreds of measurements in thousands of samples means millions of data points), because improvements in computer databases to store, analyse and display the results have lagged behind our ability to do the lab experiments. This project proposes to bridge that gap by developing a powerful, versatile and accessible computer database and associated computational tools, which can be applied to data collected from crop plant genebanks, to identify promising plant samples for further experimental analysis. All of these computational resources will be freely available to the World's genetic resources community.
Technical Summary
Efficient utilisation of plant genetic resources (gene banks) requires versatile, powerful databases for storing, accessing and combining the wide variety of data that are becoming available in rapidly increasing amounts. We have developed a functioning database, GERMINATE, which can accommodate a wide variety of data types, from descriptive (morphology, geography) to molecular (DNA sequence, marker scores, map position etc.). We now seek funds to complete its development into an integrated data and analytical resource for the World's plant genetic resource community. Currently, the GERMINATE database stores passport and multi-crop descriptor data for every popular molecular marker type except SNP. We propose to extend this capability to include SNP data, in a format that is acceptable to the World's plant genetic resources and genomics communities. We will also deploy an ontology module, which will provide standard nomenclatures for phenotypes, developmental stages and mutant or disease ontologies for crop plants, allowing rational searching for these previously inaccessible characters. Additionally, we will increase the functionality of GERMINATE by greatly expanding the number of linked, web-accessible bioinformatic tools, including the existing suites STRUCTURE (for deducing and visualising the population structure of germplasm), TASSEL (for tree drawing and linkage disequilibrium estimation) and DIVA-GIS (for visualising geographical data associated with accessions). Also, a new set of tools will be designed and developed, including GERMANE (managing workflows of multiple, chained analytical routines), CORE (for management of genetic resources, including identification of core collections in response to user requirements), and NETWORK (analysing non-treelike evolution via introgression, using a marker model-based approach). Lastly, the GERMINATE web interfaces will be improved to allow easier and more powerful uploading, retrieving and analysis of the data.
Organisations
Publications
Wingen LU
(2014)
Establishing the A. E. Watkins landrace cultivar collection as a resource for systematic gene discovery in bread wheat.
in TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik