Bioinformatics tools for plant genetic resources

Lead Research Organisation: University of Dundee

Department Name: College of Life Sciences

Abstract

Modern agriculture needs crop varieties with improved performance for the consumer (e.g. flavour, shape, texture etc) and the producer (e.g. high yield, resistance to pests), and reduced environmental impact (e.g. lower fertiliser or pesticide input). These developments are all possible using conventional breeding backed by modern biotechnology, without the need for genetically modified (GM) plants. These improved properties are found in 'genebanks', which are collections of thousands of plant samples taken from the wild or old crop varieties, together with the many cultivars resulting from decades of selective breeding around the World. The problem with harnessing this potentially useful biodiversity in future breeding programmes is working out which samples to use. The solution is to 'genetically fingerprint' every sample and take accurate measurements of all the useful properties mentioned above. These experiments can tell us in principle which plants are likely to carry potentially useful genes. However, this huge quantity of potentially useful information remains difficult to use (hundreds of measurements in thousands of samples means millions of data points), because improvements in computer databases to store, analyse and display the results have lagged behind our ability to do the lab experiments. This project proposes to bridge that gap by developing a powerful, versatile and accessible computer database and associated computational tools, which can be applied to data collected from crop plant genebanks, to identify promising plant samples for further experimental analysis. All of these computational resources will be freely available to the World's genetic resources community.

Technical Summary

Efficient utilisation of plant genetic resources (gene banks) requires versatile, powerful databases for storing, accessing and combining the wide variety of data that are becoming available in rapidly increasing amounts. We have developed a functioning database, GERMINATE, which can accommodate a wide variety of data types, from descriptive (morphology, geography) to molecular (DNA sequence, marker scores, map position etc.). We now seek funds to complete its development into an integrated data and analytical resource for the World's plant genetic resource community. Currently, the GERMINATE database stores passport and multi-crop descriptor data for every popular molecular marker type except SNP. We propose to extend this capability to include SNP data, in a format that is acceptable to the World's plant genetic resources and genomics communities. We will also deploy an ontology module, which will provide standard nomenclatures for phenotypes, developmental stages and mutant or disease ontologies for crop plants, allowing rational searching for these previously inaccessible characters. Additionally, we will increase the functionality of GERMINATE by greatly expanding the number of linked, web-accessible bioinformatic tools, including the existing suites STRUCTURE (for deducing and visualising the population structure of germplasm), TASSEL (for tree drawing and linkage disequilibrium estimation) and DIVA-GIS (for visualising geographical data associated with accessions). Also, a new set of tools will be designed and developed, including GERMANE (managing workflows of multiple, chained analytical routines), CORE (for management of genetic resources, including identification of core collections in response to user requirements), and NETWORK (analysing non-treelike evolution via introgression, using a marker model-based approach). Lastly, the GERMINATE web interfaces will be improved to allow easier and more powerful uploading, retrieving and analysis of the data.

Funded Value:

£186,639

Funded Period:

Feb 07 - Feb 09

Funder:

BBSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

BB/E003184/1

Principal Investigator:

Andrew Flavell

Research Topic:

Unclassified

Organisations

People	ORCID iD
Andrew Flavell (Principal Investigator)

Publications

Author Name

Title Publication Date Published

10 25 50

Ho T (2014) Genome-Tagged Amplification (GTA): a PCR-based method to prepare sample-tagged amplicons from hundreds of individuals for next generation sequencing in Molecular Breeding

Milne I (2010) Flapjack-graphical genotype visualization in Bioinformatics

Key Findings
Impact Summary


Description	The main output of our project is the Germinate 2 databases that constitute the Germinate project (http://bioinf.scri.ac.uk/germinate). The PostgresQL structure of the first Germinate database has been replaced by MySQL, allowing open source development by a larger user community. The database schema and visual interface scripts are fully publicly available (bioinf.scri.ac.uk/germinate/). Databases have been developed for pea, barley, potato and wheat to date (follow 'Projects' link at the above
Exploitation Route	Databases for genetic resources (the genomics of biodiversity) are highly useful to any agency or company with interest in the genetic basis of biodiversity. Our database structure has been adopted by other plant genetic resources institutes worldwide and crop breeding companies.
Sectors	Agriculture Food and Drink Digital/Communication/Information Technologies (including Software) Education Healthcare
URL	http://bioinf.hutton.ac.uk/public/?page_id=159


Description	Germinate databases are in use in many sites across the World - it has been a very successful database model
First Year Of Impact	2009
Sector	Agriculture, Food and Drink,Digital/Communication/Information Technologies (including Software),Education,Environment
Impact Types	Economic

Abstract

Technical Summary

Organisations

People

ORCID iD

Publications