Computational modelling of the relationships between miscanthus genotype environment and phenotype

Lead Research Organisation: Aberystwyth University
Department Name: IBERS

Abstract

IBERS has generated, and is generating, large quantities of genotype, phenotype and environment based data, along with associated metadata from the energy crop Miscanthus. The student appointed to this project would build on these data sets to create new scientific knowledge about the predictive relationships between data types. This knowledge will be encoded in computational models. To develop the models the student will learn and apply state-of-the-art data-mining and machine-learning methods to identify and characterise genes underlying important quantitative traits (QTL). These will enable the better understanding of Miscanthus biology, and through the defining of ideotypes guide Miscanthus breeding. BACKGROUND The domestication of a new crop as a bioenergy feedstock, requires the application of a wide range of techniques including genetics, biochemistry, physiology, chemical engineering, bioinformatics and system biology modelling. Over the last five years at IBERS a Miscanthus research platform, including a replicated trait trial of 248 diverse genotypes, has been established. Four years of intensive phenotyping of this population, and their progeny from crosses, have resulted in the generation of more than 100,000 datasets. These have been incorporated into a powerful custom coded integrated Miscanthus informatics platform. Example descriptors include: growth environment, morphology, architecture, flowering time, senescence, yield, spring emergence, ploidy, genetic group, and lignin, cellulose, hemicellulose, and cell wall phenolic concentrations. The consolidation of this information and utilisation in predictive models is very important if we are to maximise the value of the data and therefore increase the pace of crop improvement. APPROACH The student will use a combination of data-mining and machine-learning to analyse datasets to identify important associations and to form predictive rules that relate Miscanthus phenotype, genotype and environment. This will use recently developed data-mining and machine-learning techniques, that are well suited to the structure of the data, to create association models. This will be done in close interaction with scientists at IBERS and Ceres who will provide feedback on the patterns and association rules. The process will be an interactive one. The 'interesting' patterns identified by data-mining (e.g. QTL) will provide insight into Miscanthus biology, and will be fed into the machine learning systems as new descriptive attributes to improve the modelling. The analysis of genotype/environment/phenotype data is technically challenging because there are a large number of descriptors, and many of these are best represented relationally. This means that standard statistical methods are probably sub-optimal. Typical aspects of Miscanthus data that lend themselves to relational description are: temporal relations; spatial relations across fields and plants; pedigree relationships between genotypes, etc. Much of statistical genetics, and fields such as spatial statistics, are devoted to developing ad hoc solutions to relational problems that can now be solved in principled ways. Data-mining and machine-learning methods have been used to develop predictive phenotype models using human genotype and environment data; but to the best of our knowledge there have been no application in plant genetics. To evaluate and compare the performance of the approaches with standard statistical genetic methods, we will use standard re-sampling approaches, including cross-validation and bootstrapping. TRAINING This proposed project is well suited to a PhD research project as it combines fallback elements (statistical genetics) along with open-ended elements (relational learning). There are strong training elements of the proposal as the student will gain experience in genetics, bio-energy, data-mining, and machine-learning, all skills likely to be in high-demand in the future job market

Publications

10 25 50