A Multi-Processor Linux Farm for Bioinformatics and Functional Genomics

Lead Research Organisation: University of Manchester
Department Name: Life Sciences

Abstract

There are currently unprecedented amounts of biological data that need to be analysed to advance our understanding of the biological sciences. This is due to the increase in large-scale, high-throughput research projects in genomics and post-genomics, of which the Human Genome Project is the best known. This leads to an increase in data volumes that need to be stored, analysed and curated. Consequently, greater computing capacity is required for research at the forefront of international research. The types of questions we now wish to address involve wide-scale comparisons, such as comparing genomes from different organisms and require considerable computing infrastructure. Naturally, the limits of what can be addressed are dependent on computing support. As a group of researchers, we propose to develop and apply new computational methods to study a whole range of biological problems. We will study how DNA is organised in the cell, and how it gives rise to cellular functions. We aim to understand how the agents of action in a cell (proteins) interact to give rise to complex biological systems, and how these interactions and other functions are dependent on the three-dimensional shapes of each of the interacting molecular components. We will compare cellular components and systems from different organisms to understand how they evolve. Only through comparison of many sets of data, often from several organisms, can a new understanding of general trends and characteristics of biological organisms be obtained.

Technical Summary

Owing to the increase in both the amount and the complexity of biological data, many bioinformatics studies are now impossible without appropriate computing power. We request funding for a bespoke computing cluster to provide the main infrastructure for the Manchester bioinformatics groups. Using this equipment, we will conduct the following research: Proteomics: develop and integrate database search tools with local and remote databases using GRID middleware, to deliver proteomic tools to protein identification and subsequence data storage and analysis. This will be applied to genome annotation, using EST derived predicted proteomes; Functional genomics: integrate a diverse range of 'omic data from fungi, which will allow subsequent comparative functional genomics, delivered to the research community via the web and GRID, supporting investigations into the causes of pathogenicity and other molecular differences between different species. We will use investigate features of non-coding DNA, such as cis-acting regulatory sequences and transposable elements, to predict gene function; Structural bioinformatics: analyse protein family specific features of leucine-rich repeat proteins and proteases to produce accurate comparative models that will be used for functional analysis and prediction; Protein-protein interactions: use previous analysis of the properties protein interfaces to predict protein-protein interactions and networks of interactions. We will use our knowledge of specificity determinants to understand 'rewiring' during the evolution of networks; Systems biology: exploit both genome-wide and sub-system models of metabolic and regulatory networks to simulate and predict system behaviour and the gene interactions; Molecular Evolution: develop new statistical methodology for phylogenetics and comparative genomics. These will be applied to tree estimation, the study of HIV and the identification of functional changes in coding and non-coding DNA.