Upgrade of Grid resources for Structural Bioinformatics Research in the Bloomsbury Centre for Bioinformatics

Lead Research Organisation: University College London
Department Name: Computer Science

Abstract

With the massive increase in the amount of biological data, there is an increasing need for very powerful computing facilities to allow this data to be analysed. One of the key strengths of the Bloomsbury Centre for Bioinformatics is it's world leading expertise in structural bioinformatics i.e. the use of computing to analyse and predict the structures of complex biological molecules. The projects range from simulations of how proteins fold to novel image processing to allow the structures of proteins to be determined by electron microscopy. Other projects entail mapping of protein structures onto the sequences of all the genes in a genome and clustering all known protein families in order to better annotate their functions. In this project we are seeking to setup 3 new clusters of computers in the three main departments which form the Centre. Each department focuses on different projects and so have specific hardware requirements for their clusters, however for the very largest projects (e.g. folding all of the proteins in a genome) we are able to combine the power of all three clusters using special software (called Jyde) developed in an earlier BBSRC project.

Technical Summary

This proposal is a request for funding a major hardware upgrade of the Grid resources available in the Bloomsbury Centre for Bioinformatics. The present facilities are provided by Linux clusters in the UCL Dept. of Computer Science, the UCL Dept. of Biochemistry and the Birkbeck Dept. of Crystallography. Many of the nodes in the present clusters are outside current maintenance contracts - the UCL CS cluster for example was installed in 2001 and is now in a state of poor repair with nodes failing on a regular basis. A specific issue that has arisen is the recent rapid expansion of the sequence databases. A large fraction of the computing time used on the existing Grid resources relates to a continuous stream of PSI-BLAST jobs for database updates and genome annotation pipelines. With the rapid expansion of the sequence databases, many of the existing nodes do not have enough physical memory to allow PSI-BLAST jobs to be run - and this greatly reduces the throughput we can achieve. Additionally we would like to standardise on a 64-bit operating system and many of the existing nodes do not support the x64/EM-64 instruction set. To ensure maximum utilisation of the cluster, we will deploy the Jyde system which was developed in the BBSRC e-Protein Grid Pilot Project. This is a very robust and very extensively tested resource management system which has proven very effective and very easy to install and maintain.

Publications

10 25 50
 
Description This grant was to fund equipment rather than actual research work. More specifically, we wished to upgrade the Grid facilities in the Bloomsbury Centre for Bioinformatics, which is a joint research centre between UCL and Birkbeck College.

With the massive increase in the amount of biological data, there is an increasing need for very powerful computing facilities to allow this data to be analysed. One of the key strengths of the Bloomsbury Centre for Bioinformatics is it's world leading expertise in structural bioinformatics i.e. the use of computing to analyse and predict the structures of complex biological molecules. The projects range from simulations of how proteins fold to novel image processing to allow the structures of proteins to be determined by electron microscopy. Other projects entail mapping of protein structures onto the sequences of all the genes in a genome and clustering all known protein families in order to better annotate their functions.

In this project we set up 3 new clusters of computers in the three main departments which form the Centre. Each department focused on different projects and so had specific hardware requirements for their clusters, however for the very largest projects (e.g. folding all of the proteins in a genome) we are able to combine the power of all three clusters using special software (called Jyde) developed in an earlier BBSRC project.
Exploitation Route Setting up these three Grid systems provided valuable training and experience to staff in developing more extensive computer systems at UCL and Birkbeck e.g. the Legion Supercomputer at UCL or the UCL Computer Science Cluster. Also technology deployed in this grant has been used as a basis of our servers used to make our computational methods available to the general research community.
Sectors Healthcare,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology