Using Stream Computing on Mainstream PC Graphics Hardware for Fast de novo Protein Structure Prediction

Lead Research Organisation: University College London
Department Name: Computer Science

Abstract

Most genes are designed to code for specific proteins which have useful functions in the body. Proteins are essentially strings of simpler molecules, called amino acids and these strings can self-assemble into a complex 3-D structure as soon as the protein is formed by the protein-making machinery (ribosomes) in the cell. It is this unique structure which determines the precise chemical function of the protein (i.e. what is does in the cell and how it does it). By firing X-rays at crystallised proteins, scientists can determine their structure, but this process can take many months or even years. With hundreds of thousands of proteins for which the native structure is unknown, it is not surprising that scientists want to find a clever shortcut to working out the structure of proteins. We, like many other scientists have been trying to 'crack the code' of protein structure i.e. working out the rules which govern how the protein finds its unique structure and then trying to program a computer with these rules to allow scientists to quickly 'predict' what the structure of their protein of interest might be. At UCL we have been pioneering a number of approaches to predicting the structure of a protein from amino acid sequence. One of the most successful assembles new protein structures from small pieces of other proteins - a little bit like building a model from Lego(TM) parts. Although we have demonstrated a number of successful attempts at predicting protein structure, the technology is not readily available to bench scientists due to the fact that a lot of computer power is needed to carry out the calculations. One interesting new development that may allow any scientist to run these protein folding simulations on his own desktop PC is the use of graphics chips to run the simulations many times faster than the PC can on its own. Normally graphics chips allow users to run 3-D games or visualise 3-D environments at very high speed, but recently it has become apparent that these chips are capable of doing a lot more than just drawing 3-D objects. Our calculations indicate that the latest 3-D graphics boards available in the high street for less than 300 pounds are able to do as much work as 30 normal PCs. If this experiment is successful, any scientist with a cheap PC and graphics card will be able to run our software without needing access to an expensive supercomputer cluster.

Technical Summary

FRAGFOLD is a highly successful method for predicting protein structure from amino acid sequence without the need for a template fold. Despite the success of FRAGFOLD and other similar programs, they are still relative compute intensive applications. Although a single simulation on a small protein can take as little as 5 minutes on the fastest available PCs, at least 5000 simulations need to be run to ensure that the most highly populated conformational clusters have been indentified. One prediction can therefore take several weeks even on the fastest (and expensive) computer systems. Obviously one solution to this is to explot Grid computing, but this is still not a viable solution for the majority of users - particularly non-expert bench scientists who just want to explore possible structural models for a protein of interest. Although development in microprocessors has been rapid, progress has slowed somewhat. One area where there is still exponential growth is in the power of widely available graphics processing hardware i.e. graphics chips developed by the likes of ATI and NVIDIA. By the end of 2006 it will be possible to buy a board for 400 pounds (using NVIDIA's new G80 chip) which will have 128 parallel processors ('shaders') capable of applying simple kernel functions to streams of single-precision floating point numbers held on 768 Mb of onboard GDDR4 memory. Here we propose to reimplement FRAGFOLD on GPU hardware and to evaluate the performance of the code both on single machines and a small Grid. Our estimates based on code profiling suggest that cheap NVIDIA G80 or ATI R600 hardware will run FRAGFOLD up to 20 times faster than a top-end Pentium 4 processor. This will allow us to greatly increase the size of problem we can tackle using FRAGFOLD and to allow users to run the code on cheap PC hardware.

Publications

10 25 50
 
Description Most genes are designed to code for specific proteins which have useful functions in the body. Proteins are essentially strings of simpler molecules, called amino acids and these strings can self-assemble into a complex 3-D structure as soon as the protein is formed by the protein-making machinery (ribosomes) in the cell. It is this unique structure which determines the precise chemical function of the protein (i.e. what is does in the cell and how it does it). By firing X-rays at crystallised proteins, scientists can determine their structure, but this process can take many months or even years. With hundreds of thousands of proteins for which the native structure is unknown, it is not surprising that scientists want to find a clever shortcut to working out the structure of proteins. We, like many other scientists have been trying to "crack the code" of protein structure i.e. working out the rules which govern how the protein finds its unique structure and then trying to program a computer with these rules to allow scientists to quickly "predict" what the structure of their protein of interest might be.

At UCL we have been pioneering a number of approaches to predicting the structure of a protein from amino acid sequence. One of the most successful assembles new protein structures from small pieces of other proteins - a little bit like building a model from Lego(TM) parts. Although we have demonstrated a number of successful attempts at predicting protein structure, the technology is not readily available to bench scientists due to the fact that a lot of computer power is needed to carry out the calculations.

This project exploited an interesting new development that allows any scientist to run these protein folding simulations on his own desktop PC, namely the use of graphics chips to run the simulations many times faster than the PC can on its own. Normally graphics chips allow users to run 3-D games or visualise 3-D environments at very high speed, but recently it has become apparent that these chips are capable of doing a lot more than just drawing 3-D objects. The latest 3-D graphics boards available in the high street for less than 300 pounds are able to do as much work as 30 normal PCs.

Using software developed in this project, any scientist with a cheap PC and graphics card can run folding simulations without needing access to an expensive supercomputer cluster.
Exploitation Route Researchers in academia or industry can download and use our software from our web-based download site. Our work may also serve as an example of how this new technology can help speed up similar scientific calculations. Also, the fact that so much computational can be done with a single desktop PC may also have positive implications for the environment as high throughput computing applications can be run without needing power-hungry and expensive supercomputing clusters.
Sectors Pharmaceuticals and Medical Biotechnology

URL http://bioinfadmin.cs.ucl.ac.uk/downloads/gpufragfold/gpufragfold_tech_report_draft.pdf
 
Description Production of skilled research staff with appropriate transferable skills is probably the single most important deliverable item of impact from this project. The RA has been able to take advantage of UCL's training schemes and career development courses spanning a broad range of topics and themes, including statistics, mathematical packages, writing and presentation skills, personal and professional development and career management. In this case, Sean Ward, the RA, received a UCL Entrepreneur Fellowship to construct a business plan for a new UCL spinout called Synthace (http://www.synthace.com/). Synthace is the UK's first synthetic biology company, and recently the company was awarded a ?500k TSB Synthetic Biology Grant and has also completed a ?1.3m seed funding round.
First Year Of Impact 2012
Sector Agriculture, Food and Drink,Healthcare,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology
Impact Types Economic

 
Title GPUFRAGFOLD 
Description (GPU)FRAGFOLD is a software tool for folding proteins based on the assembly of supersecondary structural fragments taken from highly resolved protein structures using a simulated annealing algorithm. 
Type Of Technology Software 
Year Produced 2009 
Open Source License? Yes  
Impact None to date. 
URL http://bioinfadmin.cs.ucl.ac.uk/downloads/gpufragfold
 
Company Name Synthace 
Description Synthace produces high value, bio-based chemical and biological products through the application of synthetic biology. They are able to rapidly engineer and optimise novel biological production systems. This is achieved through the integration of computational modelling and big data analysis with wet lab experimental design and innovative molecular biology tools. Their bioengineering platform is broadly applicable across multiple industry sectors, including the production of specialty chemicals from renewable feedstocks using efficient and sustainable processes. 
Year Established 2011 
Impact Awarded ?500,000 TSB Synthetic Biology Grant in 2013.
Website http://www.synthace.com