FPGA supercomputing technology for high-throughput identification and quantitation in proteomics

Lead Research Organisation: University of Liverpool
Department Name: Veterinary Preclinical Science

Abstract

Proteomics is the study of the entire complement of a cell in a particular state. It is the proteins that 'act out' the information in the genome, and we cannot really understand cellular function without a detailed knowledge of the activity, dynamics and interplay between the 'actors'. .However, the science and technology of proteomics does not lend itself to the same highly multiplexed approaches that can be applied to nucleic acids, and strategies for protein identification and quantification are still highly serial, require complex and sometimes arcane data processing, and are slow. We have almost completed a proof-of-concept BBSRC e-science project that aimed to implement two common methods in proteomics: mass spectrum preprocessing and peptide mass fingerprint database searching, as a hardware implementation using reconfigurable computer chips known as field programmable gate arrays (FPGAs). A key feature of this computational platform is that the bioinformatics algorithms which are normally implemented as a software program were translated into optimized digital hardware processors that could process data significantly faster by running multiple analyses in parallel. The successful outcome of this project was a complete implementation that has achieved a phenomenal 2000-fold speed increase. We now wish to build on our previous success, capitalize upon the capabilities we have developed thus far, and deliver similar speed gains to the most commonly used method of proteome analysis, based on tandem mass spectrometry. At the same time, we will address an emergent and pressing need for faster and enhanced quantification to deliver new quantitative approaches and capabilities to proteomics researchers. Such tools are critical if proteomics is to deliver what we expect of it as a science.

Technical Summary

This project aims to develop a high-performance FPGA-based bioinformatics solution for high-throughput LC-MS/MS-based protein identification and quantification. This proposal builds on the results of a successful BBSRC project which has resulted in the development of the first complete reconfigurable computing solution for protein identification. The prototype system has achieved a staggering 2000 fold increase in computational speed compared with a standard software solution. The FPGA-hardware, which incorporates a raw mass spectra processor and a parallel search engine, delivers a match in less than a quarter of a second when searching the entire MSDB protein database. Developing a similar bioinformatics platform to address the computational challenges in tandem mass spectrometry and quantitative proteomics will involve designing an MSMS protein identification engine and a separate quantification engine. The hardware platform will consist of a reconfigurable computing motherboard which can hold three additional FPGA modules. The on board FPGA will be used to perform quantification. An additional FPGA module, with 1Gb SDRAM memory to hold the protein database, will be used to run the search engine. A key feature of the computational platform is the ability to perform computations in hardware which exploit algorithm and instruction parallelism. This leads to significant increases in performance, while retaining much of the flexibility of a software solution. The main challenges relate to redesign, partitioning and mapping the protein identification algorithms on the reconfigurable hardware. The proposed solution will dramatically enhance the efficiency of the proteomics related algorithms. Matching the 2000 fold speed increase achieved with the peptide mass fingerprinting solution would mean that a quantitative analysis that currently takes one hour could be completed in less than two seconds.

Publications

10 25 50
 
Description We tried to use very hardware-specific computational solutions tod enhance the speed of analysis of very complex data sets that derive from the analysis of proteins in the cell. However, the challenge of programming these hardware solutions and the rate of development of proteomics data streams suggest that this is not a viable solution for rapidly deployed and flexible data proteome analysis.
Exploitation Route not likely to find widespread application.
Sectors Electronics

 
Description We demonstrated that FPGA solution can enhance the speed of analysis of proteomics data. However, the challenge of programming an FPGA and the rate of development of proteomics data streams suggest that this is not a viable solution for rapidly deployed and flexible data proteome analysis.
First Year Of Impact 2010
Sector Agriculture, Food and Drink,Electronics