Novel high-throughput high coverage strategies for quantitative mass spectrometry on complex biological samples

Lead Research Organisation: European Bioinformatics Institute
Department Name: Sequence Database Group

Abstract

Aim The aim of the project is to design and test novel high-throughput, high coverage strategies for quantitative mass spectrometry on complex biological samples, thereby bringing proteomics technology closer to the data speed and quality available in the genomics field through gene expression arrays. Techniques involving new labelling compounds with extreme multiplexing capabilities as well as highly efficient data collection will be designed, evaluated in silico and in a real experimental setting on in vitro models. The ultimate goal is to develop novel proteomics high-throughput analysis techniques that allow the deep investigation of complex proteomes to study their expression differences and to investigate the functional role of proteins in normal physiology. As a model for a very complex protein mixture the plasma proteome is used. Introduction In 2006 a state of the art proteomics facility in the Philips Research Lab in Eindhoven has started on the investigation of complex mammalian proteomes, using an ABI 4800 MALDI TOF/TOF and an ABI 4000 Q trap respectively. Extensive sample preparation and pre-fractionation techniques are used to reduce the protein content complexity before analysis by MS and MS/MS techniques. Chemical labelling approaches allow multiplexing of several samples to a single experimental procedure, thereby eliminating many systematic effects. A Philips research scientist has been placed part-time at the proteomics team at the European Bioinformatics Institute to prepare for data analysis. This institute is the European hotspot for biological data; work relevant to this project is on manual and automatic annotation of the human genome and proteome, repositories of proteomics data, protein interactions and text mining. Several methods have been proposed to improve throughput and / or coverage of proteomics experiments. Examples of these are pure software approaches such as building of inclusion and exclusion lists of precursor masses, based on previous experiments in the own lab, as well as on 'collective experience' from experiments performed and made available world-wide. Next, there are existing methods such as the isotopic or isobaric labelling techniques that allow some sample multiplexing. Furthermore, complexity reduction can be reached by the selection of one or at most a few peptides per protein. But to generate a breakthrough in throughput while maintaining or even increasing protein coverage, several innovative chemistries -currently in development- involving a combination of extreme multiplexing techniques and complexity reduction will be investigated. This will be done in three steps: in silico design and test, test on simple protein mixtures and finally application on plasma samples as a model for a complex proteome. Workplan Year 1: Literature survey, internship at the Philips lab in Eindhoven. In silico model of 'MudPit' experiments on plasma, through modelling of digestion, separation steps, post-translational modifications and /where known- protein abundances. Generation of expected survey spectra, including isotope distributions. In silico design and test of labelling and complexity reduction methods, test on simple protein mixtures. Year 2: Test one or more methods on plasma samples, evaluate results in terms of gain in throughput, dynamic range. Year 3: Application of the analysis result to high-throughput quantitative proteomics on 100 or more samples, either on a MALDI approach, or on a electrospray instrument (higher sensitivity, but requires a priori knowledge of fragmentation). Combination and statistical analysis of the experimental results. Year 4: Exploring the data; e.g. for correlations between different proteins due to co-regulation or interaction. Placing the results in the context of current biological understanding of plasma proteomics. Comparison to data from proteomics repositories, protein interaction databases, text mining results

Publications

10 25 50