High-throughput Differential Expression Proteomics

Lead Research Organisation: Imperial College London
Department Name: Institute of Biomedical Engineering


In 2001, a major milestone was reached with the publication of the draft sequence of the human genome. It has now become apparent that there are far fewer protein-coding genes in the human genome than proteins in the human proteome. Whilst the genome is relatively stable, each tissue exhibits radically different protein expression that also changes dynamically over its life cycle and with environmental stimulus. Proteomics is therefore playing a major role in elucidating the functional role of many novel genes and their products, as well as in understanding their involvement in biologically relevant phenotypes both in normal cellular processes and disease. Differential proteomics has become a vital tool in the development of earlier and more accurate screening and diagnostic tests for the detection and treatment of disease. Protein biomarkers are discovered through determination of protein expression that changes uniquely through early progression of a disease state. These biomarkers can then be targeted in the development of non-invasive diagnosis, or used as indicators of the efficacy of new medications in drug discovery. The high-throughput discovery of protein biomarkers and the screening of all human proteins to ascertain their functions and interactions are the two major biology driven challenges in proteomics today.These large-scale challenges are too great for the resources of a single laboratory, so open international collaborations are essential and are being championed by the Human Proteome Organisation (HUPO - http://www.hupo.org/). HUPO is an international consortium that promotes the development and awareness of proteomics research and facilitates scientific collaborations between HUPO members and its initiatives. One such initiative is the Brain Proteome Project (BPP / http://www.hbpp.org/). The aims of the BPP are:- To analyse the brain proteome of human and mouse models in healthy, neurodiseased and aged states with emphasis on Alzheimer's and Parkinson's diseases.- To advance knowledge of neurodiseases and aging for developing new diagnostic approaches and medications.- To make neuroproteomic research and its results available in the scientific community and society.The brain is the most complex tissue of higher organisms, and therefore elucidating the protein complement of the brain is the upper limit of a significant challenge to today's current technologies in proteome analysis. The UK is playing a major role in HUPO, significantly through the HUPO Proteomic Standards Initiative (PSI - http://psidev.sourceforge.net/) hosted by the European Bioinformatics Institute, Hixton, Cambridge. However, the UK is under-represented in the BPP and notably in proteome informatics research as a whole. The two greatest technical barriers to large-scale proteomic analyses are:- The need for considerable expert manual interaction in differential expression proteomics. With conventional techniques errors propagate down the pipeline and so considerable expert manual validation is also required, which adds significant subjectivity.- Marked protocol variation in proteomic workflows between laboratories, leading to heterogeneity of results and therefore challenging results integration and cross-validation issues. To lift these barriers, the proposed fellowship aims to underpin proteomics research with an automated proteome informatics pipeline that:- Integrates the statistical power of multiple replicated experiments in order to elucidate all information, so that the accuracy of differential analysis and expression quantification increases to a level where full automation is possible and subjectively is removed.- Build up a statistical formation model of differential expression proteomics from a history of proteomics experiments, to compare and contrast the sensitivity of subtly different proteomic sample preparation, separation and identification protocols for use in subsequent experiment design.
Description There is currently a total disconnect between mass spectrometry (MS) expression quantification and downstream goals such as identification, differential analysis and pathway modelling. There is substantial complexity in raw MS data, but it is viewed as confounding rather than a wealth of information to be harnessed. The established approach is reductionist, converting the raw data into a symbolic representation of peaks at the earliest stage, thus propagating errors and failing to present statistical evidence. In this Fellowship I have brought detailed knowledge-based Bayesian methodology right to the raw MS data acquisition stage of the bioinformatics pipeline for the first time. The resulting seaMass framework is the first method to harness a holistic formation model of biological knowledge and physical modelling to describe the formation of raw mass spectra. Because the framework learns the range of isotope distributions possible, it is 15 times more accurate than the ubiquitous averagine model, enabling coincident peptides to be quantified for the first time. With novel use of the appropriate Poisson noise model, I demonstrated accurate separation of mixtures by their morphological diversity. This is also the cornerstone for solving a single Bayesian model that borrows strength across peak shape/skew, periodic chemical baseline and the isotope distribution range at every charge state, leading to a step-change in performance: peptide quantification despite periodic baseline contamination and detection of biologically relevant signals barely discernable from noise.

Furthermore, seaMass shows great potential for broader applications: Capability to directly integrate prior knowledge and thus borrow strength across all facets of MS; Direct applicability to metabolomics and other MS modalities; The ability to handle the additional complexities of translational and clinical application. This promise was presented in a subsequently successful application for an MRC Methodology Programme New Investigator Research Grant (NIRG), MR/L011093/1 (2014-2017).

The Fellowship has also enabled the forging of significant international collaborations. With a visiting position in Prof. Mike Dunn's facility at University College Dublin and collaboration with Prof. Frederique Lisacek (Swiss Institute of Bioinformatics) we composed a wide-ranging review of informatics for proteomics and book chapter. Moreover, the six months as a visiting researcher in the Texas Medical Centre enabled a close working environment with cutting edge clinical biochemistry practitioners. In particular, the Fellowship kick-started a long-term synergy with MD Anderson Biostatistician Prof. Jeffrey Morris, whose signal-based Functional Mixed Modelling (FMM) approach is a direct complement to the seaMass framework. This collaboration would go on to bear fruit thanks to BBSRC award BB/K004158/1 (2013-2014).
Exploitation Route seaMass significantly improves a fundamental step in the interpretation of mass spectrometry data, which is used pervasively in industry as well as academia. With collaborations in proteomics, metabolomics and translational medicine at the Centre of Advanced Discovery and Experimental Therapeutics (CADET), University of Manchester, we are applying seaMass to advanced proteomic and metabolomic workflows. This will provide an exemplar and comprehensive validation for subsequent dissemination to the omics community at large.
Sectors Agriculture, Food and Drink,Environment,Healthcare,Pharmaceuticals and Medical Biotechnology

URL http://www.seamass.net/
Description This Fellowship provided the basic research underpinning by subsequent promotion to Lecturer and my current BBSRC and MRC research programme (BB/K004158/1, BB/K016733/1, BB/L018616/1, BB/L018462/1, MR/L011093/1). Work towards economic and societal impact is ongoing.
Description Investing in Success
Amount £3,500 (GBP)
Organisation University of Manchester 
Sector Academic/University
Country United Kingdom
Start 05/2012 
End 06/2012
Description University of Liverpool EPSRC Impact Accelerator
Amount £21,844 (GBP)
Organisation University of Liverpool 
Sector Academic/University
Country United Kingdom
Start 04/2016 
End 06/2016
Title The Peptide Simplex 
Description A new type of feature detection in mass spectra which is able to detect and quantify overlapping features as well as those barely discernible above the noise floor. 
Type Of Material Computer model/algorithm 
Year Produced 2010 
Provided To Others? Yes  
Impact Provided the proof-of-concept basis for grant awards BB/K004158/1, BB/L018616/1 and MR/L011093/1. 
URL http://www.cadetbioinformatics.org/research/ms/peptide-simplex/
Description Prof Jeffrey Morris 
Organisation University of Texas
Department M. D. Anderson Cancer Center
Country United States 
Sector Academic/University 
PI Contribution Translation of Prof Morris' Wavelet Functional Mixed Model methodology to the proteomics LC-MS (Liquid Chromatography - Mass Spectrometry) field.
Collaborator Contribution Access to Prof Morris' expertise and unpublished methodology in order to create our novel differential analysis workflow for raw LC-MS data.
Impact Two publications [Liao et al, IEEE ISBI 2014; Dowsey et al Proteomics, 2010, 4226-57] plus a successful submission to the September 2014 BBSRC Bilateral NSF/BIO-BBSRC responsive mode call [BB/M024954/1].
Start Year 2009
Title seaMass 
Description The seaMass software is our open source dissemination route for the LC-MS (Liquid Chromatography - Mass Spectrometry) analysis algorithms developed by our group, including signal restoration and visualisation. 
Type Of Technology Software 
Year Produced 2014 
Open Source License? Yes  
Impact The software has only recently been released, but there is strong interest for its incorporation into the ProteoSuite's consortium's BBSRC BBR funded user-centric proteomics software (http://www.proteosuite.org/?q=aboutus). 
URL http://www.biospi.org/research/ms/seamass/