High-throughput Differential Expression Proteomics

Lead Research Organisation: Imperial College London

Department Name: Institute of Biomedical Engineering

Abstract

In 2001, a major milestone was reached with the publication of the draft sequence of the human genome. It has now become apparent that there are far fewer protein-coding genes in the human genome than proteins in the human proteome. Whilst the genome is relatively stable, each tissue exhibits radically different protein expression that also changes dynamically over its life cycle and with environmental stimulus. Proteomics is therefore playing a major role in elucidating the functional role of many novel genes and their products, as well as in understanding their involvement in biologically relevant phenotypes both in normal cellular processes and disease. Differential proteomics has become a vital tool in the development of earlier and more accurate screening and diagnostic tests for the detection and treatment of disease. Protein biomarkers are discovered through determination of protein expression that changes uniquely through early progression of a disease state. These biomarkers can then be targeted in the development of non-invasive diagnosis, or used as indicators of the efficacy of new medications in drug discovery. The high-throughput discovery of protein biomarkers and the screening of all human proteins to ascertain their functions and interactions are the two major biology driven challenges in proteomics today.These large-scale challenges are too great for the resources of a single laboratory, so open international collaborations are essential and are being championed by the Human Proteome Organisation (HUPO - http://www.hupo.org/). HUPO is an international consortium that promotes the development and awareness of proteomics research and facilitates scientific collaborations between HUPO members and its initiatives. One such initiative is the Brain Proteome Project (BPP / http://www.hbpp.org/). The aims of the BPP are:- To analyse the brain proteome of human and mouse models in healthy, neurodiseased and aged states with emphasis on Alzheimer's and Parkinson's diseases.- To advance knowledge of neurodiseases and aging for developing new diagnostic approaches and medications.- To make neuroproteomic research and its results available in the scientific community and society.The brain is the most complex tissue of higher organisms, and therefore elucidating the protein complement of the brain is the upper limit of a significant challenge to today's current technologies in proteome analysis. The UK is playing a major role in HUPO, significantly through the HUPO Proteomic Standards Initiative (PSI - http://psidev.sourceforge.net/) hosted by the European Bioinformatics Institute, Hixton, Cambridge. However, the UK is under-represented in the BPP and notably in proteome informatics research as a whole. The two greatest technical barriers to large-scale proteomic analyses are:- The need for considerable expert manual interaction in differential expression proteomics. With conventional techniques errors propagate down the pipeline and so considerable expert manual validation is also required, which adds significant subjectivity.- Marked protocol variation in proteomic workflows between laboratories, leading to heterogeneity of results and therefore challenging results integration and cross-validation issues. To lift these barriers, the proposed fellowship aims to underpin proteomics research with an automated proteome informatics pipeline that:- Integrates the statistical power of multiple replicated experiments in order to elucidate all information, so that the accuracy of differential analysis and expression quantification increases to a level where full automation is possible and subjectively is removed.- Build up a statistical formation model of differential expression proteomics from a history of proteomics experiments, to compare and contrast the sensitivity of subtly different proteomic sample preparation, separation and identification protocols for use in subsequent experiment design.

Funded Value:

£249,919

Funded Period:

Jan 08 - Dec 10

Funder:

EPSRC

Project Status:

Closed

Project Category:

Fellowship

Project Reference:

EP/E03988X/1

Principal Investigator:

Andrew Dowsey

Research Subject:

Info. & commun. Technol. (25%)

Mathematical sciences (25%)

Medical & health interface (10%)

Omic sciences & technologies (20%)

Tools, technologies & methods (20%)

Research Topic:

Artificial Intelligence (25%)

Bioinformatics (20%)

Biomedical neuroscience (10%)

Genomics (20%)

Statistics & Appl. Probability (25%)

Organisations

People	ORCID iD
Andrew Dowsey (Principal Investigator)	http://orcid.org/0000-0002-7404-9128

Publications

Author Name Title

Publication Date Published

10 25 50

Liao H (2014) A new paradigm for clinical biomarker discovery and screening with Mass Spectrometry through biomedical image analysis principles

Dowsey AW (2008) Automated image alignment for 2D gel electrophoresis in a high-throughput proteomics pipeline. in Bioinformatics (Oxford, England)

Chen SS (2011) Cardiovascular magnetic resonance tagging of the right ventricular free wall for the assessment of long axis myocardial function in congenital heart disease. in Journal of cardiovascular magnetic resonance : official journal of the Society for Cardiovascular Magnetic Resonance

Hoogland C (2010) Guidelines for reporting the use of gel image informatics in proteomics. in Nature biotechnology

Dowsey AW (2010) Image analysis tools and emerging algorithms for expression proteomics. in Proteomics

Dowsey AW (2010) Informatics and statistics for analyzing 2-d gel electrophoresis images. in Methods in molecular biology (Clifton, N.J.)

Zhang Y (2015) Streaming visualisation of quantitative mass spectrometry data based on a novel raw signal decomposition method. in Proteomics

Dowsey A (2008) The Future of Large-Scale Collaborative Proteomics in Proceedings of the IEEE

Key Findings
Impact Summary
Further Funding
Research Databases and Models
Collaboration
Software and Technical Products


Description	There is currently a total disconnect between mass spectrometry (MS) expression quantification and downstream goals such as identification, differential analysis and pathway modelling. There is substantial complexity in raw MS data, but it is viewed as confounding rather than a wealth of information to be harnessed. The established approach is reductionist, converting the raw data into a symbolic representation of peaks at the earliest stage, thus propagating errors and failing to present statistical evidence. In this Fellowship I have brought detailed knowledge-based Bayesian methodology right to the raw MS data acquisition stage of the bioinformatics pipeline for the first time. The resulting seaMass framework is the first method to harness a holistic formation model of biological knowledge and physical modelling to describe the formation of raw mass spectra. Because the framework learns the range of isotope distributions possible, it is 15 times more accurate than the ubiquitous averagine model, enabling coincident peptides to be quantified for the first time. With novel use of the appropriate Poisson noise model, I demonstrated accurate separation of mixtures by their morphological diversity. This is also the cornerstone for solving a single Bayesian model that borrows strength across peak shape/skew, periodic chemical baseline and the isotope distribution range at every charge state, leading to a step-change in performance: peptide quantification despite periodic baseline contamination and detection of biologically relevant signals barely discernable from noise. Furthermore, seaMass shows great potential for broader applications: Capability to directly integrate prior knowledge and thus borrow strength across all facets of MS; Direct applicability to metabolomics and other MS modalities; The ability to handle the additional complexities of translational and clinical application. This promise was presented in a subsequently successful application for an MRC Methodology Programme New Investigator Research Grant (NIRG), MR/L011093/1 (2014-2017). The Fellowship has also enabled the forging of significant international collaborations. With a visiting position in Prof. Mike Dunn's facility at University College Dublin and collaboration with Prof. Frederique Lisacek (Swiss Institute of Bioinformatics) we composed a wide-ranging review of informatics for proteomics and book chapter. Moreover, the six months as a visiting researcher in the Texas Medical Centre enabled a close working environment with cutting edge clinical biochemistry practitioners. In particular, the Fellowship kick-started a long-term synergy with MD Anderson Biostatistician Prof. Jeffrey Morris, whose signal-based Functional Mixed Modelling (FMM) approach is a direct complement to the seaMass framework. This collaboration would go on to bear fruit thanks to BBSRC award BB/K004158/1 (2013-2014).
Exploitation Route	seaMass significantly improves a fundamental step in the interpretation of mass spectrometry data, which is used pervasively in industry as well as academia. With collaborations in proteomics, metabolomics and translational medicine at the Centre of Advanced Discovery and Experimental Therapeutics (CADET), University of Manchester, we are applying seaMass to advanced proteomic and metabolomic workflows. This will provide an exemplar and comprehensive validation for subsequent dissemination to the omics community at large.
Sectors	Agriculture, Food and Drink,Environment,Healthcare,Pharmaceuticals and Medical Biotechnology
URL	http://www.seamass.net/


Description	This Fellowship provided the basic research underpinning by subsequent promotion to Lecturer and my current BBSRC and MRC research programme (BB/K004158/1, BB/K016733/1, BB/L018616/1, BB/L018462/1, MR/L011093/1). Work towards economic and societal impact is ongoing.


Description	Investing in Success
Amount	£3,500 (GBP)
Organisation	University of Manchester
Sector	Academic/University
Country	United Kingdom
Start	05/2012
End	06/2012


Description	University of Liverpool EPSRC Impact Accelerator
Amount	£21,844 (GBP)
Organisation	University of Liverpool
Sector	Academic/University
Country	United Kingdom
Start	04/2016
End	06/2016


Title	The Peptide Simplex
Description	A new type of feature detection in mass spectra which is able to detect and quantify overlapping features as well as those barely discernible above the noise floor.
Type Of Material	Computer model/algorithm
Year Produced	2010
Provided To Others?	Yes
Impact	Provided the proof-of-concept basis for grant awards BB/K004158/1, BB/L018616/1 and MR/L011093/1.
URL	http://www.cadetbioinformatics.org/research/ms/peptide-simplex/


Description	Prof Jeffrey Morris
Organisation	University of Texas
Department	M. D. Anderson Cancer Center
Country	United States
Sector	Academic/University
PI Contribution	Translation of Prof Morris' Wavelet Functional Mixed Model methodology to the proteomics LC-MS (Liquid Chromatography - Mass Spectrometry) field.
Collaborator Contribution	Access to Prof Morris' expertise and unpublished methodology in order to create our novel differential analysis workflow for raw LC-MS data.
Impact	Two publications [Liao et al, IEEE ISBI 2014; Dowsey et al Proteomics, 2010, 4226-57] plus a successful submission to the September 2014 BBSRC Bilateral NSF/BIO-BBSRC responsive mode call [BB/M024954/1].
Start Year	2009


Title	seaMass
Description	The seaMass software is our open source dissemination route for the LC-MS (Liquid Chromatography - Mass Spectrometry) analysis algorithms developed by our group, including signal restoration and visualisation.
Type Of Technology	Software
Year Produced	2014
Open Source License?	Yes
Impact	The software has only recently been released, but there is strong interest for its incorporation into the ProteoSuite's consortium's BBSRC BBR funded user-centric proteomics software (http://www.proteosuite.org/?q=aboutus).
URL	http://www.biospi.org/research/ms/seamass/

Abstract

Organisations

People

ORCID iD

Publications