Statistical Methodology for the Design and Analysis of Protein Mass Spectrometry Studies

Lead Research Organisation: University of Birmingham
Department Name: Cancer Sciences

Abstract

Current key goals in cancer research are to identify molecular information that can be used to (a) diagnose and stage disease, (b) monitor response to treatment, (c) predict which patients will respond to therapy and (d) predict patient outcome. This will enable the potential early detection and improved staging of cancer. In addition it will enable treatments to be targeted at individuals who have some chance of benefiting and continued in those showing a worthwhile level of improvement. The proteins that are circulating in the blood are a possible source of information. Using a very small amount of blood, current technology is able to generate a ?proteomic signature? for a patient, which gives information on the concentration of thousands of different proteins that are present in the sample. The aim is then to compare samples from different types of patients, for example cancer versus non-cancer or responder versus non-responder, to determine if any of proteins differ in their concentration. If such proteins are identified they could be used to aid the diagnosis of cancer or inform appropriate treatment for future patients.

Statistical analysis of data is an important part of any research process, enabling conclusions regarding scientific hypotheses to be drawn from the data in association with a measure of uncertainty. In the field of proteomics, valid and efficient statistical design and analysis are crucial to ensure that this scientific research provides robust and convincing conclusions. This is a new and rapidly developing scientific field that could have a direct impact on patient care and there is a great need for research to explore the statistical methodology associated with such studies. This is a hugely challenging area for statisticians and the methods that are already available needs to be assessed in terms of their specific application to proteomic data. The findings of this research will only be directly relevant to specialists in this field but ultimately this will result in increased confidence that differences detected in proteomic signatures between groups of patients are true biological differences, enabling the technology to benefit patient care in the future.

Technical Summary

Key current goals in cancer research are to identify molecular information that can be used to (a) diagnose and stage disease, (b) monitor response to treatment, (c) predict which patients will respond to therapy and (d) predict patient outcome. This will enable the potential early detection and improved staging of cancer. In addition it will enable treatments to be targeted at individuals who are likely to benefit and continued in those showing some worthwhile level of improvement. The proteins that are present in tissue or circulating in serum or urine are a possible source of information. Surface-enhanced laser desorption and ionisation time-of-flight (SELDI-TOF) mass spectrometry technology is used to produce a proteomic ?signature? for patients from their biological sample. It is hypothesised that features within these signatures may be used to distinguish between different classes of patients. The identification of such proteins may also permit unique insights into the biology of cancers and be a source of novel therapeutic targets. Technologies, such as SELDI, that are applicable to serum/plasma are particularly appropriate because they are minimally invasive, permit time-dependent studies and pose less acquisition problems that tissue-based approaches.

The potential of SELDI technology has been demonstrated but there is some scepticism surrounding the validity of the results. It is crucial that the generation and analysis of proteomic data is sound enough to be confident that any differences detected between clinical groups in terms of proteomic spectra are true biological differences and not just artefacts associated with the technology. The aim of this research is to investigate the methodological aspects of proteomic analysis within a statistical framework to give insight into the validity of results emanating from this rapidly expanding area of scientific research.

The research will investigate methodology for the statistical design of protein mass spectrometry experiments, addressing issues such as the different sources of experimental variation and number of replicates. Methodology used to process proteomic spectra prior to data analysis, specifically baseline subtraction, normalisation, peak alignment and peak-picking will all be reviewed and developed. Statistical methodology for class comparison and class prediction in their application to proteomic data will be reviewed and expanded to include other clinical and genomic data alongside the proteomic data. Methodology will be assessed and developed using a data from various lung, liver, colorectal and bladder studies.

Publications

10 25 50
 
Title clippda: A package for clinical proteomic profiling data analysis 
Description This is an R package for the analysis of data from clinical proteomic profiling studies. The focus is on the studies of human subjects, which are often observational case-control by design and have technical replicates. A method for sample size determination for planning these studies is proposed. It incorporates routines for adjusting for the expected heterogeneities and imbalances in the data and the within-sample replicate correlations. 
Type Of Material Improvements to research infrastructure 
Year Produced 2009 
Provided To Others? Yes  
Impact The package is still under development but has been peer-reviewed and accepted by Bioconductor and made available at: http://www.bioconductor.org/packages/devel/bioc/html/clippda.html 
URL http://www.bioconductor.org/packages/devel/bioc/html/clippda.html
 
Title clippda: a package for clinical proteomic profiling data analysis 
Description A software package for R (see previous section for details) 
IP Reference  
Protection Protection not required
Year Protection Granted
Licensed No
Impact The freely available software package provides a tool for researchers designing proteomic studies