MICA: Delivering a production platform and atlas for next-generation biomarker discovery, validation and assay development in clinical proteomics
Lead Research Organisation:
University of Bristol
Department Name: Clinical Veterinary Science
Abstract
The genomic revolution has advanced medical science to an important tipping point. We can now attempt to understand the complex interactions between the molecular building blocks of life that control human function and how they are perturbed and break down under disease. These perturbations and dysfunctions can lead to tell-tale biomolecular signals in our cells and tissues, often linked to changes in the underlying genetic code and other physiological characteristics. Since each individual case can be different, a single, common treatment option may not be effective or safe for all patients. This has led to the concept of stratified medicine, where different treatments are associated with the different molecule signatures of the individual patients. Critical to the success of such an approach is a diagnostic programme that can reliably characterise these molecules, ideally the proteins, for early disease detection and subsequent stratification based on drug safety or efficacy. The push to systematically discover these so called 'biomarkers' has been enhanced through the establishment of a number of large-scale facilities worldwide, including the MRC-funded Stoller Biomarker Discovery Centre (SBDC) in Manchester. The SBDC is a £25M facility that combines the latest instrumentation and techniques for high-throughput profiling of proteins, validation of candidate biomarker sets on thousands of samples, through to the development of clinical tests ('assays') for the routine measurement of individual biomarkers in clinic.
The SBDC uses mass spectrometry (MS) to do this, a pervasive technique for gaining a snapshot of a sample, which measures each constituent compound's mass and quantity e.g. for profiling proteins - 'proteomics'. The SBDC and other recently launched centres employ a new strategy for MS called Data-Independent Acquisition (DIA). DIA produces a comprehensive digital record of the sample, unlike previous approaches potentially enabling the identification and quantification of all detectable proteins. Nevertheless, due to biological variations, it is necessary to analyse multiple samples to get a reliable understanding of patient populations. The DIA-based SWATH-MS approach from SCIEX Ltd. has generated considerable clinical interest as it enables reliable and reproducible monitoring of potential biomarkers over thousands of samples. SWATH-MS, like all clinical MS approaches, must digest proteins to smaller peptides for analysis. However, this leads to challenges for both SWATH-MS analysis and the development of clinical assays with MS, when selecting reproducible peptide(s) to base the test upon.
We have recently developed a new statistical (Bayesian) modelling approach to assess peptide reproducibility, and a fundamentally novel workflow for biomarker discovery that for the first time performs statistical modelling on the unprocessed data delivering a significant performance increase. The purpose of this project is to exploit the sensitivity of our workflow to deliver a robust, production quality biomarker discovery and validation software platform for routine use by the SBDC and beyond. Moreover, since the SBDC will analyse up to 12,000 samples per annum and is underpinned by rigorous standard operating procedures controlling sample collection, preparation and analysis, it also provides a unique opportunity to collate and understand the biological and experimental variation in protein levels across vast patient populations, in health and disease. We will build an 'atlas' of this variation, stratified across genetic, physiological and other clinical data. To achieve this, we will combine 'big data' computing approaches and web infrastructure. The atlas will enable biomarker verification and peptide characterisation for assay development much earlier in the pipeline than is currently possible, and realise further step-change improvements in the sensitivity and specificity of our discovery platform.
The SBDC uses mass spectrometry (MS) to do this, a pervasive technique for gaining a snapshot of a sample, which measures each constituent compound's mass and quantity e.g. for profiling proteins - 'proteomics'. The SBDC and other recently launched centres employ a new strategy for MS called Data-Independent Acquisition (DIA). DIA produces a comprehensive digital record of the sample, unlike previous approaches potentially enabling the identification and quantification of all detectable proteins. Nevertheless, due to biological variations, it is necessary to analyse multiple samples to get a reliable understanding of patient populations. The DIA-based SWATH-MS approach from SCIEX Ltd. has generated considerable clinical interest as it enables reliable and reproducible monitoring of potential biomarkers over thousands of samples. SWATH-MS, like all clinical MS approaches, must digest proteins to smaller peptides for analysis. However, this leads to challenges for both SWATH-MS analysis and the development of clinical assays with MS, when selecting reproducible peptide(s) to base the test upon.
We have recently developed a new statistical (Bayesian) modelling approach to assess peptide reproducibility, and a fundamentally novel workflow for biomarker discovery that for the first time performs statistical modelling on the unprocessed data delivering a significant performance increase. The purpose of this project is to exploit the sensitivity of our workflow to deliver a robust, production quality biomarker discovery and validation software platform for routine use by the SBDC and beyond. Moreover, since the SBDC will analyse up to 12,000 samples per annum and is underpinned by rigorous standard operating procedures controlling sample collection, preparation and analysis, it also provides a unique opportunity to collate and understand the biological and experimental variation in protein levels across vast patient populations, in health and disease. We will build an 'atlas' of this variation, stratified across genetic, physiological and other clinical data. To achieve this, we will combine 'big data' computing approaches and web infrastructure. The atlas will enable biomarker verification and peptide characterisation for assay development much earlier in the pipeline than is currently possible, and realise further step-change improvements in the sensitivity and specificity of our discovery platform.
Technical Summary
We have recently demonstrated a fundamentally novel workflow for mass spectrometry (MS) discovery proteomics, which for the first time performs statistical modelling on the unprocessed full MS data for a significant increase in differential expression detection sensitivity. However, in conventional MS with Data-Dependent Acquisition, only the most intense signals are targeted for fragmentation and therefore available for identification. Consequently, potential biomarkers may not be identified. Recently, Data-Independent Acquisition (DIA) approaches have emerged where the whole mass range is systematically fragmented to produce a comprehensive digital record of the proteome. The SWATH approach from SCIEX has generated considerable clinical research interest, enabling multiplexed targeted analysis on large sets of hypothetical biomarkers, plus potential reanalysis of existing datasets with new candidate biomarkers as they arise. For this reason, the new MRC-funded Stoller Biomarker Discovery Centre (SBDC) at Manchester is centred upon a fleet of SWATH instruments. The purpose of this project is to exploit the quantitative sensitivity of our statistical workflow to deliver a robust, production quality DIA biomarker discovery software platform for routine use by the SBDC and beyond. From this, we will develop the computational methodology necessary for representing, parameterising and organising sum knowledge of protein quantitation across a history of past studies, to generate and openly disseminate a stratified probabilistic atlas of protein and peptide variation across health and disease. The atlas will enable candidate biomarker verification and peptide characterisation for clinical selected reaction monitoring assay development much earlier in the development pipeline than is currently possible, and realise further step-change improvements in the sensitivity and specificity of our discovery and validation platform through borrowing strength from the atlas.
Planned Impact
Both the FDA and EMA now actively encourage the co-development of biomarkers and companion diagnostics alongside therapeutics, and several prescription drugs now require biomarker-based tests to be performed prior to prescription. This focus on stratified and personalised medicine is now vital for streamlining the hugely expensive drug development process, while making therapies more targeted, safer and ensuring the right drug is given to the right patient as early into the disease course as possible. Impacts will potentially be seen economically and societally in reduced costs and increased efficacy of current and new biomarkers, with more sensitive early diagnosis. There is therefore considerable potential for indirect benefits across the spectrum of public health, treatment and quality of life for patients and the general public in the UK and abroad. It is also possible that tertiary disease processes and safety factors could be identified which otherwise would go unnoticed, avoiding misallocation of resources or delivering further novel breakthroughs.
The atlas also has a significant potential impact on molecular pathology. To date, most assays are generally based around antibodies, despite recent high-profile concerns with repeatability. Given this, targeted mass spectrometry (MS) offers significant advantages and has a clear role either to validate biomarkers in large clinical cohorts prior to costly antibody-based assay development, or for the clinical assay itself. Nevertheless, MS assay development is difficult; selected peptides need to be both specific, offer sufficiently sensitive detection on the MS, and be recoverable from the biological sample in a highly reliable and reproducible manner. We have shown previously that recovery of peptides from the same protein can vary widely, and that the 'obvious' peptides are not always the best for this purpose. There is a lack of critical data on peptide recovery and reproducibility which is required for optimal peptide selection and rapid and robust assay development. The use of a quantitative atlas generated on large numbers of samples has the capacity to provide this key information on a proteome-wide scale and dramatically improve the quality of these assays.
The proposed research has significant prospective impact for the mass spectrometry industry and associated proteomics vendors, who have a strong presence in the UK. The biomarker discovery platform will increase the amount of usable data extracted from MS and therefore correspondingly increase users' return on investment. This will make commercial mass spectrometry instrumentation, which requires a considerable capital and running costs, more attractive. In particular, we hope this extra research capacity will attract a wider uptake of mass spectrometry in translational research in industry and academia, as well as a wider audience of users and uses.
The atlas also has a significant potential impact on molecular pathology. To date, most assays are generally based around antibodies, despite recent high-profile concerns with repeatability. Given this, targeted mass spectrometry (MS) offers significant advantages and has a clear role either to validate biomarkers in large clinical cohorts prior to costly antibody-based assay development, or for the clinical assay itself. Nevertheless, MS assay development is difficult; selected peptides need to be both specific, offer sufficiently sensitive detection on the MS, and be recoverable from the biological sample in a highly reliable and reproducible manner. We have shown previously that recovery of peptides from the same protein can vary widely, and that the 'obvious' peptides are not always the best for this purpose. There is a lack of critical data on peptide recovery and reproducibility which is required for optimal peptide selection and rapid and robust assay development. The use of a quantitative atlas generated on large numbers of samples has the capacity to provide this key information on a proteome-wide scale and dramatically improve the quality of these assays.
The proposed research has significant prospective impact for the mass spectrometry industry and associated proteomics vendors, who have a strong presence in the UK. The biomarker discovery platform will increase the amount of usable data extracted from MS and therefore correspondingly increase users' return on investment. This will make commercial mass spectrometry instrumentation, which requires a considerable capital and running costs, more attractive. In particular, we hope this extra research capacity will attract a wider uptake of mass spectrometry in translational research in industry and academia, as well as a wider audience of users and uses.
Publications
Bhamber RS
(2021)
mzMLb: A Future-Proof Raw Mass Spectrometry Data Format Based on Standards-Compliant mzML and Optimized for Speed and Storage Requirements.
in Journal of proteome research
Brignoli T
(2022)
Diagnostic MALDI-TOF MS can differentiate between high and low toxic Staphylococcus aureus bacteraemia isolates as a predictor of patient outcome.
in Microbiology (Reading, England)
Deutsch EW
(2018)
Expanding the Use of Spectral Libraries in Proteomics.
in Journal of proteome research
Dowsey A
(2017)
The need for statistical contributions to bioinformatics at scale, with illustration to mass spectrometry
in Statistical Modelling
Kassab S
(2019)
Cognitive dysfunction in diabetic rats is prevented by pyridoxamine treatment. A multidisciplinary investigation
in Molecular Metabolism
Mcharg S
(2022)
Mast cell infiltration of the choroid and protease release are early events in age-related macular degeneration associated with genetic risk at both chromosomes 1q32 and 10q26.
in Proceedings of the National Academy of Sciences of the United States of America
Philbert SA
(2021)
Widespread severe cerebral elevations of haptoglobin and haemopexin in sporadic Alzheimer's disease: Evidence for a pervasive microvasculopathy.
in Biochemical and biophysical research communications
Phillips AM
(2023)
Uncertainty-Aware Protein-Level Quantification and Differential Expression Analysis of Proteomics Data with seaMass.
in Methods in molecular biology (Clifton, N.J.)
Description | Artificial Intelligence for bacterial subtype and resistance identification from clinical MALDI-ToF to accelerate optimal prescribing and inform on phage susceptibility, Welcome Trust Translational Partnership Award |
Amount | £39,959 (GBP) |
Organisation | Wellcome Trust |
Sector | Charity/Non Profit |
Country | United Kingdom |
Start | 07/2023 |
End | 04/2024 |
Description | Identification of hazardous chemical and biological contamination on surfaces using spectral signatures |
Amount | £44,891 (GBP) |
Organisation | Defence Science & Technology Laboratory (DSTL) |
Sector | Public |
Country | United Kingdom |
Start | 09/2021 |
End | 02/2022 |
Description | Novel semi-supervised Bayesian learning to rapidly screen new oligonucleotide drugs for impurities |
Amount | £104,203 (GBP) |
Organisation | AstraZeneca |
Sector | Private |
Country | United Kingdom |
Start | 08/2021 |
End | 09/2025 |
Title | BayesProt v1.0 |
Description | BayesTraq: a Bayesian mixed-effects model for protein quantification in iTraq clinical proteomics |
Type Of Technology | Software |
Year Produced | 2015 |
Open Source License? | Yes |
Impact | Significantly improves the sensitivity and robustness of differential analysis in iTraq proteomics |
URL | http://www.biospi.org/research/ms/bayestraq/ |
Title | mzMLb |
Description | A |
Type Of Technology | Software |
Year Produced | 2018 |
Impact | Proteomics Standards Initiative standards compatible binary mass spectrometry data format for efficient read/write speed and storage space requirements |
URL | https://github.com/biospi/mzmlb |
Title | seaMass |
Description | The seaMass software is our open source dissemination route for the LC-MS (Liquid Chromatography - Mass Spectrometry) analysis algorithms developed by our group, including signal restoration and visualisation. |
Type Of Technology | Software |
Year Produced | 2014 |
Open Source License? | Yes |
Impact | The software has only recently been released, but there is strong interest for its incorporation into the ProteoSuite's consortium's BBSRC BBR funded user-centric proteomics software (http://www.proteosuite.org/?q=aboutus). |
URL | http://www.biospi.org/research/ms/seamass/ |
Title | seaMass sigma/delta (aka BayesProt v2.0) |
Description | The seaMass suite of tools for quantification and differential expression analysis in mass spectrometry proteomics. The current status of the individual tools is: seaMass-S: A Bayesian protein group-level quantification technique univerally supporting label-free, SILAC, iTraq/TMT and DIA data. Currently we support proteomics input from Waters Progenesis (label-free), SCIEX ProteinPilot (iTraQ), Thermo ProteomeDiscoverer (SILAC/TMT) and OpenSWATH (SWATH). Other packages supported on request (MaxQuant coming soon). The model provides automatic quality control by downweighting problematic samples and peptides/features, can scale to massive study sizes, and propagates quantification uncertainty downstream to the differential expression analysis stage. seaMass-?: Bayesian normalisation and differential expression analysis on the output of seaMass-S. By harnessing the generic MCMCglmm package for Bayesian mixed-effect modelling, the tool allows the user to perform many kinds of univariate analysis on the data, from simple Welch's t-tests and two-way ANOVA to timecourse and multi-level models. Studies analysed with earlier versions of seaMass-S and seaMass-? are published in [Freeman et al, Diabetes, 2016], [Xu et al, Nature Comms Biology, 2019] and [Kassab et al, Molecular Metabolism, 2019]. |
Type Of Technology | Software |
Year Produced | 2020 |
Impact | Currently used by partners in the grant |
URL | https://github.com/biospi/seamass |
Description | TEDx style talk on Biomarker Discovery for SCIEX, National Gallery, London |
Form Of Engagement Activity | A broadcast e.g. TV/radio/film/podcast (other than news/press) |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Industry/Business |
Results and Impact | Recorded a TEDx style talk for SCIEX under their 'Talk Precision Medicine' initiative. This exclusive series of talks offers unique insights from thought leaders and researchers in the field of precision medicine, posing and discussing important questions about current challenges and future direction. |
Year(s) Of Engagement Activity | 2022 |
URL | https://sciex.com/landing-pages/talkprecisionmedicine |