ProteoFormer - a software toolkit for top-down proteomics

Lead Research Organisation: University of Liverpool
Department Name: Electrical Engineering and Electronics

Abstract

Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.

Planned Impact

Our developments will have impacts through the following routes:

- The development of seaMass-TD and Proteoformer will make it more straightforward for top-down analysis to be performed on a much wider range of instruments, producing high-quality and reliable results. This will open up this important technology for studying proteins in their native state in the cell, for basic and applied research across numerous domains.

- Our software has the potential to increase sales of mass spectrometers, capable of performing top-down analysis. In particular, locally we are working with Waters to develop software compatible with their data, since current software has typically been designed for mass spectrometers produced by Thermo.

- We will explore routes for commercialisation of Proteoformer and seaMass-TD, as discussed in the Pathways to Impact document.

- We will work with international consortia aimed with data sharing and standardisation in proteomics - the Proteomics Standards Initiative (PSI), ProteomeXchange and EBI's PRIDE database to ensure that current standards can appropriately handle top-down data, and researchers can submit data to the leading public repositories for community re-analysis.

Publications

10 25 50
 
Description We have developed a new top-down proteomics deconvolution strategy based on seaMass called seaMass-TD. Like Waters MaxEnt and Thermo ReSpect, seaMass-TD is a true mathematical deconvolution of the raw data, modelling the prior belief as a set of constraints (mass relationships between charge states, peak FWHM/shape) and using an iterative method to solve the inverse problem of finding the most probable deconvolution that fits the model. However, seaMass-TD is unique: (a) By learning the range of protein isotope distributions generated from UniProt, relaxed to allow small deviations caused by unknown proteoforms, we enable overlapping proteoform deconvolution whilst also probabilistically outputting a range of monoisotoptic peak candidates for each; (b) A sparse regression approach is used, based on the assumption that there are far fewer proteins in the dataset than datapoints. Improbable proteins are thus eliminated after only a few iterations, hence seaMass-TD is orders of magnitude faster than MaxEnt, allowing it to process at high mass resolution like Xtract/MS-Deconv but on non-isotopically resolved data for the first time; (c) Through implementation of group sparse regression, we allow complete flexibility in the charge state distribution of each proteoform, inferring both the isotope and charge state distribution for each.
Exploitation Route In order to develop the technique further to enable deconvolution of high mass proteins and complex LC-MS data, the method has been used as pilot work for a BBSRC responsive mode application.
Sectors Agriculture, Food and Drink,Environment,Healthcare,Pharmaceuticals and Medical Biotechnology

URL http://www.biospi.org/research/ms/seamass-td/
 
Description Technology developed in this grant is being developed for characterising impurities in oligonucleotide drugs, funded by AstraZeneca
First Year Of Impact 2022
Sector Pharmaceuticals and Medical Biotechnology
 
Description Novel semi-supervised Bayesian learning to rapidly screen new oligonucleotide drugs for impurities
Amount £104,203 (GBP)
Organisation AstraZeneca 
Sector Private
Country United Kingdom
Start 09/2021 
End 09/2025
 
Title seaMass-TD 
Description seaMass-TD is the first method to deconvolute top down proteomics spectra that infers high resolution output on isotopically unresolved input. It extends the Peptide Simplex deisotoping model to whole proteins, and the seaMass sparse inference model to group sparsity in order to link together the full charge state ladders of these whole protein isotope distributions. 
Type Of Technology Software 
Year Produced 2015 
Open Source License? Yes  
Impact Currently an alpha quality version demonstrating its power; impact ongoing. 
URL http://www.biospi.org/research/ms/seamass-td/