BBSRC-NSF/BIO PTMeXchange: Globally harmonized re-analysis and sharing of data on post-translational modifications

Lead Research Organisation: University of Liverpool

Department Name: Institute of Integrative Biology

Abstract

Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.

Technical Summary

The types and sites of post-translational modifications (PTMs) on proteins are rich and diverse, providing cells with a rapid mechanism for adapting function under different conditions. PTMs are widely studied across all areas of fundamental and applied life sciences research. Proteomics approaches using mass spectrometry (MS) provide the sole high-throughput means to detect and localize protein PTMs. Despite their biological importance, PTM-relevant data is collated in the public domain via disparate resources, with a lack of data provenance. An efficient way to improve the situation is to make PTM information derived from proteomics approaches available through UniProtKB (http://www.uniprot.org/), the world-leading protein-knowledgebase. There are hundreds of relevant PTM proteomics datasets in the public domain since the proteomics community is now widely embracing open data policies (e.g. through the resources PRIDE and PeptideAtlas, part of the ProteomeXchange consortium).
We will develop and deploy in the cloud open and reproducible pipelines to re-analyse consistently hundreds of PTM relevant public datasets coming from human and the main model organisms. Complementary analysis approaches will be used: primarily standard protein database-based but also spectral library-based and open modification searches. Special attention will be devoted to ensuring that PTM localization is accurate and community guidelines will be developed with that goal in mind. These data will be widely disseminated to UniProtKB and other knowledge-bases (e.g. neXtProt) and made available at PRIDE, PeptideAtlas, and a new resource PTMeXchange. These new PTM data will be integrated across studies, to increase statistical power at an unprecedented scale and accuracy. Finally, we will perform several following demonstration studies to understand PTM motifs, function and evolution.

Planned Impact

There is the potential for the following impacts:

- The biggest potential impact is on Pharma, within which there are many efforts in drug design to target cell signalling, and PTMs. The results will inevitably feed into improved understanding of processes and potentially generating new targets. There is also potential for indirect benefits in the biotech industry (improved understanding of PTMs in fungi) and Agrifood (PTMs on plants), e.g. derived through inference of site conservation from model organisms.

- Software vendors or pharmaceutical research and development teams will benefit, since we envisage they may wish to take up our software for local pipelines (e.g. deployed in their own cloud environments). It is important to highlight that all the software developed during the proposal will be open source.

- Research councils and charities funding research will benefit through the potential for increased impact of the mass spectrometry (MS)-based proteomics projects they fund, thanks to the re-analysis of public proteomics datasets and the integration of novel PTM proteomics data in UniProtKB.

- More broadly, as proteomics is a key technology in the Life Sciences, there is the potential for considerable indirect benefits across a wide range of areas in basic biology, biomedical and clinical science, as more value will be derived from datasets.

- Life scientists worldwide will be able to benefit from the training activities planned (both face-to-face and via on-line resources).

Staff employed will benefit:

- Further training in one key enabling technology for the BBSRC (proteomics) and exposure to a multi-disciplinary team, and to conferences, workshops and new national and International collaborations.

- Acquire skills needed to work with bioinformatics software in a cloud environment, something that is getting increasingly important with the growing size of datasets and the need of suitable IT infrastructure. The team will also use cutting edge machine learning methods in WP4, which are skills hugely in demand in academic research and industry.

Funded Value:

£310,483

Funded Period:

Sep 19 - Jun 23

Funder:

BBSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

BB/S017054/1

Principal Investigator:

Andrew Jones

Research Subject:

Omic sciences & technologies (60%)

Tools, technologies & methods (40%)

Research Topic:

Bioinformatics (40%)

Proteomics (60%)

Organisations

University of Liverpool (Lead Research Organisation)

People	ORCID iD
Andrew Jones (Principal Investigator)

Publications

Author Name

Title Publication Date Published

|< < 1 2 > >|

10 25 50

Camacho O (2022) Assessing multiple evidence streams to decide on confidence for identification of post-translational modifications, within and across data sets

Camacho OJM (2024) Phosphorylation in the Plasmodium falciparum Proteome: A Meta-Analysis of Publicly Available Data Sets. in Journal of proteome research

Camacho OM (2023) Assessing Multiple Evidence Streams to Decide on Confidence for Identification of Post-Translational Modifications, within and Across Data Sets. in Journal of proteome research

Combe C (2023) mzIdentML 1.3.0 - Essential progress on the support of crosslinking and other identifications based on multiple spectra

Combe C (2024) mzIdentML 1.3.0 - Essential progress on the support of crosslinking and other identifications based on multiple spectra in PROTEOMICS

Combe, Colin W. (2024) mzIdentML 1.3.0-Essential progress on the support of crosslinking and other identifications based on multiple spectra

Daly LA (2023) Custom Workflow for the Confident Identification of Sulfotyrosine-Containing Peptides and Their Discrimination from Phosphopeptides. in Journal of proteome research

Deutsch EW (2023) Proteomics Standards Initiative at Twenty Years: Current Activities and Future Work. in Journal of proteome research

Deutsch EW (2023) The ProteomeXchange consortium at 10 years: 2023 update. in Nucleic acids research

Jones AR (2020) Proteome Bioinformatics Methods for Studying Histidine Phosphorylation. in Methods in molecular biology (Clifton, N.J.)

Key Findings
Impact Summary
Software and Technical Products


Description	We have developed statistical methods for validating the reporting of phosphorylation sites (and other types of post-translational modifications, PTMs) on proteins, with accurate control of false discovery rate. We are now applying the method and a software pipeline to analyse large data sets for key species (rice, P. falciparum, mouse and human and others, summarised here: https://www.proteomexchange.org/ptmexchange/index.html), to create public databases for researcher studying cell signalling.
Exploitation Route	The data resources will be very valuable for fundamental research, as well as for use in drug discovery. We are forming a consortium of interested groups to apply the methods and create a critical mass to give sustainability to the work.
Sectors	Agriculture Food and Drink Digital/Communication/Information Technologies (including Software) Pharmaceuticals and Medical Biotechnology
URL	https://www.proteomexchange.org/ptmexchange/index.html


Description	It is a little early to say what the wider impacts will be, but we have made available high-quality datasets "PTM builds" for multiple types of PTM and multiple species, collected at https://www.proteomexchange.org/ptmexchange/index.html. There is a lot of drug discovery work focussed on PTM sites, so we expect that the data will have impacts in the pharmaceutial industry, amongst many other areas.
First Year Of Impact	2025
Sector	Pharmaceuticals and Medical Biotechnology
Impact Types	Economic


Title	mzidFLR - Pipeline for global false localisation analysis in PTM site determination
Description	The software is described in these publications: https://pubs.acs.org/doi/full/10.1021/acs.jproteome.1c00827 https://pubs.acs.org/doi/full/10.1021/acs.jproteome.2c00823 In biological and biomedical sciences, it is common to identify sites of post-translational modification on proteins. Our new software allows for accurate estimation of global statistics for PTM identification, with potential for use in many areas of proteomics research.
Type Of Technology	Software
Year Produced	2024
Open Source License?	Yes
Impact	Impact is too early to judge, but we hope that other groups will take up the software in their own pipelines.
URL	https://github.com/PGB-LIV/mzidFLR