GRAPPA - Global compRehensive Atlas of Peptide and Protein Abundance
Lead Research Organisation:
European Bioinformatics Institute
Department Name: OMICs
Abstract
Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.
Technical Summary
The world-leading PRIDE database now contains >14,000 proteomics datasets, all of which contain raw mass spectrometry (MS) data, some contain standardised lists of protein identifications but currently none contain quantitative data expressed in a standard format. As such, there is vast untapped potential for quantitative data re-use, for the majority of research groups who do not have the capability to re-process data sets themselves.
In this project, we will develop robust open cloud-based data analysis pipelines that will be used to process 100s of publicly available datasets, using standardised data processing and normalisation protocols. All datasets will be made available within a new portal, PRIDE Quant to support computational users, and will be passed to the Expression Atlas database to provide a biologist-friendly view of the data. Data processing will largely focus on human samples for which the highest data volumes exist, including both "baseline" datasets e.g. to provide cell line or tissue/organ-level estimates of protein abundance, and "differential" expression datasets for various diseases including cancer, dementia, diabetes and major infectious diseases.
We will develop several exemplar applications of the data, including displays showing correlations between gene and protein expression for matched samples, generation of co-expression networks from proteomics data, and generating vast maps of peptide-level abundance to support new research in proteome bioinformatics.
In this project, we will develop robust open cloud-based data analysis pipelines that will be used to process 100s of publicly available datasets, using standardised data processing and normalisation protocols. All datasets will be made available within a new portal, PRIDE Quant to support computational users, and will be passed to the Expression Atlas database to provide a biologist-friendly view of the data. Data processing will largely focus on human samples for which the highest data volumes exist, including both "baseline" datasets e.g. to provide cell line or tissue/organ-level estimates of protein abundance, and "differential" expression datasets for various diseases including cancer, dementia, diabetes and major infectious diseases.
We will develop several exemplar applications of the data, including displays showing correlations between gene and protein expression for matched samples, generation of co-expression networks from proteomics data, and generating vast maps of peptide-level abundance to support new research in proteome bioinformatics.
Planned Impact
Human proteomics data have considerable potential to support biomarker discovery efforts by pharmaceutical companies, or for example to test the distribution of particular proteins over various tissues or cell types, more broadly to support pharmaceutical industry development pipelines. Many pharmaceutical companies do not have in-house proteomics analysis capabilities, and will be able to mine any datasets they wish straightforwardly, without requiring local/specialist bioinformatics support.
Research councils and charities funding research will benefit through the potential for increased impact of the mass spectrometry (MS)-based proteomics projects they fund, thanks to the re-analysis of public proteomics datasets and the integration of quantitative proteomics data in Expression Atlas.
More broadly, as proteomics is a key technology in the Life Sciences, there is the potential for considerable indirect benefits across a wide range of areas in basic biology, biomedical and clinical science, as more value will be derived from datasets.
Life scientists worldwide will be able to benefit from the training activities planned (both face-to-face and via on-line resources).
Staff employed will benefit:
- Receiving further training in a key enabling technology for the BBSRC (proteomics) and exposure to a multi-disciplinary team, and to conferences, workshops and new national and international collaborations (for example through the Proteomics Standards Initiative).
- Acquiring skills needed to work with bioinformatics software in a cloud environment, something that is getting increasingly important with the growing size of datasets and the need of suitable IT infrastructure.
Research councils and charities funding research will benefit through the potential for increased impact of the mass spectrometry (MS)-based proteomics projects they fund, thanks to the re-analysis of public proteomics datasets and the integration of quantitative proteomics data in Expression Atlas.
More broadly, as proteomics is a key technology in the Life Sciences, there is the potential for considerable indirect benefits across a wide range of areas in basic biology, biomedical and clinical science, as more value will be derived from datasets.
Life scientists worldwide will be able to benefit from the training activities planned (both face-to-face and via on-line resources).
Staff employed will benefit:
- Receiving further training in a key enabling technology for the BBSRC (proteomics) and exposure to a multi-disciplinary team, and to conferences, workshops and new national and international collaborations (for example through the Proteomics Standards Initiative).
- Acquiring skills needed to work with bioinformatics software in a cloud environment, something that is getting increasingly important with the growing size of datasets and the need of suitable IT infrastructure.
Publications
Bouyssié D
(2024)
WOMBAT-P: Benchmarking Label-Free Proteomics Data Analysis Workflows.
in Journal of proteome research
Camacho OJM
(2024)
Phosphorylation in the Plasmodium falciparum Proteome: A Meta-Analysis of Publicly Available Data Sets.
in Journal of proteome research
Claeys T
(2023)
lesSDRF is more: maximizing the value of proteomics data through streamlined metadata annotation.
in Nature communications
Deutsch EW
(2023)
Proteomics Standards Initiative at Twenty Years: Current Activities and Future Work.
in Journal of proteome research
Deutsch EW
(2023)
The ProteomeXchange consortium at 10 years: 2023 update.
in Nucleic acids research
George N
(2024)
Expression Atlas update: insights from sequencing data at both bulk and single cell level.
in Nucleic acids research
Moreno P
(2022)
Expression Atlas update: gene and protein expression in multiple species.
in Nucleic acids research
Perez-Riverol Y
(2022)
The PRIDE database resources in 2022: a hub for mass spectrometry-based proteomics evidences.
in Nucleic acids research
| Description | We have reanalysed a number of proteomics datasets coming from baseline tissue (human, mouse, rat, pig), colorectal cancer datasets and other tumour samples, and also from some cell lines. The protein expression profiles are integrated in the resource Expression Atlas. All the corresponding publications are indicated in those applied to this award. Researchers can now access gene and protein expression information in the same interface. |
| Exploitation Route | This information can be used for different purposes, including studies related to drug target safety and efficacy (for mouse, rat and pig). Additionally protein expression values can be used to predict protein complexes, for instance. It is important to highlight that protein expression provides data closer to the phenotype than gene expression. Correlation between gene and protein expression varies a lot depending on the concrete genes/proteins and the biological conditions. |
| Sectors | Digital/Communication/Information Technologies (including Software) Healthcare Pharmaceuticals and Medical Biotechnology |
| Description | BBSRC-NSF/BIO. Globally harmonized re-analysis of Data Independent Acquisition (DIA) proteomics datasets enables the creation of new resources |
| Amount | £493,010 (GBP) |
| Funding ID | BB/X001911/1 |
| Organisation | Biotechnology and Biological Sciences Research Council (BBSRC) |
| Sector | Public |
| Country | United Kingdom |
| Start | 03/2023 |
| End | 04/2026 |
| Description | The Open Data Exchange Ecosystem in Proteomics: Evolving its Utility |
| Amount | £131,897 (GBP) |
| Funding ID | EP/Y035984/1 |
| Organisation | Engineering and Physical Sciences Research Council (EPSRC) |
| Sector | Public |
| Country | United Kingdom |
| Start | 03/2024 |
| End | 02/2026 |
| Title | Availability of DDA (Data Dependent Acquisition) proteomics datasets in Expression Atlas |
| Description | We have reanalysed a number of proteomics datasets coming from baseline tissue (human, mouse, rat, pig), colorectal cancer datasets and other tumour samples, and also from some cell lines. The results are integrated in the resource Expression Atlas. All the corresponding publications are indicated in those applied to this award. |
| Type Of Material | Database/Collection of data |
| Year Produced | 2022 |
| Provided To Others? | Yes |
| Impact | Researchers have now access to protein and gene expression data in Expression Atlas, in the same interface. |
| URL | https://www.ebi.ac.uk/gxa/home |
| Title | PRIDE database |
| Description | The PRIDE database is the world leading data repository for mass spectrometry proteomics data (https://www.ebi.ac.uk/pride/). Created originally in 2004, a lot of functionality/capabilities have been and continue to be added to PRIDE as a result of different BBSRC grants. PRIDE has become the world leading resource for mass spectrometry (MS) proteomics dataset and commands a huge International impact. PRIDE is also leading the activities of the International ProteomeXchange Consortium. Additionally, public proteomics data included in PRIDE is increasingly being reused and integrated in added-value bioinformatics resources: Expression Atlas (quantitative proteomics datasets), Ensembl (proteogenomics information) and UniProt (for post-translational modification data). |
| Type Of Material | Database/Collection of data |
| Provided To Others? | Yes |
| Impact | PRIDE has become the world leading proteomics data repository, and as such, PRIDE has an enormous International impact. It enables data reproducibility and data re-use by third parties. |
| URL | https://www.ebi.ac.uk/pride/ |
| Description | Delicious DNA |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | Regional |
| Primary Audience | Schools |
| Results and Impact | Workshop with Yr3 students discussing DNA and science. |
| Year(s) Of Engagement Activity | 2021 |
| Description | EuBIC-MS Winter School 2024 |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Professional Practitioners |
| Results and Impact | This winter school provided workshops and training for rersearchers in computational Mass Spectrometry tools and workflows, it also provides lecturers and practical workshops covering the identification, quantificatio, result interpretation and integration of MS data. It aims to provide researchers with the tools they require to increase their usage of proteomics data. |
| Year(s) Of Engagement Activity | 2024 |
| URL | https://eubic-ms.org/events/2024-winter-school/ |
| Description | Open data Practises in Proteomics |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Professional Practitioners |
| Results and Impact | Part of the Human Proteome Organisation webinar series, this webinar explores the benefits of making data available in the public domain and how this can be achieved. It enables researchers to discover how these practices can unlock new opportunities for research and innovation in the field of proteomics. |
| Year(s) Of Engagement Activity | 2023 |
| URL | https://www.youtube.com/watch?v=-XeuJ4MlqK0 |
| Description | Proteomics Bioinformatics |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Professional Practitioners |
| Results and Impact | Provision of hands-on training in the basics of mass spectrometry (MS) and proteomics bioinformatics. Training provided on how to use search engines and post-processing software, quantitative approaches, MS data repositories, the use of public databases for protein analysis, annotation of subsequent protein lists, and incorporation of information from molecular interaction and pathway databases. The course is aimed at research scientists with a minimum of a degree in a scientific discipline, including industrial, laboratory and clinical staff, as well as specialists in related fields. It looks to provide researchers with the knowledge and tools for them to be able to utilize proteomics and proeomics bioinformatics more effectively in their own research. |
| Year(s) Of Engagement Activity | 2023 |
| URL | https://www.ebi.ac.uk/training/events/proteomics-bioinformatics-0/ |
