In silico mass spectrometry for biologists: Tools and resources for next-generation proteomics
Lead Research Organisation:
University of Manchester
Department Name: School of Medical Sciences
Abstract
Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.
Technical Summary
To date, mass spectrometry (MS)-based proteomics has been largely driven by Data-Dependent Acquisition (DDA) approaches, where complex mixtures of peptide analytes are separated via liquid chromatography and elute into the instrument. This approach is limited by instrument throughput and the stochastic sampling of the analyte, leading to under-sampling and poor detection of low abundance proteins. To address such limitations, Data Independent Acquisition (DIA) approaches are gaining popularity, led by SWATH-MS and MSe/HD-MSe. These methods sample the analyte more uniformly and capture richer, deeper data, but generate more challenging data sets to interrogate which require sophisticated software solutions. Indeed, the lack of standard tools and the extra expertise required is preventing the further popularity and adoption of DIA proteomics approaches. Here, we will develop open analysis pipelines for different DIA techniques using non-commercial software (e.g. OpenSWATH, DIA Umpire, Skyline, etc), and deploy them in the EBI "Embassy Cloud" infrastructure. We will enable easy access to robust, and portable pipelines that can also be deployed in other cloud environments, for wider community benefit. In addition we will extend the functionality of the world-leading proteomics resource (PRIDE Archive at EMBL-EBI) and related tooling, extend the data standard mzTab to create a common output format of the analysis. A further compelling aspect is the link to PRIDE Archive that will support construction of robust spectral libraries (from different instruments and species), that can be used by us and our users to conduct novel DIA analyses. This will make good use of the growing DDA and DIA public datasets in PRIDE Archive to extract new knowledge. Novel results will be communicated to the original submitter and the rest of PRIDE Archive users, as well as into three EMBL-EBI resources: Ensembl, UniProt and the Expression Atlas.
Planned Impact
There is the potential for the following impacts:
- Mass spectrometry vendors (at least SCIEX and Waters) will benefit through the free availability of robust, reliable, reproducible and improved pipelines for the analysis of DIA proteomics datasets. When these pipelines are robust, there will not be the urgency to keep developing their own commercial software solutions, with gains in resources that could be focused in other efforts.
- Software vendors or pharmaceutical research and development teams, since we envisage they may wish to take up our software for local pipelines (e.g. deployed in their own cloud environments). It is important to highlight that all the software developed during the proposal will be open source or at least free-to-use (if the original software use to build the analysis pipelines is not open source). Commercial software will not be part of the developed pipelines.
- Research councils and charities funding research will benefit through the potential for increased impact of the mass spectrometry (MS)-based proteomics projects they fund, thanks to the re-analysis of public DIA proteomics datasets and the integration of novel proteomics data in Ensembl, UniProt and the Expression Atlas.
- Leveraging research partnerships and funding with industry via knowledge exchange and innovation funding has been successfully demonstrable at UoM. We have been fruitful with MRC CiC, P2D, Wellcome Trust ISSF, HEIF, and EPSRC IAA funding streams, which are all aimed at promoting and driving impact. Manchester projects with an MS foundation have always been successful in the life and biomedical sciences, in themselves generating high impact papers and multiple millions of GBP in industry and key stakeholder support.
- There is potential for our infrastructure to assist in clinical biomarker discovery, since DIA based methods (such as SWATH-MS and MSe/HD-MSe) are hugely growing in use in this space, as exemplified by the Stoller Biomarker Discovery Centre Manchester (where some of the applicants are involved).
- More broadly, as proteomics is a key technology in the Life Sciences, there is the potential for considerable indirect benefits on a wide range of areas in basic biology, biomedical or clinical science, as more value will be derived from datasets, including post-translational modifications (PTMs) - key regulators of cell signalling, and thus often studied in the clinical context.
Staff employed will benefit:
- Further training in one key enabling technology for the BBSRC (proteomics) and exposure to conferences, workshops and new national and International collaborations.
- Acquire skills needed to work with bioinformatics software in a cloud environment, something that is getting increasingly important with the growing size of datasets and the need of suitable IT infrastructure.
- Mass spectrometry vendors (at least SCIEX and Waters) will benefit through the free availability of robust, reliable, reproducible and improved pipelines for the analysis of DIA proteomics datasets. When these pipelines are robust, there will not be the urgency to keep developing their own commercial software solutions, with gains in resources that could be focused in other efforts.
- Software vendors or pharmaceutical research and development teams, since we envisage they may wish to take up our software for local pipelines (e.g. deployed in their own cloud environments). It is important to highlight that all the software developed during the proposal will be open source or at least free-to-use (if the original software use to build the analysis pipelines is not open source). Commercial software will not be part of the developed pipelines.
- Research councils and charities funding research will benefit through the potential for increased impact of the mass spectrometry (MS)-based proteomics projects they fund, thanks to the re-analysis of public DIA proteomics datasets and the integration of novel proteomics data in Ensembl, UniProt and the Expression Atlas.
- Leveraging research partnerships and funding with industry via knowledge exchange and innovation funding has been successfully demonstrable at UoM. We have been fruitful with MRC CiC, P2D, Wellcome Trust ISSF, HEIF, and EPSRC IAA funding streams, which are all aimed at promoting and driving impact. Manchester projects with an MS foundation have always been successful in the life and biomedical sciences, in themselves generating high impact papers and multiple millions of GBP in industry and key stakeholder support.
- There is potential for our infrastructure to assist in clinical biomarker discovery, since DIA based methods (such as SWATH-MS and MSe/HD-MSe) are hugely growing in use in this space, as exemplified by the Stoller Biomarker Discovery Centre Manchester (where some of the applicants are involved).
- More broadly, as proteomics is a key technology in the Life Sciences, there is the potential for considerable indirect benefits on a wide range of areas in basic biology, biomedical or clinical science, as more value will be derived from datasets, including post-translational modifications (PTMs) - key regulators of cell signalling, and thus often studied in the clinical context.
Staff employed will benefit:
- Further training in one key enabling technology for the BBSRC (proteomics) and exposure to conferences, workshops and new national and International collaborations.
- Acquire skills needed to work with bioinformatics software in a cloud environment, something that is getting increasingly important with the growing size of datasets and the need of suitable IT infrastructure.
Publications
Jones RC
(2022)
Urocortin-1 Is Chondroprotective in Response to Acute Cartilage Injury via Modulation of Piezo1.
in International journal of molecular sciences
Walzer M
(2022)
Implementing the reuse of public DIA proteomics datasets: from the PRIDE database to Expression Atlas.
in Scientific data
Description | We have developed a software pipeline and integrated it with cloud-based computing at the EBI in Hinxton so that proteomics mass spectrometry data can be processed by non-specialists. This relates to a specialist mass spectrometry technique called SWATH-MS, and we hoped that it will be made easily accessible to a wider audience of bioscientists. At present it does not have a "front-end" but we are satisfied that it constitutes proof-of-principle progress. We believe however that it will be difficult to sustain as a long term resource and have switched to a focus on to quality control instead - so users can use our software to quality control their DIA data against a set of metrics and criteria. This will be invaluable to clinical labs to QC their data and decide when a give data set is worth considering further or is not fit for purpose. Three manuscripts associated with QC software are under preparation, and software has been released via Github. https://github.com/PaulBrack/Yamato. In addition, we have developed an associated database of SWATH-MS data that in turn has led to a new tool we are writing up that can predict the best peptides to use for quantiative approaches such as this. |
Exploitation Route | they will facilitate the interogation of proteomics data sets stored in the PRIDE repository at EBI by other groups than the original depositors |
Sectors | Agriculture Food and Drink Healthcare Manufacturing including Industrial Biotechology Pharmaceuticals and Medical Biotechnology |
URL | https://github.com/PaulBrack/Yamato |
Title | Reanalysis of PRIDE proteomic datasets via SWATH-MS pipeline |
Description | We reanalysed publicly available proteomic datasets via our robust DIA-SWATH-MS pipeline described in the paper, with all software made available via GitHub https://github.com/PRIDE-reanalysis/DIA-reanalysis and reanalysed results were reported in the EBI expression atlas database e.g. https://www.ebi.ac.uk/gxa/home |
Type Of Material | Database/Collection of data |
Year Produced | 2022 |
Provided To Others? | Yes |
Impact | all reanalyses made publicly available for other groups to compare to |
URL | https://www.ebi.ac.uk/gxa/home |
Description | David Tabb collab in SA |
Organisation | University of Stellenbosch |
Department | Faculty of Medicine and Health Sciences |
Country | South Africa |
Sector | Academic/University |
PI Contribution | Links established between PhD student in SA, postdoc on BBSRC grant, who have both contributed to the development of QC software |
Collaborator Contribution | Our PDRA met David Tabb and members of his group at HUPO PSI meetings, and formed a collaboration of mutual interest to develop quality control software for the type of mass spec data we are using on this project. This has resulted in an improved tool, which is faster and more rich |
Impact | papers are still being drafted |
Start Year | 2019 |
Title | Mass spectrometry software pipeline |
Description | The prototype pipeline is now publicly available. This was a key aim of the project, to move it from Manchester to EBI, where is will be made publicly available and where the necessary compute support is deployed. We have conducted testing already and it replicates results generated on the version in Manchester, and we deployed and evaluate |
Type Of Technology | Software |
Year Produced | 2022 |
Impact | one publication, and the pipeline made available via GitHub |
URL | https://github.com/PRIDE-reanalysis/DIA-reanalysis |
Description | Proteomics training workshop at EBI |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Postgraduate students |
Results and Impact | We engaged in a Proteomics training workshop at the EBI, training participants in the use of SWATH mass spectrometry informatics to query a SWATH-MS dataset |
Year(s) Of Engagement Activity | 2019 |
URL | https://www.ebi.ac.uk/training/events/2019/proteomics-bioinformatics-3 |