In silico mass spectrometry for biologists: Tools and resources for next-generation proteomics

Lead Research Organisation: University of Manchester
Department Name: School of Medical Sciences

Abstract

Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.

Technical Summary

To date, mass spectrometry (MS)-based proteomics has been largely driven by Data-Dependent Acquisition (DDA) approaches, where complex mixtures of peptide analytes are separated via liquid chromatography and elute into the instrument. This approach is limited by instrument throughput and the stochastic sampling of the analyte, leading to under-sampling and poor detection of low abundance proteins. To address such limitations, Data Independent Acquisition (DIA) approaches are gaining popularity, led by SWATH-MS and MSe/HD-MSe. These methods sample the analyte more uniformly and capture richer, deeper data, but generate more challenging data sets to interrogate which require sophisticated software solutions. Indeed, the lack of standard tools and the extra expertise required is preventing the further popularity and adoption of DIA proteomics approaches. Here, we will develop open analysis pipelines for different DIA techniques using non-commercial software (e.g. OpenSWATH, DIA Umpire, Skyline, etc), and deploy them in the EBI "Embassy Cloud" infrastructure. We will enable easy access to robust, and portable pipelines that can also be deployed in other cloud environments, for wider community benefit. In addition we will extend the functionality of the world-leading proteomics resource (PRIDE Archive at EMBL-EBI) and related tooling, extend the data standard mzTab to create a common output format of the analysis. A further compelling aspect is the link to PRIDE Archive that will support construction of robust spectral libraries (from different instruments and species), that can be used by us and our users to conduct novel DIA analyses. This will make good use of the growing DDA and DIA public datasets in PRIDE Archive to extract new knowledge. Novel results will be communicated to the original submitter and the rest of PRIDE Archive users, as well as into three EMBL-EBI resources: Ensembl, UniProt and the Expression Atlas.

Planned Impact

There is the potential for the following impacts:

- Mass spectrometry vendors (at least SCIEX and Waters) will benefit through the free availability of robust, reliable, reproducible and improved pipelines for the analysis of DIA proteomics datasets. When these pipelines are robust, there will not be the urgency to keep developing their own commercial software solutions, with gains in resources that could be focused in other efforts.

- Software vendors or pharmaceutical research and development teams, since we envisage they may wish to take up our software for local pipelines (e.g. deployed in their own cloud environments). It is important to highlight that all the software developed during the proposal will be open source or at least free-to-use (if the original software use to build the analysis pipelines is not open source). Commercial software will not be part of the developed pipelines.

- Research councils and charities funding research will benefit through the potential for increased impact of the mass spectrometry (MS)-based proteomics projects they fund, thanks to the re-analysis of public DIA proteomics datasets and the integration of novel proteomics data in Ensembl, UniProt and the Expression Atlas.

- Leveraging research partnerships and funding with industry via knowledge exchange and innovation funding has been successfully demonstrable at UoM. We have been fruitful with MRC CiC, P2D, Wellcome Trust ISSF, HEIF, and EPSRC IAA funding streams, which are all aimed at promoting and driving impact. Manchester projects with an MS foundation have always been successful in the life and biomedical sciences, in themselves generating high impact papers and multiple millions of GBP in industry and key stakeholder support.

- There is potential for our infrastructure to assist in clinical biomarker discovery, since DIA based methods (such as SWATH-MS and MSe/HD-MSe) are hugely growing in use in this space, as exemplified by the Stoller Biomarker Discovery Centre Manchester (where some of the applicants are involved).

- More broadly, as proteomics is a key technology in the Life Sciences, there is the potential for considerable indirect benefits on a wide range of areas in basic biology, biomedical or clinical science, as more value will be derived from datasets, including post-translational modifications (PTMs) - key regulators of cell signalling, and thus often studied in the clinical context.

Staff employed will benefit:

- Further training in one key enabling technology for the BBSRC (proteomics) and exposure to conferences, workshops and new national and International collaborations.

- Acquire skills needed to work with bioinformatics software in a cloud environment, something that is getting increasingly important with the growing size of datasets and the need of suitable IT infrastructure.
 
Description We have developed a software pipeline and integrated it with cloud-based computing at the EBI in Hinxton so that proteomics mass spectrometry data can be processed by non-specialists. This relates to a specialist mass spectrometry technique called SWATH-MS, and we hoped that it will be made easily accessible to a wider audience of bioscientists. At present it does not have a "front-end" but we are satisfied that it constitutes proof-of-principle progress. We believe however that it will be difficult to sustain as a long term resource and have switched to a focus on to quality control instead - so users can use our software to quality control their DIA data against a set of metrics and criteria. This will be invaluable to clinical labs to QC their data and decide when a give data set is worth considering further or is not fit for purpose. Three manuscripts associated with QC software are under preparation, and software has been released via Github. https://github.com/PaulBrack/Yamato. In addition, we have developed an associated database of SWATH-MS data that in turn has led to a new tool we are writing up that can predict the best peptides to use for quantiative approaches such as this.
Exploitation Route they will facilitate the interogation of proteomics data sets stored in the PRIDE repository at EBI by other groups than the original depositors
Sectors Agriculture, Food and Drink,Healthcare,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology

URL https://github.com/PaulBrack/Yamato
 
Description David Tabb collab in SA 
Organisation University of Stellenbosch
Department Faculty of Medicine and Health Sciences
Country South Africa 
Sector Academic/University 
PI Contribution Links established between PhD student in SA, postdoc on BBSRC grant, who have both contributed to the development of QC software
Collaborator Contribution Our PDRA met David Tabb and members of his group at HUPO PSI meetings, and formed a collaboration of mutual interest to develop quality control software for the type of mass spec data we are using on this project. This has resulted in an improved tool, which is faster and more rich
Impact papers are still being drafted
Start Year 2019
 
Title Mass spectrometry software pipeline 
Description The prototype pipeline is now publicly available. This was a key aim of the project, to move it from Manchester to EBI, where is will be made publicly available and where the necessary compute support is deployed. We have conducted testing already and it replicates results generated on the version in Manchester, and we deployed and evaluate 
Type Of Technology Software 
Year Produced 2022 
Impact one publication, and the pipeline made available via GitHub 
URL https://github.com/PRIDE-reanalysis/DIA-reanalysis
 
Description Proteomics training workshop at EBI 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact We engaged in a Proteomics training workshop at the EBI, training participants in the use of SWATH mass spectrometry informatics to query a SWATH-MS dataset
Year(s) Of Engagement Activity 2019
URL https://www.ebi.ac.uk/training/events/2019/proteomics-bioinformatics-3