Understanding protein multi- and trans-localisation at the full proteome level

Lead Research Organisation: University of Cambridge
Department Name: Biochemistry

Abstract

In biology, localisation is function. Cells display a complex sub-cellular structure, where each of these niches are characterised by specific biochemical conditions and fulfil dedicated functions. A protein must be localised to its intended sub-cellular niche to meet its interaction partners and be functionally active. Hence, being able to systematically measure the locations of proteins, and in particular the full proteome, a field coined spatial proteomics, is of major interest in cell biology.

To further depict an accurate view of the spatial sub-cellular landscape of proteins, they are known to display more than one sub-cellular location, and to traffic between different such niches. The former phenomenon is termed multi-localisation and the second one, whether initiated by normal biological triggers, pathological cellular states, or external stimuli such as changes in the cell nutrients or effect of a drug, is called trans-localisation. Finally, the mis-localisation of proteins have been associated with cellular dis-function and diseases such as cancer.

The most information-rich datasets for proteome-wise spatial proteomics are generated using high accuracy mass-spectrometry, a technique that allows to identify and quantify the proteome content in complex biological samples. These datasets are high quality rich sources of data that have been mined using a variety of robust supervised statistical machine learning methods which have shown to yield valuable protein-organelle predictions. In particular, the applicants recently published hyperLOPIT, a technological advance enabling to obtain exquisite spatial resolution. Using this groundbreaking technology on mouse embryonic stem cells, they identified the localisation of 7000 proteins with unprecedented spatial resolution, uncovering the organisation of organelles, sub-organellar compartments, protein complexes, functional networks, and the steady-state dynamics of proteins including unexpected sub-cellular locations.

In this proposal, we aim to complement contemporary spatial proteomics data with state-of-the-art statistical routines to reliably identify multi- and trans-localisation events at the full proteome level. These new tools, which will complement our existing open-source spatial proteomics suite of software, will enable the proteomics and cell biology community to mine spatial proteomics data to new depths, identifying subtle yet biologically important patterns such as proteins with mixed localisation and proteins that change localisations upon perturbation, in a robust and statistically sound way. We will also develop dedicated visualisation platforms to highlight the outputs of our analysis pipelines and enable interactive exploration of the multidimensional spatial data. We will apply these tools ourselves on a wide range of spatial proteomics datasets from various different biological systems of interest. To guarantee broad exposure of our work, the datasets we will analyse and the spatial patterns we will infer will further be disseminated through community databases, in particular the SpatialMap.org online resource.

Technical Summary

Localisation of proteins inside cells is of paramount importance to study their function, refine our comprehension of sub-cellular process and organisation, and understand the effect of perturbations at the sub-cellular level. Various dedicated experimental designs based on biochemical separation and quantitative mass-spectrometry have been described and refined over the years, in particular the recently published hyperLOPIT technique. Using this groundbreaking technology, we identified the localisation of over 7000 proteins with unprecedented spatial resolution, uncovering the organisation of organelles, sub-organellar compartments, protein complexes, functional networks, and the steady-state dynamics of proteins including unexpected sub-cellular locations. Using such data, we will develop the next generation of statistical learning tools for spatial proteomics to reliably identify proteins residing simultaneously in multiple sub-cellular niches (multi-localisation) and proteins changing localisation upon perturbation (trans-localisation). To do this, we will rely on mixture modelling of the quantitative protein profiles. (1) Deconvolution of these mixed profiles will enable us to reconstitute the individual localisations of the proteins and (2) comparison of (possibly mixed) profiles among multiple conditions will enable us to identify changes of localisations. We will support such analysis at the peptide level to identify multi- and trans-localisation events for isoforms. Finally, we will support our core statistical learning infrastructure with dedicated, interactive visualisation applications to enable direct and easy exploration of the complex spatial localisation patterns identified.

Planned Impact

Research Aim

The prime and novel aim of this project is to develop computational methods and software tools to reliably identify protein dynamics at the sub-cellular level. This research proposal will enable us to tackle more fine-grained, and biologically relevant spatial patterns, and in particular changes in cellular state upon perturbation of the cell.

Who will benefit from this research?

The main beneficiaries of this project is the proteomics community. There is a long over due requirement to develop appropriate methodologies to analyse sub-cellular protein dynamics, as exemplified by letters of support by not only very productive collaborators of the applicants, but also from several of the top organelle proteomics laboratories in the world. Moreover, the cell biology community, both academic and within the pharmaceutical sector will also benefit as this proposal underpins the interface of modern 'omics technologies and more classical cell biological methodologies. Computational biologists will also benefit from the freely available open-source open-development statistical analysis methods, normalisation strategies and machine learning methods that will form part of this novel pipeline. Our work is targeted to experimentalist users who will use our tools to analyse their data, as well as computational scientists who want to re-use or adapt our methods and software infrastructure to new topics.

How will they benefit from this research?

The software suite that will be a directly output from the proposal will have multiple benefits to the proteomics and cell biology communities by delivering a novel framework to give users the means to analyse changes in protein sub-cellular dynamics, bringing spatial proteomics data analysis to a whole new level. The statistical methods will be made available for the statistical programming environment R and the Bioconductor project and will interoperate with existing complementary software. Our novel methods will no doubt be applicable in other 'omics areas of research due to the inherit cross-disciplinary nature of computer science, statistics and machine learning that underpins many areas of computational biology.

The project will contribute knowledge and scientific advancement in the form of the dissemination of data and improvement of the analyses of complex multivariate data to facilitate interpretation and understanding of relevant biological processes. (i) Fully characterised organelle proteomics datasets will be deposited in publicly accessible databases, (ii) analysis methodologies will be documented and distributed with software releases to facilitate application of our methods to new datasets and use cases, (iii) data with also be made available through the existing R Bioconductor data packages, and (iv) also available through the online SpatialMap.org resource where users interactively view and share data. The research staff will benefit from the multi-disciplinary research environment and extend their national and international research network through on-going collaborations.

What will be done to ensure that they have the opportunity to benefit from this research?

The algorithms and tools developed in this proposal will be implemented in the R statistical programming environment and will be deposited to the Bioconductor suite of bioinformatics software. The algorithms will be implemented as independent modules that will be contributed to and compatible with the current pRoloc analysis framework, to form a freely available open-source toolbox for the analysis of spatial proteomics data. It is envisaged that both computational and biological outputs will be written as manuscripts and will be submitted to high impact journals with large general readership. KSL, LG, WH and OK are invited to give numerous talks at top proteomics and computational conferences world wide, thus they will endeavour to publicise the work described here at such events.
 
Description We have created a series of tools to map proteins that are multi localised in cells and also that change location upon perturbation.

These tools are provided in the form of open source software packages which have been submitted to Bioconductor suite of R packages.
Exploitation Route A suite of R packages for use by the spatial proteomics community
Sectors Agriculture, Food and Drink,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology

 
Description The software developed as part of this award has been used in attracting funding from and interacting with pharmaceutical companies
First Year Of Impact 2017
Sector Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology
 
Description H2020 Infrastructure
Amount € 10,000,000 (EUR)
Funding ID EPIC-XS - DLV-823839 
Organisation European Commission H2020 
Sector Public
Country Belgium
Start 01/2019 
End 12/2023
 
Title BANDLE 
Description Bayesian approach to determine re-localisation events in spatial proteomics experiments - open source software 
Type Of Material Improvements to research infrastructure 
Year Produced 2021 
Provided To Others? Yes  
Impact groups worldwide are using this method and are publishing data that has been created using BANDLE 
 
Title LOPITDC 
Description A novel method to characterise the subcellular proteome using differential centrifugation 
Type Of Material Technology assay or reagent 
Year Produced 2019 
Provided To Others? Yes  
Impact Publication of manuscript in Nature Communications Adoption of method by industry 
 
Description Collaboration with Astra Zeneca 
Organisation AstraZeneca
Country United Kingdom 
Sector Private 
PI Contribution Atra Zeneca are funding two PhD students in my lab to use spatial proteomics tools developed in my laboratory
Collaborator Contribution Expertise in spatial proteomics and computational pipelines we have also developed
Impact this project is multi disciplinary involving cell biology, protein biochemistry and statistics
Start Year 2021
 
Description Collaboration with GSK 
Organisation GlaxoSmithKline (GSK)
Country Global 
Sector Private 
PI Contribution Funded post doctoral researcher position to look at the affect of various DNA damage therapeutic agents on RNA binding capacity of proteins
Collaborator Contribution The partners will supply therapeutic agents and expertise
Impact the project has just started and hence there are no outputs as yet
Start Year 2020
 
Description Collaboration with Pfizer 
Organisation Pfizer Inc
Country United States 
Sector Private 
PI Contribution We have been awarded a project grant by Pfizer to make use of the OOPS (doi: 10.1038/s41587-018-0001-2) method developed as part of Transcriptin trafficking translation award. We will use these tools to determine the interaction of RNA therapeutic tools with host cell machinery. We will also make use of the LOPIT technology developed as part of several projects funded by the Wellcome Trust and BBSRC
Collaborator Contribution Post doctoral researcher based in my laboratory is applying our expertise in subcellular proteomics and characterisation of the RNA binding proteome
Impact The project is multi-disciplinary involving cell biology and computational biology
Start Year 2022
 
Description Mapping the sub-cellular proteome 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Talk at the Leibniz Institute for ageing, in November 2017. The talk was also followed by a workshop on using the R programming language, that was used to develop the spatial proteomics software tools presented in the talk.
Year(s) Of Engagement Activity 2017
URL http://doi.org/10.5281/zenodo.1042726
 
Description Open source and open development proteomics software, EuBIC 2018 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact A presentation and participation in the EuBIC 2018 developer's meeting in Ghent, Belgium, in January 2018.
Year(s) Of Engagement Activity 2018
URL http://bit.ly/20180109eubic
 
Description Software sustainability, sharing and management workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Professional Practitioners
Results and Impact The goal of the workshops was to flesh out the current problems in software management and sharing and try to identify possible solutions. The researcher-led nature of this event provided researchers, software engineers and support staff with a great opportunity to discuss the issues around creating and maintaining software collaboratively and to exchange good practice among peers. I was invited as a group moderator to promote and favour group discussions.
Year(s) Of Engagement Activity 2016
URL https://unlockingresearch.blog.lib.cam.ac.uk/?p=1286