Understanding protein multi- and trans-localisation at the full proteome level

Lead Research Organisation: University of Cambridge

Department Name: Biochemistry

Abstract

In biology, localisation is function. Cells display a complex sub-cellular structure, where each of these niches are characterised by specific biochemical conditions and fulfil dedicated functions. A protein must be localised to its intended sub-cellular niche to meet its interaction partners and be functionally active. Hence, being able to systematically measure the locations of proteins, and in particular the full proteome, a field coined spatial proteomics, is of major interest in cell biology.

To further depict an accurate view of the spatial sub-cellular landscape of proteins, they are known to display more than one sub-cellular location, and to traffic between different such niches. The former phenomenon is termed multi-localisation and the second one, whether initiated by normal biological triggers, pathological cellular states, or external stimuli such as changes in the cell nutrients or effect of a drug, is called trans-localisation. Finally, the mis-localisation of proteins have been associated with cellular dis-function and diseases such as cancer.

The most information-rich datasets for proteome-wise spatial proteomics are generated using high accuracy mass-spectrometry, a technique that allows to identify and quantify the proteome content in complex biological samples. These datasets are high quality rich sources of data that have been mined using a variety of robust supervised statistical machine learning methods which have shown to yield valuable protein-organelle predictions. In particular, the applicants recently published hyperLOPIT, a technological advance enabling to obtain exquisite spatial resolution. Using this groundbreaking technology on mouse embryonic stem cells, they identified the localisation of 7000 proteins with unprecedented spatial resolution, uncovering the organisation of organelles, sub-organellar compartments, protein complexes, functional networks, and the steady-state dynamics of proteins including unexpected sub-cellular locations.

In this proposal, we aim to complement contemporary spatial proteomics data with state-of-the-art statistical routines to reliably identify multi- and trans-localisation events at the full proteome level. These new tools, which will complement our existing open-source spatial proteomics suite of software, will enable the proteomics and cell biology community to mine spatial proteomics data to new depths, identifying subtle yet biologically important patterns such as proteins with mixed localisation and proteins that change localisations upon perturbation, in a robust and statistically sound way. We will also develop dedicated visualisation platforms to highlight the outputs of our analysis pipelines and enable interactive exploration of the multidimensional spatial data. We will apply these tools ourselves on a wide range of spatial proteomics datasets from various different biological systems of interest. To guarantee broad exposure of our work, the datasets we will analyse and the spatial patterns we will infer will further be disseminated through community databases, in particular the SpatialMap.org online resource.

Technical Summary

Localisation of proteins inside cells is of paramount importance to study their function, refine our comprehension of sub-cellular process and organisation, and understand the effect of perturbations at the sub-cellular level. Various dedicated experimental designs based on biochemical separation and quantitative mass-spectrometry have been described and refined over the years, in particular the recently published hyperLOPIT technique. Using this groundbreaking technology, we identified the localisation of over 7000 proteins with unprecedented spatial resolution, uncovering the organisation of organelles, sub-organellar compartments, protein complexes, functional networks, and the steady-state dynamics of proteins including unexpected sub-cellular locations. Using such data, we will develop the next generation of statistical learning tools for spatial proteomics to reliably identify proteins residing simultaneously in multiple sub-cellular niches (multi-localisation) and proteins changing localisation upon perturbation (trans-localisation). To do this, we will rely on mixture modelling of the quantitative protein profiles. (1) Deconvolution of these mixed profiles will enable us to reconstitute the individual localisations of the proteins and (2) comparison of (possibly mixed) profiles among multiple conditions will enable us to identify changes of localisations. We will support such analysis at the peptide level to identify multi- and trans-localisation events for isoforms. Finally, we will support our core statistical learning infrastructure with dedicated, interactive visualisation applications to enable direct and easy exploration of the complex spatial localisation patterns identified.

Planned Impact

Research Aim

The prime and novel aim of this project is to develop computational methods and software tools to reliably identify protein dynamics at the sub-cellular level. This research proposal will enable us to tackle more fine-grained, and biologically relevant spatial patterns, and in particular changes in cellular state upon perturbation of the cell.

Who will benefit from this research?

The main beneficiaries of this project is the proteomics community. There is a long over due requirement to develop appropriate methodologies to analyse sub-cellular protein dynamics, as exemplified by letters of support by not only very productive collaborators of the applicants, but also from several of the top organelle proteomics laboratories in the world. Moreover, the cell biology community, both academic and within the pharmaceutical sector will also benefit as this proposal underpins the interface of modern 'omics technologies and more classical cell biological methodologies. Computational biologists will also benefit from the freely available open-source open-development statistical analysis methods, normalisation strategies and machine learning methods that will form part of this novel pipeline. Our work is targeted to experimentalist users who will use our tools to analyse their data, as well as computational scientists who want to re-use or adapt our methods and software infrastructure to new topics.

How will they benefit from this research?

The software suite that will be a directly output from the proposal will have multiple benefits to the proteomics and cell biology communities by delivering a novel framework to give users the means to analyse changes in protein sub-cellular dynamics, bringing spatial proteomics data analysis to a whole new level. The statistical methods will be made available for the statistical programming environment R and the Bioconductor project and will interoperate with existing complementary software. Our novel methods will no doubt be applicable in other 'omics areas of research due to the inherit cross-disciplinary nature of computer science, statistics and machine learning that underpins many areas of computational biology.

The project will contribute knowledge and scientific advancement in the form of the dissemination of data and improvement of the analyses of complex multivariate data to facilitate interpretation and understanding of relevant biological processes. (i) Fully characterised organelle proteomics datasets will be deposited in publicly accessible databases, (ii) analysis methodologies will be documented and distributed with software releases to facilitate application of our methods to new datasets and use cases, (iii) data with also be made available through the existing R Bioconductor data packages, and (iv) also available through the online SpatialMap.org resource where users interactively view and share data. The research staff will benefit from the multi-disciplinary research environment and extend their national and international research network through on-going collaborations.

What will be done to ensure that they have the opportunity to benefit from this research?

The algorithms and tools developed in this proposal will be implemented in the R statistical programming environment and will be deposited to the Bioconductor suite of bioinformatics software. The algorithms will be implemented as independent modules that will be contributed to and compatible with the current pRoloc analysis framework, to form a freely available open-source toolbox for the analysis of spatial proteomics data. It is envisaged that both computational and biological outputs will be written as manuscripts and will be submitted to high impact journals with large general readership. KSL, LG, WH and OK are invited to give numerous talks at top proteomics and computational conferences world wide, thus they will endeavour to publicise the work described here at such events.

Funded Value:

£162,444

Funded Period:

Sep 16 - Mar 19

Funder:

BBSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

BB/N023129/1

Principal Investigator:

Laurent Gatto

Kathryn Lilley

Research Subject:

Omic sciences & technologies (70%)

Tools, technologies & methods (28%)

Research Topic:

Bioinformatics (28%)

Functional genomics (28%)

Proteomics (42%)

Organisations

People	ORCID iD
Laurent Gatto (Principal Investigator)
Kathryn Lilley (Principal Investigator)

Publications

Author Name

Title Publication Date Published

|< < 1 2 3 > >|

10 25 50

Baers L (2019) Proteome Mapping of a Cyanobacterium Reveals Distinct Compartment Organization and Cell-Dispersed Metabolism.

Baers LL (2019) Proteome Mapping of a Cyanobacterium Reveals Distinct Compartment Organization and Cell-Dispersed Metabolism. in Plant physiology

Barylyuk K (2020) A Comprehensive Subcellular Atlas of the Toxoplasma Proteome via hyperLOPIT Provides Spatial Context for Protein Functions. in Cell host & microbe

Braccia C (2022) CFTR Rescue by Lumacaftor (VX-809) Induces an Extensive Reorganization of Mitochondria in the Cystic Fibrosis Bronchial Epithelium. in Cells

Breckels L (2024) Advances in spatial proteomics: Mapping proteome architecture from protein complexes to subcellular localizations in Cell Chemical Biology

Breckels LM (2016) A Bioconductor workflow for processing and analysing spatial proteomics data. in F1000Research

Christopher JA (2022) Subcellular Transcriptomics and Proteomics: A Comparative Methods Review. in Molecular & cellular proteomics : MCP

Cristea IM (2019) Editorial overview: Untangling proteome organization in space and time. in Current opinion in chemical biology

Crook O (2021) Inferring differential subcellular localisation in comparative spatial proteomics using BANDLE

Crook O (2018) A Bayesian Mixture Modelling Approach For Spatial Proteomics

Key Findings
Impact Summary
Further Funding
Research Tools and Methods
Collaboration
Engagement Activities


Description	We have created a series of tools to map proteins that are multi localised in cells and also that change location upon perturbation. These tools are provided in the form of open source software packages which have been submitted to Bioconductor suite of R packages.
Exploitation Route	A suite of R packages for use by the spatial proteomics community
Sectors	Agriculture Food and Drink Manufacturing including Industrial Biotechology Pharmaceuticals and Medical Biotechnology


Description	The software developed as part of this award has been used in attracting funding from and interacting with pharmaceutical companies
First Year Of Impact	2017
Sector	Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology


Description	H2020 Infrastructure
Amount	€ 10,000,000 (EUR)
Funding ID	EPIC-XS - DLV-823839
Organisation	European Commission H2020
Sector	Public
Country	Belgium
Start	01/2019
End	12/2023


Title	BANDLE
Description	Bayesian approach to determine re-localisation events in spatial proteomics experiments - open source software
Type Of Material	Improvements to research infrastructure
Year Produced	2021
Provided To Others?	Yes
Impact	groups worldwide are using this method and are publishing data that has been created using BANDLE


Title	LOPITDC
Description	A novel method to characterise the subcellular proteome using differential centrifugation
Type Of Material	Technology assay or reagent
Year Produced	2019
Provided To Others?	Yes
Impact	Publication of manuscript in Nature Communications Adoption of method by industry


Description	Collaboration with Astra Zeneca
Organisation	AstraZeneca
Country	United Kingdom
Sector	Private
PI Contribution	Atra Zeneca are funding two PhD students in my lab to use spatial proteomics tools developed in my laboratory
Collaborator Contribution	Expertise in spatial proteomics and computational pipelines we have also developed
Impact	this project is multi disciplinary involving cell biology, protein biochemistry and statistics
Start Year	2021


Description	Collaboration with GSK
Organisation	GlaxoSmithKline (GSK)
Country	Global
Sector	Private
PI Contribution	Funded post doctoral researcher position to look at the affect of various DNA damage therapeutic agents on RNA binding capacity of proteins
Collaborator Contribution	The partners will supply therapeutic agents and expertise
Impact	the project has just started and hence there are no outputs as yet
Start Year	2020


Description	Collaboration with Pfizer
Organisation	Pfizer Inc
Country	United States
Sector	Private
PI Contribution	We have been awarded a project grant by Pfizer to make use of the OOPS (doi: 10.1038/s41587-018-0001-2) method developed as part of Transcriptin trafficking translation award. We will use these tools to determine the interaction of RNA therapeutic tools with host cell machinery. We will also make use of the LOPIT technology developed as part of several projects funded by the Wellcome Trust and BBSRC
Collaborator Contribution	Post doctoral researcher based in my laboratory is applying our expertise in subcellular proteomics and characterisation of the RNA binding proteome
Impact	The project is multi-disciplinary involving cell biology and computational biology
Start Year	2022


Description	Mapping the sub-cellular proteome
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Talk at the Leibniz Institute for ageing, in November 2017. The talk was also followed by a workshop on using the R programming language, that was used to develop the spatial proteomics software tools presented in the talk.
Year(s) Of Engagement Activity	2017
URL	http://doi.org/10.5281/zenodo.1042726


Description	Open source and open development proteomics software, EuBIC 2018
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	A presentation and participation in the EuBIC 2018 developer's meeting in Ghent, Belgium, in January 2018.
Year(s) Of Engagement Activity	2018
URL	http://bit.ly/20180109eubic


Description	Software sustainability, sharing and management workshop
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	Regional
Primary Audience	Professional Practitioners
Results and Impact	The goal of the workshops was to flesh out the current problems in software management and sharing and try to identify possible solutions. The researcher-led nature of this event provided researchers, software engineers and support staff with a great opportunity to discuss the issues around creating and maintaining software collaboratively and to exchange good practice among peers. I was invited as a group moderator to promote and favour group discussions.
Year(s) Of Engagement Activity	2016
URL	https://unlockingresearch.blog.lib.cam.ac.uk/?p=1286