Unravelling SWI/SNF ARID1A/B paralogs function at sequence resolution

Lead Research Organisation: Institute of Cancer Research
Department Name: Division of Cancer Biology

Abstract

Paralog proteins emerged through gene duplication events, they are very similar in sequence and structure and have related functions, but small and subtle differences in their sequences can lead to specific modulation of their roles. More than 60% of human proteins have paralogs and they are prevalent in chromatin protein complexes. Systematic approaches for sequence to function mapping are required to disentangle the specific biological roles and behaviours of the paralog pairs, and lack of scalable methods for this has limited our understanding of the molecular basis of diverging paralog functions. The overarching aim of this project is to understand differential sequence-function relationships of paralog protein pair ARID1A/1B. These proteins are subunits of DNA-binding multiprotein assemblies called SWI/SNF complexes that are important for many essential functions, including cell proliferation, cell cycle control, response to DNA damage and organism development. Most proteins work by associating with other proteins, so characterising how proteins interact is important for fully understanding how they perform their roles. In this project we will identify the proteins that ARID1A and 1B interact with, mapping the domains or motifs that mediate the interactions, investigating the effect of mutations in their sequence on cell growth and finally, integrating all the data to construct models that can be helpful for understanding the mechanisms of ARID1A and 1B function.
To identify ARID1A and 1B associated proteins we will purify ARID1A/B in conditions that maintain the native interactions with antibodies that specifically recognise them and use a technique called mass spectrometry to identify proteins that co-purify with them. We will also use mass spectrometry to identify post-translational modifications, small chemical "flags" that can regulated different aspects of protein function, like protein activity, interactions or locatlisation amongst others. To identify binding domains we will use peptides, or short protein fragments, covering the entire length of ARID1A and 1B arrayed on a paper membrane. A cell extract is added, and proteins that can interact with the peptides remain bound to the membrane. Each peptide spot will then be analysed by mass spectrometry to identify the bound proteins. This strategy is optimal to detect binding dependent on short motifs. To map binding domains that depend on the 3D structure of ARID1A/B we will fix the interactions inside the cell using a "molecular glue" that binds proteins that are very close to each other. We will use mass spectrometry to identify the regions of the proteins that were linked together. To identify aminoacids in ARID1A/B that are important for cell growth when the paralog is absent, we will mutate each aa sequentially to alanine and monitor cell proliferation. Finally, we will consolidate all the data to produce a graph that represents an integrated view of the knowledge acquired. This will be useful to generate hypothesis on how ARID1A/B differentially perform their specific roles.

Technical Summary

Paralog proteins emerge from gene duplications events. They exhibit very high sequence and structure similarity, as well as having related functions that are reflected by their synthetic lethal genetic associations. However, small subtle sequence determinants result in distinct regulatory and functional attributes. Over 60% of human proteins have paralogs and are prevalent in chromatin complexes, but despite the significant implications of this redundancy, the molecular basis of diverging paralog function is underexplored limiting our mechanistic understanding of these gene families. ARID1A and 1B are defining paralog subunits of the SWI/SNF chromatin remodelling complex important for essential biological processes including cell proliferation, cell cycle control, DNA damage response and organism development, and knowledge of how sequence diversity impacts their function is limited. The aim of this project is to understand differential sequence determinants of ARID1 function. To do this, we will use affinity purification coupled to mass spectrometry to identify ARID1A/B interacting proteins and post-translational modifications in the presence / absence of the other paralog. We will then map residues and motifs that mediate interactions at high sequence resolution using two complementary approaches, PRISMA (Protein Interaction Screen on peptide Matrices) and crosslinking-mass spectrometry. These studies will be complemented with alanine mutational scanning of ARID1A/1B to identify residues that are critical when cells are devoid of the alternative paralog. Finally, we will construct a protein knowledge graph that represents a model for understanding the mechanisms of divergent ARID1A and 1B function. Our work will yield a high-molecular-resolution functional footprint of ARID1A/B interactions with contact site information, provide a basis to explore gene regulatory and PTM-modulated ARID1A/B functions, and serve as a paradigm for the study of paralog proteins.

Publications

10 25 50
 
Title Multipep SPOT membrane synthesis 
Description Automated synthesis of peptides on cellulose membranes 
Type Of Material Improvements to research infrastructure 
Year Produced 2024 
Provided To Others? Yes  
Impact Ability to synthesise peptide arrays for PRISMA studies in-house 
 
Title PhoXplex 
Description A method that combines phospho-enrichable cross-linking with isobaric labeling for quantitative proteome-wide mapping of protein interfaces 
Type Of Material Technology assay or reagent 
Year Produced 2024 
Provided To Others? Yes  
Impact NA 
URL https://pubs.acs.org/doi/10.1021/acs.jproteome.4c00567
 
Description PRISMA with BB/SG 
Organisation Institute of Cancer Research UK
Country United Kingdom 
Sector Academic/University 
PI Contribution We contributed expertise on the PRISMA method and helped with the data analysis
Collaborator Contribution NA
Impact PhD thesis, defended December 2024
Start Year 2023
 
Description ICR CPD Brainstorming event Jan 2025 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Other audiences
Results and Impact The event included several talks on the ongoing work of the CPD. We presented our PRISMA research and discussed how it could be useful to the CPD. This sparked questions and discussion and led to further collaborations.
Year(s) Of Engagement Activity 2025
 
Description ICR Cancer Biology Division Open Day 
Form Of Engagement Activity Participation in an open day or visit at my research institution
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Postgraduate students
Results and Impact The aim of the Open Day was to disseminate the research activities of the Cancer Biology Division. The technology described in this grant and some preliminary results were presented as an oral talk and a poster, which sparked questions and discussion, and initiated collaborations with other research groups.
Year(s) Of Engagement Activity 2024
 
Description ICR-MRC Doctoral Training Program Proteomics Module May 2024 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Postgraduate students
Results and Impact This was a teaching module for ICR PhD students on Protein Interaction Proteomics, which consisted of interactive teaching on experimental proteomics methods for investigating protein interactions and hands-on learning of different bioinformatics tools for data analysis.
Year(s) Of Engagement Activity 2024