Unravelling SWI/SNF ARID1A/B paralogs function at sequence resolution
Lead Research Organisation:
Institute of Cancer Research
Department Name: Division of Cancer Biology
Abstract
Paralog proteins emerged through gene duplication events, they are very similar in sequence and structure and have related functions, but small and subtle differences in their sequences can lead to specific modulation of their roles. More than 60% of human proteins have paralogs and they are prevalent in chromatin protein complexes. Systematic approaches for sequence to function mapping are required to disentangle the specific biological roles and behaviours of the paralog pairs, and lack of scalable methods for this has limited our understanding of the molecular basis of diverging paralog functions. The overarching aim of this project is to understand differential sequence-function relationships of paralog protein pair ARID1A/1B. These proteins are subunits of DNA-binding multiprotein assemblies called SWI/SNF complexes that are important for many essential functions, including cell proliferation, cell cycle control, response to DNA damage and organism development. Most proteins work by associating with other proteins, so characterising how proteins interact is important for fully understanding how they perform their roles. In this project we will identify the proteins that ARID1A and 1B interact with, mapping the domains or motifs that mediate the interactions, investigating the effect of mutations in their sequence on cell growth and finally, integrating all the data to construct models that can be helpful for understanding the mechanisms of ARID1A and 1B function.
To identify ARID1A and 1B associated proteins we will purify ARID1A/B in conditions that maintain the native interactions with antibodies that specifically recognise them and use a technique called mass spectrometry to identify proteins that co-purify with them. We will also use mass spectrometry to identify post-translational modifications, small chemical "flags" that can regulated different aspects of protein function, like protein activity, interactions or locatlisation amongst others. To identify binding domains we will use peptides, or short protein fragments, covering the entire length of ARID1A and 1B arrayed on a paper membrane. A cell extract is added, and proteins that can interact with the peptides remain bound to the membrane. Each peptide spot will then be analysed by mass spectrometry to identify the bound proteins. This strategy is optimal to detect binding dependent on short motifs. To map binding domains that depend on the 3D structure of ARID1A/B we will fix the interactions inside the cell using a "molecular glue" that binds proteins that are very close to each other. We will use mass spectrometry to identify the regions of the proteins that were linked together. To identify aminoacids in ARID1A/B that are important for cell growth when the paralog is absent, we will mutate each aa sequentially to alanine and monitor cell proliferation. Finally, we will consolidate all the data to produce a graph that represents an integrated view of the knowledge acquired. This will be useful to generate hypothesis on how ARID1A/B differentially perform their specific roles.
To identify ARID1A and 1B associated proteins we will purify ARID1A/B in conditions that maintain the native interactions with antibodies that specifically recognise them and use a technique called mass spectrometry to identify proteins that co-purify with them. We will also use mass spectrometry to identify post-translational modifications, small chemical "flags" that can regulated different aspects of protein function, like protein activity, interactions or locatlisation amongst others. To identify binding domains we will use peptides, or short protein fragments, covering the entire length of ARID1A and 1B arrayed on a paper membrane. A cell extract is added, and proteins that can interact with the peptides remain bound to the membrane. Each peptide spot will then be analysed by mass spectrometry to identify the bound proteins. This strategy is optimal to detect binding dependent on short motifs. To map binding domains that depend on the 3D structure of ARID1A/B we will fix the interactions inside the cell using a "molecular glue" that binds proteins that are very close to each other. We will use mass spectrometry to identify the regions of the proteins that were linked together. To identify aminoacids in ARID1A/B that are important for cell growth when the paralog is absent, we will mutate each aa sequentially to alanine and monitor cell proliferation. Finally, we will consolidate all the data to produce a graph that represents an integrated view of the knowledge acquired. This will be useful to generate hypothesis on how ARID1A/B differentially perform their specific roles.
Technical Summary
Paralog proteins emerge from gene duplications events. They exhibit very high sequence and structure similarity, as well as having related functions that are reflected by their synthetic lethal genetic associations. However, small subtle sequence determinants result in distinct regulatory and functional attributes. Over 60% of human proteins have paralogs and are prevalent in chromatin complexes, but despite the significant implications of this redundancy, the molecular basis of diverging paralog function is underexplored limiting our mechanistic understanding of these gene families. ARID1A and 1B are defining paralog subunits of the SWI/SNF chromatin remodelling complex important for essential biological processes including cell proliferation, cell cycle control, DNA damage response and organism development, and knowledge of how sequence diversity impacts their function is limited. The aim of this project is to understand differential sequence determinants of ARID1 function. To do this, we will use affinity purification coupled to mass spectrometry to identify ARID1A/B interacting proteins and post-translational modifications in the presence / absence of the other paralog. We will then map residues and motifs that mediate interactions at high sequence resolution using two complementary approaches, PRISMA (Protein Interaction Screen on peptide Matrices) and crosslinking-mass spectrometry. These studies will be complemented with alanine mutational scanning of ARID1A/1B to identify residues that are critical when cells are devoid of the alternative paralog. Finally, we will construct a protein knowledge graph that represents a model for understanding the mechanisms of divergent ARID1A and 1B function. Our work will yield a high-molecular-resolution functional footprint of ARID1A/B interactions with contact site information, provide a basis to explore gene regulatory and PTM-modulated ARID1A/B functions, and serve as a paradigm for the study of paralog proteins.
Title | Multipep SPOT membrane synthesis |
Description | Automated synthesis of peptides on cellulose membranes |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2024 |
Provided To Others? | Yes |
Impact | Ability to synthesise peptide arrays for PRISMA studies in-house |
Title | PhoXplex |
Description | A method that combines phospho-enrichable cross-linking with isobaric labeling for quantitative proteome-wide mapping of protein interfaces |
Type Of Material | Technology assay or reagent |
Year Produced | 2024 |
Provided To Others? | Yes |
Impact | NA |
URL | https://pubs.acs.org/doi/10.1021/acs.jproteome.4c00567 |
Description | PRISMA with BB/SG |
Organisation | Institute of Cancer Research UK |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | We contributed expertise on the PRISMA method and helped with the data analysis |
Collaborator Contribution | NA |
Impact | PhD thesis, defended December 2024 |
Start Year | 2023 |
Description | ICR CPD Brainstorming event Jan 2025 |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | Regional |
Primary Audience | Other audiences |
Results and Impact | The event included several talks on the ongoing work of the CPD. We presented our PRISMA research and discussed how it could be useful to the CPD. This sparked questions and discussion and led to further collaborations. |
Year(s) Of Engagement Activity | 2025 |
Description | ICR Cancer Biology Division Open Day |
Form Of Engagement Activity | Participation in an open day or visit at my research institution |
Part Of Official Scheme? | No |
Geographic Reach | Local |
Primary Audience | Postgraduate students |
Results and Impact | The aim of the Open Day was to disseminate the research activities of the Cancer Biology Division. The technology described in this grant and some preliminary results were presented as an oral talk and a poster, which sparked questions and discussion, and initiated collaborations with other research groups. |
Year(s) Of Engagement Activity | 2024 |
Description | ICR-MRC Doctoral Training Program Proteomics Module May 2024 |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | Local |
Primary Audience | Postgraduate students |
Results and Impact | This was a teaching module for ICR PhD students on Protein Interaction Proteomics, which consisted of interactive teaching on experimental proteomics methods for investigating protein interactions and hands-on learning of different bioinformatics tools for data analysis. |
Year(s) Of Engagement Activity | 2024 |