Unravelling SWI/SNF ARID1A/B paralogs function at sequence resolution

Lead Research Organisation: Institute of Cancer Research
Department Name: Division of Cancer Biology

Abstract

Paralog proteins emerged through gene duplication events, they are very similar in sequence and structure and have related functions, but small and subtle differences in their sequences can lead to specific modulation of their roles. More than 60% of human proteins have paralogs and they are prevalent in chromatin protein complexes. Systematic approaches for sequence to function mapping are required to disentangle the specific biological roles and behaviours of the paralog pairs, and lack of scalable methods for this has limited our understanding of the molecular basis of diverging paralog functions. The overarching aim of this project is to understand differential sequence-function relationships of paralog protein pair ARID1A/1B. These proteins are subunits of DNA-binding multiprotein assemblies called SWI/SNF complexes that are important for many essential functions, including cell proliferation, cell cycle control, response to DNA damage and organism development. Most proteins work by associating with other proteins, so characterising how proteins interact is important for fully understanding how they perform their roles. In this project we will identify the proteins that ARID1A and 1B interact with, mapping the domains or motifs that mediate the interactions, investigating the effect of mutations in their sequence on cell growth and finally, integrating all the data to construct models that can be helpful for understanding the mechanisms of ARID1A and 1B function.
To identify ARID1A and 1B associated proteins we will purify ARID1A/B in conditions that maintain the native interactions with antibodies that specifically recognise them and use a technique called mass spectrometry to identify proteins that co-purify with them. We will also use mass spectrometry to identify post-translational modifications, small chemical "flags" that can regulated different aspects of protein function, like protein activity, interactions or locatlisation amongst others. To identify binding domains we will use peptides, or short protein fragments, covering the entire length of ARID1A and 1B arrayed on a paper membrane. A cell extract is added, and proteins that can interact with the peptides remain bound to the membrane. Each peptide spot will then be analysed by mass spectrometry to identify the bound proteins. This strategy is optimal to detect binding dependent on short motifs. To map binding domains that depend on the 3D structure of ARID1A/B we will fix the interactions inside the cell using a "molecular glue" that binds proteins that are very close to each other. We will use mass spectrometry to identify the regions of the proteins that were linked together. To identify aminoacids in ARID1A/B that are important for cell growth when the paralog is absent, we will mutate each aa sequentially to alanine and monitor cell proliferation. Finally, we will consolidate all the data to produce a graph that represents an integrated view of the knowledge acquired. This will be useful to generate hypothesis on how ARID1A/B differentially perform their specific roles.

Technical Summary

Paralog proteins emerge from gene duplications events. They exhibit very high sequence and structure similarity, as well as having related functions that are reflected by their synthetic lethal genetic associations. However, small subtle sequence determinants result in distinct regulatory and functional attributes. Over 60% of human proteins have paralogs and are prevalent in chromatin complexes, but despite the significant implications of this redundancy, the molecular basis of diverging paralog function is underexplored limiting our mechanistic understanding of these gene families. ARID1A and 1B are defining paralog subunits of the SWI/SNF chromatin remodelling complex important for essential biological processes including cell proliferation, cell cycle control, DNA damage response and organism development, and knowledge of how sequence diversity impacts their function is limited. The aim of this project is to understand differential sequence determinants of ARID1 function. To do this, we will use affinity purification coupled to mass spectrometry to identify ARID1A/B interacting proteins and post-translational modifications in the presence / absence of the other paralog. We will then map residues and motifs that mediate interactions at high sequence resolution using two complementary approaches, PRISMA (Protein Interaction Screen on peptide Matrices) and crosslinking-mass spectrometry. These studies will be complemented with alanine mutational scanning of ARID1A/1B to identify residues that are critical when cells are devoid of the alternative paralog. Finally, we will construct a protein knowledge graph that represents a model for understanding the mechanisms of divergent ARID1A and 1B function. Our work will yield a high-molecular-resolution functional footprint of ARID1A/B interactions with contact site information, provide a basis to explore gene regulatory and PTM-modulated ARID1A/B functions, and serve as a paradigm for the study of paralog proteins.

Publications

10 25 50