Understanding epigenetic mechanisms in tissue-specific gene expression

Lead Research Organisation: King's College London
Department Name: Immunology Infection and Inflam Diseases

Abstract

Between 20-25,000 protein coding genes have been identified in the human and mouse genomes that give rise to ~200,000 transcripts in tissue- and developmental stage-specific combinations. These transcripts can be generated via co-transcriptional pre-mRNA processing mechanisms that include alternative splicing (AS) and alternative polyadenylation (APA). Estimates based on transcriptome analyses reveal that ~90% of human transcripts undergo AS and that APA occurs in at least 70% of mammalian pre-mRNAs. AS involves the differential inclusion of exons and sometimes introns to form the mature mRNA. APA refers to the polyadenylation of transcripts originating from the same gene but that differ in their 3' end. Both AS and APA are dependent on specific sequences recognised by the cellular machinery. APA events can occur either at 3' untranslated regions (UTRs) or intragenic locations, here we consider intragenic APA. The incidence of intragenic polyadenylation (IPA) varies across tissues and cell types providing a way to diversify both the transcriptome and the proteome. Genome-wide, we have identified over 4000 host genes that harbour an (intragenic CpG island) iCGI in the mammalian genome, including novel iCGI/host gene pairs. The transcriptional activity of these iCGIs is tissue- and developmental stage-specific and, for the first time, we demonstrated that the premature termination of host gene transcripts upstream of iCGIs is closely correlated with the level of iCGI transcription in a DNA-methylation independent manner. These studies suggest that iCGI transcription, rather than histone modification (eg H3K36me3) or DNA methylation, interfere with host gene transcription and pre-mRNA processing genome-wide and contribute to the spatiotemporal diversification and regulation of the transcriptome, impacting proteome. There is evidence in the literature for the failure of correct AS and APA to result in pathological conditions and cancer.
Imprinted genes are particularly useful models for the dissection of epigenetic gene expression regulation. There are ~130 genes in mouse and human that are subject to genomic imprinting. Monoallelic expression of these genes is coordinated by allele-specific DNA methylation of imprinting control regions. The expression of imprinted genes is determined by the inherited allele. The active and silent alleles of imprinted genes share the same DNA sequence and are present within the same cellular environment, so that allelic differences in gene expression are the consequence of epigenetic differences between the alleles. The imprinted Mcts2/H13 locus is an ideal model in which to study iCGI activity and chromatin context on active and silent alleles and this study will dissect the mechanism of iCGI dependent intronic APA choice. Here, the Mcts2/H13 locus will be tagged in neural stem cells with fluorescent markers to generate a reporter system. Once this system has been characterised, it will be exposed to a CRISPR KO screen to identify regulators of iCGI dependent intronic APA. Potential candidates will be validated and knocked out in an in vitro neurogenesis model to assess their essentiality during differentiation.
Alongside this experimental approach, a computational strand of the project will be performed to take advantage of extensive single cell RNA-seq data from specific brain regions to leverage a well-studied neurogenesis model to specifically focus on transcript levels and more importantly, provide isoform-specific resolution to further understand AS and APA in the brain in terms of transcript diversity potential and specific locus identification.

Publications

10 25 50