Regulation of transcript stability by splicing in non-coding gene regions

Lead Research Organisation: University of Sheffield
Department Name: School of Biosciences

Abstract

We believe we have discovered a novel mechanism by which gene expression can be regulated. Regulation of gene expression makes one cell different from another and controls many of the ways in which cells respond to their environment. Differential regulation of gene expression, rather than differences in gene function, are also thought to underlie many of the differences between individuals.

In order for a cell to express a protein from a gene it must be transcribed into RNA, which is then translated into a protein. Not all the sequence within a gene makes it into the final RNA: the "protein coding-regions" are interspersed with regions known as "introns" that must be removed by a process called splicing to reconstitute the sequence to make the protein.

Regulation of gene expression is usually studied by looking at the rate of transcription, but the stability of the transcribed RNA is also important. Sequences at the end of the transcript that are not translated are called 3' Untranslated regions (3' UTRs). These UTRs contain binding sites for a range of regulatory factors (e.g. the small regulatory RNAs known as microRNAs, or RNA binding proteins such CELF1) that can alter transcript stability. Another important mechanism controlling RNA stability is called Nonsense Mediated Decay (NMD). This ensures that transcripts containing a gene-disabling mutation are destroyed. NMD also results in the destruction of transcripts that are incorrectly spliced. Splicing in the 3' UTR is thus thought to trigger NMD. Until now, RNA sequences that would have resulted from splicing in the 3' UTR have been regarded as artefacts and largely ignored. However, we have found substantial expression of thousands of such written-off RNAs, and in many cases the removed sequence contains predicted binding sites for microRNAs. This suggests two hypotheses:
1) This splicing does trigger NMD, causing a regulated destabilisation of the transcript.
2) This splicing does not trigger NMD and instead removal of UTR sequence regulates the sensitivity of RNA regulator factors such as miRNAs.
We have preliminary evidence for examples of both these hypotheses. If we are correct then this will represent a novel, but fundamental mechanism of gene regulation.

Here we propose to reanalyse tens of thousands of existing samples from different tissues, individuals and lab cell lines to examine the prevalence of this mechanism and how it relates to different gene expression levels between individuals and cell types. We will measure the difference in transcript stability between RNAs with splicing in the 3' UTR and those without, both in selected candidates and genome-wide. We will do this in cells with a normal NMD pathway and ones in which we have disabled the NMD pathway to determine if it is responsible for any differences. We will also ascertain whether this splicing regulates sensitivity to RNA binding factors by manipulation of the sequences of UTRs of interest and levels of the RNA binding factors. Finally we will examine the role of this mechanism in active regulation in response to pro-proliferative stimuli, such as the hormone estradiol, by measuring differences in splicing on this stimulus.

Technical Summary

We believe that we have discovered a new mechanism for gene regulation: splicing in the 3' UTR. In current models of Nonsense Mediated Decay (NMD), premature stop codons are detected by the presence of 3' exon junction complexes. However, we have detected the expression of thousands of transcripts containing exon junctions within their 3' UTRs, some highly expressed and/or making up a large fraction of the expression from a locus. Many overlap predicted microRNA binding sites. Either these transcripts are escaping NMD or are expressed at such a high level that they can be detected anyway. It should be noted that the leading model of NMD is not universally accepted. We find preliminary evidence that while the majority of these transcripts are less stable in the cytoplasm than the nucleus, several prominent examples, which overlap MREs, are more stable in the cytoplasm. We suggest that in both these cases splicing in the 3' UTR might act as a regulatory mechanism either by destabilising it via NMD, or stabilising it through removing binding sites for miRNAs or other RNA binding factors. If we are right this represents a novel, but fundamental mechanism for the regulation of gene expression that has been hiding in plain sight, contained in data we have been ignoring.

We propose to study this phenomenon by assembling RNAseq data from two large repositories of primary human data: GTEx and TCGA, and a repository of over 1000 cell lines (CCLE) and using this data to assess the effect on gene regulation between cell types and individuals, and to characterise overlaps with regulatory elements. We will use BrU labelling to study the effects on transcript stability and dependence on NMD, and reporter constructs/knock-downs to study the effects of regulatory elements. We will also use these reporters to study how this regulation responds to stimuli and how it differs between cell types.

Planned Impact

Our work will explore aspects of the basic biology of gene regulation through the use of large, human data resources. Mis-regulation of such fundamental processes are at the heart of many diseases particularly cancer; other UTR altering mechanisms have been shown to be involved in cancer. In the future clinicians, patients and pharmaceutical companies will benefit from this research by guiding the creation of better diagnostic tests, and better targeted treatments. In addition, this project will benefit data analysis community and the UK economy.

Health benefit.
Patients and clinicians: This work will develop a better understanding of gene regulation which is of particular importance or cancer and may identify events that can act as novel biomarkers likely to help guiding treatment choices. Specifically, responses to various treatments for patients with cancer depend upon the integrity numerous molecular pathways. This work will allow a better understanding of this, and may identify specific molecules that are indicative of disease that are likely to be refractory to radiotherapy, hormonal or general chemotherapeutic treatments, and so would be better served by alternate strategies or specific targeted therapies. This work will also identify novel expressed transcripts, for which urinary, blood and tissue based assays could be developed as cancer biomarkers.
Societal Impacts
Policy makers: The use of large amounts of anonymized human genomics data is currently subject to much discussion in terms of privacy and security. Policy makers will be able to use this work as evidence of what can be achieved and how the benefits of collecting large amounts of genomic data can be balanced with the risks.
Public: Big data is currently a topic of interest to the public. In particular the use of genomic data could become an area of concern for family members and the public. This project is a concrete example of the use of large amounts of genomic data and engaging the public will allow the research to consider public concerns and raise understanding around the benefits of big data research.

Economic Benefits.
Pharmaceutical Industry: This research may uncover novel genetic constructs that present attractive biomarkers or drug targets for the pharmaceutical industry, commercialization of which would present an economic benefit.
Data Analysis Industry/Community: There is currently a trend for the better use of data in many aspects of life and this sector is vital to the economic future of the UK. However, these skills are in short supply. The community will benefit from the presence ofadditional individuals with these skills, both as potential employees, but also from the synergistic effects of an increase in the community of individuals following similar approaches to different problems sharing experience and skills.
 
Description In a cell, the genes, written in the DNA, contain instructions to make proteins, but before those instructions can be used, they must be copied to a different, more mobile molecule, called RNA. The RNA is exported from the nucleus, where the DNA is, to the rest of the cell (the cytoplasm) where proteins are made, where it is "translated" into protein. However, there are two complications with this - one is that the message for the RNA is not written in one continuous block on the DNA, and the different parts of it must be "spliced" together to make the finished message. The second is that the RNA molecule contains regions beside those that contain instructions for the protein which contain informaiton on how the RNA should be regulated. It has long been thought that if there is "splicing" in this non-translated region that the cell would regard the RNA as defective and degrade it. However, we have found thousands of RNAs where it seems this splicing is common, particularly in both Stem cells and cancer cells. Indeed, for many genes, the majority of RNAs have this splicing. The sequences around the splicing seem to be conserved between different species, suggesting that they are important. We have shown that far from marking these RNAs for degradation, in some cases, this splicing protects the RNA from degredation by this defective RNA clean-up process.
Exploitation Route It appears that splicing in 3' UTRs are important in Stem cell regulation, and thus might present important considerations for the differentiation of embryonic and induced pluripotent stem cells for commercial activities. We are currently following up on this lead. In addition, the increase in biologic pharmaceuticals, particularly RNA based therapeutics, requires carefully designed expression vectors, including 3' UTRs. Considerations of the splicing landscape in 3' UTRs, particularly if this leads to stabilisation of the mRNA, will be important, and we are taking this forward in our synthetic UTR design projects which we are undertaking in collaboration with AstraZeneca.
Sectors Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology

 
Description Pop-up university lecture 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact A twenty minute talk to members of the general public, with extended opportunities for questions and answers, given by Talk title "It's all about us: accessing our
genetic information!", given by Dr Cristina Alexandru-Crivac. The talk explored processes leading to genetic variation between people, cell types and disease, introduced RNA to the general public, and talked about the uses to which peoples DNA and RNA sequence can be put to.
Year(s) Of Engagement Activity 2019
URL https://www.sheffield.ac.uk/news/nr/pop-up-sheffield-university-festival-1.864991