Elucidating mechansims and roles of alternative polyadenylation
Lead Research Organisation:
University of Dundee
Department Name: College of Life Sciences
Abstract
Our genes are made of DNA, but when they are switched on, copies are made in a related molecule called RNA and this RNA goes on to code for the protein products of our genes. As the gene is copied into RNA, the RNA is cut and a string of Adenine molecules (A for short) are added at the end. This so-called 'poly A tail' functions to protect the RNA from being degraded, and helps to transport the RNA around the cell and stimulates the formation of protein from the RNA. The site at which the poly A tail is added is not always the same, even for the same gene. For example, half of all human genes have RNAs with more than one site for adding a poly A tail. Controlling the site at which the poly A tail is added is very important because it ultimately affects how genes function. However, this is a process we know surprisingly little about. It's not just human RNAs that have different poly A tails, other animals and plants do too. We have been studying how plants control the time at which they flower, a process where genes are very precisely controlled. In the course of this work, we have discovered that three factors called FCA, FY and, most recently, FPA, function to control poly A site selection of some RNAs. Such basic aspects of gene expression are very similar in plants and animals and it turns out that there are human proteins highly related to FY and FPA. It is possible therefore, that these proteins control poly A site selection in humans too, but very little is known about them. As we have found that FCA and FPA don't need each other to control poly A site choice, we think they must be doing this in different ways. This gives us a chance to understand how poly A site choice can be controlled. In this proposal we plan to build on what we know about FCA and FPA in plants, but this knowledge should be of much more general interest. We want to know two things: (1) How do FCA and FPA control the site at which a poly A tail is added (2) What genes do FCA and FPA regulate by controlling alternative poly A site choice? We will work out how FCA and FPA control poly A sites by identifying the features of the RNA required. This should be quite straightforward. We will make test genes containing different parts of the target gene and see how they affect poly A site selection when placed back in plants. In order to find the other genes whose normal poly A tail depends on FCA and FPA, we will look at where RNAs are polyadenylated in normal plants and in mutant plants that lack FCA or FPA. It is now possible for us to look at nearly all the RNAs in a cell thanks to Next Generation Sequencing, a technology that is revolutionizing modern biology by giving us huge amounts of sequence data, very quickly and at a fraction of the cost to before. This technology has been developed to look at RNA by sequencing a short part of every RNA, sufficient to identify it, called a 'tag'. To find the tag, scientists use the poly A tail and sequence what is next to it. This is a happy coincidence for us, because it means that in addition to tagging a particular RNA, this method also tells us where a poly A tail has been added to RNA. To analyse the large amounts of data and make comparisons, we will need to develop specialized computational tools. Because we already know genes where FCA and FPA control poly A site selection, we should be able to find changes in these 'tags' if our tools are working well. Once we are sure they are, we can look for other shifts in 'tags' to identify other genes controlled by FCA and FPA. As lots of other scientists are also using this sequencing technology, but for completely different reasons, we can use our analysis tools to look at changes in polyadenylation in their data too. In this way we will be able to identify cell-types and situations where alternative polyadenylation is an important part of gene regulation.
Technical Summary
Alternative polyadenylation (pA) is a commonplace, but surprisingly poorly characterised aspect of eukaryotic gene regulation. We have discovered that the Arabidopsis RNA binding proteins, FCA and FPA (regulators of flowering and RNA silencing) function genetically independently to control the site of RNA 3' end formation. Our work is unique because no other trans-acting regulators of RNA 3' end formation, that are not components of the splicing or polyadenylation machinery, have been identified. Our discovery therefore provides a genetically tractable system to dissect alternative pA. We will first define the level at which FCA and FPA control pA site selection through cis-element analysis, which we will couple with PolII ChIP and transcription run-on assays. A combination of RNA immunoprecipitation (which we have successfully developed) and RNA specificity-swap assays will be used to determine how directly these proteins regulate 3' end formation. Next, a genome-wide identification of pA sites regulated by FCA and FPA will be made by utilising recent developments in next generation sequencing: Fortunately for us, Digital Transcriptomics (DT) involves sequencing short 'tags' of RNA adjacent to pA tails, thereby providing positional information on pA sites. We will develop a bioinformatics pipeline to analyse DT data to quantify changes in pA site selection, working first with different FCA and FPA genetic backgrounds that we know have contrasting patterns of pA. With these tools in place, we will be able analyse other DT data releases and associate alternative pA with diverse backgrounds, cell types or treatments. Our work will have widespread impacts in understanding gene regulation because it will define mechanisms by which alternative pA can be controlled, clarify the connections between 3' end formation and RNA silencing and establish generic bioinformatics tools to identify alternative pA in next generation sequencing data.
Organisations
Publications
Hornyik C
(2010)
Alternative polyadenylation of antisense RNAs and flowering time control.
in Biochemical Society transactions
Sherstnev A
(2012)
Direct sequencing of Arabidopsis thaliana RNA reveals patterns of cleavage and polyadenylation
in Nature Structural & Molecular Biology
Lyons R
(2013)
The RNA-binding protein FPA regulates flg22-triggered defense responses and transcription factor activity by alternative polyadenylation.
in Scientific reports
Duc C
(2013)
Transcription termination and chimeric RNA formation controlled by Arabidopsis thaliana FPA.
in PLoS genetics
Rataj K
(2014)
Message ends: RNA 3' processing and flowering time control.
in Journal of experimental botany
Gierlinski M
(2015)
Statistical models for RNA-seq data derived from a two-condition 48-replicate experiment.
in Bioinformatics (Oxford, England)
Zhang Y
(2016)
Crystal Structure of the SPOC Domain of the Arabidopsis Flowering Regulator FPA.
in PloS one
Mourão K
(2018)
Detection and Mitigation of Spurious Antisense Reads with RoSA
Froussios K
(2019)
How well do RNA-Seq differential gene expression tools perform in a complex eukaryote? A case study in Arabidopsis thaliana.
in Bioinformatics (Oxford, England)
Froussios K
(2019)
Relative Abundance of Transcripts ( RATs): Identifying differential isoform abundance from RNA-seq.
in F1000Research
Mourão K
(2019)
Detection and mitigation of spurious antisense expression with RoSA
in F1000Research
Parker MT
(2020)
Nanopore direct RNA sequencing maps the complexity of Arabidopsis mRNA processing and m6A modification.
in eLife
Description | 1. We used 3rd generation direct RNA sequencing to map the position of mRNA cleavage and polyadenylation in the model plant Arabidopsis thaliana and in a number of mutant backgrounds defective in flower development. 2. We improved the understanding of RNA 3' end formation in plants and how the genome was organised. 3. We made a publicly accessible database of these data called polyAdb |
Exploitation Route | Our data is crucial to defining where genes end and how the genome of the model plant Arabidopsis thaliana should be annotated. Since Arabidopsis is a pathfinder model this is important for the proper annotation of crop plant genomes too. |
Sectors | Agriculture, Food and Drink,Healthcare,Pharmaceuticals and Medical Biotechnology |
URL | https://www.compbio.dundee.ac.uk/polyADB/ |
Description | PhD Studentship |
Amount | £90,000 (GBP) |
Organisation | University of Dundee |
Sector | Academic/University |
Country | United Kingdom |
Start | 09/2010 |
End | 03/2014 |
Description | The Non-coding Arabidopsis Genome |
Amount | £792,345 (GBP) |
Funding ID | BB/J00247X/1 |
Organisation | Biotechnology and Biological Sciences Research Council (BBSRC) |
Sector | Public |
Country | United Kingdom |
Start | 07/2013 |
End | 08/2016 |
Title | Our mapped 3' ends added to Araport |
Description | We have previously mapped the 3' ends of Arabidopsis RNAs and we have made our own database to share these data. However these data also have great value in determining how the genome should be annotated. We collaborated with the Araport team to share our data. Our work contributed to the re-annotation of the Arabidopsis genome known as Araport 11 released at the end of 2015. The new Arabidopsis genome browser uses our data as a selectable track, so the plant science community can visualise the 3' ends of genes that they are interested in. |
Type Of Material | Database/Collection of data |
Year Produced | 2015 |
Provided To Others? | Yes |
Impact | Its notable that these data contributed to the reannotation of the Arabidopsis genome: Araport 11 |
URL | https://apps.araport.org/jbrowse/?data=arabidopsis&loc=Chr5%3A3165471..3184470&tracks=TAIR10_genome%... |
Title | polyAdb |
Description | A database of single molecule direct RNA sequencing data that maps the site of polyadenylation on RNA. |
Type Of Material | Database/Collection of data |
Year Produced | 2013 |
Provided To Others? | Yes |
Impact | We commissioned an animation to explain the database, revealing the many ways we have tried to make the data accessible to different communities. The Sequencing company SeqLL provide a link to the database from their commercial website. |
URL | https://www.compbio.dundee.ac.uk/polyADB/ |
Title | Relative Abundance of Transcripts (rats) |
Description | Description Who it is for Anyone working in transcriptomics, analysing gene expression and transcript abundances. What it does It provides a method to detect changes in the abundance ratios of transcript isoforms of a gene. This is called Differential Transcript Usage (DTU). RATs is workflow-agnostic. Quantification quality details are left to the quantification tools; RATs uses only the transcript abundances, which you can obtain using any tool you like. This makes it suitable for use with alignment-free quantification tools like Kallisto or Salmon. RATs is able to take advantage of the bootstrapped quantifications provided by the alignment-free tools. These bootstrapped data are used by RATs to assess how much the technical variability of the heuristic quantifications affects differential transcript usage and thus provide a measure of confidence in the DTU calls. What it needs This is an R source package, and will run on any platform with a reasonably up-to-date R environment. A few third-party R packages are also required (see below). As input, RATs requires transcript abundance estimates with or without bootstrapping. The format either way is tables with the samples as columns and the transcripts as rows. An extra column holds the transcript IDs. Some functionality to create these from Salmon or Kallisto quantification files is provided by RATs. RATs also requires a look-up table matching the transcript identifiers to the respective gene identifiers. This can be obtained through various means, one of them being extracting this info from a GTF file using functionality provided by RATs. |
Type Of Technology | Software |
Year Produced | 2018 |
Open Source License? | Yes |
Impact | This allows the analysis of alternative transcripts from RNA-seq data and has been used by a number of groups in their research. |
URL | https://github.com/bartongroup/RATS |
Title | RoSA: a tool for the Removal of Spurious Antisense |
Description | In stranded RNA-Seq experiments we have the opportunity to detect and measure antisense transcription, important since antisense transcripts impact gene transcription in several different ways. Stranded RNA-Seq determines the strand from which an RNA fragment originates, and so can be used to identify where antisense transcription may be implicated in gene regulation. However, spurious antisense reads are often present in experiments, and can manifest at levels greater than 1% of sense transcript levels. This is enough to disrupt analyses by causing false antisense counts to dominate the set of genes with high antisense transcription levels. The RoSA (Removal of Spurious Antisense) tool detects the presence of high levels of spurious antisense transcripts, by: analysing ERCC spike-in data to find the ratio of antisense:sense transcripts in the spike-ins; or using antisense and sense counts around splice sites to provide a set of gene-specific estimates; or both. Once RoSA has an estimate of the spurious antisense, expressed as a ratio of antisense:sense counts, RoSA will calculate a correction to the antisense counts based on the ratio. Where a gene-specific estimate is available for a gene, it will be used in preference to the global estimate obtained from either spike-ins or spliced reads. |
Type Of Technology | Software |
Year Produced | 2017 |
Open Source License? | Yes |
Impact | It has enabled a number of groups to identify spurious antisesnse in their RNA-seq data. |
URL | https://doi.org/10.5281/zenodo.2661378 |
Description | Dec 2016 Oxford: Seminar at WTCHG/SGC: Identification of novel functional sites in protein domains from the analysis of human variation |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Regional |
Primary Audience | Professional Practitioners |
Results and Impact | This was an invited seminar at the Oxford SGC and WTCHG institutes in the Department of Medicine. The seminar described work that covered most of our funded research activities. |
Year(s) Of Engagement Activity | 2016 |
URL | https://talks.ox.ac.uk/talks/id/7b03765b-6d8a-45c0-bbb1-e570a70377ff/ |
Description | Feb 2017: Seattle: What can human variation tell us about protein structure? |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | This was an invited talk at the genevariation3d workshop at the Institute of Systems Biology which brought together scientists from the genomics/personalised medicine field and the field of protein structure analysis. I presented on our work at this interface that is built on Jalview and the Dundee Resource and inspired by our research in plant biology. |
Year(s) Of Engagement Activity | 2017 |
URL | http://genevariation3d.org/ |
Description | Invited Research Seminar Bielefeld University, Germany |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | international |
Primary Audience | Participants in your research or patient groups |
Results and Impact | As a result of our work on this grant I was invited to speak at Bielefeld University, Germany no actual impacts realised to date |
Year(s) Of Engagement Activity | 2011 |
Description | Invited Research Seminar at International Plant RNA Workshop, Riken, Yokohama, Japan |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Participants in your research or patient groups |
Results and Impact | Based on our work on this grant, I was invited to speak at the International Plant RNA Meeting I was an invited speak and an invited Chair of a Session at this International Meeting no actual impacts realised to date |
Year(s) Of Engagement Activity | 2011 |
Description | Invited Research Seminar, Edinburgh University |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Participants in your research or patient groups |
Results and Impact | As a result of our work on this grant I was invited to give a talk to the Biology Department at Edinburgh University no actual impacts realised to date |
Year(s) Of Engagement Activity | 2012 |
Description | Invited Research Seminar, Miami University, USA |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Participants in your research or patient groups |
Results and Impact | As a result of our work in this area, I was invited to speak at Miami University, USA no actual impacts realised to date |
Year(s) Of Engagement Activity | 2011 |
Description | Invited Research Seminar, Temasek Life Sciences, Singapore |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | international |
Primary Audience | Participants in your research or patient groups |
Results and Impact | As a result of our work on this grant I was invited to give a research talk in Singapore no actual impacts realised to date |
Year(s) Of Engagement Activity | 2011 |
Description | Mar 2017: Seminar at Newcastle University: Identification of novel functional sites in protein domains from the analysis of human variation |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Regional |
Primary Audience | Professional Practitioners |
Results and Impact | This was an invited seminar to the Centre for Health and Bioinformatics at Newcastle University. I presented work at the interface between genomics/transcriptomics and protein structure which relied heavily on the software tools we develop and other resources. |
Year(s) Of Engagement Activity | 2017 |
URL | http://www.ncl.ac.uk/chabi/events/pastevents/item/eventgeoffbarton.html |
Description | Sept 2015: Invited Seminar at TGAC, Norwich |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Regional |
Primary Audience | Professional Practitioners |
Results and Impact | I presented a broad range of our research at an invited seminar at The Genome Analysis Centre (TGAC) now, the Earlham Institute. |
Year(s) Of Engagement Activity | 2015 |