Elucidating mechansims and roles of alternative polyadenylation

Lead Research Organisation: University of Dundee
Department Name: College of Life Sciences

Abstract

Our genes are made of DNA, but when they are switched on, copies are made in a related molecule called RNA and this RNA goes on to code for the protein products of our genes. As the gene is copied into RNA, the RNA is cut and a string of Adenine molecules (A for short) are added at the end. This so-called 'poly A tail' functions to protect the RNA from being degraded, and helps to transport the RNA around the cell and stimulates the formation of protein from the RNA. The site at which the poly A tail is added is not always the same, even for the same gene. For example, half of all human genes have RNAs with more than one site for adding a poly A tail. Controlling the site at which the poly A tail is added is very important because it ultimately affects how genes function. However, this is a process we know surprisingly little about. It's not just human RNAs that have different poly A tails, other animals and plants do too. We have been studying how plants control the time at which they flower, a process where genes are very precisely controlled. In the course of this work, we have discovered that three factors called FCA, FY and, most recently, FPA, function to control poly A site selection of some RNAs. Such basic aspects of gene expression are very similar in plants and animals and it turns out that there are human proteins highly related to FY and FPA. It is possible therefore, that these proteins control poly A site selection in humans too, but very little is known about them. As we have found that FCA and FPA don't need each other to control poly A site choice, we think they must be doing this in different ways. This gives us a chance to understand how poly A site choice can be controlled. In this proposal we plan to build on what we know about FCA and FPA in plants, but this knowledge should be of much more general interest. We want to know two things: (1) How do FCA and FPA control the site at which a poly A tail is added (2) What genes do FCA and FPA regulate by controlling alternative poly A site choice? We will work out how FCA and FPA control poly A sites by identifying the features of the RNA required. This should be quite straightforward. We will make test genes containing different parts of the target gene and see how they affect poly A site selection when placed back in plants. In order to find the other genes whose normal poly A tail depends on FCA and FPA, we will look at where RNAs are polyadenylated in normal plants and in mutant plants that lack FCA or FPA. It is now possible for us to look at nearly all the RNAs in a cell thanks to Next Generation Sequencing, a technology that is revolutionizing modern biology by giving us huge amounts of sequence data, very quickly and at a fraction of the cost to before. This technology has been developed to look at RNA by sequencing a short part of every RNA, sufficient to identify it, called a 'tag'. To find the tag, scientists use the poly A tail and sequence what is next to it. This is a happy coincidence for us, because it means that in addition to tagging a particular RNA, this method also tells us where a poly A tail has been added to RNA. To analyse the large amounts of data and make comparisons, we will need to develop specialized computational tools. Because we already know genes where FCA and FPA control poly A site selection, we should be able to find changes in these 'tags' if our tools are working well. Once we are sure they are, we can look for other shifts in 'tags' to identify other genes controlled by FCA and FPA. As lots of other scientists are also using this sequencing technology, but for completely different reasons, we can use our analysis tools to look at changes in polyadenylation in their data too. In this way we will be able to identify cell-types and situations where alternative polyadenylation is an important part of gene regulation.

Technical Summary

Alternative polyadenylation (pA) is a commonplace, but surprisingly poorly characterised aspect of eukaryotic gene regulation. We have discovered that the Arabidopsis RNA binding proteins, FCA and FPA (regulators of flowering and RNA silencing) function genetically independently to control the site of RNA 3' end formation. Our work is unique because no other trans-acting regulators of RNA 3' end formation, that are not components of the splicing or polyadenylation machinery, have been identified. Our discovery therefore provides a genetically tractable system to dissect alternative pA. We will first define the level at which FCA and FPA control pA site selection through cis-element analysis, which we will couple with PolII ChIP and transcription run-on assays. A combination of RNA immunoprecipitation (which we have successfully developed) and RNA specificity-swap assays will be used to determine how directly these proteins regulate 3' end formation. Next, a genome-wide identification of pA sites regulated by FCA and FPA will be made by utilising recent developments in next generation sequencing: Fortunately for us, Digital Transcriptomics (DT) involves sequencing short 'tags' of RNA adjacent to pA tails, thereby providing positional information on pA sites. We will develop a bioinformatics pipeline to analyse DT data to quantify changes in pA site selection, working first with different FCA and FPA genetic backgrounds that we know have contrasting patterns of pA. With these tools in place, we will be able analyse other DT data releases and associate alternative pA with diverse backgrounds, cell types or treatments. Our work will have widespread impacts in understanding gene regulation because it will define mechanisms by which alternative pA can be controlled, clarify the connections between 3' end formation and RNA silencing and establish generic bioinformatics tools to identify alternative pA in next generation sequencing data.

Publications

10 25 50
 
Description 1. We used 3rd generation direct RNA sequencing to map the position of mRNA cleavage and polyadenylation in the model plant Arabidopsis thaliana and in a number of mutant backgrounds defective in flower development.

2. We improved the understanding of RNA 3' end formation in plants and how the genome was organised.

3. We made a publicly accessible database of these data called polyAdb
Exploitation Route Our data is crucial to defining where genes end and how the genome of the model plant Arabidopsis thaliana should be annotated. Since Arabidopsis is a pathfinder model this is important for the proper annotation of crop plant genomes too.
Sectors Agriculture, Food and Drink,Healthcare,Pharmaceuticals and Medical Biotechnology

URL https://www.compbio.dundee.ac.uk/polyADB/
 
Description PhD Studentship
Amount £90,000 (GBP)
Organisation University of Dundee 
Sector Academic/University
Country United Kingdom
Start 09/2010 
End 03/2014
 
Description The Non-coding Arabidopsis Genome
Amount £792,345 (GBP)
Funding ID BB/J00247X/1 
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 07/2013 
End 08/2016
 
Title Our mapped 3' ends added to Araport 
Description We have previously mapped the 3' ends of Arabidopsis RNAs and we have made our own database to share these data. However these data also have great value in determining how the genome should be annotated. We collaborated with the Araport team to share our data. Our work contributed to the re-annotation of the Arabidopsis genome known as Araport 11 released at the end of 2015. The new Arabidopsis genome browser uses our data as a selectable track, so the plant science community can visualise the 3' ends of genes that they are interested in. 
Type Of Material Database/Collection of data 
Year Produced 2015 
Provided To Others? Yes  
Impact Its notable that these data contributed to the reannotation of the Arabidopsis genome: Araport 11 
URL https://apps.araport.org/jbrowse/?data=arabidopsis&loc=Chr5%3A3165471..3184470&tracks=TAIR10_genome%...
 
Title polyAdb 
Description A database of single molecule direct RNA sequencing data that maps the site of polyadenylation on RNA. 
Type Of Material Database/Collection of data 
Year Produced 2013 
Provided To Others? Yes  
Impact We commissioned an animation to explain the database, revealing the many ways we have tried to make the data accessible to different communities. The Sequencing company SeqLL provide a link to the database from their commercial website. 
URL https://www.compbio.dundee.ac.uk/polyADB/
 
Title Relative Abundance of Transcripts (rats) 
Description Description Who it is for Anyone working in transcriptomics, analysing gene expression and transcript abundances. What it does It provides a method to detect changes in the abundance ratios of transcript isoforms of a gene. This is called Differential Transcript Usage (DTU). RATs is workflow-agnostic. Quantification quality details are left to the quantification tools; RATs uses only the transcript abundances, which you can obtain using any tool you like. This makes it suitable for use with alignment-free quantification tools like Kallisto or Salmon. RATs is able to take advantage of the bootstrapped quantifications provided by the alignment-free tools. These bootstrapped data are used by RATs to assess how much the technical variability of the heuristic quantifications affects differential transcript usage and thus provide a measure of confidence in the DTU calls. What it needs This is an R source package, and will run on any platform with a reasonably up-to-date R environment. A few third-party R packages are also required (see below). As input, RATs requires transcript abundance estimates with or without bootstrapping. The format either way is tables with the samples as columns and the transcripts as rows. An extra column holds the transcript IDs. Some functionality to create these from Salmon or Kallisto quantification files is provided by RATs. RATs also requires a look-up table matching the transcript identifiers to the respective gene identifiers. This can be obtained through various means, one of them being extracting this info from a GTF file using functionality provided by RATs. 
Type Of Technology Software 
Year Produced 2018 
Open Source License? Yes  
Impact This allows the analysis of alternative transcripts from RNA-seq data and has been used by a number of groups in their research. 
URL https://github.com/bartongroup/RATS
 
Title RoSA: a tool for the Removal of Spurious Antisense 
Description In stranded RNA-Seq experiments we have the opportunity to detect and measure antisense transcription, important since antisense transcripts impact gene transcription in several different ways. Stranded RNA-Seq determines the strand from which an RNA fragment originates, and so can be used to identify where antisense transcription may be implicated in gene regulation. However, spurious antisense reads are often present in experiments, and can manifest at levels greater than 1% of sense transcript levels. This is enough to disrupt analyses by causing false antisense counts to dominate the set of genes with high antisense transcription levels. The RoSA (Removal of Spurious Antisense) tool detects the presence of high levels of spurious antisense transcripts, by: analysing ERCC spike-in data to find the ratio of antisense:sense transcripts in the spike-ins; or using antisense and sense counts around splice sites to provide a set of gene-specific estimates; or both. Once RoSA has an estimate of the spurious antisense, expressed as a ratio of antisense:sense counts, RoSA will calculate a correction to the antisense counts based on the ratio. Where a gene-specific estimate is available for a gene, it will be used in preference to the global estimate obtained from either spike-ins or spliced reads. 
Type Of Technology Software 
Year Produced 2017 
Open Source License? Yes  
Impact It has enabled a number of groups to identify spurious antisesnse in their RNA-seq data. 
URL https://doi.org/10.5281/zenodo.2661378
 
Description Dec 2016 Oxford: Seminar at WTCHG/SGC: Identification of novel functional sites in protein domains from the analysis of human variation 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Professional Practitioners
Results and Impact This was an invited seminar at the Oxford SGC and WTCHG institutes in the Department of Medicine. The seminar described work that covered most of our funded research activities.
Year(s) Of Engagement Activity 2016
URL https://talks.ox.ac.uk/talks/id/7b03765b-6d8a-45c0-bbb1-e570a70377ff/
 
Description Feb 2017: Seattle: What can human variation tell us about protein structure? 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact This was an invited talk at the genevariation3d workshop at the Institute of Systems Biology which brought together scientists from the genomics/personalised medicine field and the field of protein structure analysis. I presented on our work at this interface that is built on Jalview and the Dundee Resource and inspired by our research in plant biology.
Year(s) Of Engagement Activity 2017
URL http://genevariation3d.org/
 
Description Invited Research Seminar Bielefeld University, Germany 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach international
Primary Audience Participants in your research or patient groups
Results and Impact As a result of our work on this grant I was invited to speak at Bielefeld University, Germany

no actual impacts realised to date
Year(s) Of Engagement Activity 2011
 
Description Invited Research Seminar at International Plant RNA Workshop, Riken, Yokohama, Japan 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Participants in your research or patient groups
Results and Impact Based on our work on this grant, I was invited to speak at the International Plant RNA Meeting I was an invited speak and an invited Chair of a Session at this International Meeting

no actual impacts realised to date
Year(s) Of Engagement Activity 2011
 
Description Invited Research Seminar, Edinburgh University 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Participants in your research or patient groups
Results and Impact As a result of our work on this grant I was invited to give a talk to the Biology Department at Edinburgh University

no actual impacts realised to date
Year(s) Of Engagement Activity 2012
 
Description Invited Research Seminar, Miami University, USA 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Participants in your research or patient groups
Results and Impact As a result of our work in this area, I was invited to speak at Miami University, USA

no actual impacts realised to date
Year(s) Of Engagement Activity 2011
 
Description Invited Research Seminar, Temasek Life Sciences, Singapore 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach international
Primary Audience Participants in your research or patient groups
Results and Impact As a result of our work on this grant I was invited to give a research talk in Singapore

no actual impacts realised to date
Year(s) Of Engagement Activity 2011
 
Description Mar 2017: Seminar at Newcastle University: Identification of novel functional sites in protein domains from the analysis of human variation 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Professional Practitioners
Results and Impact This was an invited seminar to the Centre for Health and Bioinformatics at Newcastle University. I presented work at the interface between genomics/transcriptomics and protein structure which relied heavily on the software tools we develop and other resources.
Year(s) Of Engagement Activity 2017
URL http://www.ncl.ac.uk/chabi/events/pastevents/item/eventgeoffbarton.html
 
Description Sept 2015: Invited Seminar at TGAC, Norwich 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Professional Practitioners
Results and Impact I presented a broad range of our research at an invited seminar at The Genome Analysis Centre (TGAC) now, the Earlham Institute.
Year(s) Of Engagement Activity 2015