Diversifying Transcription Termination Function

Lead Research Organisation: University of Dundee
Department Name: School of Life Sciences

Abstract

Our genes are encoded in specific regions of DNA in our chromosomes. When genes are switched on, they are copied into a related molecule called RNA. Genes start and stop at specific stretches of a particular strand of the DNA double helix. It turns out that what is copied is not always the same: the sequence being copied can stop (or terminate) at different places. This process is controlled in the cell as a way to tune how much gene expression occurs and what will be coded. For example, it was very recently shown that the rhythms of gene expression that run our body clock are controlled by regulated termination. Despite its importance, termination is the least understood aspect of the copying process.
Unexpectedly, the study of when plants flower has provided insight into ways in which termination can be controlled. GGS's lab recently discovered that the flowering regulator FPA interacts with a protein called Pcfs2. Pcfs2 is related to a protein called Pcf11 that is known to be essential for promoting termination in many organisms, including yeast, flies, worms and humans. What is special about this finding is that GGS's lab discovered flowering plants have evolved two related Pcf11 proteins (Pcfs2, Pcfs4), while yeast, animals and primitive plants appear only to have one. Intriguingly, GGS's lab discovered that these two plant proteins must carry out specialized tasks because one is essential to life and doesn't interact with FPA (Pcfs4), but the other is not essential, but does interact with FPA (Pcfs2).
The aim of the research proposed here, is to work out how and why flowering plants have evolved two related proteins involved in termination and to discover how they work differently. This should provide basic insight into how gene expression is controlled in plants and provide evidence of different ways in which termination can be controlled that should be of wide general interest.
The first objective of this study is to determine whether Pcfs2 and Pcfs4 target all genes or different sub-sets of genes for termination. This can be done using a method called ChIP-seq. We can then tell how these targets relate to a function in termination by sequencing all the RNA in mutants that lack properly functioning Pcfs2 and Pcfs4. In this way we can see at which genes the copying process does not stop properly and how this affects the expression of neighbouring genes. We will integrate different RNA sequencing data to answer this question. One thing we will do that no one has ever done in plants before is sequence the RNA as it is being copied from DNA, so we can immediately see what is happening to termination. Our second objective is to see if the mechanism by which Pcfs4 and Pcfs2 affect termination involves interaction with different proteins because these could be rare examples of regulators of this process.
The GGS and GJB groups form a hugely experienced team in this area - not only in understanding how termination can be regulated, but also in developing breakthroughs in proteomics and RNA-sequencing analysis essential to this study and which are generally useful to other scientists. This work will provide state-of the-art training for early career scientists working as a team on plants, genetics, proteomics and computational analysis of large datasets.
This work will greatly advance our understanding of novel features of regulated termination and link back to the biology of flowering plants by revealing what genes are controlled by regulated termination. In this way, we will provide underpinning knowledge about how gene expression is controlled in plants essential to our future food and energy security.

Technical Summary

Termination is the least understood aspect of the transcription cycle, but is a newly emerging level at which eukaryotic gene expression is regulated. An unexpected by-product of the genetic analysis of flowering time control has been the identification of a series of factors that enable flowering, and control cleavage, polyadenylation and transcription termination. The analysis of such factors has yielded insight into the regulation of RNA 3' end formation that is of wide general interest.
We recently discovered that the flowering regulator FPA directly interacts with Pcfs2, a protein related to the conserved termination factor Pcf11. In contrast to other eukaryotes that encode a single, essential Pcf11 protein, flowering plants have evolved two proteins related to Pcf11 that are diversified in function: Pcfs2 interacts with FPA and is non-essential, while Pcfs4 does not interact with FPA and is essential. In this proposal we aim to understand how and why the function of a crucial termination factor has diversified in flowering plants.
We will identify the immediate targets of Pcfs2 and Pcfs4 using ChIP-Seq and validate the functional impact of these associations through integration of specialised RNA-sequencing datasets (including nascent transcriptomes) designed to interrogate defective termination. We will combine this approach with a cross-link based proteomics procedure and in vitro binding assays to determine which regulatory proteins Pcfs2 and Pcfs4 associate with.
This research will advance our fundamental understanding of the mechanisms controlling transcription termination and how it can be regulated. The integration of multiple, specialized RNA-Seq reactions and sequencing of the nascent transcriptome will influence the design and interpretation of future transcriptome studies. In the longer term, this study will link back to biology, revealing targets and process controlled by gene-specific termination.

Planned Impact

1. Cultural Life. Our work defines a new area of science aiming to uncover why flowering plants have evolved two related termination factors diversified in function. This curiosity-led discovery of new knowledge is a feature that the UK public expect of their scientists as GGS experienced when he spoke about non-coding antisense RNAs at a BBSRC organized public engagement event at the Edinburgh International Science Festival.

2.Agricultural Industry. Our work benefits the development of world agriculture in several distinct ways. First GGS's lab is training a new generation of plant scientists familiar with working with genetics, making crosses and phenotyping plants. Second, we are training biologists used to working in multi-disciplinary teams, combining the iterative interaction of bench scientists with computational biologists. Third, through analysis of huge sequencing datasets we are drawing into plant biology, scientists from mathematical and physics backgrounds, who bring with them quite different skill-sets and insight that can be highly beneficial to understanding plant biology and hence crop science. Fourth, we are developing cross-link based in vivo interaction proteomics. In light of the relative challenges in extracting proteins from living plants compared to human cell lines, this development may be particularly beneficial to plant biologists. Finally, GGS's position is jointly supported by The James Hutton Institute. Dedicated crop science colleagues are immediately exposed to the work of his lab in Arabidopsis and this has facilitated translation to biofuel research, barley genetics, potato disease and the timing of raspberry fruit development.

3. World Economy. Dundee takes the training of PhD and Post-Doctoral scientists particularly seriously and has a specific department called "OPD" that delivers "non-bench" training in, for example, public speaking and public engagement. Dundee provides a highly international working environment with staff from over 60 different nationalities working in the College of Life Sciences. Dundee also houses the 3rd largest biotech cluster in the UK with an entrepreneurial culture of spin-out companies from Life Sciences. Together, these aspects of research life in Dundee provide rounded, highly skilled and educated employees to the international work-force. For example, among recent alumni from GGS's lab who came to Dundee from overseas, C. Hornyik has gone on to hold a P.I. position in crop science in the UK and L. Terzi a managerial position in a Swiss pharmaceutical company.

4. Society Through Public Engagement. This proposal relates to fundamental understanding of plant gene expression. The GM controversy highlights the importance of public understanding and support for the research we do. GGS became responsible for Dundee Plant Sciences impact activities in 2010 and since then the Division has successfully developed valuable links with Dundee's Botanic Garden, a hands-on DNA extraction activity to communicate information about plants having genes and a sustainable "Genetics Garden". This has involved all members of GGS's lab and most members of the Division of Plant Sciences. Further links with the Botanic Garden are planned through annual Family Fun Days, Fascination for Plants Days and the annual replanting of variants of our Genetics Garden. In this proposal we specifically describe a computer generated animation of the processes of cleavage, polyadenylation and termination of RNA. Using this, we will also communicate very specifically our research interest in gene expression to the general public. The work of our groups in public engagement is not unique, but part of the culture of Dundee University College of Life Sciences, reflected by the fact that Dundee won the inaugural BBSRC "Excellence with Impact" competition and are participants in the current competition.
 
Description We developed a method to purify protein complexes from plants and devised a new data analysis approach to interpret the findings. Using this approach, we identified FPA co-purifying with many different proteins shown in other species to control where RNAs end. This work provides the clearest view of the proteins that control where RNAs end in plants.
2. Solving the structure of the FPA SPOC domain.
In order to understand how FPA would make interactions with other proteins, we collaborated with Liang Tong's lab at Columbia University New York, USA to solve the structure of the FPA SPOC domain. This structure was published in the open access science journal PLOS One.
3. Understanding how RNA Sequencing analysis tools work with our data.
At the heart of our analysis approach was the use of RNA-Seq. We had previously developed and studied a large dataset of yeast RNA-seq to determine the properties of these data and what available tools were best suited for analysis. However, yeast is not Arabidopsis and, in particular, differs greatly in the degree of complexity with which RNA can be processed. Consequently, we extended this analysis to ask how the complexity of processing events in Arabidopsis would influence the data and the analysis tools. We discovered that, despite these RNA differences, many of the conclusions we had made in analysing yeast RNA-Seq data could be applied to Arabidopsis. This work was published in the scientific journal, Bioinformatics.
4. Detection of spurious antisense RNA.
With the large RNA-Seq datasets that we had accumulated in order to analyse RNA processing, we detected RNA signal coded not only from the DNA strand where genes were, but also on the opposite strand. This type of signal can correspond to molecules known as antisense RNA which can play important regulatory roles in gene expression. However, such signal can be produced as an artefact of the copying process used in making libraries for RNA-Seq. In large scale studies, it is therefore essential to know if one is looking at biology or spurious experimental artefact. This is not so straightforward in plants, because they have a class of enzymes that make an antisense copy of processed mRNA. We were able to distinguish between these possibilities because we included synthetic RNAs spiked into all of our experimental samples. By measuring the antisense signal detected at the synthetic spike-ins, we could show variability in the extent of technical artefact in our data and also in publicly available data collected by the so-called ENCODE consortium of human genomics researchers, for example, who included spike-ins into their samples too. We developed a software tool called RoSa to correct for this, but the major finding here, is that in the widely used RNA-seq experiments should include synthetic spike-ins as quality control. This work was published in the open access journal F1000.
5. RATs a software tool to quantify changes in different RNAs.
In an attempt to quantify the impact of FPA on RNA processing we developed a software tool called RATs (Relative Abundance of Transcripts) to address some of the difficulties in working with short read RNA-Seq data. This work was published in the open access journal F1000.
6. Changes in RNA 3' end formation in Pcfs2 and 4.
We obtained viable mutants disrupting Pcfs2 and Pcfs4 and used these to study the impact on gene expression by using RNA-Seq analysis. We could show that, consistent with a role in controlling where RNA copies end, much longer RNAs were detected in the mutant backgrounds. We could also show that consistent with the idea that they play different roles, many of the genes affected in these two mutants were different.
7. Mapping the complexity of RNA using nanopore direct RNA sequencing with a special focus on where RNAs end.
Although we used the conventional RNA-Seq approach to study RNA processing, we also pioneered the use of another technology called nanopore direct RNA sequencing (DRS) that was released towards the end of our funding period. The major distinction between these technologies is that in RNA-Seq, RNA is fragmented into small pieces for sequencing and then computationally reconstructed. The authentic reconstruction of these fragments is a difficult and unsolved problem. The advantage of the nanopore approach is that full-length RNA molecules can be sequenced. In this way we can therefore not only map where RNAs end, but also reveal the full context of the RNA that is associated with any particular end. We demonstrated that nanopore DRS maps 3' ends in close agreement with sites that we mapped using another technology (Helicos) that only revealed RNA ends. A major limitation of using RNA-Seq approaches to map 3' ends is that polypurine stretches (A or G) in internal regions can be incorrectly detected as RNA ends through a technical error called internal priming. We demonstrated that internal priming was rare or absent in nanopore DRS. We confirmed that nanopore DRS was quantitative and that we could analyse shifts in where different RNAs from the same gene end in different conditions. Building on our work with antisense RNA signal, we could also show that spurious antisense signal was rare or absent in nanopore data. Our detailed examination of the utility of nanopore direct RNA sequencing was published in the open access journal, eLIFE.
8. New analysis tools to study RNA ends using nanopore direct RNA sequencing.
Because nanopore DRS is such a new technology, software tools and analysis approaches to work with the types of data it generates are limited. Consequently, one of the things we have done, is develop our own tools for working with the data. A parallel problem with RNA ends is that they are not well annotated in many genomes including that of Arabidopsis. To get around this problem, we made annotation independent mapping of RNA ends, for example to detect ends in certain features of genes, such as stretches of sequence called introns.
9. Examining the impact of FPA on RNA using nanopore DRS.
Having established the utility of nanopore DRS for studying where RNAs end, we used it to study how FPA controls processing. We had originally proposed to do this with a combination of RNA-Seq approaches. We used nanopore DRS and deep RNA-Seq of lines over-expressing FPA or in which FPA was disrupted. In this way we could identify the impact of FPA on different RNA processing events. This work will be submitted for publication in the near future.
10. Mapping of the sites of m6A using nanopore direct RNA sequencing.
Our work on nanopore DRS involved a collaboration with other members of the group working on a BBSRC funded study of RNA modifications. The most abundant internal modification of mRNA is m6A. We were the first to develop an approach to map m6A using nanopore DRS. This enabled us to map the position of m6A much more precisely than before. This work was published in the open access journal eLIFE.
11. The major impact of m6A is on where mRNAs end.
When genes are switched on, the DNA is copied into a related molecule called RNA. The RNA copy can be stopped at different sites in the gene, so that different parts of the gene are copied - this is an important level at which genes can be controlled and is a process found widely in nature, including in humans too. We found that in the absence of m6A, the copies of 1000s of genes stopped earlier than normal. These findings suggest that one of the major functions of m6A is to control where gene copies end. The process of ending gene copies is controlled by a multi-protein complex that is made up of related proteins in very different species. However, one component of this complex in plants has a domain which reads m6A, whereas humans, for example, do not. Our findings provide an explanation for the evolution of this domain in this protein specifically in plants.
12. Identification of the proteins that read m6A.
We applied our method for purifying proteins in a collaboration with a group in France to identify the proteins that co-purify with proteins that read m6A. This led to the funding of a collaborative award to pursue the function of these proteins during plant stress responses funded by the French Research agency, ANR-Heat-EpiRNA: ANR-17-CE20-0007-04.
13. The approach that we have used to map m6A has already been adopted and adapted by others.
The approach that we developed to map m6A has already been adopted and adapted to map m6A in adenovirus, demonstrating the potential for this approach to be more widely useful. We anticipate that this approach could be used to study m6A in different species and in different conditions eg during stress.
14. The experience we developed in analysing RNA we have applied to transform the annotation of orphan crops.
The investment we made in detailed characterisation of the properties of nanopore DRS in revealing the complexity of RNA processing demonstrated its utility in genome annotation. We used this as background data to win funding from GCRF to transform the annotation of orphan crops. We have collaborated with the African Orphan Crops Consortium in this new programme. Our first study, targeting water yam, has already been released https://phytozome-next.jgi.doe.gov/info/Dalata_v2_1.
15. Insight into a group of disease-causing animals.
Aside from plants, the only other species to have evolved a reader domain in the equivalent protein that controls where RNAs end is in the apicomplexa, a group of parasitic protozoa that cause diseases in animals and human such as toxoplasmosis and malaria. Consequently, the insight and approaches used by us, could be translated to alternative drug targets to treat important diseases.
16. A press release relating to the publication of our work in eLIFE is available here:
https://www.lifesci.dundee.ac.uk/news/2020/jan/16/new-approach-revealing-complexity-rnas-genomes-really-encode
17. A post on our eLIFE paper appeared on the Arabidopsis GARNET Community website here:
https://blog.garnetcommunity.org.uk/page/2/
18. A podcast interview of people who worked on the nanopore direct RNA sequencing study is available here
http://blog.garnetcommunity.org.uk/matthew-parker-kasia-knop-and-anya-sherwood-talk-to-the-garnet-community-podcast/
19. A press release of our work with nanopore direct RNA sequencing in orphan crops is available here
https://www.lifesci.dundee.ac.uk/news/2020/jan/21/uncovering-genome-sequence-water-yam-orphan-crop
20. Our analysis of changes in patterns of RNA processing in different backgrounds with different levels of FPA activity revealed that premature transcription termination was widespread in Arabidopsis NLR immune response genes. This was has been submitted to eLife for publication and the corresponding preprint is available at bioRxiv: www.biorxiv.org/content/10.1101/2020.12.15.422694v1.
Exploitation Route 1. The approach that we have used to map m6A has already been adopted and adapted by others.
The approach that we developed to map m6A has already been adopted and adapted to map m6A in adenovirus, demonstrating the potential for this approach to be more widely useful. We anticipate that this approach could be used to study m6A in different species and in different conditions eg during stress.
2. The experience we developed in analysing RNA we have applied to transform the annotation of orphan crops.
The investment we made in detailed characterisation of the properties of nanopore DRS in revealing the complexity of RNA processing demonstrated its utility in genome annotation. We used this as background data to win funding from GCRF to transform the annotation of orphan crops. We have collaborated with the African Orphan Crops Consortium in this new programme. Our first study, targeting water yam, has already been released https://phytozome-next.jgi.doe.gov/info/Dalata_v2_1.
3. Insight into a group of disease-causing animals.
Aside from plants, the only other species to have evolved a reader domain in the equivalent protein that controls where RNAs end is in the apicomplexa, a group of parasitic protozoa that cause diseases in animals and human such as toxoplasmosis and malaria. Consequently, the insight and approaches used by us, could be translated to alternative drug targets to treat important diseases.
4. Our discovery of widespread premature transcription termination in NLRs has important implications for those understanding how this crucial gene family are regulated and evolve.
Sectors Agriculture, Food and Drink,Environment,Pharmaceuticals and Medical Biotechnology

URL https://www.lifesci.dundee.ac.uk/news/2020/jan/16/new-approach-revealing-complexity-rnas-genomes-really-encode
 
Description British Council Newton Bhabha Fund
Amount £5,000 (GBP)
Organisation British Council 
Sector Charity/Non Profit
Country United Kingdom
Start 05/2018 
End 08/2018
 
Description Marie Sklodowska-Curie Actions
Amount € 183,455 (EUR)
Organisation European Union 
Sector Public
Country European Union (EU)
Start 03/2017 
End 02/2019
 
Description Project de Recherche collaborative (PRC) appel a project générique 2017 (collaborative research project, call 2017)
Amount € 508,572 (EUR)
Funding ID ANR-Heat-EpiRNA: ANR-CE20-0007-04 
Organisation National Agency for Research 
Sector Public
Country France
Start 02/2018 
End 01/2022
 
Description Royal Society Newton Advanced Fellowship
Amount £30,000 (GBP)
Organisation The Royal Society 
Sector Charity/Non Profit
Country United Kingdom
Start 12/2017 
End 11/2018
 
Title RATs - Relative Abundance of Transcripts 
Description Who it is for Anyone working in transcriptomics, analysing gene expression and transcript abundances. What it does It provides a method to detect changes in the abundance ratios of transcript isoforms of a gene. This is called Differential Transcript Usage (DTU). RATs is workflow-agnostic. Quantification quality details are left to the quantification tools; RATs uses only the transcript abundances. This makes it suitable for use with alignment-free quantification tools like Kallisto or Salmon. It is also compatible with DTE output from Sleuth. RATs is able to take advantage of the bootstrapped quantifications provided by the alignment-free tools. These bootstrapped data are used by `RATs to assess how much the technical variability of the heuristic quantifications affects differential transcript usage and thus provide a measure of confidence in the DTU calls. What it needs This is an R source package, and will run on any platform with a reasonably up-to-date R environment. As input, RATs requires transcript abundance estimates with or without bootstrapping. For convenience, these can also be extracted directly from the output of Sleuth. RATs also requires a look-up table matching transcript identifiers to respective gene identifiers. This can be obtained through various means, one of them being extracting this info from a GTF file. RATs makes use of the data.table and matrixStats packages, as well as ggplot2 and shiny for visualisations. All these are available from CRAN. 
Type Of Technology Software 
Year Produced 2016 
Open Source License? Yes  
Impact RATs has generated a lot of interest in the community of people interested in differential transcript identification. A paper describing RATs and its evalutation is in final draft. 
URL https://github.com/bartongroup/RATS
 
Title Relative Abundance of Transcripts (RATS) 
Description Relative Abundance of Transcripts (rats) Description Who it is for Anyone working in transcriptomics, analysing gene expression and transcript abundances. What it does It provides a method to detect changes in the abundance ratios of transcript isoforms of a gene. This is called Differential Transcript Usage (DTU). RATs is workflow-agnostic. Quantification quality details are left to the quantification tools; RATs uses only the transcript abundances, which you can obtain using any tool you like. This makes it suitable for use with alignment-free quantification tools like Kallisto or Salmon. RATs is able to take advantage of the bootstrapped quantifications provided by the alignment-free tools. These bootstrapped data are used by RATs to assess how much the technical variability of the heuristic quantifications affects differential transcript usage and thus provide a measure of confidence in the DTU calls. What it needs This is an R source package, and will run on any platform with a reasonably up-to-date R environment. A few third-party R packages are also required (see below). As input, RATs requires transcript abundance estimates with or without bootstrapping. The format either way is tables with the samples as columns and the transcripts as rows. An extra column holds the transcript IDs. Some functionality to create these from Salmon or Kallisto quantification files is provided by RATs. RATs also requires a look-up table matching the transcript identifiers to the respective gene identifiers. This can be obtained through various means, one of them being extracting this info from a GTF file using functionality provided by RATs. 
Type Of Technology Software 
Year Produced 2017 
Open Source License? Yes  
Impact RATS allows transcript abundance to be estimated from short read (Illumina) data. It is unique in providing a confidence estimate for such analyses. 
 
Title Relative Abundance of Transcripts (rats) 
Description Description Who it is for Anyone working in transcriptomics, analysing gene expression and transcript abundances. What it does It provides a method to detect changes in the abundance ratios of transcript isoforms of a gene. This is called Differential Transcript Usage (DTU). RATs is workflow-agnostic. Quantification quality details are left to the quantification tools; RATs uses only the transcript abundances, which you can obtain using any tool you like. This makes it suitable for use with alignment-free quantification tools like Kallisto or Salmon. RATs is able to take advantage of the bootstrapped quantifications provided by the alignment-free tools. These bootstrapped data are used by RATs to assess how much the technical variability of the heuristic quantifications affects differential transcript usage and thus provide a measure of confidence in the DTU calls. What it needs This is an R source package, and will run on any platform with a reasonably up-to-date R environment. A few third-party R packages are also required (see below). As input, RATs requires transcript abundance estimates with or without bootstrapping. The format either way is tables with the samples as columns and the transcripts as rows. An extra column holds the transcript IDs. Some functionality to create these from Salmon or Kallisto quantification files is provided by RATs. RATs also requires a look-up table matching the transcript identifiers to the respective gene identifiers. This can be obtained through various means, one of them being extracting this info from a GTF file using functionality provided by RATs. 
Type Of Technology Software 
Year Produced 2018 
Open Source License? Yes  
Impact This allows the analysis of alternative transcripts from RNA-seq data and has been used by a number of groups in their research. 
URL https://github.com/bartongroup/RATS
 
Title RoSA 
Description RoSA: a tool for the Removal of Spurious Antisense In stranded RNA-Seq experiments we have the opportunity to detect and measure antisense transcription, important since antisense transcripts impact gene transcription in several different ways. Stranded RNA-Seq determines the strand from which an RNA fragment originates, and so can be used to identify where antisense transcription may be implicated in gene regulation. However, spurious antisense reads are often present in experiments, and can manifest at levels greater than 1% of sense transcript levels. This is enough to disrupt analyses by causing false antisense counts to dominate the set of genes with high antisense transcription levels. The RoSA (Removal of Spurious Antisense) tool detects the presence of high levels of spurious antisense transcripts, by: analysing ERCC spike-in data to find the ratio of antisense:sense transcripts in the spike-ins; or using antisense and sense counts around splice sites to provide a set of gene-specific estimates; or both. Once RoSA has an estimate of the spurious antisense, expressed as a ratio of antisense:sense counts, RoSA will calculate a correction to the antisense counts based on the ratio. Where a gene-specific estimate is available for a gene, it will be used in preference to the global estimate obtained from either spike-ins or spliced reads. 
Type Of Technology Software 
Year Produced 2017 
Open Source License? Yes  
Impact This tool has improved the ability to interpret RNA-seq data for antisense analysis. 
 
Title RoSA: a tool for the Removal of Spurious Antisense 
Description In stranded RNA-Seq experiments we have the opportunity to detect and measure antisense transcription, important since antisense transcripts impact gene transcription in several different ways. Stranded RNA-Seq determines the strand from which an RNA fragment originates, and so can be used to identify where antisense transcription may be implicated in gene regulation. However, spurious antisense reads are often present in experiments, and can manifest at levels greater than 1% of sense transcript levels. This is enough to disrupt analyses by causing false antisense counts to dominate the set of genes with high antisense transcription levels. The RoSA (Removal of Spurious Antisense) tool detects the presence of high levels of spurious antisense transcripts, by: analysing ERCC spike-in data to find the ratio of antisense:sense transcripts in the spike-ins; or using antisense and sense counts around splice sites to provide a set of gene-specific estimates; or both. Once RoSA has an estimate of the spurious antisense, expressed as a ratio of antisense:sense counts, RoSA will calculate a correction to the antisense counts based on the ratio. Where a gene-specific estimate is available for a gene, it will be used in preference to the global estimate obtained from either spike-ins or spliced reads. 
Type Of Technology Software 
Year Produced 2017 
Open Source License? Yes  
Impact It has enabled a number of groups to identify spurious antisesnse in their RNA-seq data. 
URL https://doi.org/10.5281/zenodo.2661378
 
Title profDGE48 - Code base for profiling highly replicated differential gene expression RNA-seq 
Description profDGE48 is the code that has been used to understand the relationship between replication and power in RNA-seq analysis. The code also allows the comparison of different methods of calling differential gene expression (DGE) by RNA-seq. This code was central to the high-impact work on RNA-seq published in the journal RNA (Schurch et al). 
Type Of Technology Software 
Year Produced 2015 
Open Source License? Yes  
Impact The software enabled work on RNA-seq to be completed and underpins our understanding of the technique for experimental design decisions made in this grant and all subsequence Simpson/Barton collaborative grants. The principles set out by the software have been widely adopted by the academic community. 
URL https://github.com/bartongroup/profDGE48
 
Title profDGE48 (2016 Update) - Code base for profiling highly replicated differential gene expression RNA-seq 
Description profDGE48 is the code that has been used to understand the relationship between replication and power in RNA-seq analysis. The code also allows the comparison of different methods of calling differential gene expression (DGE) by RNA-seq. This code was central to the high-impact work on RNA-seq published in the journal RNA (Schurch et al). This version includes bug fixes and updates to the work from 2015. 
Type Of Technology Software 
Year Produced 2016 
Open Source License? Yes  
Impact The software enabled work on RNA-seq to be completed and underpins our understanding of the technique for experimental design decisions made in this grant and all subsequence Simpson/Barton collaborative grants. The principles set out by the software have been widely adopted by the academic community. 
URL https://github.com/bartongroup/profDGE48
 
Description Dec 2016 Oxford: Seminar at WTCHG/SGC: Identification of novel functional sites in protein domains from the analysis of human variation 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Professional Practitioners
Results and Impact This was an invited seminar at the Oxford SGC and WTCHG institutes in the Department of Medicine. The seminar described work that covered most of our funded research activities.
Year(s) Of Engagement Activity 2016
URL https://talks.ox.ac.uk/talks/id/7b03765b-6d8a-45c0-bbb1-e570a70377ff/
 
Description Fascination o fPlants Day - "Plant Power" Dundee Botanic Garden 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact We host scientific engagement activities run by scientists in the Division of Plant Sciences. In recent years we have invited staff from The James Hutton Institute to be involved. We have a day of activities, but in addition, we have long lasting displays that include a "Genetics Garden" where we have planted barley mutant along a chromosome, wild ancestors and modern day cultivated varieties of barley, Mendel's original pea mutants. The garden and associated information boards are visible for most of the year. The garden is funded in part by my BBSRC grants. The Garden itself has been profiled on BBC Scotland. The Botanic garden receives 80,000 visitors per year, and the open day activities attract between 650-1500 people per event.
Year(s) Of Engagement Activity 2012,2013,2014,2015,2016,2017
 
Description Feb 2017: Seattle: What can human variation tell us about protein structure? 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact This was an invited talk at the genevariation3d workshop at the Institute of Systems Biology which brought together scientists from the genomics/personalised medicine field and the field of protein structure analysis. I presented on our work at this interface that is built on Jalview and the Dundee Resource and inspired by our research in plant biology.
Year(s) Of Engagement Activity 2017
URL http://genevariation3d.org/
 
Description Gatsby Charitable Trust Plant Science Master Class 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Schools
Results and Impact I organised a Masterclass for local schoolchildren S% and S6 on "New technologies" with talks on RNA-Seq analysis, proteomics, with a tour of the proteomics facility. Members of my lab, Geoff Barton's lab and the proteomics facility were involved.The Dundee series of Plant Science Master Classes has been used as a case study by the Gatsby Charitable Trust for the success of the programme as a whole.
Year(s) Of Engagement Activity 2017
 
Description Genetics Garden 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact As part of the pathways to Impact of this proposal we said we would work together with Dundee University Botanic Garden to develop a "Genetics Garden" for public engagement. With funding from Dundee University College of Life Sciences BBSRC Excellence with Impact award we have developed this garden. THis was opened to the public on 27th June 2013.

I developed a Genetics Garden as a hub for Plant Sciences Public Engagement Activity (which I lead). Since its creation in 2013 we have directly en

no actual impacts realised to date
Year(s) Of Engagement Activity 2013,2014,2015
URL http://www.bbc.co.uk/programmes/b0674xf1
 
Description Mar 2017: Seminar at Newcastle University: Identification of novel functional sites in protein domains from the analysis of human variation 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Professional Practitioners
Results and Impact This was an invited seminar to the Centre for Health and Bioinformatics at Newcastle University. I presented work at the interface between genomics/transcriptomics and protein structure which relied heavily on the software tools we develop and other resources.
Year(s) Of Engagement Activity 2017
URL http://www.ncl.ac.uk/chabi/events/pastevents/item/eventgeoffbarton.html
 
Description March 2016: SLS Symposium Poster -with Kimon Froussios: Statistical Models for RNA-seq expression measurements and evaluation of Differential Gene Expression analysis tools on 16 replicates of a model plant 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Professional Practitioners
Results and Impact This was a poster that described the research now included in our paper on the same topic.
Year(s) Of Engagement Activity 2016
 
Description Nov 2016: Talk by Kimon Froussios at RNA Discussion Meeting, James Hutton Institute 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Professional Practitioners
Results and Impact This was a talk on techniques developed in our group for the analysis of transcript abundance in Arabidopsis and other species. A short talk was given presenting a new computational tool to an audience composed of researchers most likely to use such tools. Discussions ensued with regards to its capabilities and comparison to existing similar tools.
Year(s) Of Engagement Activity 2016
 
Description Oct 2016 Poster: GRE Symposium: with Kimon Froussios: Transcript isoform switching in RNA-seq data 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Professional Practitioners
Results and Impact Poster describing techniques developed to analyse isoform switching in Arabidopsis and other complex Eukaryotes.
Year(s) Of Engagement Activity 2016
 
Description Presentation to Central South University, Changsha, China 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Professional Practitioners
Results and Impact I gave a talk about our recent research to a mixed audience of scientists and students at CSU, Changsha.
Year(s) Of Engagement Activity 2017
 
Description Seminar at Garvan Institute, Sydney, Australia 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Professional Practitioners
Results and Impact Presented recent research work to staff at the Garvan Institute and others in the region.
Year(s) Of Engagement Activity 2017
 
Description Seminar at Newcastle University, UK 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Professional Practitioners
Results and Impact Presented recent research from my group to Research Group Leaders, Postdocs, Ph.D. Students, Undergraduates.
Year(s) Of Engagement Activity 2017
 
Description Seminar at Sydney University, Australia 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Professional Practitioners
Results and Impact Presentation about my group's research to the University of Sydney.
Year(s) Of Engagement Activity 2017
 
Description Seminar at Wellcome Trust Centre for Human Genetics, University of Oxford 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Professional Practitioners
Results and Impact Seminar describing wide range of research from my group.
Year(s) Of Engagement Activity 2016
URL https://talks.ox.ac.uk/talks/id/7b03765b-6d8a-45c0-bbb1-e570a70377ff/
 
Description Seminar to Free Univesrity of Amsterdam, Netherlands 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Professional Practitioners
Results and Impact Presented recent research from my group to Research Group Leaders, Postdocs, Ph.D. Students, Undergraduates.
Year(s) Of Engagement Activity 2017
 
Description Sept 2015: Invited Seminar at TGAC, Norwich 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Professional Practitioners
Results and Impact I presented a broad range of our research at an invited seminar at The Genome Analysis Centre (TGAC) now, the Earlham Institute.
Year(s) Of Engagement Activity 2015