Construction and refinement of Reference Transcript Dataset annotations for fast and accurate transcript quantification in barley and potato

Lead Research Organisation: University of Dundee
Department Name: School of Life Sciences


In the model plant, Arabidopsis, we have taken a new and successful approach to analyse gene expression from high-throughput sequencing of RNA (RNA-seq). The high incidence of alternative splicing (AS) in plants (found in >600% of intron-containing genes) requires methods to distinguish and quantify AS variants or isoforms. We have taken an approach used by scientists analyzing gene expression and AS in human cancers. The approach uses programmes like SALMON, which does not require mapping of reads to genomes but instead use a Reference Transcript Dataset to quantify transcripts. We have generated an RTD for Arabidopsis (Zhang et al., 2015, 2016), which involved the development of in-house pipelines for removing redundancy in the RTD. Using this system with RNA-seq data from a time-course of plants transferred from normal to cold temperatures has already demonstrated transcript-specific expression including AS responses to cold, and identified new genes involved in the cold response. The PhD project will develop computational pipelines for construction of RTDs in crop plants (potato and barley) which will be tested with RNA-seq datasets. This will require development of algorithms for the pipelines and for various downstream analyses. The ability to generate transcript-specific and allele-specific expression data will greatly enhance our ability to analyse gene expression and identify key genes in plant/crop processes such as abiotic and biotic stress responses.


10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
BB/M010996/1 01/10/2015 30/09/2023
1785562 Studentship BB/M010996/1 01/09/2016 31/08/2020 Juan Carlos Juan Entizne
Description Among the most significant achievements from my award is the development of software tools for the analysis of transcriptomic data. These programs allows for fast and accurate analysis of high-throughput sequencing data. These tools will be useful to any researchers that wish to analyze the transcriptome of poorly annotated organisms.

As of now, I have meet the objective of generating an improved transcriptome annotation for the Double-Monoploid Potato cultivar (called StRTD, version Nov2018), which albeit still being a prototype, it already show a considerable improvement on the number of transcripts and genes annotated (in comparison to the only currently available annotation for potato). Once I have further improved my programs, I will be able to develop other high-quality transcriptome annotations for multiple other agricultural cultivars which are currently poorly annotated, such as Barley, Lettuce and Tomato.
Exploitation Route The results obtained so far in this award, due to the compartmentalized nature and independency of the programs, can be easily taken forward by future Ph.D. and/or Post-doc students. For example, new protocols to integrate data for other novel sequencing technologies (ex: PacBio) can be developed and applied in pararel or subsequently to these programs, therefore generating even more accurate and/or complete transcriptome annotations.

Furthermore, since I'm designing these programs to be user-friendly, these programs can be put to use by any researcher that wish to generate a high-quality comprehensive transcriptome annotation to study almost any organism they are interesting in.
Sectors Agriculture, Food and Drink,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology

Description Construction and refinement of Reference Transcript Dataset annotations for fast and accurate transcript quantification in barley and potato
Amount £58,000 (GBP)
Funding ID 1785562 
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 09/2016 
End 08/2020
Title Program for accurate translations of transcripts by fixing starting codon 
Description This program find, for a group of transcripts belonging to a gene, the starting codon that translates into the longest ORF. It then use that start codon to try to translate all the related transcripts (transcripts coming from the same gene). 
Type Of Technology Software 
Year Produced 2018 
Impact Most translation programs translate the longest ORF of each transcript. However, this is not completly biologically accurate. For example, such approach ignores the presence of PTC (premature termination codons), which are transcripts that are biologically expected to be terminated instead of translated into their longest version. 
Title Quality assessment of transcript model assembled from RNA-seq data for the identification of chimeric, fragmentary and redundant transcripts 
Description This program takes as input multiple transcriptome annotations and cross-reference the transcript models. The program then applies a series of criterias to classify conflicting transcript models as chimeric, fragments or redundants. Finally, the program merge accepted (not-conflicting) models into a single transcriptome annotation. 
Type Of Technology Software 
Year Produced 2018 
Impact This program allows to integrate transcriptome information coming from multiple sources, either previously annotated and curated annotations, or newly assembled ones. The annotation generate by this program contain only high-confidence models, which allows for accurate differential gene and transcript expression analysis with other bioinformatic tools that depend on annotated transcriptomes. 
Description Presentation of my computational tool for the creation of RTDs at the James Hutton Institue and at the School of Life Science (University of Dundee) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Postgraduate students
Results and Impact My project is leading to the development of a computational tool for the creation of high-quality transcriptome annotations. I presented my tool both at the James Hutton Institute and at the University of Dundee. Additionally, I also presented an improved transcriptome annotation for the Double-Monoploid potato cultivar generate with my tool. The audiences, both at the James Hutton Institute and the University of Dundee, expressed interest on my tool and requested further information about when it is going to be available for use and to which organism my tool can be applied to.
Year(s) Of Engagement Activity 2018