An automated pipeline for construction of Reference Transcript Datasets (RTD) to enable rapid and accurate gene expression analysis in plant species
Lead Research Organisation:
The James Hutton Institute
Department Name: Information & Computational Sciences
Abstract
A gene is the basic physical and functional unit on the genome. Genes are turned off and on at different times of development and in response to external and internal signals. Protein-coding genes are copied (transcribed) into precursor messenger RNA (pre-mRNA) which are then processed in different ways into mRNAs which can then be translated into proteins. A goal of the biological research is to understand how genes work by measuring changes in gene expression. This is achieved by estimating the abundances of all of the transcripts produced at any particular time or condition.
The current technologies to measure gene and transcript expression are called RNA sequencing (RNA-seq) which by sequencing millions of transcripts allows RNA levels to be measured on a genome-wide scale. The two main platforms are Illumina which generates short reads (currently 75 to 250 bp) and PacBio/Nanopore single molecule sequencing which produces full-length transcript reads. To measure gene expression, Illumina short reads are often mapped to the genome and assembled into transcripts which is an inaccurate process. PacBio/Nanopore have high sequencing error rates and do not generate sufficient depth of coverage of genes. These technologies, both in terms of chemistry and computational analyses, continue to advance at a rapid pace but a combination of the platforms is currently the best approach to generate RNA-seq data. In addition, the fastest and most accurate programs for computational quantification of transcript and gene expression require a comprehensive catalogue of transcripts which we call a Reference Transcript Dataset (RTD).
Over the last four years, we developed an RTD for Arabidopsis (AtRTD2) based on extensive Illumina short read sequences. Through a series of iterations, we developed the computational methods to identify and retain high confidence transcripts while removing false transcripts. AtRTD2 greatly increased the accuracy of the quantification allowing, for example, identification of novel transcription and splicing factors in response to cold. The challenge now is to translate this knowledge and experience to other plant and crop (and animal) species. Currently, transcript sequence catalogues for most plant species are incomplete, missing large numbers of transcripts, and for those with RNA-seq data, out-of-date analysis procedures have produced large numbers of false transcripts.
From developing AtRTD2, we have a prototype pipeline for constructing an RTD. The key features are multiple quality control filters which remove mis-assembled transcripts, redundant transcripts, chimaeric transcripts and transcript fragments. These multiple, iterative steps are currently individually coded and while the pipeline can be used, it will take up to 12 months to generate an RTD and requires the full-time expertise of a bioinformatician.
We will develop a fully automated pipeline (RTDBox) which can be used by scientists with basic bioinformatics skills or bioinformaticians with little experience in transcriptomics. Such a pipeline would also be designed to allow the incremental improvement of the RTD with the automatic incorporation of any new RNA-seq data (Illumina, PacBio, Nanopore). Within the pipeline, we will develop a transcript evaluation suite (TES) which will provide evaluation metrics to help biologists to identify and remove mis-constructed transcripts from assembly programs as well as understand the quality and completeness of the RTD generated. All our experience and expertise will be brought together to make a user-friendly software for plant scientists to measure gene expressions more accurately and thereby improving the exploration of biological processes across the globe.
The current technologies to measure gene and transcript expression are called RNA sequencing (RNA-seq) which by sequencing millions of transcripts allows RNA levels to be measured on a genome-wide scale. The two main platforms are Illumina which generates short reads (currently 75 to 250 bp) and PacBio/Nanopore single molecule sequencing which produces full-length transcript reads. To measure gene expression, Illumina short reads are often mapped to the genome and assembled into transcripts which is an inaccurate process. PacBio/Nanopore have high sequencing error rates and do not generate sufficient depth of coverage of genes. These technologies, both in terms of chemistry and computational analyses, continue to advance at a rapid pace but a combination of the platforms is currently the best approach to generate RNA-seq data. In addition, the fastest and most accurate programs for computational quantification of transcript and gene expression require a comprehensive catalogue of transcripts which we call a Reference Transcript Dataset (RTD).
Over the last four years, we developed an RTD for Arabidopsis (AtRTD2) based on extensive Illumina short read sequences. Through a series of iterations, we developed the computational methods to identify and retain high confidence transcripts while removing false transcripts. AtRTD2 greatly increased the accuracy of the quantification allowing, for example, identification of novel transcription and splicing factors in response to cold. The challenge now is to translate this knowledge and experience to other plant and crop (and animal) species. Currently, transcript sequence catalogues for most plant species are incomplete, missing large numbers of transcripts, and for those with RNA-seq data, out-of-date analysis procedures have produced large numbers of false transcripts.
From developing AtRTD2, we have a prototype pipeline for constructing an RTD. The key features are multiple quality control filters which remove mis-assembled transcripts, redundant transcripts, chimaeric transcripts and transcript fragments. These multiple, iterative steps are currently individually coded and while the pipeline can be used, it will take up to 12 months to generate an RTD and requires the full-time expertise of a bioinformatician.
We will develop a fully automated pipeline (RTDBox) which can be used by scientists with basic bioinformatics skills or bioinformaticians with little experience in transcriptomics. Such a pipeline would also be designed to allow the incremental improvement of the RTD with the automatic incorporation of any new RNA-seq data (Illumina, PacBio, Nanopore). Within the pipeline, we will develop a transcript evaluation suite (TES) which will provide evaluation metrics to help biologists to identify and remove mis-constructed transcripts from assembly programs as well as understand the quality and completeness of the RTD generated. All our experience and expertise will be brought together to make a user-friendly software for plant scientists to measure gene expressions more accurately and thereby improving the exploration of biological processes across the globe.
Technical Summary
For the majority of plant and crop species, transcript information is incomplete and poorly annotated. AtRTD2 shows the feasibility of building a comprehensive RTD and both Illumina and PacBio/Nanopore are required for complete and comprehensive RTD construction. We have the necessary knowledge and expertise to produce an automated, easy-to-use pipeline for building RTDs and allowing incorporation of new RNA-seq datasets as they arise.
The automated pipeline and software will be designed for use by scientists with basic bioinformatics skills or bioinformaticians with little experience in transcriptomics. RTDBox will be available in several formats, on different platforms, that will provide flexible access: 1) A local galaxy server will allow users to upload sequence data, run the pipeline and download RTD directly; 2) The pipeline will be set up on publicly available platforms, such as Cyverse (https://www.cyverse.org/) and GigaGalaxy (http://gigagalaxy.net/); 3) The wrapped pipeline will also be available in Galaxy Toolshed for download and installation for groups with local Galaxy infrastructure and who prefer to keep their data private; 4) The pipeline will also be wrapped in Docker containers so that they can be downloaded and run under Unix. It will have a modular construction covering the major functions: uploading RNA-seq data, quality control and trimming (if needed), read mapping and transcript assembly using different assembly programs. Separate automated pipelines for Illumina short read and single molecule sequencing will be included along with stringent quality controls such as splice junction assessment (archived through SJ and SJ phase databases). Merging of different assemblies (new and existing) and further quality control to remove redundancy, fragments etc are performed in the Transcript Evaluation Suite (TES). TES provides evaluation metrics to help the biologists to understand the quality and completeness of the RTD generated.
The automated pipeline and software will be designed for use by scientists with basic bioinformatics skills or bioinformaticians with little experience in transcriptomics. RTDBox will be available in several formats, on different platforms, that will provide flexible access: 1) A local galaxy server will allow users to upload sequence data, run the pipeline and download RTD directly; 2) The pipeline will be set up on publicly available platforms, such as Cyverse (https://www.cyverse.org/) and GigaGalaxy (http://gigagalaxy.net/); 3) The wrapped pipeline will also be available in Galaxy Toolshed for download and installation for groups with local Galaxy infrastructure and who prefer to keep their data private; 4) The pipeline will also be wrapped in Docker containers so that they can be downloaded and run under Unix. It will have a modular construction covering the major functions: uploading RNA-seq data, quality control and trimming (if needed), read mapping and transcript assembly using different assembly programs. Separate automated pipelines for Illumina short read and single molecule sequencing will be included along with stringent quality controls such as splice junction assessment (archived through SJ and SJ phase databases). Merging of different assemblies (new and existing) and further quality control to remove redundancy, fragments etc are performed in the Transcript Evaluation Suite (TES). TES provides evaluation metrics to help the biologists to understand the quality and completeness of the RTD generated.
Planned Impact
The main output of this work will be development and provision of the automated computational pipeline, RTDBox, to construct high quality RTDs for the plant research community and beyond. The major impact will be the uptake of the RTDBox by different plant communities to generate RTDs for different plant species, cultivars or ecotypes. We envisage two significant primary impacts of the pipeline and software:
1. the ability of plant researchers to carry out high quality RNA-seq analysis of gene expression more quickly and accurately to improve understanding of gene regulation and identification of novel genes in biological processes.
2. the means to evaluate the quality of existing and future transcript assemblies. Current literature and databases contain thousands of mis-annotated transcript isoforms with insufficient quality control; the pipeline will permit rapid re-analysis and clean-up of of such data as part of processing of a new RTD for analysis of RNA-seq.
The main challenge is to raise awareness of the importance and opportunities of having high quality, comprehensive RTDs. To ensure speedy uptake and exploitation of RTDs, we have three Impact Objectives:
1. Inform the plant community of the value of the use of the RTD well ahead of a primary release of RTDBox allowing groups to design and plan RNA-seq experiments and even apply for funding to make an RTD.
2. Inform the plant community of the value of working at the transcript level for differential expression data analyses including AS and improving accuracy of downstream analyses (e.g. gene and splicing networks).
3. Release the RTDBox to the plant community as soon as possible through a range of platforms for ease of access and monitor uptake.
To achieve these objectives, we have four Impact Activities:
1) Publicise the need and importance of RTDs and encourage the use of the RTDBox in plant communities The PI/Co-Is will emphasise the benefits of RTDs and the importance of a comprehensive and accurate transcript annotation on downstream analysis at national and international meetings, invited seminars, plant science community newsletters, social media and publications. In particular, we will contact plant science research group leaders in the UK with details of the project and and in a highly interactive way, we will visit the 10-12 main University and Institute plant science departments/groupings in the UK to make presentations on value and advantages of RTD construction in the 6-9 month period of the grant
2) Ensuring that potential beneficiaries have the opportunity to engage fully with the research. By the end of the first year, RTDBox will be released on Github, a publicly available Galaxy server and other platforms (e.g. Docker). We will provide user friendly graphical user interface and detailed user manuals on how to use RTDBox and use online methods to monitor access and obtain feedback for improvement. We will commit to maintaining the RTD Galaxy server for at least two years after the project and to try and obtain funding for longer.
3) Release RTDs for tomato, potato and lettuce for improved RNA-seq analysis. We will contact the research groups responsible for genome annotation and resources in tomato, lettuce and potato in preparation for the release of the species RTDs. These RTDs will be made available on other genome browsers and genome resource websites (e.g. IGB, Ensembl and Gramene. We can monitor the downloads for these databases and associated citations for long term success.
4) Public engagement and PDRA career development. We regularly have opportunities for public engagement at the University of Dundee and James Hutton Institute and the PI/Co-I and PDRA will take part. We will provide the PDRA with formal mentoring and appraisal with a focus on supporting career development. JHI has a formal programme of appraisal for PDRAs designed to identify training needs and opportunities to develop a career path.
1. the ability of plant researchers to carry out high quality RNA-seq analysis of gene expression more quickly and accurately to improve understanding of gene regulation and identification of novel genes in biological processes.
2. the means to evaluate the quality of existing and future transcript assemblies. Current literature and databases contain thousands of mis-annotated transcript isoforms with insufficient quality control; the pipeline will permit rapid re-analysis and clean-up of of such data as part of processing of a new RTD for analysis of RNA-seq.
The main challenge is to raise awareness of the importance and opportunities of having high quality, comprehensive RTDs. To ensure speedy uptake and exploitation of RTDs, we have three Impact Objectives:
1. Inform the plant community of the value of the use of the RTD well ahead of a primary release of RTDBox allowing groups to design and plan RNA-seq experiments and even apply for funding to make an RTD.
2. Inform the plant community of the value of working at the transcript level for differential expression data analyses including AS and improving accuracy of downstream analyses (e.g. gene and splicing networks).
3. Release the RTDBox to the plant community as soon as possible through a range of platforms for ease of access and monitor uptake.
To achieve these objectives, we have four Impact Activities:
1) Publicise the need and importance of RTDs and encourage the use of the RTDBox in plant communities The PI/Co-Is will emphasise the benefits of RTDs and the importance of a comprehensive and accurate transcript annotation on downstream analysis at national and international meetings, invited seminars, plant science community newsletters, social media and publications. In particular, we will contact plant science research group leaders in the UK with details of the project and and in a highly interactive way, we will visit the 10-12 main University and Institute plant science departments/groupings in the UK to make presentations on value and advantages of RTD construction in the 6-9 month period of the grant
2) Ensuring that potential beneficiaries have the opportunity to engage fully with the research. By the end of the first year, RTDBox will be released on Github, a publicly available Galaxy server and other platforms (e.g. Docker). We will provide user friendly graphical user interface and detailed user manuals on how to use RTDBox and use online methods to monitor access and obtain feedback for improvement. We will commit to maintaining the RTD Galaxy server for at least two years after the project and to try and obtain funding for longer.
3) Release RTDs for tomato, potato and lettuce for improved RNA-seq analysis. We will contact the research groups responsible for genome annotation and resources in tomato, lettuce and potato in preparation for the release of the species RTDs. These RTDs will be made available on other genome browsers and genome resource websites (e.g. IGB, Ensembl and Gramene. We can monitor the downloads for these databases and associated citations for long term success.
4) Public engagement and PDRA career development. We regularly have opportunities for public engagement at the University of Dundee and James Hutton Institute and the PI/Co-I and PDRA will take part. We will provide the PDRA with formal mentoring and appraisal with a focus on supporting career development. JHI has a formal programme of appraisal for PDRAs designed to identify training needs and opportunities to develop a career path.
Publications

Coulter M
(2022)
BaRTv2 : a highly resolved barley reference transcriptome for accurate transcript-specific RNA -seq quantification
in The Plant Journal


Daszkowska-Golec A
(2023)
Editorial: Applications of long-read sequencing in plant genomics and transcriptomics
in Frontiers in Plant Science

Ding P
(2021)
Chromatin accessibility landscapes activated by cell-surface and intracellular immune receptors.
in Journal of experimental botany

Flores, P.
(2019)
BaRTv1.0: an improved barley reference transcript dataset to determine accurate changes in the barley transcriptome using RNA-seq
in BMC Genomics


Guo W
(2022)
The value of genotype-specific reference for transcriptome analyses in barley.
in Life science alliance

Harvey S
(2020)
Downy Mildew effector HaRxL21 interacts with the transcriptional repressor TOPLESS to promote pathogen susceptibility.
in PLoS pathogens

Jabre I
(2020)
Differential nucleosome occupancy modulates alternative splicing in Arabidopsis thaliana
in New Phytologist
Description | Over the last 6 years, we developed an RTD for Arabidopsis (AtRTD2) based on extensive Illumina short-read sequences. Through a series of iterations, we developed computational methods to identify and retain high confidence transcripts while removing false transcripts. AtRTD2 greatly increased the accuracy of the quantification allowing, for example, identification of novel transcription and splicing factors in response to cold. It has now been translated to other plant and crop (and animal) species, such as barley, potato, rice and oil palm. Currently, transcript sequence catalogues for most plant species are incomplete, missing large numbers of transcripts, and for those with RNA-seq data, out-of-date analysis programs have produced large numbers of false transcripts. In the past year, we have 1) improved and formalized the short read assembly method and pipeline 2) Developed a novel computational method to define transcripts accurately from pacbio Iso-seq data 4) Developed an R package that allows pacbio Iso-seq data analysis using the above method 2) developed a software solution that allows us to authenticate the users to access their analysis through email 3) Web interface that allows the users to carry out the analysis and control the analysis process |
Exploitation Route | We will develop a fully automated pipeline (RTDBox) that can be used by scientists with basic bioinformatics skills or bioinformaticians with little experience in transcriptomics. Such a program would also be designed to allow the incremental improvement of the RTD with the automatic incorporation of any new RNA-seq data (Illumina, PacBio, Nanopore). Within the pipeline, we will develop a transcript evaluation suite that will provide evaluation metrics to help biologists to identify and remove misconstrued transcripts from assembly programs as well as understand the quality and completeness of the RTD generated. All our experience and expertise will be brought together to make user-friendly software for plant scientists to measure gene expressions more accurately and thereby improving the exploration of biological processes across the globe. Now the short read pipeline has been used to construct transcript references for a number of projects, including potato, barley, lettuce, and raspberry. It has also been employed for a barley pan-transcriptome project to construct transcript references for 20 different barley cultivars. The RTDBox can be used to generate transcript annotations for fast and accurate quantification using RNA-seq data and the 3D RNA-seq pipeline developed in my group can be used to investigate differential gene expression and alternative splicing analysis. |
Sectors | Agriculture, Food and Drink,Healthcare,Pharmaceuticals and Medical Biotechnology |
URL | https://rtdbox.hutton.ac.uk/#/ |
Description | Australia Partnering Award: International pooling for advanced cereal science - IPAC |
Amount | £47,766 (GBP) |
Funding ID | BB/V018299/1 |
Organisation | Biotechnology and Biological Sciences Research Council (BBSRC) |
Sector | Public |
Country | United Kingdom |
Start | 03/2021 |
End | 03/2024 |
Description | Create new opportunities to exploit barley resources and accelerate breeding |
Amount | £30,612 (GBP) |
Funding ID | BB/V018906/1 |
Organisation | Biotechnology and Biological Sciences Research Council (BBSRC) |
Sector | Public |
Country | United Kingdom |
Start | 03/2021 |
End | 03/2025 |
Description | SEFARI Workshop: 3D RNA-seq App - A ?exible and powerful tool for di?erential expression and alternative splicing analysis of RNA-seq data for biologists |
Amount | £9,807 (GBP) |
Organisation | Scottish Environment, Food and Agriculture Research Institutes Gateway |
Sector | Charity/Non Profit |
Start | 11/2020 |
End | 11/2020 |
Description | The Generation Gap - Mechanisms of maternal control on grain |
Amount | £88,838 (GBP) |
Funding ID | BB/W002590/1 |
Organisation | Biotechnology and Biological Sciences Research Council (BBSRC) |
Sector | Public |
Country | United Kingdom |
Start | 08/2021 |
End | 09/2024 |
Title | 3D RNA-seq App - A ?exible and powerful tool for di?erential expression and alternative splicing analysis of RNA-seq data for biologists |
Description | RNA-sequencing (RNA-seq) analysis of gene expression and alternative splicing should be routine and robust but is often a bottleneck for biologists because of different and complex analysis programs and reliance on specialized bioinformatics skills. We have developed the '3D RNA-seq' App, an R shiny App and web-based pipeline for the comprehensive analysis of RNA-seq data from any organism. It represents an easy-to-use, flexible and powerful tool for analysis of both gene and transcript-level gene expression to identify differential gene/transcript expression, differential alternative splicing and differential transcript usage (3D) as well as isoform switching from RNA-seq data. 3D RNA-seq integrates state-of-the-art differential expression analysis tools and adopts best practice for RNA-seq analysis. |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2019 |
Provided To Others? | Yes |
Impact | The program is designed to be run by biologists with minimal bioinformatics experience (or by bioinformaticians) allowing lab scientists to analyse their RNA-seq data. It achieves this by operating through a user-friendly graphical interface that automates the data flow through the programs in the pipeline. The comprehensive analysis performed by 3D RNA-seq is extremely rapid and accurate, can handle complex experimental designs, allows user setting of statistical parameters, visualizes the results through graphics and tables, and generates publication-quality figures such as heat-maps, expression profiles and GO enrichment plots. The manuscript has been cited 14 times just over one year and >4400 users have used 3D RNA-seq for their RNA-seq analysis globally with a quarter of returning and regular users, who have used our tool on multiple occasions. |
URL | http://3drnaseq.hutton.ac.uk/ |
Title | RTDBox |
Description | RTDbox is a computational pipeline that allows scientists to construct a high-quality transcript reference, which enables fast and accurate quantifications of gene expression using RNA-seq data. We have established cutting-edge methods for filtering misassembled transcripts from Illumina short-read assemblies and PacBio Iso-seq long reads. We also provide a web interface that allows this analysis to be carried out quickly and easily without coding. It is in a testing phase and it will be available to public once it is thoroughly tested. |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2021 |
Provided To Others? | No |
Impact | The transcriptome reference plays a key role in gene expression quantification as incomplete, misassembled transcriptome often leads to erroneous gene expression quantifications. Our new pipeline will allow the construction of high quality transcriptome reference quickly that incorporate a range of stringent filtering to remove mis-assembled transcripts. For the PacBio long read pipeline, we also developed a method that defines the transcript start and end accurately, which not only improves the gene expression accuracy, but also allows the study of the transcriptional regulations, such as polyadenylations and alternative splicing. |
URL | https://rtdbox.hutton.ac.uk/#/ |
Description | RTDBox will be validated on three crop species: lettuce, tomato and potato |
Organisation | University of Cambridge |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | RTDBox is developed to automate the construction of comprehensive and high-quality transcriptome for plant species using high throughput sequencing data. We have budgeted for Illumina short-read sequencing and PacBio sequencing for three exemplary crop species: Lettuce (in collaboration with Prof Katherine Denby at University of York), tomato (in collaboration with Prof David Baulcomb at University of Cambridge) and potato (in collaboration with Dr Ingo Hein at University of Dundee). I have contacted all the above collaborators and notified the project schedule to get them ready to make RNA available for sequencing. |
Collaborator Contribution | Discussions and plans were made with all collaborators on how to proceed with the generation and preparation of the samples. |
Impact | no outputs yet |
Start Year | 2019 |
Description | RTDBox will be validated on three crop species: lettuce, tomato and potato |
Organisation | University of Dundee |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | RTDBox is developed to automate the construction of comprehensive and high-quality transcriptome for plant species using high throughput sequencing data. We have budgeted for Illumina short-read sequencing and PacBio sequencing for three exemplary crop species: Lettuce (in collaboration with Prof Katherine Denby at University of York), tomato (in collaboration with Prof David Baulcomb at University of Cambridge) and potato (in collaboration with Dr Ingo Hein at University of Dundee). I have contacted all the above collaborators and notified the project schedule to get them ready to make RNA available for sequencing. |
Collaborator Contribution | Discussions and plans were made with all collaborators on how to proceed with the generation and preparation of the samples. |
Impact | no outputs yet |
Start Year | 2019 |
Description | RTDBox will be validated on three crop species: lettuce, tomato and potato |
Organisation | University of York |
Department | Department of Biology |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | RTDBox is developed to automate the construction of comprehensive and high-quality transcriptome for plant species using high throughput sequencing data. We have budgeted for Illumina short-read sequencing and PacBio sequencing for three exemplary crop species: Lettuce (in collaboration with Prof Katherine Denby at University of York), tomato (in collaboration with Prof David Baulcomb at University of Cambridge) and potato (in collaboration with Dr Ingo Hein at University of Dundee). I have contacted all the above collaborators and notified the project schedule to get them ready to make RNA available for sequencing. |
Collaborator Contribution | Discussions and plans were made with all collaborators on how to proceed with the generation and preparation of the samples. |
Impact | no outputs yet |
Start Year | 2019 |
Description | lettuce RTD |
Organisation | University of York |
Department | Department of Biology |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | My team will utilize the PacBio Iso-seq sequencing and Illumina sequencing data generated from the samples harvests at Prof Denby's lab to generate a high-quality lettuce transcriptome using the pipeline established in this project. |
Collaborator Contribution | Prof Katherine Denby generated the RNAs from diverse tissues and experimental conditions for illumina and PacBio sequencing. |
Impact | high quality lettuce transcriptome that allows accurate and fast gene quantifications using RNA-seq |
Start Year | 2021 |
Title | 3D RNA-seq: a powerful and flexible tool for rapid and accurate differential expression and alternative splicing analysis of RNA-seq data for biologists |
Description | 3D RNA-seq' App is an R shiny App and web-based pipeline for the comprehensive analysis of RNA-seq data from any organism. It represents an easy-to-use, flexible and powerful tool for analysis of both gene and transcript-level gene expression to identify differential gene/transcript expression, differential alternative splicing and differential transcript usage (3D) as well as isoform switching from RNA-seq data. 3D RNA-seq integrates state-of-the-art differential expression analysis tools and adopts best practice for RNA-seq analysis. The program is designed to be run by biologists with minimal bioinformatics experience (or by bioinformaticians) allowing lab scientists to analyse their RNA-seq data. It achieves this by operating through a user-friendly graphical interface that automates the data flow through the programs in the pipeline. The comprehensive analysis performed by 3D RNA-seq is extremely rapid and accurate, can handle complex experimental designs, allows user setting of statistical parameters, visualizes the results through graphics and tables, and generates publication-quality figures such as heat-maps, expression profiles and GO enrichment plots. |
Type Of Technology | Webtool/Application |
Year Produced | 2019 |
Open Source License? | Yes |
Impact | The publication has been cited 14 times and >4,400 users have used the tool for their RNA-seq analysis globally, with a quarter of regular and returning users who have used it on multiple occasions. |
URL | http://3drnaseq.hutton.ac.uk |
Title | RTDBox |
Description | RTDbox is a computational pipeline that allows scientists to construct a high-quality transcript reference, which enables fast and accurate quantifications of gene expression using RNA-seq data. We have established cutting-edge methods for filtering misassembled transcripts from Illumina short-read assemblies and PacBio Iso-seq long reads. We also provide a web interface that allows this analysis to be carried out quickly and easily without coding. |
Type Of Technology | Webtool/Application |
Year Produced | 2021 |
Open Source License? | Yes |
Impact | The transcriptome reference plays a key role in gene expression quantification as incomplete, misassembled transcriptome often leads to erroneous gene expression quantifications. Our new pipeline will allow the construction of high quality transcriptome reference quickly that incorporate a range of stringent filtering to remove mis-assembled transcripts. For the PacBio long read pipeline, we also developed a method that defines the transcript start and end accurately, which not only improves the gene expression accuracy, but also allows the study of the transcriptional regulations, such as polyadenylations and alternative splicing. |
URL | https://rtdbox.hutton.ac.uk/#/ |