An automated pipeline for construction of Reference Transcript Datasets (RTD) to enable rapid and accurate gene expression analysis in plant species
Lead Research Organisation:
James Hutton Institute
Department Name: Information & Computational Sciences
Abstract
A gene is the basic physical and functional unit on the genome. Genes are turned off and on at different times of development and in response to external and internal signals. Protein-coding genes are copied (transcribed) into precursor messenger RNA (pre-mRNA) which are then processed in different ways into mRNAs which can then be translated into proteins. A goal of the biological research is to understand how genes work by measuring changes in gene expression. This is achieved by estimating the abundances of all of the transcripts produced at any particular time or condition.
The current technologies to measure gene and transcript expression are called RNA sequencing (RNA-seq) which by sequencing millions of transcripts allows RNA levels to be measured on a genome-wide scale. The two main platforms are Illumina which generates short reads (currently 75 to 250 bp) and PacBio/Nanopore single molecule sequencing which produces full-length transcript reads. To measure gene expression, Illumina short reads are often mapped to the genome and assembled into transcripts which is an inaccurate process. PacBio/Nanopore have high sequencing error rates and do not generate sufficient depth of coverage of genes. These technologies, both in terms of chemistry and computational analyses, continue to advance at a rapid pace but a combination of the platforms is currently the best approach to generate RNA-seq data. In addition, the fastest and most accurate programs for computational quantification of transcript and gene expression require a comprehensive catalogue of transcripts which we call a Reference Transcript Dataset (RTD).
Over the last four years, we developed an RTD for Arabidopsis (AtRTD2) based on extensive Illumina short read sequences. Through a series of iterations, we developed the computational methods to identify and retain high confidence transcripts while removing false transcripts. AtRTD2 greatly increased the accuracy of the quantification allowing, for example, identification of novel transcription and splicing factors in response to cold. The challenge now is to translate this knowledge and experience to other plant and crop (and animal) species. Currently, transcript sequence catalogues for most plant species are incomplete, missing large numbers of transcripts, and for those with RNA-seq data, out-of-date analysis procedures have produced large numbers of false transcripts.
From developing AtRTD2, we have a prototype pipeline for constructing an RTD. The key features are multiple quality control filters which remove mis-assembled transcripts, redundant transcripts, chimaeric transcripts and transcript fragments. These multiple, iterative steps are currently individually coded and while the pipeline can be used, it will take up to 12 months to generate an RTD and requires the full-time expertise of a bioinformatician.
We will develop a fully automated pipeline (RTDBox) which can be used by scientists with basic bioinformatics skills or bioinformaticians with little experience in transcriptomics. Such a pipeline would also be designed to allow the incremental improvement of the RTD with the automatic incorporation of any new RNA-seq data (Illumina, PacBio, Nanopore). Within the pipeline, we will develop a transcript evaluation suite (TES) which will provide evaluation metrics to help biologists to identify and remove mis-constructed transcripts from assembly programs as well as understand the quality and completeness of the RTD generated. All our experience and expertise will be brought together to make a user-friendly software for plant scientists to measure gene expressions more accurately and thereby improving the exploration of biological processes across the globe.
The current technologies to measure gene and transcript expression are called RNA sequencing (RNA-seq) which by sequencing millions of transcripts allows RNA levels to be measured on a genome-wide scale. The two main platforms are Illumina which generates short reads (currently 75 to 250 bp) and PacBio/Nanopore single molecule sequencing which produces full-length transcript reads. To measure gene expression, Illumina short reads are often mapped to the genome and assembled into transcripts which is an inaccurate process. PacBio/Nanopore have high sequencing error rates and do not generate sufficient depth of coverage of genes. These technologies, both in terms of chemistry and computational analyses, continue to advance at a rapid pace but a combination of the platforms is currently the best approach to generate RNA-seq data. In addition, the fastest and most accurate programs for computational quantification of transcript and gene expression require a comprehensive catalogue of transcripts which we call a Reference Transcript Dataset (RTD).
Over the last four years, we developed an RTD for Arabidopsis (AtRTD2) based on extensive Illumina short read sequences. Through a series of iterations, we developed the computational methods to identify and retain high confidence transcripts while removing false transcripts. AtRTD2 greatly increased the accuracy of the quantification allowing, for example, identification of novel transcription and splicing factors in response to cold. The challenge now is to translate this knowledge and experience to other plant and crop (and animal) species. Currently, transcript sequence catalogues for most plant species are incomplete, missing large numbers of transcripts, and for those with RNA-seq data, out-of-date analysis procedures have produced large numbers of false transcripts.
From developing AtRTD2, we have a prototype pipeline for constructing an RTD. The key features are multiple quality control filters which remove mis-assembled transcripts, redundant transcripts, chimaeric transcripts and transcript fragments. These multiple, iterative steps are currently individually coded and while the pipeline can be used, it will take up to 12 months to generate an RTD and requires the full-time expertise of a bioinformatician.
We will develop a fully automated pipeline (RTDBox) which can be used by scientists with basic bioinformatics skills or bioinformaticians with little experience in transcriptomics. Such a pipeline would also be designed to allow the incremental improvement of the RTD with the automatic incorporation of any new RNA-seq data (Illumina, PacBio, Nanopore). Within the pipeline, we will develop a transcript evaluation suite (TES) which will provide evaluation metrics to help biologists to identify and remove mis-constructed transcripts from assembly programs as well as understand the quality and completeness of the RTD generated. All our experience and expertise will be brought together to make a user-friendly software for plant scientists to measure gene expressions more accurately and thereby improving the exploration of biological processes across the globe.
Technical Summary
For the majority of plant and crop species, transcript information is incomplete and poorly annotated. AtRTD2 shows the feasibility of building a comprehensive RTD and both Illumina and PacBio/Nanopore are required for complete and comprehensive RTD construction. We have the necessary knowledge and expertise to produce an automated, easy-to-use pipeline for building RTDs and allowing incorporation of new RNA-seq datasets as they arise.
The automated pipeline and software will be designed for use by scientists with basic bioinformatics skills or bioinformaticians with little experience in transcriptomics. RTDBox will be available in several formats, on different platforms, that will provide flexible access: 1) A local galaxy server will allow users to upload sequence data, run the pipeline and download RTD directly; 2) The pipeline will be set up on publicly available platforms, such as Cyverse (https://www.cyverse.org/) and GigaGalaxy (http://gigagalaxy.net/); 3) The wrapped pipeline will also be available in Galaxy Toolshed for download and installation for groups with local Galaxy infrastructure and who prefer to keep their data private; 4) The pipeline will also be wrapped in Docker containers so that they can be downloaded and run under Unix. It will have a modular construction covering the major functions: uploading RNA-seq data, quality control and trimming (if needed), read mapping and transcript assembly using different assembly programs. Separate automated pipelines for Illumina short read and single molecule sequencing will be included along with stringent quality controls such as splice junction assessment (archived through SJ and SJ phase databases). Merging of different assemblies (new and existing) and further quality control to remove redundancy, fragments etc are performed in the Transcript Evaluation Suite (TES). TES provides evaluation metrics to help the biologists to understand the quality and completeness of the RTD generated.
The automated pipeline and software will be designed for use by scientists with basic bioinformatics skills or bioinformaticians with little experience in transcriptomics. RTDBox will be available in several formats, on different platforms, that will provide flexible access: 1) A local galaxy server will allow users to upload sequence data, run the pipeline and download RTD directly; 2) The pipeline will be set up on publicly available platforms, such as Cyverse (https://www.cyverse.org/) and GigaGalaxy (http://gigagalaxy.net/); 3) The wrapped pipeline will also be available in Galaxy Toolshed for download and installation for groups with local Galaxy infrastructure and who prefer to keep their data private; 4) The pipeline will also be wrapped in Docker containers so that they can be downloaded and run under Unix. It will have a modular construction covering the major functions: uploading RNA-seq data, quality control and trimming (if needed), read mapping and transcript assembly using different assembly programs. Separate automated pipelines for Illumina short read and single molecule sequencing will be included along with stringent quality controls such as splice junction assessment (archived through SJ and SJ phase databases). Merging of different assemblies (new and existing) and further quality control to remove redundancy, fragments etc are performed in the Transcript Evaluation Suite (TES). TES provides evaluation metrics to help the biologists to understand the quality and completeness of the RTD generated.
Planned Impact
The main output of this work will be development and provision of the automated computational pipeline, RTDBox, to construct high quality RTDs for the plant research community and beyond. The major impact will be the uptake of the RTDBox by different plant communities to generate RTDs for different plant species, cultivars or ecotypes. We envisage two significant primary impacts of the pipeline and software:
1. the ability of plant researchers to carry out high quality RNA-seq analysis of gene expression more quickly and accurately to improve understanding of gene regulation and identification of novel genes in biological processes.
2. the means to evaluate the quality of existing and future transcript assemblies. Current literature and databases contain thousands of mis-annotated transcript isoforms with insufficient quality control; the pipeline will permit rapid re-analysis and clean-up of of such data as part of processing of a new RTD for analysis of RNA-seq.
The main challenge is to raise awareness of the importance and opportunities of having high quality, comprehensive RTDs. To ensure speedy uptake and exploitation of RTDs, we have three Impact Objectives:
1. Inform the plant community of the value of the use of the RTD well ahead of a primary release of RTDBox allowing groups to design and plan RNA-seq experiments and even apply for funding to make an RTD.
2. Inform the plant community of the value of working at the transcript level for differential expression data analyses including AS and improving accuracy of downstream analyses (e.g. gene and splicing networks).
3. Release the RTDBox to the plant community as soon as possible through a range of platforms for ease of access and monitor uptake.
To achieve these objectives, we have four Impact Activities:
1) Publicise the need and importance of RTDs and encourage the use of the RTDBox in plant communities The PI/Co-Is will emphasise the benefits of RTDs and the importance of a comprehensive and accurate transcript annotation on downstream analysis at national and international meetings, invited seminars, plant science community newsletters, social media and publications. In particular, we will contact plant science research group leaders in the UK with details of the project and and in a highly interactive way, we will visit the 10-12 main University and Institute plant science departments/groupings in the UK to make presentations on value and advantages of RTD construction in the 6-9 month period of the grant
2) Ensuring that potential beneficiaries have the opportunity to engage fully with the research. By the end of the first year, RTDBox will be released on Github, a publicly available Galaxy server and other platforms (e.g. Docker). We will provide user friendly graphical user interface and detailed user manuals on how to use RTDBox and use online methods to monitor access and obtain feedback for improvement. We will commit to maintaining the RTD Galaxy server for at least two years after the project and to try and obtain funding for longer.
3) Release RTDs for tomato, potato and lettuce for improved RNA-seq analysis. We will contact the research groups responsible for genome annotation and resources in tomato, lettuce and potato in preparation for the release of the species RTDs. These RTDs will be made available on other genome browsers and genome resource websites (e.g. IGB, Ensembl and Gramene. We can monitor the downloads for these databases and associated citations for long term success.
4) Public engagement and PDRA career development. We regularly have opportunities for public engagement at the University of Dundee and James Hutton Institute and the PI/Co-I and PDRA will take part. We will provide the PDRA with formal mentoring and appraisal with a focus on supporting career development. JHI has a formal programme of appraisal for PDRAs designed to identify training needs and opportunities to develop a career path.
1. the ability of plant researchers to carry out high quality RNA-seq analysis of gene expression more quickly and accurately to improve understanding of gene regulation and identification of novel genes in biological processes.
2. the means to evaluate the quality of existing and future transcript assemblies. Current literature and databases contain thousands of mis-annotated transcript isoforms with insufficient quality control; the pipeline will permit rapid re-analysis and clean-up of of such data as part of processing of a new RTD for analysis of RNA-seq.
The main challenge is to raise awareness of the importance and opportunities of having high quality, comprehensive RTDs. To ensure speedy uptake and exploitation of RTDs, we have three Impact Objectives:
1. Inform the plant community of the value of the use of the RTD well ahead of a primary release of RTDBox allowing groups to design and plan RNA-seq experiments and even apply for funding to make an RTD.
2. Inform the plant community of the value of working at the transcript level for differential expression data analyses including AS and improving accuracy of downstream analyses (e.g. gene and splicing networks).
3. Release the RTDBox to the plant community as soon as possible through a range of platforms for ease of access and monitor uptake.
To achieve these objectives, we have four Impact Activities:
1) Publicise the need and importance of RTDs and encourage the use of the RTDBox in plant communities The PI/Co-Is will emphasise the benefits of RTDs and the importance of a comprehensive and accurate transcript annotation on downstream analysis at national and international meetings, invited seminars, plant science community newsletters, social media and publications. In particular, we will contact plant science research group leaders in the UK with details of the project and and in a highly interactive way, we will visit the 10-12 main University and Institute plant science departments/groupings in the UK to make presentations on value and advantages of RTD construction in the 6-9 month period of the grant
2) Ensuring that potential beneficiaries have the opportunity to engage fully with the research. By the end of the first year, RTDBox will be released on Github, a publicly available Galaxy server and other platforms (e.g. Docker). We will provide user friendly graphical user interface and detailed user manuals on how to use RTDBox and use online methods to monitor access and obtain feedback for improvement. We will commit to maintaining the RTD Galaxy server for at least two years after the project and to try and obtain funding for longer.
3) Release RTDs for tomato, potato and lettuce for improved RNA-seq analysis. We will contact the research groups responsible for genome annotation and resources in tomato, lettuce and potato in preparation for the release of the species RTDs. These RTDs will be made available on other genome browsers and genome resource websites (e.g. IGB, Ensembl and Gramene. We can monitor the downloads for these databases and associated citations for long term success.
4) Public engagement and PDRA career development. We regularly have opportunities for public engagement at the University of Dundee and James Hutton Institute and the PI/Co-I and PDRA will take part. We will provide the PDRA with formal mentoring and appraisal with a focus on supporting career development. JHI has a formal programme of appraisal for PDRAs designed to identify training needs and opportunities to develop a career path.
Organisations
- James Hutton Institute (Lead Research Organisation)
- Council for Agricultural Research and Agricultural Economy Analysis (Collaboration)
- Murdoch University (Collaboration)
- University of Zurich (Collaboration)
- Carlsberg Group (Collaboration)
- Zhejiang University (Collaboration)
- Okayama University (Collaboration)
- Australian National University (ANU) (Collaboration)
- University of York (Collaboration)
- University of Silesia (Collaboration)
- Martin Luther University of Halle-Wittenberg (Collaboration)
- IPK Gatersleben (Collaboration)
- Helmholtz Association of German Research Centres (Collaboration)
- UNIVERSITY OF CAMBRIDGE (Collaboration)
- Academy of Sciences of the Czech Republic (Collaboration)
- UNIVERSITY OF OXFORD (Collaboration)
- University of Saskatchewan (Collaboration)
- Leibniz Association (Collaboration)
- Indiana University (Collaboration)
- University of Adelaide (Collaboration)
- UNIVERSITY OF DUNDEE (Collaboration)
Publications
Zhang R
(2022)
A high-resolution single-molecule sequencing-based Arabidopsis transcriptome using novel methods of Iso-seq analysis.
in Genome biology
Flores, P.
(2019)
BaRTv1.0: an improved barley reference transcript dataset to determine accurate changes in the barley transcriptome using RNA-seq
in BMC Genomics
Rapazote-Flores P
(2019)
BaRTv1.0: an improved barley reference transcript dataset to determine accurate changes in the barley transcriptome using RNA-seq.
in BMC genomics
Coulter M
(2022)
BaRTv2: a highly resolved barley reference transcriptome for accurate transcript-specific RNA-seq quantification.
in The Plant journal : for cell and molecular biology
Ding P
(2021)
Chromatin accessibility landscapes activated by cell-surface and intracellular immune receptors.
in Journal of experimental botany
Jabre I
(2021)
Differential nucleosome occupancy modulates alternative splicing in Arabidopsis thaliana.
in The New phytologist
Harvey S
(2020)
Downy Mildew effector HaRxL21 interacts with the transcriptional repressor TOPLESS to promote pathogen susceptibility.
in PLoS pathogens
Description | Over the last 10 years, we developed an RTD for Arabidopsis (AtRTD2) based on extensive Illumina short-read sequences. Through a series of iterations, we developed computational methods to identify and retain high-confidence transcripts while removing false transcripts. AtRTD2 greatly increased the accuracy of the quantification allowing, for example, the identification of novel transcription and splicing factors in response to cold. It has now been translated to other plant and crop (and animal) species, such as barley, potato, rice and oil palm. Currently, transcript sequence catalogs for most plant species are incomplete, missing large numbers of transcripts, and for those with RNA-seq data, out-of-date analysis programs have produced large numbers of false transcripts. In the past 5 years, we have 1) improved and formalized the short read assembly method and pipeline 2) Developed a novel computational method to define transcripts accurately from pacbio Iso-seq data 4) Developed an R package that allows pacbio Iso-seq data analysis using the above method 2) developed a software solution that allows us to authenticate the users to access their analysis through email 3) Web interface that allows the users to carry out the analysis and control the analysis process |
Exploitation Route | We will develop a fully automated pipeline (RTDBox) that can be used by scientists with basic bioinformatics skills or bioinformaticians with little experience in transcriptomics. Such a program would also be designed to allow the incremental improvement of the RTD with the automatic incorporation of any new RNA sequencing data (Illumina, PacBio, Nanopore). Within the pipeline, we have developed a transcript evaluation suite that will provide evaluation metrics to help biologists identify and remove misconstrued transcripts from assembly programs as well as understand the quality and completeness of the RTD generated. All our experience and expertise will be brought together to make user-friendly software for plant scientists to measure gene expressions more accurately thereby improving the exploration of biological processes across the globe. By now the pipeline has been used to construct transcript references for over 10 species, including potato, barley, lettuce, tomato, and raspberry. It has also been employed for a barley pan-transcriptome project to construct transcript references for 20 different barley cultivars. The RTDBox can be used to generate transcript annotations for fast and accurate quantification using RNA-seq data and the 3D RNA-seq pipeline developed in my group can be used to investigate differential gene expression and alternative splicing analysis. Currently, we are collaborating with clinicians to apply our methods to sequencing data in humans. |
Sectors | Agriculture Food and Drink Environment Healthcare Pharmaceuticals and Medical Biotechnology |
URL | https://rtdbox.hutton.ac.uk/#/ |
Description | The RTDBox we developed has been applied to over 10 species, including rice, barley, potato, tomato and lettuce. The IP generated from RTDBox will be transferred into spin-off company SHARP Genomic Analytical to produce a commercial offering for advanced transcriptomics analysis for agricultural, environmental and clinical and pharmaceutical applications. |
First Year Of Impact | 2024 |
Sector | Agriculture, Food and Drink,Environment,Healthcare,Pharmaceuticals and Medical Biotechnology |
Impact Types | Economic |
Description | Australia Partnering Award: International pooling for advanced cereal science - IPAC |
Amount | £47,766 (GBP) |
Funding ID | BB/V018299/1 |
Organisation | Biotechnology and Biological Sciences Research Council (BBSRC) |
Sector | Public |
Country | United Kingdom |
Start | 04/2022 |
End | 03/2025 |
Description | Create new opportunities to exploit barley resources and accelerate breeding |
Amount | £30,612 (GBP) |
Funding ID | BB/V018906/1 |
Organisation | Biotechnology and Biological Sciences Research Council (BBSRC) |
Sector | Public |
Country | United Kingdom |
Start | 06/2021 |
End | 03/2025 |
Description | Follow on fund |
Amount | £249,956 (GBP) |
Funding ID | APP2126 |
Organisation | Biotechnology and Biological Sciences Research Council (BBSRC) |
Sector | Public |
Country | United Kingdom |
Start | 12/2023 |
End | 11/2025 |
Description | Pioneer award |
Amount | £199,727 (GBP) |
Funding ID | BB/Y513192/1 |
Organisation | Biotechnology and Biological Sciences Research Council (BBSRC) |
Sector | Public |
Country | United Kingdom |
Start | 02/2024 |
End | 06/2025 |
Description | SEFARI Workshop: 3D RNA-seq App - A ?exible and powerful tool for di?erential expression and alternative splicing analysis of RNA-seq data for biologists |
Amount | £9,807 (GBP) |
Organisation | Scottish Environment, Food and Agriculture Research Institutes Gateway |
Sector | Charity/Non Profit |
Start | 11/2020 |
End | 11/2020 |
Description | The Generation Gap - Mechanisms of maternal control on grain |
Amount | £88,838 (GBP) |
Funding ID | BB/W002590/1 |
Organisation | Biotechnology and Biological Sciences Research Council (BBSRC) |
Sector | Public |
Country | United Kingdom |
Start | 03/2022 |
End | 09/2025 |
Title | 3D RNA-seq App - A ?exible and powerful tool for di?erential expression and alternative splicing analysis of RNA-seq data for biologists |
Description | RNA-sequencing (RNA-seq) analysis of gene expression and alternative splicing should be routine and robust but is often a bottleneck for biologists because of different and complex analysis programs and reliance on specialized bioinformatics skills. We have developed the '3D RNA-seq' App, an R shiny App and web-based pipeline for the comprehensive analysis of RNA-seq data from any organism. It represents an easy-to-use, flexible and powerful tool for analysis of both gene and transcript-level gene expression to identify differential gene/transcript expression, differential alternative splicing and differential transcript usage (3D) as well as isoform switching from RNA-seq data. 3D RNA-seq integrates state-of-the-art differential expression analysis tools and adopts best practice for RNA-seq analysis. |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2019 |
Provided To Others? | Yes |
Impact | The program is designed to be run by biologists with minimal bioinformatics experience (or by bioinformaticians) allowing lab scientists to analyse their RNA-seq data. It achieves this by operating through a user-friendly graphical interface that automates the data flow through the programs in the pipeline. The comprehensive analysis performed by 3D RNA-seq is extremely rapid and accurate, can handle complex experimental designs, allows user setting of statistical parameters, visualizes the results through graphics and tables, and generates publication-quality figures such as heat-maps, expression profiles and GO enrichment plots. The manuscript has been cited 14 times just over one year and >4400 users have used 3D RNA-seq for their RNA-seq analysis globally with a quarter of returning and regular users, who have used our tool on multiple occasions. |
URL | http://3drnaseq.hutton.ac.uk/ |
Title | RTDBox |
Description | RTDbox is a computational pipeline that allows scientists to construct a high-quality transcript reference, which enables fast and accurate quantifications of gene expression using RNA-seq data. We have established cutting-edge methods for filtering misassembled transcripts from Illumina short-read assemblies and PacBio Iso-seq long reads. We also provide a web interface that allows this analysis to be carried out quickly and easily without coding. It is in a testing phase and it will be available to public once it is thoroughly tested. |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2021 |
Provided To Others? | No |
Impact | The transcriptome reference plays a key role in gene expression quantification as incomplete, misassembled transcriptome often leads to erroneous gene expression quantifications. Our new pipeline will allow the construction of high quality transcriptome reference quickly that incorporate a range of stringent filtering to remove mis-assembled transcripts. For the PacBio long read pipeline, we also developed a method that defines the transcript start and end accurately, which not only improves the gene expression accuracy, but also allows the study of the transcriptional regulations, such as polyadenylations and alternative splicing. |
URL | https://rtdbox.hutton.ac.uk/#/ |
Title | Additional file 1 of A high-resolution single-molecule sequencing-based Arabidopsis transcriptome using novel methods of Iso-seq analysis |
Description | Additional file 1: Table S1. Plant material for RNA samples for Iso-Seq. Table S2. Read statistics for Iso-seq libraries. Table S3A and B. Number and percentage of splice junctions with sequencing mismatches in positions L1 to L30 for A) upstream (left) and B) downstream (right) of splice junctions. Table S4. Position Weight Matrix scores for consensus splice site sequences of introns. Table S5A and B. Filtering of SJs on basis of mismatches in each position. Table S6. Sequence motifs for validation of TSS and TES sites. Table S7. Number of genes and transcripts contributed to AtIso from each Iso-seq library. Table S8. Saturation curve of the number of unique genes and transcripts added to AtIso with the addition of each library. Table S9. AtRTD3 - Transcript characteristics and translations from TransFeat. Table S10A. TranSuite output of AtRTD3 for mono-exonic/multi-exonic genes with single or multiple transcript isoforms; B Comparison of TranSuite output of AtRTD3 gene and transcript characterisation. Table S11. AtRTD3 - novel genes. Table S12. Functional analysis of transcripts from novel genes in AtRTD3 with TRAPID 2.0. Table S13. AtRTD3 - Chimeric Genes and transcripts. Table S14. Frequency of AS event type among AtRTD3, AtIso and Araport11. Table S15A. Frequency of AS event type among AtRTD3, AtIso and Araport11. Table S15B. Gene descriptions of genes containing non-stop RNAs. |
Type Of Material | Database/Collection of data |
Year Produced | 2022 |
Provided To Others? | Yes |
URL | https://springernature.figshare.com/articles/dataset/Additional_file_1_of_A_high-resolution_single-m... |
Title | Additional file 1 of A high-resolution single-molecule sequencing-based Arabidopsis transcriptome using novel methods of Iso-seq analysis |
Description | Additional file 1: Table S1. Plant material for RNA samples for Iso-Seq. Table S2. Read statistics for Iso-seq libraries. Table S3A and B. Number and percentage of splice junctions with sequencing mismatches in positions L1 to L30 for A) upstream (left) and B) downstream (right) of splice junctions. Table S4. Position Weight Matrix scores for consensus splice site sequences of introns. Table S5A and B. Filtering of SJs on basis of mismatches in each position. Table S6. Sequence motifs for validation of TSS and TES sites. Table S7. Number of genes and transcripts contributed to AtIso from each Iso-seq library. Table S8. Saturation curve of the number of unique genes and transcripts added to AtIso with the addition of each library. Table S9. AtRTD3 - Transcript characteristics and translations from TransFeat. Table S10A. TranSuite output of AtRTD3 for mono-exonic/multi-exonic genes with single or multiple transcript isoforms; B Comparison of TranSuite output of AtRTD3 gene and transcript characterisation. Table S11. AtRTD3 - novel genes. Table S12. Functional analysis of transcripts from novel genes in AtRTD3 with TRAPID 2.0. Table S13. AtRTD3 - Chimeric Genes and transcripts. Table S14. Frequency of AS event type among AtRTD3, AtIso and Araport11. Table S15A. Frequency of AS event type among AtRTD3, AtIso and Araport11. Table S15B. Gene descriptions of genes containing non-stop RNAs. |
Type Of Material | Database/Collection of data |
Year Produced | 2022 |
Provided To Others? | Yes |
URL | https://springernature.figshare.com/articles/dataset/Additional_file_1_of_A_high-resolution_single-m... |
Title | LsRTDv1: A reference transcript dataset for accurate transcript-specific expression analysis in lettuce |
Description | Accurate quantification of gene and transcript-specific expression, with the underlying knowledge of precise transcript isoforms, is crucial to understanding many biological processes. Analysis of RNA sequencing data has benefited from the development of alignment-free algorithms which enhance the precision and speed of expression analysis. However, such algorithms require a reference transcriptome. Here we present a reference transcript dataset (LsRTDv1) for lettuce, combining long- and short-read sequencing with publicly available transcriptome annotations, and filtering to keep only transcripts with high-confidence splice junctions and transcriptional start and end sites. LsRTDv1 is a valuable resource for the investigation of transcriptional and alternative splicing regulation in lettuce. |
Type Of Material | Database/Collection of data |
Year Produced | 2024 |
Provided To Others? | Yes |
URL | https://datadryad.org/stash/dataset/doi:10.5061/dryad.xwdbrv1m8 |
Description | 3D RNA-seq training workshop |
Organisation | Australian National University (ANU) |
Country | Australia |
Sector | Academic/University |
PI Contribution | We have carried out a 3D RNA-seq training workshop at the Australian National University. (https://www.eventbrite.com.au/e/3d-rna-seq-workshop-tickets-556207510637) |
Collaborator Contribution | The participants have provided feedbacks on how to improve the 3D RNA-seq tool as well as the training. |
Impact | not available yet |
Start Year | 2023 |
Description | Barley Pan-Transcriptome |
Organisation | Academy of Sciences of the Czech Republic |
Department | Institute of Experimental Botany |
Country | Czech Republic |
Sector | Academic/University |
PI Contribution | Principle Investigators and coordinators |
Collaborator Contribution | Funding, data, data analysis, data interpretation, paper writing, project management |
Impact | https://doi.org/10.21203/rs.3.rs-3787876/v1 |
Start Year | 2019 |
Description | Barley Pan-Transcriptome |
Organisation | Carlsberg Group |
Department | Carlsberg Research Centre |
Country | Denmark |
Sector | Private |
PI Contribution | Principle Investigators and coordinators |
Collaborator Contribution | Funding, data, data analysis, data interpretation, paper writing, project management |
Impact | https://doi.org/10.21203/rs.3.rs-3787876/v1 |
Start Year | 2019 |
Description | Barley Pan-Transcriptome |
Organisation | Council for Agricultural Research and Agricultural Economy Analysis |
Country | Italy |
Sector | Public |
PI Contribution | Principle Investigators and coordinators |
Collaborator Contribution | Funding, data, data analysis, data interpretation, paper writing, project management |
Impact | https://doi.org/10.21203/rs.3.rs-3787876/v1 |
Start Year | 2019 |
Description | Barley Pan-Transcriptome |
Organisation | Helmholtz Association of German Research Centres |
Department | Helmholtz Zentrum Munchen |
Country | Germany |
Sector | Academic/University |
PI Contribution | Principle Investigators and coordinators |
Collaborator Contribution | Funding, data, data analysis, data interpretation, paper writing, project management |
Impact | https://doi.org/10.21203/rs.3.rs-3787876/v1 |
Start Year | 2019 |
Description | Barley Pan-Transcriptome |
Organisation | Indiana University |
Department | School of Medicine |
Country | United States |
Sector | Academic/University |
PI Contribution | Principle Investigators and coordinators |
Collaborator Contribution | Funding, data, data analysis, data interpretation, paper writing, project management |
Impact | https://doi.org/10.21203/rs.3.rs-3787876/v1 |
Start Year | 2019 |
Description | Barley Pan-Transcriptome |
Organisation | Leibniz Association |
Department | Leibniz Institute of Plant Genetics and Crop Plant Research |
Country | Germany |
Sector | Charity/Non Profit |
PI Contribution | Principle Investigators and coordinators |
Collaborator Contribution | Funding, data, data analysis, data interpretation, paper writing, project management |
Impact | https://doi.org/10.21203/rs.3.rs-3787876/v1 |
Start Year | 2019 |
Description | Barley Pan-Transcriptome |
Organisation | Martin Luther University of Halle-Wittenberg |
Country | Germany |
Sector | Academic/University |
PI Contribution | Principle Investigators and coordinators |
Collaborator Contribution | Funding, data, data analysis, data interpretation, paper writing, project management |
Impact | https://doi.org/10.21203/rs.3.rs-3787876/v1 |
Start Year | 2019 |
Description | Barley Pan-Transcriptome |
Organisation | Murdoch University |
Country | Australia |
Sector | Academic/University |
PI Contribution | Principle Investigators and coordinators |
Collaborator Contribution | Funding, data, data analysis, data interpretation, paper writing, project management |
Impact | https://doi.org/10.21203/rs.3.rs-3787876/v1 |
Start Year | 2019 |
Description | Barley Pan-Transcriptome |
Organisation | Okayama University |
Country | Japan |
Sector | Academic/University |
PI Contribution | Principle Investigators and coordinators |
Collaborator Contribution | Funding, data, data analysis, data interpretation, paper writing, project management |
Impact | https://doi.org/10.21203/rs.3.rs-3787876/v1 |
Start Year | 2019 |
Description | Barley Pan-Transcriptome |
Organisation | University of Adelaide |
Country | Australia |
Sector | Academic/University |
PI Contribution | Principle Investigators and coordinators |
Collaborator Contribution | Funding, data, data analysis, data interpretation, paper writing, project management |
Impact | https://doi.org/10.21203/rs.3.rs-3787876/v1 |
Start Year | 2019 |
Description | Barley Pan-Transcriptome |
Organisation | University of Saskatchewan |
Country | Canada |
Sector | Academic/University |
PI Contribution | Principle Investigators and coordinators |
Collaborator Contribution | Funding, data, data analysis, data interpretation, paper writing, project management |
Impact | https://doi.org/10.21203/rs.3.rs-3787876/v1 |
Start Year | 2019 |
Description | Barley Pan-Transcriptome |
Organisation | University of Zurich |
Country | Switzerland |
Sector | Academic/University |
PI Contribution | Principle Investigators and coordinators |
Collaborator Contribution | Funding, data, data analysis, data interpretation, paper writing, project management |
Impact | https://doi.org/10.21203/rs.3.rs-3787876/v1 |
Start Year | 2019 |
Description | Barley Pan-Transcriptome |
Organisation | Zhejiang University |
Country | China |
Sector | Academic/University |
PI Contribution | Principle Investigators and coordinators |
Collaborator Contribution | Funding, data, data analysis, data interpretation, paper writing, project management |
Impact | https://doi.org/10.21203/rs.3.rs-3787876/v1 |
Start Year | 2019 |
Description | RTDBox will be validated on three crop species: lettuce, tomato and potato |
Organisation | University of Cambridge |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | RTDBox is developed to automate the construction of comprehensive and high-quality transcriptome for plant species using high throughput sequencing data. We have budgeted for Illumina short-read sequencing and PacBio sequencing for three exemplary crop species: Lettuce (in collaboration with Prof Katherine Denby at University of York), tomato (in collaboration with Prof David Baulcomb at University of Cambridge) and potato (in collaboration with Dr Ingo Hein at University of Dundee). I have contacted all the above collaborators and notified the project schedule to get them ready to make RNA available for sequencing. |
Collaborator Contribution | Discussions and plans were made with all collaborators on how to proceed with the generation and preparation of the samples. |
Impact | no outputs yet |
Start Year | 2019 |
Description | RTDBox will be validated on three crop species: lettuce, tomato and potato |
Organisation | University of Dundee |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | RTDBox is developed to automate the construction of comprehensive and high-quality transcriptome for plant species using high throughput sequencing data. We have budgeted for Illumina short-read sequencing and PacBio sequencing for three exemplary crop species: Lettuce (in collaboration with Prof Katherine Denby at University of York), tomato (in collaboration with Prof David Baulcomb at University of Cambridge) and potato (in collaboration with Dr Ingo Hein at University of Dundee). I have contacted all the above collaborators and notified the project schedule to get them ready to make RNA available for sequencing. |
Collaborator Contribution | Discussions and plans were made with all collaborators on how to proceed with the generation and preparation of the samples. |
Impact | no outputs yet |
Start Year | 2019 |
Description | RTDBox will be validated on three crop species: lettuce, tomato and potato |
Organisation | University of York |
Department | Department of Biology |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | RTDBox is developed to automate the construction of comprehensive and high-quality transcriptome for plant species using high throughput sequencing data. We have budgeted for Illumina short-read sequencing and PacBio sequencing for three exemplary crop species: Lettuce (in collaboration with Prof Katherine Denby at University of York), tomato (in collaboration with Prof David Baulcomb at University of Cambridge) and potato (in collaboration with Dr Ingo Hein at University of Dundee). I have contacted all the above collaborators and notified the project schedule to get them ready to make RNA available for sequencing. |
Collaborator Contribution | Discussions and plans were made with all collaborators on how to proceed with the generation and preparation of the samples. |
Impact | no outputs yet |
Start Year | 2019 |
Description | barley long read analysis for heat stress |
Organisation | University of Silesia |
Country | Poland |
Sector | Academic/University |
PI Contribution | Using our established short read pipeline, we are testing and collecting user feedback through collaborations with a research group from University of Silesia in Katowice, led by Dr. Agata Daszkowska to analysis PacBio long-read sequencing data to study the heat stress in barley |
Collaborator Contribution | The research group from the University of Silesia in Katowice, led by Dr. Agata Daszkowska has helped us to provide advice on how to improve the RTDBox pipeline for its accessibility. |
Impact | not available yet |
Start Year | 2022 |
Description | common bean RTD |
Organisation | IPK Gatersleben |
Country | Germany |
Sector | Private |
PI Contribution | Using our established short-read pipeline, we are testing and collecting user feedback through collaborations with a research group from IPK. Dr. Beate Fraust visited us in Dundde and we trained her to use RTDBox for developing RTD for common beans. |
Collaborator Contribution | Feedbacks are provided on what to improve for RTDBox |
Impact | common bean RTD |
Start Year | 2022 |
Description | lettuce RTD |
Organisation | University of York |
Department | Department of Biology |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | My team will utilize the PacBio Iso-seq sequencing and Illumina sequencing data generated from the samples harvests at Prof Denby's lab to generate a high-quality lettuce transcriptome using the pipeline established in this project. |
Collaborator Contribution | Prof Katherine Denby generated the RNAs from diverse tissues and experimental conditions for illumina and PacBio sequencing. |
Impact | high quality lettuce transcriptome that allows accurate and fast gene quantifications using RNA-seq |
Start Year | 2021 |
Description | tomato RTD |
Organisation | University of Oxford |
Department | Department of Plant Sciences |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | we are using the RTDBox we developed to construct a comprehensive and high quality RTD in tomato |
Collaborator Contribution | Sara Lopez Gomollon has generated the tomato samples and extract RNAs that were send for sequencing |
Impact | n/a |
Start Year | 2021 |
Title | 3D RNA-seq: a powerful and flexible tool for rapid and accurate differential expression and alternative splicing analysis of RNA-seq data for biologists |
Description | 3D RNA-seq' App is an R shiny App and web-based pipeline for the comprehensive analysis of RNA-seq data from any organism. It represents an easy-to-use, flexible and powerful tool for analysis of both gene and transcript-level gene expression to identify differential gene/transcript expression, differential alternative splicing and differential transcript usage (3D) as well as isoform switching from RNA-seq data. 3D RNA-seq integrates state-of-the-art differential expression analysis tools and adopts best practice for RNA-seq analysis. The program is designed to be run by biologists with minimal bioinformatics experience (or by bioinformaticians) allowing lab scientists to analyse their RNA-seq data. It achieves this by operating through a user-friendly graphical interface that automates the data flow through the programs in the pipeline. The comprehensive analysis performed by 3D RNA-seq is extremely rapid and accurate, can handle complex experimental designs, allows user setting of statistical parameters, visualizes the results through graphics and tables, and generates publication-quality figures such as heat-maps, expression profiles and GO enrichment plots. |
Type Of Technology | Webtool/Application |
Year Produced | 2019 |
Open Source License? | Yes |
Impact | The publication has been cited 14 times and >4,400 users have used the tool for their RNA-seq analysis globally, with a quarter of regular and returning users who have used it on multiple occasions. |
URL | http://3drnaseq.hutton.ac.uk |
Title | RTDBox |
Description | RTDbox is a computational pipeline that allows scientists to construct a high-quality transcript reference, which enables fast and accurate quantifications of gene expression using RNA-seq data. We have established cutting-edge methods for filtering misassembled transcripts from Illumina short-read assemblies and PacBio Iso-seq long reads. We also provide a web interface that allows this analysis to be carried out quickly and easily without coding. |
Type Of Technology | Webtool/Application |
Year Produced | 2021 |
Open Source License? | Yes |
Impact | The transcriptome reference plays a key role in gene expression quantification as incomplete, misassembled transcriptome often leads to erroneous gene expression quantifications. Our new pipeline will allow the construction of high quality transcriptome reference quickly that incorporate a range of stringent filtering to remove mis-assembled transcripts. For the PacBio long read pipeline, we also developed a method that defines the transcript start and end accurately, which not only improves the gene expression accuracy, but also allows the study of the transcriptional regulations, such as polyadenylations and alternative splicing. |
URL | https://rtdbox.hutton.ac.uk/#/ |
Description | 3D RNA-seq Training at La Trobe University at Melbourne, Australia |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | 12 PhD students and postdocs at La Trobe University in Melbourne Australia have attend the 3D RNA-seq training workshop, which equipped them with analysis skill for RNA-seq data. An anonymous survey shows that 100% of participants would recommend the training course and the 3D RNA-seq tool to others. |
Year(s) Of Engagement Activity | 2023 |
URL | https://www.eventbrite.com.au/e/3d-rna-seq-app-a-flexible-and-powerful-tool-for-analysis-of-rna-seq-... |
Description | Deliver 3D RNA-seq training workshop at Australian National University |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | Local |
Primary Audience | Professional Practitioners |
Results and Impact | Wenbin and I have undertaken a two-day workshop about their software 3D RNA-seq (https://3drnaseq.hutton.ac.uk/app_direct/3DRNAseq/) for the analysis of transcriptomics data. Sessions are available to attend on the following dates and times: Wednesday 1 March 2023, 9:00am to 12:00 noon, Seminar Rooms 1 & 2 Thursday 2 March 2023, 9:00am to 12:00 noon, Seminar Rooms 1 & 2 |
Year(s) Of Engagement Activity | 2023 |
URL | https://www.eventbrite.com.au/e/3d-rna-seq-workshop-tickets-556207510637 |
Description | poster presentation at RECOMB 2022 |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | I have presented a poster presentation titled "Novel computational methods for high-resolution single molecule sequencing-based transcriptomes in Arabidopsis and barley" at RECOMB 2022, San Diego engaging with 20+ scientists. |
Year(s) Of Engagement Activity | 2022 |
Description | presentation at International Conference on Arabidopsis Research (ICAR) conference |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | presented a talk titled "A high-resolution single molecule sequencing based Arabidopsis transcriptome using novel methods of Iso-seq analysis " at International Conference on Arabidopsis Research (ICAR), Belfast, 20-24 June 2022 |
Year(s) Of Engagement Activity | 2022 |
Description | transcriptome data analysis training workshop |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Postgraduate students |
Results and Impact | we have held a transcriptome data analysis training workshop, including developing high quality transcript reference datasets for accurate quantification (Zhang et al, 2017; Zhang et al, 2022 ) and using 3D RNA-seq to carry out comprehensive and high quality gene expression analysis. The 3D RNA-seq app (Guo et al., 2021) has been developed at the James Hutton Institute with over 8,700 users globally and cited 49 times by plant, animal and human studies since 2019. The workshop was attended by 18 participants, from students, post-docs and permanent staff from the IPK including three participants travelling from Poland. All participants had a chance to run through the app with a test dataset and were keen to use it on their own datasets afterward. Overwhelmingly positive feedbacks have been received through different channels. |
Year(s) Of Engagement Activity | 2022 |
URL | https://www.denbi.de/training/1469-3d-rna-seq-a-flexible-and-powerful-tool-for-differential-expressi... |