Development of SUPPA for alternative splicing analysis from RNA-seq in plants across multiple conditions

Lead Research Organisation: University of Dundee
Department Name: School of Life Sciences

Abstract

Genes are the repositories of hereditary information and proteins are the machines that carry out the functions of living cells. The term 'gene expression' usually refers to the process by which a gene gives rise to a protein. In eukaryotes, gene expression is complex and when protein-coding genes are expressed, the DNA sequence is first copied into a precursor messenger RNA (pre-mRNA) by transcription. The pre-mRNA undergoes several processing steps to form mature messenger RNAs (mRNAs) which direct synthesis of the corresponding protein (translation). An extremely important RNA processing step called alternative splicing (AS) generates different mRNA transcripts (i.e. isoforms) from the same gene and thereby modulates transcript and protein levels and functions. The majority of protein-coding genes undergo AS and the relative amounts of AS isoforms changes dynamically as cells and organisms develop and grow. High-throughput methods such as RNA sequencing (RNA-seq) are now capable of generating data on tens of thousands of transcripts from cells or particular stages of development or different conditions. To be able to analyse the dynamic changes in transcripts and AS, and to understand how this is regulated, we need computational tools that will allow the accurate measurement of these different mRNA transcript isoforms from these large datasets. The tool which we will develop here will enable the high resolution analysis of dynamic changes in gene expression at the individual transcript and AS event level.

Being able to distinguish the abundance of different transcript isoforms is important because one of the main approaches scientists use to associate genes with functions is to monitor gene expression: i.e. where and when genes are switched on or off, and at what level. The RNA-seq technology and programmes to analyse the data are continually being improved. Significant recent advances are the release of computational programmes that can quantify transcript isoform abundance (e.g. Sailfish, Salmon, Kallisto) and can generate measures of AS (e.g. SUPPA) from large datasets very quickly.

We have been using these programmes to analyse RNA-seq data from Arabidopsis. We also have an excellent experimental system which allows us to validate the RNA-seq results. Detailed comparisons have helped us to identify a number of discrepancies and issues with the SUPPA programme where it does not accurately report on, for example, complex AS events. We have taken initial steps in improving SUPPA with good success and here we aim to modify SUPPA in a number of ways so that it 1) accurately measures AS, and 2) can be applied to experiments with multiple conditions such as time-course series. The significance of this is that it will allow clustering of AS responses, correlation of AS indices with gene expression, and the building of splicing networks to understand the regulation of AS.

Research groups in the UK and around the world use RNA-seq to analyse gene expression in plants and animals. Currently there are many limitations in quantifying transcripts and AS in a dynamic way - the improved version of SUPPA will deliver this function. It will be relevant not only to Arabidopsis but also to other plant and crop species, and to animal and human studies.

Technical Summary

The regulation of gene expression is essential to plant growth and development. Alternative Splicing (AS) generates more than one transcript isoform per gene and occurs in up to 60-70% of intron-containing genes in plants. In Arabidopsis, using ultra-deep RNA-seq and new computational methods of analysis we have generated dynamic, transcript-specific data which allows us to interpret the contribution of individual transcript isoforms to overall gene expression. However for a comprehensive analysis of AS we require computational tools capable of dealing with complex AS events and identifying statistically different ASamong tens of thousands of transcripts across two or more conditions.

We are the first plant group to apply SUPPA (a programme developed to analyse AS transcripts in human cancers) to plant RNA-seq data. The programme functions well for a number of genes/AS but does not deal well with complex AS events; also it was developed for analysis of binary systems (e.g. cancer cells versus normal cells) and needs to be substantially modified to handle multiple conditions accounting for variability among biological repeats and analysing differential AS.

We have taken initial steps to demonstrate that SUPPA can be modified and have identified further areas for improvement. We have significant advantages in this proposal. Firstly, the proposal is a collaboration with Prof. Eduardo Eyras (Barcelona) the developer of SUPPA; secondly, we have probably the most extensive RNA-seq time-course dataset in plants in terms of the resolution of the time series (26 time points) and the ultra-deep RNA-seq data; and thirdly, we have extensive validation data generated on the same RNA samples which will be used to test and assess the planned modifications and the application to multiple time-point data.

The output will be improved versions of SUPPA software applicable to the analysis of RNA-seq data of plant/crop species (as well as animal/human).

Planned Impact

The main impact of this work will be the programme to analyse AS in RNA-seq data from both plants and animals. In particular, it will impact researchers examining gene function and global expression changes at the transcript level. As such the main beneficiaries and users are the research sector, both academic and industrial.

The main challenge to maximising impact of the new software tool is to raise awareness of its potential and utility with the people who are most likely to use it and benefit their research. This will be done in timely fashion so that other researchers can begin to use the software to re-analyse existing RNA-seq datasets and analyse new and future RNA-seq experiments.

The main Impact Objectives are to:

1) Inform the alternative splicing community of the software for AS analysis of RNA-seq data and its uses and applications while it is being developed and tested;
2) Release the new SUPPA software to the community as soon as it is finalised and disseminate it further from specific websites used regularly by the community.

To achieve these objectives:
1. The PIs/Co-Is will ensure community awareness by contacting research groups in the plant and animal AS community with details of the programme and how it will benefit them;
2. The PIs/Co-Is will present progress on the development at national and international conferences and meetings such as the annual UK RNA and EURASNET meetings as well as through invited seminars;
3. The new SUPPA will be released to relevant groups as soon as it is finalised and made widely available on our websites following publication;
4. Public engagement activities;
5. Training and mentoring of the PDRA.

Publications

10 25 50
 
Description We identified areas of improvement for the SUPPA program as applied particularly to plant alternative splicing (AS) data. The issues were that SUPPA does pairwise comparisons of different transcripts from the same gene to identify the type of AS event and degree of AS between the transcripts. For genes with only two transcripts, this worked well. However, plant data can be more complex because of the small size of many introns often creating additional AS events in the same region. We tested a number of ways to make the definition of events more flexible and group transcript with the same splice sites. Modifications were made to the program by the developer and run on our data and we analysed the data for improvements. A number of iterations helped to define the parameters and led to the improved SUPPA2 program. We are using SUPPA2 in our analyses and it has also been used by other plant groups.
Exploitation Route Ultimately, SUPPA will be used to analyse expression and alternative splicing in different plant and crop species and will aid identification of key genes.
Sectors Agriculture, Food and Drink

 
Title New program to detect significant isoform switches in time-series data -Time-Series Isoform Switch (TSIS) 
Description One aspect of alternative splicing (AS) is isoform switching where the relative abundance of different isoforms of the same genes switches under different conditions. For example, isoform switches are used in cancer diagnostics. There are three programs to identify isoform switches in pairwise sample comparisons. TSIS is the only program which can identify significantly different isoform switches in time-series RNA-seq data. 
Type Of Material Data analysis technique 
Year Produced 2017 
Provided To Others? Yes  
Impact To date there are a small number of citations. However, it's application is clearly demonstrated in the Calixto et al (2018) paper and we expect more uptake. 
 
Description Collaboration to improve the utility of SUPPA, a programme to analyse alternative splicing in RNA-seq data 
Organisation Pompeu Fabra University
Department Department of Experimental and Health Sciences
Country Spain 
Sector Academic/University 
PI Contribution In developing the pipeline to analyse RNA-seq data in Arabidopsis we used the original version of SUPPA and compared results to our experimental data. This validation raised some issues with SUPPA which were the corrected by the EYRAS team (giving SUPPA-var). We are collaborating in this grant to further improve SUPPA for use with plant RNA-seq data and to deal with more complex alternative splicing events.
Collaborator Contribution Eduardo Eyras and team own the programme SUPPA and continue to improve its utility. Our suggestions from our studies are implemented into SUPPA by Eduardo Eyras and team.
Impact We have contributed to an improvement in SUPPA. SUPPA is widely used world-wide but has been developed mainly for the analysis of AS in human cancer.
Start Year 2015
 
Description Alternatively spliced genes as novel cold responsive genes in Arabidopsis (Cristiane Calixto/ASPB Hawaii/2017) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Talk in RNA Biology session of ASPB Plant Biology meeting Hawaii, USA 2017
Year(s) Of Engagement Activity 2017
 
Description Presentaion at Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Germany - Rapid and dynamic alternative splicing impacts the cold response transcriptome in Arabidopsis - given by Prof John W S Brown 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Invited seminar at IPK-Gatersleben; multiple discussions with different group leaders and interested parties
Year(s) Of Engagement Activity 2018
 
Description Presentaion at Molecular Cell Physiology Department, University of Bielefeld, Bielefeld, Germany - Rapid and dynamic alternative splicing impacts the cold response transcriptome in Arabidopsis - given by Prof. John W S Brown 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Invited seminar - discussions with PhD students and research groups interested in our advanced approaches.
Year(s) Of Engagement Activity 2018
 
Description Presentation at 6th UK RNA Splicing Workshop - Rapid cold-induced alternative splicing in Arabidopsis involves a complex network of regulators - given by Prof John W S Brown 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Presentation covered our research on cold-induced changes in expression and alternative splicing. Also presented a new method/tool of RNA-seq analysis designed for biologists. massive interest in this.
Year(s) Of Engagement Activity 2018
 
Description Presentation at GARNet2018:a plant science showcase at University of York - in Large Scale Biology section - "Genome-wide alternative splicing" - given by Dr Cristane Calixto 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Invited to speak about oue genome-wide methods for analysis of RNA-seq for gene expression and alternative splicing
Year(s) Of Engagement Activity 2018
 
Description Presentation at SEB Annual Meeting, Florence - Rapid cold-induced alternative splicing in Arabidopsis involves a complex network of regulators - given by Dr Nikoleta Tzioutziou 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Presentation in section: Plant Temperature responses: Shaping Development and Enhancing Survival?

Presented out novel RNA-seq analysis methods for time-course analysis (paper published in Plant Cell - Calixto et al 2018.
Year(s) Of Engagement Activity 2018
 
Description RNA-squencing meeting (University of Dundee and James Hutton Institute) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Professional Practitioners
Results and Impact We organised an "RNA-sequencing afternoon" workshop for people involved in RNA-seq analyses at the University of Dundee and the James Hutton Institute. the purpose was to raise awareness of new approaches in this very fast moving field.
Year(s) Of Engagement Activity 2017
 
Description Rapid and dynamic alternative splicing impacts the Arabidopsis cold response (Cristiane Calixto/IGC Symposium - Lisbon/2017) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Invited talk at the IGC Symposium 2017 - Plant RNA Biology. Lisbon, 27-28 September 2017 (talk given by Cristiane Calixto)
Year(s) Of Engagement Activity 2017
 
Description Reference Transcript Datasets for RNA-sequencing in plants (Cris Calixto/ASPB Hawaii, 2017) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact ASPB Plant Biology Hawaii June 24-28, 2017 Bioinformatics Session (talk given by Cristiane Calixto) - REFERENCE TRANSCRIPT DATASETS THE VALUE OF ACCURATE TRANSCRIPTOMES FOR GENE EXPRESSION AND ALTERNATIVE SPLICING ANALYSIS
Year(s) Of Engagement Activity 2017