Development and benchmarking of improved computational methods for transcript-level expression analysis using RNA-seq data

Lead Research Organisation: University of Liverpool

Department Name: Institute of Integrative Biology

Abstract

Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.

Technical Summary

RNA-seq technology enables the discovery and quantification of multiple transcripts for each gene, including different gene isoforms and different allelic forms. We propose the development of a Bayesian inference approach for inferring the concentration of different transcripts present in a sample by using a probabilistic model of mapped reads. By using a Bayesian inference approach we will capture the level of inherent uncertainty in our estimates of transcript expression levels due to mapping ambiguity, technical noise, read depth limitations and biological noise. We will include the possibility of discovering unannotated isoforms. The use of a read-level probabilistic model will allow us to incorporate information about read density biases and read mapping quality scores. We will apply the model to quantify allele-specific isoform expression which is particularly challenging in complex genomes such as the hexaploid wheat that can express genes from a set of three diploid genomes. We will develop a transcript-level benchmark dataset for method evaluation in which different gene isoforms are spiked in at known concentrations against a natural background. R-code implementing our methods for transcript-level inference and benchmarking will be disseminated through the Bioconductor project. We will extend the existing puma Bioconductor package for noise propagation in microarray analysis so that the methods there can be applied to transcript-level expression data with an associated multivariate uncertainty distribution.

Planned Impact

Communication and Engagement: We will publish papers in open access peer-reviewed journals so that the academic community are made aware of developments. Software will be implemented as open source Bioconductor packages. A public benchmark will lead to better practice by allowing a publically available comparison of competing methods. We have close links to TGAC and the other MRC hubs and we will ensure that all of these groups are made aware of the tools developed and their application. The CGR, as a NERC and MRC hub, also works with a large bioinformatics community and will train new users in working with this software.

Collaboration and Co-production: The investigators are also engaged in many other BBSRC projects which can adopt the methodology developed here to add value to those projects. These projects also provide excellent application data for this proposal. Many of these projects involve short read sequencing of economically important species and comparative analysis to model species and we will identify other projects where the software will be deployed and ensure that their feedback is reflected in the development of the software.

Exploitation and Application: As this tool will be deployed primarily for academic research we do not intend to protect its application. It will be made freely available to the user community through a suitable open source license.

Capacity and Involvement: We are involved in supervising BBSRC and MRC funded Ph.D. students who will benefit from this research as they will be directly using the software developed and we regularly employed sixth form students to undertake research activities in the lab. Both SITRAN and the CGR undertake a wide range of outreach activities to industry, the academic community and the general public and actively engage with the media at local, national and international level

Impact Activity Deliverables and Milestones: Computational Biology developments will be presented at international conferences. Four key papers and associated software will be published along with a benchmarking website.

Funded Value:

£315,943

Funded Period:

Sep 12 - Sep 15

Funder:

BBSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

BB/J007994/1

Principal Investigator:

Neil Hall

Research Subject:

Genetics & development (28%)

Mathematical sciences (28%)

Omic sciences & technologies (14%)

Tools, technologies & methods (28%)

Research Topic:

Bioinformatics (28%)

Gene action & regulation (28%)

Statistics & Appl. Probability (28%)

Transcriptomics (14%)

Organisations

People	ORCID iD
Neil Hall (Principal Investigator)
Anthony Hall (Co-Investigator)	http://orcid.org/0000-0002-1806-020X

Publications

Author Name

Title Publication Date Published

10 25 50

D'Amore R (2016) A comprehensive benchmarking study of protocols and sequencing platforms for 16S rRNA community profiling. in BMC genomics

Gardiner L (2017) Hidden variation in polyploid wheat drives local adaptation

Gardiner LJ (2015) A genome-wide survey of DNA methylation in hexaploid wheat. in Genome biology

Key Findings
Impact Summary
Policy Influence
Further Funding
Collaboration


Description	We have generated multiple RNAseq datasets that ae being used for development of improved algotythems to support the bitseq software package
Exploitation Route	Our data, once released will be used by the plant genomics field and the software will be used by anyone doing RNAseq experiments.
Sectors	Agriculture, Food and Drink,Pharmaceuticals and Medical Biotechnology
URL	https://code.google.com/p/bitseq/


Description	The software developed as part of this work (Puma) has been used widely by other research groups for RNAseq analsis.
First Year Of Impact	2016
Sector	Agriculture, Food and Drink,Environment,Pharmaceuticals and Medical Biotechnology


Description	Membership of BBSRC Transformative technology Panel
Geographic Reach	National
Policy Influence Type	Membership of a guideline committee
Impact	The transformative technology strategy advisory panel have influence BBSRC policy on data intensive bioscience and big ideas pipeline


Description	A computational cloud framework for the study of gene families
Amount	£181,000 (GBP)
Funding ID	BB/N023145/1
Organisation	Biotechnology and Biological Sciences Research Council (BBSRC)
Sector	Public
Country	United Kingdom
Start	04/2017
End	09/2018


Description	International Wheat Yield Partnership (IWYP).
Amount	$2,000,000 (USD)
Organisation	Biotechnology and Biological Sciences Research Council (BBSRC)
Sector	Public
Country	United Kingdom
Start	01/2016
End	01/2019


Description	Collaboration with The University of Manchester (Rattray)
Organisation	Manchester University
Country	United States
Sector	Academic/University
PI Contribution	The group at Liverpool provided expertise in bioscience and molecular biology. We undertook benchmarking RNA seq experiments and applied developed tools to different biological problems.
Collaborator Contribution	while the Rattray group provide expertise in statistics and software engineering
Impact	None
Start Year	2013

Abstract

Technical Summary

Planned Impact

Organisations

People

ORCID iD

Publications