Integrated Expression Analysis and E-support using Bayesian Models for Affymetrix Exon and Gene Arrays

Lead Research Organisation: Imperial College London
Department Name: School of Public Health

Abstract

DNA microarrays measure the expression of thousands of genes simultaneously. In the past decade they have become the standard technology for studying gene expression differences between different experimental conditions, for example between individuals suffering from certain diseases and unaffected individuals. The latest generation of Affymetrix microarrays have increased the resolution of genomic expression detection from the gene level down to the exon level, enabling measurement of different splice variants. Currently there are very few tools to take advantage of the additional information contained within these Exon arrays. The aim of this project is to develop powerful and robust methods for Exon array analysis including alternative splice detection and gene expression analyses. The software will be made available for researchers to download and use on their own sytem. In addition, data sets from microarray experiments can be very large and complex. The use of such high-throughput technology brings challenges for researchers in terms of data analysis and storage. Many biologists do not have direct collaboration with statisticians and bioinformaticians who are able to advise on and carry out data analysis. Thus there is a need for readily available analysis tools, in a form accessible to the biological community. Therefore this project also aims to present the developed analysis methods in a user friendly analysis environment, along with technical support and training software. The work will build on an existing resource called EMAAS, created for microarray data analysis and management which has been developed by the Bioinformatics Support Service at Imperial College. This utilises Grid clusters of computers which allows very fast data analysis, accessed via a user-friendly web interface.

Technical Summary

This project has two parallel aims: firstly to develop novel methods for analysing the new Affymetrix Exon and Gene arrays, including measuring different splice variants, and secondly to provide E-support for integrated Bayesian analysis of microarray data. We will develop a user-friendly web interface to a Grid-enabled system, allowing researchers to benefit from a powerful computing facility without having to deal directly with the underlying technology. We will adapt existing software for Bayesian analysis to be part of the microarray analysis portal EMAAS, which has been developed by the Bioinformatics Support Service at Imperial College. A series of separate analysis stages will be integrated together, allowing for simultaneous parameter estimation and propagation of uncertainty. We will develop a new Bayesian model for analysing Exon and Gene arrays which will include the facility to study alternative splice variants. This model will be made publicly available via the R software repository Bioconductor, and included in the integrated expression analysis pipeline. Additionally we will provide training and support tools for Bayesian analysis, tutorials for our particular models and worked examples using publicly available data.

Publications

10 25 50
 
Description Two novel analysis methods were produced: one for exploiting data from Affymetrix Gene Expression Microarrays, one for high-throughput sequencing of RNA (known as RNA-seq). In both cases, the results are the quantification of gene expression levels, and the detection of genes which significantly differ between experimental conditions.



For each of the two new methods, publicly available, open-source software has been created.
Exploitation Route The software is freely available to any researcher with Microarray or RNA-seq data.
Sectors Healthcare,Manufacturing/ including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology

URL http://www.bgx.org.uk/software.html
 
Title mmbgx 
Description mmbgx: a novel method for estimating expression levels for isoforms, using Exon and Gene Array data. 
Type Of Material Data analysis technique 
Year Produced 2010 
Provided To Others? Yes  
Impact
 
Title mmseq 
Description mmseq: a novel method for estimating expression levels for isoforms and haplotype specific isoforms, using RNA- seq data. 
Type Of Material Data analysis technique 
Year Produced 2011 
Provided To Others? Yes  
Impact
 
Description Chattaway/Bangham proteomics 
Organisation Imperial College London
Department Department of Medicine
Country United Kingdom 
Sector Academic/University 
PI Contribution Statistical analysis of high-dimensional proteomics biomarker data for Multiple Schlerosis.
Collaborator Contribution Clinical trial, Experimental work.
Impact Multi-disciplinary: Medicine, Virology, Statistics. Lewin, A et al. (2016), Free serum haemoglobin is associated with brain atrophy in secondary progressive mul- tiple sclerosis. Wellcome Open Res 2016, doi: 10.12688/wellcomeopenres.9967.2.Kirk P, Witkover A, Bangham CR, Richardson S, Lewin AM, Stumpf MP. (2013), Balancing the robustness and predictive performance of biomarkers. J. Comp. Biol. December 2013, 20(12): 979-989. Kirk P, Witkover A, Courtney A, Lewin A, Wait R, Stumpf M, Richardson S, Taylor G and Bangham C (2011), Plasma proteome analysis in HTLV-1-associated myelopathy/tropical spastic paraparesis. Retrovirolo- gy. 2011 Oct 12;8:81.
Start Year 2009
 
Title mmbgx 
Description Software for estimating expression at the isoform level and detecting differential splicing using whole-transcript Affymetrix arrays. PMID 19854940 Open-source, freely available 
Type Of Technology Software 
Year Produced 2010 
Open Source License? Yes  
Impact No actual Impacts realised to date 
URL http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2800219/
 
Title mmseq 
Description Pipeline for analysis of RNA-seq high-throughput sequencing data - produces expression estimates of haplotype-specific isoforms 
Type Of Technology Software 
Year Produced 2011 
Impact No actual Impacts realised to date 
URL http://www.rna-seqblog.com/tag/data-analysis-pipeline/