Integrated Expression Analysis and E-support using Bayesian Models for Affymetrix Exon and Gene Arrays
Lead Research Organisation:
Imperial College London
Department Name: School of Public Health
Abstract
DNA microarrays measure the expression of thousands of genes simultaneously. In the past decade they have become the standard technology for studying gene expression differences between different experimental conditions, for example between individuals suffering from certain diseases and unaffected individuals. The latest generation of Affymetrix microarrays have increased the resolution of genomic expression detection from the gene level down to the exon level, enabling measurement of different splice variants. Currently there are very few tools to take advantage of the additional information contained within these Exon arrays. The aim of this project is to develop powerful and robust methods for Exon array analysis including alternative splice detection and gene expression analyses. The software will be made available for researchers to download and use on their own sytem. In addition, data sets from microarray experiments can be very large and complex. The use of such high-throughput technology brings challenges for researchers in terms of data analysis and storage. Many biologists do not have direct collaboration with statisticians and bioinformaticians who are able to advise on and carry out data analysis. Thus there is a need for readily available analysis tools, in a form accessible to the biological community. Therefore this project also aims to present the developed analysis methods in a user friendly analysis environment, along with technical support and training software. The work will build on an existing resource called EMAAS, created for microarray data analysis and management which has been developed by the Bioinformatics Support Service at Imperial College. This utilises Grid clusters of computers which allows very fast data analysis, accessed via a user-friendly web interface.
Technical Summary
This project has two parallel aims: firstly to develop novel methods for analysing the new Affymetrix Exon and Gene arrays, including measuring different splice variants, and secondly to provide E-support for integrated Bayesian analysis of microarray data. We will develop a user-friendly web interface to a Grid-enabled system, allowing researchers to benefit from a powerful computing facility without having to deal directly with the underlying technology. We will adapt existing software for Bayesian analysis to be part of the microarray analysis portal EMAAS, which has been developed by the Bioinformatics Support Service at Imperial College. A series of separate analysis stages will be integrated together, allowing for simultaneous parameter estimation and propagation of uncertainty. We will develop a new Bayesian model for analysing Exon and Gene arrays which will include the facility to study alternative splice variants. This model will be made publicly available via the R software repository Bioconductor, and included in the integrated expression analysis pipeline. Additionally we will provide training and support tools for Bayesian analysis, tutorials for our particular models and worked examples using publicly available data.
Publications
Chambers JC
(2014)
The South Asian genome.
in PloS one
Jänes J
(2015)
A comparative study of RNA-seq analysis strategies.
in Briefings in bioinformatics
Kirk P
(2013)
Balancing the robustness and predictive performance of biomarkers.
in Journal of computational biology : a journal of computational molecular cell biology
Kirk PD
(2011)
Plasma proteome analysis in HTLV-1-associated myelopathy/tropical spastic paraparesis.
in Retrovirology
Kulinskaya E
(2009)
Testing for linkage and Hardy-Weinberg disequilibrium.
in Annals of human genetics
Kulinskaya E
(2009)
On fuzzy familywise error rate and false discovery rate procedures for discrete distributions
in Biometrika
Lewin A
(2016)
Free serum haemoglobin is associated with brain atrophy in secondary progressive multiple sclerosis.
in Wellcome open research
Lewin A
(2016)
Free serum haemoglobin is associated with brain atrophy in secondary progressive multiple sclerosis
in Wellcome Open Research
Thillai M
(2012)
Sarcoidosis and tuberculosis cytokine profiles: indistinguishable in bronchoalveolar lavage but different in blood.
in PloS one
Turro E
(2011)
Haplotype and isoform specific expression estimation using multi-mapping RNA-seq reads
in Genome Biology
Description | Two novel analysis methods were produced: one for exploiting data from Affymetrix Gene Expression Microarrays, one for high-throughput sequencing of RNA (known as RNA-seq). In both cases, the results are the quantification of gene expression levels, and the detection of genes which significantly differ between experimental conditions. For each of the two new methods, publicly available, open-source software has been created. |
Exploitation Route | The software is freely available to any researcher with Microarray or RNA-seq data. |
Sectors | Healthcare Manufacturing/ including Industrial Biotechology Pharmaceuticals and Medical Biotechnology |
URL | http://www.bgx.org.uk/software.html |
Title | mmbgx |
Description | mmbgx: a novel method for estimating expression levels for isoforms, using Exon and Gene Array data. |
Type Of Material | Data analysis technique |
Year Produced | 2010 |
Provided To Others? | Yes |
Impact | h |
Title | mmseq |
Description | mmseq: a novel method for estimating expression levels for isoforms and haplotype specific isoforms, using RNA- seq data. |
Type Of Material | Data analysis technique |
Year Produced | 2011 |
Provided To Others? | Yes |
Impact | j |
Description | Chattaway/Bangham proteomics |
Organisation | Imperial College London |
Department | Department of Medicine |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Statistical analysis of high-dimensional proteomics biomarker data for Multiple Schlerosis. |
Collaborator Contribution | Clinical trial, Experimental work. |
Impact | Multi-disciplinary: Medicine, Virology, Statistics. Lewin, A et al. (2016), Free serum haemoglobin is associated with brain atrophy in secondary progressive mul- tiple sclerosis. Wellcome Open Res 2016, doi: 10.12688/wellcomeopenres.9967.2.Kirk P, Witkover A, Bangham CR, Richardson S, Lewin AM, Stumpf MP. (2013), Balancing the robustness and predictive performance of biomarkers. J. Comp. Biol. December 2013, 20(12): 979-989. Kirk P, Witkover A, Courtney A, Lewin A, Wait R, Stumpf M, Richardson S, Taylor G and Bangham C (2011), Plasma proteome analysis in HTLV-1-associated myelopathy/tropical spastic paraparesis. Retrovirolo- gy. 2011 Oct 12;8:81. |
Start Year | 2009 |
Title | mmbgx |
Description | Software for estimating expression at the isoform level and detecting differential splicing using whole-transcript Affymetrix arrays. PMID 19854940 Open-source, freely available |
Type Of Technology | Software |
Year Produced | 2010 |
Open Source License? | Yes |
Impact | No actual Impacts realised to date |
URL | http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2800219/ |
Title | mmseq |
Description | Pipeline for analysis of RNA-seq high-throughput sequencing data - produces expression estimates of haplotype-specific isoforms |
Type Of Technology | Software |
Year Produced | 2011 |
Impact | No actual Impacts realised to date |
URL | http://www.rna-seqblog.com/tag/data-analysis-pipeline/ |