Managing the Data Explosion in Post-Genomic Biology with Fast Bayesian Computational Methods

Lead Research Organisation: University of Cambridge

Department Name: Engineering

Abstract

Rapid technological advances in molecular biology are providing an unprecedented opportunity to investigate the basic processes of life. This `post-genomic' phase of molecular biology has resulted in an explosion of typically high dimensional structured data from new technologies for transcriptomics (microarrays), proteomics and metabolomics. Such data requires novel mathematical, statistical and computational methods for their interpretation and analysis. This proposal focuses on the development of statistical and computational methods for the analysis of such data, using novel approaches from the fields of machine learning and nonparametric Bayesian statistics. The project involves a close collaboration of scientists with expertise in machine learning and statistics, bioinformatics and molecular biology. The new software tools will be developed in the context of real-world scientific problems, such as: elucidating signalling networks in plant stress responses; metabolic regulation in the bacteria Streptomyces, major producers of antibiotics and delineating the molecular mechanisms contributing to mitochondrial dysfunction in obesity and diabetes. The scientific goal of the project will be to apply these novel methods to modelling bioinformatics data, but the methods developed will be broadly applicable across a number of fields.

Funded Value:

£255,580

Funded Period:

Jun 08 - Jun 11

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/F028628/1

Principal Investigator:

Zoubin Ghahramani

Research Subject:

Info. & commun. Technol. (45%)

Mathematical sciences (25%)

Tools, technologies & methods (30%)

Research Topic:

Artificial Intelligence (45%)

Bioinformatics (20%)

Statistics & Appl. Probability (25%)

Theoretical biology (10%)

Organisations

University of Cambridge (Lead Research Organisation)

People	ORCID iD
Zoubin Ghahramani (Principal Investigator)

Publications

Author Name

Title Publication Date Published

10 25 50

Cooke EJ (2011) Bayesian hierarchical clustering for microarray time series data with replicates and outlier measurements. in BMC bioinformatics

Knowles D (2011) Nonparametric Bayesian sparse factor models with application to gene expression modeling in The Annals of Applied Statistics

Orbanz P (2011) Projective limit random probabilities on Polish spaces in Electronic Journal of Statistics

Orbanz P. (2009) Construction of nonparametric Bayesian models from parametric Bayes equations in Advances in Neural Information Processing Systems 22 - Proceedings of the 2009 Conference

Orbanz, P. Unit-rate Poisson representations of completely random measures

S Lacoste-Julien (2011) Approximate inference for the loss-calibrated Bayesian

Savage RS (2009) R/BHC: fast Bayesian hierarchical clustering for microarray data. in BMC bioinformatics

Savage RS (2010) Discovering transcriptional modules by Bayesian data integration. in Bioinformatics (Oxford, England)

Williamson S. (2010) Dependent Indian buffet processes in Journal of Machine Learning Research

Key Findings
Impact Summary
Further Funding


Description	We identified 6 key computational and scientific challenges which we addressed in this project: (1) developing fast algorithms and software tools for Bayesian hierarchical clustering, (2) novel algorithms for clustering time series data, (3) new non-parametric models for finding overlapping clusters, (4) new non-parametric models for context dependent clustering, (5) developing an integrated software toolkit implementing the algorithms in (1), (2), (3) and (4), and (6) closed loop modelling, hypothesis generation, and experimentation on the biological pathways discovered.
Exploitation Route	Although the immediate scientific goals of the project will be to apply these novel methods to modelling bioinformatics data, the methods developed in this project will be broadly applicable across many disciplines. Examples include: clustering stocks with different prices dynamics in finance, clustering regions with different growth patterns in economics, and signal processing applications. The methods developed in this project will thus have applications across many different areas. We therefore anticipate that academic researchers and ultimately industrial and commercial concerns in these fields will be long term beneficiaries of this research.
Sectors	Digital/Communication/Information Technologies (including Software) Financial Services and Management Consultancy Healthcare Pharmaceuticals and Medical Biotechnology
URL	http://www2.warwick.ac.uk/fac/sci/systemsbiology/research/software/


Description	The immediate beneficiaries of the project have been our experimental collaborators at Warwick, who have already generated extensive datasets from microarray analysis of gene expression time series and the effects of a variety of knockout mutants, experimental treatments or clinical conditions on gene expression patterns. The wider beneficiaries of the research have been the community of molecular biology researchers who have utilized our software in high-throughput data analysis. To ensure that the outputs of our EPSRC supported research are widely disseminated, versions of our code have been released as Open Source Matlab code or through the R/Bioconductor environment. Although the immediate scientific goals of the project have been to apply these novel methods to modelling bioinformatics data, the methods developed in this project will be broadly applicable across many disciplines. Examples include: clustering stocks with different prices dynamics in finance, clustering regions with different growth patterns in economics, and signal processing applications. The methods developed in this project will thus have applications across many different areas. We therefore anticipate that academic researchers and ultimately industrial and commercial concerns in these fields will be long term beneficiaries of this research.
First Year Of Impact	2012
Sector	Agriculture, Food and Drink,Healthcare
Impact Types	Societal Economic


Description	EPSRC
Amount	£289,422 (GBP)
Funding ID	EP/I026827/1
Organisation	Engineering and Physical Sciences Research Council (EPSRC)
Sector	Public
Country	United Kingdom
Start


Description	EPSRC
Amount	£1,158,512 (GBP)
Funding ID	EP/I036575/1
Organisation	Engineering and Physical Sciences Research Council (EPSRC)
Sector	Public
Country	United Kingdom
Start


Description	Medical Research Council
Amount	£436,500 (GBP)
Funding ID	MRC Biostatistics Fellowship
Organisation	Medical Research Council (MRC)
Sector	Public
Country	United Kingdom
Start

Abstract

Organisations

People

ORCID iD

Publications