Managing the Data Explosion in Post-Genomic Biology with Fast Bayesian Computational Methods
Lead Research Organisation:
University of Cambridge
Department Name: Engineering
Abstract
Rapid technological advances in molecular biology are providing an unprecedented opportunity to investigate the basic processes of life. This `post-genomic' phase of molecular biology has resulted in an explosion of typically high dimensional structured data from new technologies for transcriptomics (microarrays), proteomics and metabolomics. Such data requires novel mathematical, statistical and computational methods for their interpretation and analysis. This proposal focuses on the development of statistical and computational methods for the analysis of such data, using novel approaches from the fields of machine learning and nonparametric Bayesian statistics. The project involves a close collaboration of scientists with expertise in machine learning and statistics, bioinformatics and molecular biology. The new software tools will be developed in the context of real-world scientific problems, such as: elucidating signalling networks in plant stress responses; metabolic regulation in the bacteria Streptomyces, major producers of antibiotics and delineating the molecular mechanisms contributing to mitochondrial dysfunction in obesity and diabetes. The scientific goal of the project will be to apply these novel methods to modelling bioinformatics data, but the methods developed will be broadly applicable across a number of fields.
Organisations
People |
ORCID iD |
Zoubin Ghahramani (Principal Investigator) |
Publications
Cooke EJ
(2011)
Bayesian hierarchical clustering for microarray time series data with replicates and outlier measurements.
in BMC bioinformatics
Knowles D
(2011)
Nonparametric Bayesian sparse factor models with application to gene expression modeling
in The Annals of Applied Statistics
Orbanz P
(2011)
Projective limit random probabilities on Polish spaces
in Electronic Journal of Statistics
Orbanz P.
(2009)
Construction of nonparametric Bayesian models from parametric Bayes equations
in Advances in Neural Information Processing Systems 22 - Proceedings of the 2009 Conference
S Lacoste-Julien
(2011)
Approximate inference for the loss-calibrated Bayesian
Savage RS
(2010)
Discovering transcriptional modules by Bayesian data integration.
in Bioinformatics (Oxford, England)
Savage RS
(2009)
R/BHC: fast Bayesian hierarchical clustering for microarray data.
in BMC bioinformatics
Williamson S.
(2010)
Dependent Indian buffet processes
in Journal of Machine Learning Research
Description | We identified 6 key computational and scientific challenges which we addressed in this project: (1) developing fast algorithms and software tools for Bayesian hierarchical clustering, (2) novel algorithms for clustering time series data, (3) new non-parametric models for finding overlapping clusters, (4) new non-parametric models for context dependent clustering, (5) developing an integrated software toolkit implementing the algorithms in (1), (2), (3) and (4), and (6) closed loop modelling, hypothesis generation, and experimentation on the biological pathways discovered. |
Exploitation Route | Although the immediate scientific goals of the project will be to apply these novel methods to modelling bioinformatics data, the methods developed in this project will be broadly applicable across many disciplines. Examples include: clustering stocks with different prices dynamics in finance, clustering regions with different growth patterns in economics, and signal processing applications. The methods developed in this project will thus have applications across many different areas. We therefore anticipate that academic researchers and ultimately industrial and commercial concerns in these fields will be long term beneficiaries of this research. |
Sectors | Digital/Communication/Information Technologies (including Software) Financial Services and Management Consultancy Healthcare Pharmaceuticals and Medical Biotechnology |
URL | http://www2.warwick.ac.uk/fac/sci/systemsbiology/research/software/ |
Description | The immediate beneficiaries of the project have been our experimental collaborators at Warwick, who have already generated extensive datasets from microarray analysis of gene expression time series and the effects of a variety of knockout mutants, experimental treatments or clinical conditions on gene expression patterns. The wider beneficiaries of the research have been the community of molecular biology researchers who have utilized our software in high-throughput data analysis. To ensure that the outputs of our EPSRC supported research are widely disseminated, versions of our code have been released as Open Source Matlab code or through the R/Bioconductor environment. Although the immediate scientific goals of the project have been to apply these novel methods to modelling bioinformatics data, the methods developed in this project will be broadly applicable across many disciplines. Examples include: clustering stocks with different prices dynamics in finance, clustering regions with different growth patterns in economics, and signal processing applications. The methods developed in this project will thus have applications across many different areas. We therefore anticipate that academic researchers and ultimately industrial and commercial concerns in these fields will be long term beneficiaries of this research. |
First Year Of Impact | 2012 |
Sector | Agriculture, Food and Drink,Healthcare |
Impact Types | Societal Economic |
Description | EPSRC |
Amount | £289,422 (GBP) |
Funding ID | EP/I026827/1 |
Organisation | Engineering and Physical Sciences Research Council (EPSRC) |
Sector | Public |
Country | United Kingdom |
Start |
Description | EPSRC |
Amount | £1,158,512 (GBP) |
Funding ID | EP/I036575/1 |
Organisation | Engineering and Physical Sciences Research Council (EPSRC) |
Sector | Public |
Country | United Kingdom |
Start |
Description | Medical Research Council |
Amount | £436,500 (GBP) |
Funding ID | MRC Biostatistics Fellowship |
Organisation | Medical Research Council (MRC) |
Sector | Public |
Country | United Kingdom |
Start |