Bayesian Computation in Systems and Synthetic Biology

Lead Research Organisation: University of Warwick
Department Name: Warwick Systems Biology Centre

Abstract

This proposal is to request support for visits to the Centre of Excellence in Genomic Sciences (CEGS) at the University of Southern California, and Caltech (Mayo Laboratory), to exchange ideas, develop new lines of research and explore collaborative ventures. The underlying scientific theme which links these visits is the development of advanced tools for Bayesian statistical modelling, and their application to two challenging problems in systems and synthetic biology - the understanding of genetic variation and the design of novel protein molecules. This proposal surmounts traditional academic disciplinary boundaries and lies at the interface of biophysics, genomics and computational statistics.

A key feature that distinguishes the modern approach to systems biology is the aim of linking mathematical and statistical modelling with the huge volume and diversity of contemporary cellular and molecular data, such as that coming from high-throughput, genome-wide and imaging technologies. One of the most important challenges facing modern biology and medicine is to understand how the genetic variation between individuals (the genotype) translates into the type of variation we can see or measure, such as blood pressure (the phenotype), and how environment influences this relationship. Although considerable progress has been made in recent years in identifying regulatory genes and modules in various organisms, there is still limited knowledge about downstream gene regulatory networks, and about how variation in these networks results in phenotypic differences, and is, in turn, affected by the environment.

The Centre of Excellence in Genomic Sciences (CEGS) at the University of Southern California, directed by Professor Simon Tavare FRS, is one of only 11 CEGS funded by the National Institutes of Health, with a focus on the use of the heterogeneous data produced by modern genomics technologies to understand genetic variation. Professor Tavare is internationally recognised for his work at the interface of statistics, probability and the biological and medical sciences. He has made important contributions to the study of combinatorial stochastic processes, population genetics and statistical bioinformatics. The visit to CEGS will provide an unparalleled opportunity to interact with a wide range of researchers, including molecular biologists, population geneticists, genetic epidemiologists, statisticians, computer scientists, and mathematicians, who are focused these problems.

Whilst systems biology attempts to understand the design principles underpinning biological processes, synthetic biology attempts to apply this understanding to the design and construction of novel biological functions and systems not found in nature. One facet of synthetic biology is protein design, in which our increasing understanding of the principles underlying protein structure and function is being applied in the redesign of existing proteins, or the design of novel proteins.

Professor Steve Mayo is one of the pioneers of the field of protein design and a member of the US National Academy of Sciences; the focus of his laboratory at Caltech is the use of theoretical, computational, and experimental approaches to study structural biology, and in particular to develop quantitative methods for protein design. Caltech was rated the world's number one university in the 2011--2012 Times Higher Education global ranking of the top 200 universities. The visit to Professor Mayo's laboratory will provide a unique opportunity to interact with a wide range of researchers applying theoretical, computational, and experimental approaches to the study of protein design, protein sequence evolution and protein-protein recognition, in a world-class environment.

Planned Impact

The project will have an impact on a number of end-users of statistical modelling techniques in biomedicine, computational science, and other data intensive fields.

At the core of this proposal is the development of novel algorithmic tools for Bayesian computation. The contributions in this area will offer the prospect of new modalities for addressing important and challenging problems in systems and synthetic biology. In the forthcoming era of personal genomic medicine, genome sequences and other forms of high-throughput data such as gene expression, alternative splicing, DNA methylation, histone acetylation, and protein abundances will be routinely measured for large numbers of people. New approaches to synthetic protein design offer the long-term potential for novel biotechnological and therapeutic applications. Statistical and computational methodology will be essential to realizing the promise of these technological developments. The visits described in this proposal will provide exemplar projects and preliminary results, which will form the basis of future collaborative grant proposals, either to the Human Frontier Science Programme or to domestic funding agencies.

Publications

10 25 50
 
Description At the core of this proposal is the development of novel algorithmic tools for Bayesian computation. The contributions in this area will offer the prospect of new modalities for addressing important and challenging problems in systems and synthetic biology. In the forthcoming era of personal ge-
nomic medicine, genome sequences and other forms of high-throughput data such as gene expression,
alternative splicing, DNA methylation, histone acetylation, and protein abundances will be routinely
measured for large numbers of people. New approaches to synthetic protein design offer the long-
term potential for novel biotechnological and therapeutic applications. Statistical and computational
methodology will be essential to realizing the promise of these technological developments. The visits
described in this proposal have provided exemplar projects and preliminary results, which will form the
basis of future collaborative grant proposals.
Exploitation Route CEGS has a collaborative agreement with the Colon Cancer Family Registry (CCFR) study
based at USC, which as initiated an active, integrated, and comprehensive plan of research on the
etiology of colorectal cancer, including a wide range of studies on genetic/epigenetic-environmental
interactions. This collaboration between CEGS and the CCFR provides an ideal route by which the
computational and statistical methods for integrating and analyzing heterogeneous data developed in
order to address the broad questions of genotype-phenotype mapping outlined in this proposal may
then be tested using human data sets with a direct disease focus.
Sectors Agriculture, Food and Drink,Healthcare,Pharmaceuticals and Medical Biotechnology

 
Description The main concrete outputs of my interactions with the USC researchers to date have been the submission of a $15.7m Center of Excellence in Genomic Sciences grant proposal entitled "Closed-loop modeling and experimentation for developmental gene regulatory networks" to the US National Institutes of Health, a proposal to the NSF/BBSRC pilot scheme entitled "Combining systems biology and population genomics to discover the molecular nature of adaptation in Medicago truncatula", and an outline application to the Leverhulme Trust entitled "Exploring the energy landscapes of proteins and peptides with Bayesian computation".
First Year Of Impact 2014
 
Title MDI-GPU 
Description Accelerating integrative modelling for genomic scale data using GP-GPU computing 
Type Of Technology Software 
Year Produced 2016 
Open Source License? Yes  
Impact publication 
URL http://www2.warwick.ac.uk/fac/sci/systemsbiology/research/software/
 
Title OSCI 
Description Inferring Orthologous Gene Regulatory Networks Using Interspecies Data Fusion 
Type Of Technology Software 
Year Produced 2016 
Open Source License? Yes  
Impact publication 
URL http://www2.warwick.ac.uk/fac/sci/systemsbiology/research/software/