A computer array approach to accelerating the functional prediction of biological systems

Lead Research Organisation: University of Aberdeen
Department Name: School of Medical Sciences

Abstract

Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.

Technical Summary

Predicting the systems responsible for controlling biological processes is now possible thanks to the widespread availability of multiple genome sequences, the increased speed and accuracy of proteomic and microarray analyses and the development of novel powerful computer based algorithms. We have pioneered novel bioinformatic approaches that will allow for the prediction of components of the biological systems that contribute to human animal health and microbial pathogenesis. This bioinformatic expertise has generated a number of novel algorithms that allow for the simultaneous analysis of massive genomic and microarray derived data sets for the prediction of enhancer-gene linkage (Starkey, MacKenzie), yeast transcriptional profiling (Brown), the prediction of replication origins (Donaldson, Starkey), the prediction of protein-protein interactions (Ritchie) and the predictive modelling of translation termination ad elongation efficiencies (Stansfield, Starkey). All of the applicants have the expertise to test these predictions in the lab. Because of the nature of the algorithms that we are developing and the number and large size of the genomic, microarray derived data sets to be analysed the conventional desktop computers currently available to us lack sufficient processing power. In order to successfully carry out these analyses we are requesting funds to purchase and maintain a 32 node Dual AMD Opteron Cluster System computer array that will use our unique algorithms to quickly analyse massive data sets and thus speed up the prediction of biological system components by at least two orders of magnitude. Access to the AMD Opteron Cluster System will greatly accelerate our abilities to predict the function of a variety of different biological systems components and increase our knowledge of these biological systems and how these systems may be involved in increasing animal disease susceptibility and microbial pathogenesis.

Publications

10 25 50

 
Description This Research Equipment Initiative scheme provided half the funding (£46k) required to purchase a High Performance Computer (HPC) array that allowed high speed whole genome analyses. Using this HPC we were able to use a combination of whole genome comparative genomics in combination with the existing dbSNP database to examine SNP densities across the whole conserved human genome . In this way we were able to show demonstrate the following phenomena,(1)The vast majority of conservation in the human genome is non-coding.(2)SNP densities were reduced in the most conserved regions of the human genome (3)SNP densities were lowest in the intronic and exonic components of the conserved human genome (4)By comparison SNP densities were significantly higher in the conserved intergenic genome despite being conserved to an identical degree.In this way we were able to conclude that (1)the majority of conserved and by extrapolation, functional information is contained in the non-coding genome. (2)The exonic and intronic components of the conserved genome are under identical levels of high purifying selective pressure (3)the conserved intergenic component provides the plasticity required for adaptive evolution and may also be the major reservoir of disease causing polymorphisms.These observations have since been validated by GWAS studies showing that 88% of disease causing SNPs occur in non-coding regions of the genome. The results of this study were published in BMC genomics [1].In addition to this study the HPC has been used to host an on line web site called RegSNP (http://viis.abdn.ac.uk/regsnp/Home.aspx) that allows researchers to predict the effects of different alleles of SNPs on transcription factor binding sites. This facility also allows the prediction of SNP in linkage disequilibrium and is currently being updated to allow the detection of LD of GWAS associated SNPs with SNPs in highly conserved regions.In addition, Starkey has used the HPC in work that undertakes data mining of a multi-parameter model landscape through the use of Monte Carlo methods allied to statistical techniques focused on the global Nitrogen model [3,4]. [1] Davidson, S., Starkey, A. & MacKenzie, A. (2009). 'Evidence of uneven selective pressure on different subsets of the conserved human genome: implications for the significance of intronic and intergenic DNA'. BMC Genomics, pp. 10:614; [2] Davidson, S., Starkey, A., MacKenzie, RegSNP - Predicting Allele Specific Differences in Transcription Factor - DNA binding. http://viis.abdn.ac.uk/regsnp/Home.aspx; [3] Starkey, A, Robinson D. "Monte Carlo simulation of the global nitrogen cycle", CSC'09;[4] (submitted for publication) David Robinson, Calum Burgoyne and Andrew Starkey (2012). Nitrogen fluxes in a steady-state global nitrogen cycle
Exploitation Route Many people either gain little therapeutic benefit from currently available drug therapies or suffer serious side effects. Identification of drug response stratification loci will greatly accelerate the delivery of novel drugs to market by allowing more selection of drug test patient cohorts during their development and by providing an avenue to more focussed prescribing to patients who would benefit after their market delivery thus delivering on many of the promises of personalised medicine. This ability will prove hugely profitable to the drugs industry whilst greatly improving patient care.. We are currently exploring methods of using the techniques developed using our high performance computer array to identify the polymorphisms that contribute to drug response stratification in the human genome
Sectors Healthcare,Pharmaceuticals and Medical Biotechnology

 
Title RegSNP 
Description RegSNP is a new computer algorithm for the prediction of the effects of SNPs on transcription factor binding to DNA. The algorithm was released on line in the form of an easily usable web site. 
Type Of Material Improvements to research infrastructure 
Year Produced 2009 
Provided To Others? Yes  
Impact The use of RegSNP allowed us to predict the effects of SNPs identified within the regulatory regions under study. We are also aware of the use of this web site by other researchers overseas. 
 
Title Reporter gene transgenic lines 
Description The material consists of a number of trangenic lines that contain reporter gene constructs made with enhancers identified during the time frame of the current projecty 
Type Of Material Model of mechanisms or symptoms - mammalian in vivo 
Year Produced 2008 
Provided To Others? Yes  
Impact The material provided allow for the assessment of the tissue specific and inducible properties of enhancers identified by comparitive genomics in vivo. These novel models allow for the production of persuasive in vivo data on the properties of novel enhancers. The regulatory regions currently being modelled in these lines include enhancers and promoters for Galanin, TAC1, CGRP, NPY, BDNF and CNR1. 
 
Title Transgenic model 
Description Transgenic model of ECR2-TAC1prom-LacZ. Currently stored as frozen embryo 
Type Of Material Model of mechanisms or symptoms - non-mammalian in vivo 
Year Produced 2008 
Provided To Others? Yes  
Impact These transgenic lines are now frozen in N2 but have contributed critical data to 4 different research articles in good journals. They are available to the research community as frozen embryos. 
 
Title transgenic Mouse 
Description Mouse transgenic for GAL5.1-LacZ construct 
Type Of Material Model of mechanisms or symptoms - mammalian in vivo 
Year Produced 2010 
Provided To Others? Yes  
Impact Publication of the LacZ expression patterns produced in the brain of this mouse has generated tremendous interest in the GAL5.1 enhancer sequence. 
 
Description Development of a novel Algorithm to define SNP effects on DNA binding sites 
Organisation University of Aberdeen
Department School of Engineering
Country United Kingdom 
Sector Academic/University 
PI Contribution We have provided intellectual input and have guided the project from the biological perspective.
Collaborator Contribution Dr Starkey has brought expertise in Computer software engineering.
Impact We have one paper in press (BMC genomics) and one to be submitted very soon in BMC computational biology that describes the development of a novel web site that allows researchers to predict the effects of non-coding polymorphisms on the binding of transcription factors
Start Year 2006
 
Description GWA Studies, Gene Regulatory Variation and Disease 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Postgraduate students
Results and Impact Seminar/workshop

no actual impacts realised to date
Year(s) Of Engagement Activity 2011
 
Description Gene regulation, SNPs and disease 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact Many interesting questions were asked and views shared

I hope to have stimulated young people in the audience to pursue science as a career
Year(s) Of Engagement Activity 2011,2012
 
Description Gene regulatory mechanisms and Chronic Pain 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Type Of Presentation Keynote/Invited Speaker
Geographic Reach International
Primary Audience Participants in your research or patient groups
Results and Impact Seminar/workshop Invited seminar at the University of Strathclyde, Glasgow

no actual impacts realised to date
Year(s) Of Engagement Activity 2008
 
Description Gene regulatory mechanisms and Inflammatory Pain 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Type Of Presentation Keynote/Invited Speaker
Geographic Reach local
Primary Audience Participants in your research or patient groups
Results and Impact Seminar/workshop Invited seminar at the University of Liverpool

no actual impacts realised to date
Year(s) Of Engagement Activity 2009
 
Description Gene regulatory variation in health and disease 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Postgraduate students
Results and Impact Seminar/workshop

The talk stimulated a great deal of interest from the audience
Year(s) Of Engagement Activity 2011
 
Description RegSNP - Predicting Allele Specific Differences in Transcription Factor - DNA binding 
Form Of Engagement Activity A magazine, newsletter or online publication
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact This website permits non-expert in biotechnology to identify the transcription factor bindingsites most affected by specific SNPs. The website also displays LD data, information on disease data and whether a particular SNP is GWAS associated or alters suceptibility to epigenetic modifiation.

Our web site has informed the research of many other researchers.
Year(s) Of Engagement Activity 2009,2010,2011,2012,2013,2014
URL http://viis.abdn.ac.uk/regsnp/Home.aspx