Development of network analysis tool BioLayout Express3D

Lead Research Organisation: University of Edinburgh
Department Name: Genomic Technology and Informatics

Abstract

Enormous amounts of data pertaining to the functions of genes and proteins and their interactions in the cell, have now been generated by a range of techniques including but not limited to: expression profiling, mass spectrometry, RNAi and Y2H assays. Such functional genomics and proteomics approaches, when combined with computational biology and the emerging discipline of systems biology, finally allow us to begin comprehensive mapping of cellular and molecular networks and pathways. One of the main difficulties we currently face is how best to integrate these disparate data sources and use them to better understand biological systems. Visualisation and analysis of biological data as networks is becoming an increasingly important approach to explore a variety of biological relationships. Such approaches have already been used successfully in the study of sequence similarity, protein structure, protein interactions and evolution. Shifting biological data into a graph/network paradigm allows one to utilise algorithms, techniques, ideas and statistics previously developed in graph theory, engineering, computer science and computational systems biology. In networks derived from biological data, nodes are usually genes, transcripts or proteins, while edges tend to represent experimentally determined similarities or functional linkages between them. While network analysis of biological data has shown great promise, little attention has been paid to microarray data. These data are now abundant, generally of high quality and consist of the type of high-dimensional data for which such approaches are well suited. We have developed a new program called BioLayout Express3D that constructs networks out of microarray expression data. This is achieved by measuring the similarity between individual gene expression profiles and where similar i.e. above a defined threshold, a line is used to connect them. In circumstances where there are groups of co-expressed genes within a given dataset, these nodes form a clique of interconnected nodes. Given the complexity of the data from modern array platforms tools that provide a means of visualising and analysing large amounts of data are very much needed. The current version of BioLayout Express3D can construct graphs composing of over 10K nodes and 1M edges. Visual representation of the graphs is enhanced by a unique layout algorithm combined with an OpenGL graphics engine that renders the network graphs in 3-D space. The layout data in this manner has a number of distinct advantages. The position of each node (gene) within the network can be determined relative to its immediate neighbours i.e. genes that are closest in expression (share edges) to that selected. This visualisation also allows the user to quickly identify structures and features in the graph by eye that would not have been obvious previously. Definition of these structures has also been enhanced by a graph-based clustering algorithm (MCL). Using this approach, large graphs can be divided in groups of highly connected nodes or expression data clusters of co-expressed genes. Having now looked at numerous 1- and 2-colour microarray expression datasets varying in size from less than 20 chips to over 200, we are very happy with the basic performance of the tool. However, we urgently need to add features that will extend its analytical capabilities. The other area in which this Biolayout Express3D is likely to play an important role is in modelling other types of biological relationships. In particular we have begun to use this tool construct graphs based on relationships in protein similarities and in particular networks based on large-scale interaction and pathway datasets. In this respect the tool is showing great promise over other available software packages, but again the tool is in need of further development to enhance its capabilities in this area.

Technical Summary

Conventional analysis techniques are generally pair-wise where an individual relationship between two biological entities is studied without considering higher-order interactions with their neighbours. Graph and network analysis techniques allow the exploration of the position of a biological entity in the context of its local neighbourhood in the graph and the network as a whole. BioLayout Express3D has evolved from a program called BioLayout. BioLayout was originally written as a general approach for the representation and analysis of relatively small networks of various types and complexity. BioLayout Express3D is the product of an 18 month programme of extension and modification of the core BioLayout system specifically, but not exclusively, so as to facilitate a new approach to the analysis of microarray expression data. These improvements include: 1. Built in probe to probe Pearson correlation calculation and storage 2. Built in network building and clustering for gene-expression data 3. Layout of large graphs 4. Highly optimised routines for layout and correlation calculation 5. 3D visualisation of network graphs 6. Input of multiple annotation classes 7. Implementation of Markov clustering routine (MCL) 8. Expression profile viewer for single and dual colour analyses 9. Class annotation viewer BioLayout Express3D is entirely written in Java and is portable as a jar file across the Windows, Mac, Linux and other operating systems. We believe that this new approach to represents a significant advance on previous analytical techniques for microarray data. The approach we have taken is novel and overcomes some of the intrinsic problems associated with the visualisation and clustering expression data (and large network graphs derived from other data sources). Together, we believe we have developed an approach and tool that potentially will have wide application to microarray data and beyond.

Publications

10 25 50
 
Description See BB/I001107/1 which is an extension of this work.
Exploitation Route The website currently receives hits from approx. 700 users a month from around the globe and is used 1500-2000 times a month. This tool has been used in analyses that have contributed to over 40 publications. The technology and know how developed is currently en route to being commercialised and developed further by a new spin out company called Kajeka.
Sectors Digital/Communication/Information Technologies (including Software),Pharmaceuticals and Medical Biotechnology

URL http://www.biolayout.org/about/publications/
 
Description This output of this work is summarised under the grant BB/I001107/1 which is an extension of this work.
First Year Of Impact 2007
Sector Digital/Communication/Information Technologies (including Software),Pharmaceuticals and Medical Biotechnology
Impact Types Economic

 
Title BioLayout Express3D 
Description BioLayout Express3D is a powerful tool for the visualization and analysis of network graphs. Network-based approaches are becoming increasingly popular for the analysis of complex systems of interaction and high dimensional data. Networks can be produced from a wide variety of relationships between entities. In biology this includes the interactions between individuals, disease transmission, sequence similarity, metabolic pathways, protein interactions, pathways, regulatory cascades, gene expression, clinical data. This tool represents the product of over 15 years research and development and uses a combination of high end 3D graphics, algorithms and user-friendly graphical interfaces to allow the user to explore and better analyse their data. 
Type Of Material Data analysis technique 
Year Produced 2007 
Provided To Others? Yes  
Impact The website currently receives hits from approx. 700 users a month from around the globe and is used 1500-2000 times a month. This tool has been used in analyses that have contributed to over 40 publications. The technology and know how developed is currently en route to being commercialised and developed further by a new spin out company called Kajeka. 
URL http://www.biolayout.org/
 
Title Virtually Immune 
Description Virtually Immune is a resource that aims to support the graphical and computational modelling of immune pathways. The tools and resources presented there are primarily designed for biologists, both to use and to allow them to develop their own models of systems of interest to them. Virtually Immune introduces: a standardised system for presenting pathway knowledge a user-friendly platform for modelling pathway resources which capture and collate information from hundreds of papers a platform for knowledge exchange within the scientific community 
Type Of Material Computer model/algorithm 
Year Produced 2014 
Provided To Others? Yes  
Impact As a result of setting up this website as part of the phase 1 of the NCR's Crack it challenge we are beginning to guide people to it as a resource for our modelling efforts. Two papers in preparation describing the tools and our modelling approach. 
URL http://www.virtuallyimmune.org/
 
Title macrophages.com 
Description Macrophages.com is an online resource for those interested in macrophages and their role as major effector cells in innate and adaptive immunity. This website is designed to act as a centralised resource for the worldwide community of scientists interested in different aspects of macrophage biology. Included here are: Collections of macrophage images in different tissues Expression datasets and transcriptional analyses Analysis of protein expression in macrophages A gene-centered information portal for macrophage-expressed genes Macrophage pathway resources Curated key publications and reviews on aspects of macrophage biology Useful online resources News and events 
Type Of Material Database/Collection of data 
Year Produced 2009 
Provided To Others? Yes  
Impact This site is currently receiving hits from over a 1,00 unique users per week. 
URL http://www.macrophages.com/
 
Title BioLayout Express3D 
Description This is a network analysis tool and represents the product of over 15 years research and development. It uses a combination of high end 3D graphics, algorithms and user-friendly graphical interfaces to allow the user to explore and better analyse their data. 
IP Reference  
Protection Copyrighted (e.g. software)
Year Protection Granted 2014
Licensed Yes
Impact The website currently receives hits from approx. 700 users a month from around the globe and is used 1500-2000 times a month. This tool has been used in analyses that have contributed to over 40 publications. The technology and know how developed is currently en route to being commercialised and developed further by a new spin out company called Kajeka.
 
Title BioLayout Express3D 
Description BioLayout Express3D is a powerful tool for the visualization and analysis of network graphs. Network-based approaches are becoming increasingly popular for the analysis of complex systems of interaction and high dimensional data. Networks can be produced from a wide variety of relationships between entities. In biology this includes the interactions between individuals, disease transmission, sequence similarity, metabolic pathways, protein interactions, pathways, regulatory cascades, gene expression, clinical data. This tool represents the product of over 15 years research and development and uses a combination of high end 3D graphics, algorithms and user-friendly graphical interfaces to allow the user to explore and better analyse their data. 
Type Of Technology Software 
Year Produced 2007 
Open Source License? Yes  
Impact The website currently receives hits from approx. 700 users a month from around the globe and is used 1500-2000 times a month. This tool has been used in analyses that have contributed to over 40 publications. 
URL http://www.biolayout.org/
 
Company Name Kajeka Ltd 
Description Company offering network analysis tools and services for the analysis of high dimensional data. Based on IP and know how behind BioLayout Express3D. 
Year Established 2014 
Impact Currently establishing the company as part of the UP accelerator, Edinburgh (www.upaccelerator.com)
 
Company Name Fios Genomics Ltd 
Description Fios Genomics is a provider of an extensive range of bioinformatics data analysis services to Pharma, CROs and academia for drug discovery & development and applied research across all species. They provide access to a combined resource of in-house bioinformaticians, statisticians and biologists working together to analyse and interpret your genomic, transcriptomic and proteomic data. 
Year Established 2008 
Impact Supply of data analysis services to pharma, CRO's and academic groups. Currently employ 7 full time people.
Website http://www.fiosgenomics.com
 
Description School visits - Prestonfield primary school, Edinburgh 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Schools
Results and Impact Talk to P4 and P5 classes about biological research, especially around DNA as part of work week. Showed BioLayout and lots of questions asked by pupils.

I was told by the teacher that after my talks some of children said they wanted to be scientists.
Year(s) Of Engagement Activity 2012,2013,2014