Efficient Biological Networks Discovery and Analysis

Lead Research Organisation: University of Liverpool
Department Name: Computer Science

Abstract

The hopping PhD student will spend six months in the CCBM in the IIB at Liverpool University, working closely with pure biologists and computational biologists. The main purpose of this placement will be to acquire by the discipline hopper good understanding of biological networks. This project will be used also to set foundation for further development of respective computational tools and to liaison research collaboration between the two research groups in Algorithms and Systems Biology and their respective collaborators.

The main research challenges to be addressed within this projects include:

1) NETWORK MODULARIZATION: Network modularization consists in the identification of a portion of large network that share certain characteristics. Most of the available methods that perform well use a definition of network module based on connectivity. Some of the more advanced approaches instead aim at integrating multi-level information (e.g. agglomeration of several gene properties and gene relation- ships in the module search) within a module and are therefore more suitable for representing biological complexity. Unfortunately, these tend to perform well inl small networks (<1000 nodes) and they either fail for larger networks, which are of real interests to biologists. In search for efficient solutions we will look into new promissing clustering methods. The group lead by Prof Gasieniec (PI) currently develops a tool "Graph Draw" designed for analysis of real datasets gathered from a wide range of social networking mediums and manipulates the layout of the data in order to produce meaningful representation of information, from which can be analysed to achieve some specific goals. Metrics used in Graph Draw include degree centrality, closeness centrality, betweenness centrality, page rank, transitivity, amongst others. This joint project is expected to build further on the success of Graph Draw in the context of complex biological networks analysis.

2) NETWORK VISUALISATION: The visualization and the visual analysis of biological networks are one of the key analysis techniques to cope with the enormous amount of data. In particular, the layout of networks should be in agreement with biological drawing conventions and should be adopted [19]. In general, visualization methods for the life sciences should allow for the layout and navigation of biological networks for both their static presentation as well as their interactive exploration. Such methods need to adhere to constraints that originate from recognized textbook and poster layouts from generally accepted drawing conventions within the life-science community as well as from standardization initiatives such as MIM (Molecular Interaction Maps) and SBGN (Systems Biology Graphical Notation). The Graph Draw tool provides also some visualisation based on force-directed graph drawing algorithms. Further extensions including mutilayer presentation and animation are sought also within this project.

Planned Impact

The research and dissemination activities planned in the proposed project have the potential to impact directly the environmental scientific community and longer term are bound to have a profound impact on the development of omics based platforms for understanding and modeling environmental biodiversity.

A key component of this project is to promote networking activities between the two departments directly involved in the hopping and more broadly involving key members of the UK and international community. We plan to achieve this by organizing a workshop where our teams will present the innovative approaches explored in this project and external speakers from relevant areas of science will be invited to present ideas and solutions to specific challenges in the integration of environmental "omics" data within the framework of network science. The direct involvement of leading researchers in the area of network analysis will promote knowledge exchange that will result in the development of long term collaborative frameworks, which we expect will be supported by additional funding (e.g. EPSRC international workshops).

Establishing a strong connection between experts in network analysis from computer science and other non-biology fields will also contribute to enhance UK leadership in environmental informatics, which we expect will play a key role in generating innovation in the foreseeable future.

Publications

10 25 50
 
Description The main aim of this project was to seek efficient ways to analyze biological network data via further study and application of methods known in Computer Science and related fields. The main objectives of the projects included determining the current research needs and suitable algorithmic solutions as well as designing software prototypes including formatting input data and provision of network analysis mechanisms. All concluded with efficient dissemination of the results. The adopted research mechanism was relocation of a CS PhD student, discipline hopper, who worked with the Life Sciences team for six month. Another important mechanism adopted for the purpose of this project was a thematic workshop co-organized with International Environmental'Omics Synthesis Conference that took place in Cardiff in September 2013. Finally the work was supported by several incoming and outgoing research visits associated with research and software development discussions with the respective specialists. We have identified a CS PhD student Thomas Gorry (named researcher on the project proposal) who worked closely with Prof. Falciani's team in the Department of Functional and Comparative Genomics at the Institute of Integrative Biology. Apart from the leader the team comprised two researchers Dr Philipp Antczak and Mr John Herbert. The team had been working on ways to construct and explore biological networks in meaningful ways and with this purpose had developed a software pipeline scheme. The main purpose of the developed software is to assist the user in better understanding of gene expression data via a suitable construction and further analysis of (hidden) networks. Previously the CS team had developed an early network analysis software allowing analysis, manipulation and visualization of generic networked data. The software developed as a part of this project provides a pipeline that conveniently incorporates a number of previously known data processing and analysis mechanisms that were handled manually. More particularly, all elements/features of the pipeline are introduced as additional modules to the existing network analysis/visualization software tool. The main features incorporated into the software include: - the ability to load data in the form of an Expression Matrix and a ranked list/lists of genes. (with options to view and select the required information from these files) - the automated use of the SOTA: Self Organizing Tree Algorithm to cluster the genes provided. (the user has control over the input parameters if required) - gene set enrichment capabilities (with user control over input parameters) - functional annotation of genes and clusters (with user control over input parameters) - automated use of the ARACNE algorithm to construct the network. (the user is given the choice of building either a network based on individual genes or the clusters created by the SOTA algorithm) - various coloring capabilities to aid with cluster visualization in the main window of the software. - data display windows to enable users to easily click on a gene or cluster in the network and gain access to the data procured from the above features. Statement from the discipline hopper: "I feel that this opportunity to work closely with a team from a different background and on a topic has not only provided me with an insight of how other departments and groups operate but also allowed me to expand my knowledge in the area of environmental biology. Taking an active part in this project allowed me to attend the first annual International Environmental 'Omics Synthesis Conference in Cardiff and I had the opportunity to showcase our software in the workshop that was attached to the main event. Furthermore, we were able to showcase the newly developed software further to members of the biological research community during a teaching workshop on Bioinformatics hosted at the Department of Computer Science and run by Prof Falciani's team. Finally, we are currently putting together a research paper in collaboration with Prof Falciani and his team that will discuss functionality of the software on the example of analysis of respective environmental data sets owned by the group. We intend to publish this paper in a good quality journal."
Exploitation Route We are seeking further applications of the developed software. We are currently investigating possible interest and in turn support from CRUK Technology Board. We also seek ways of commercialization of certain parts of our work with the help of University of Liverpool legal team. We believe that the software tool can be of a great use to biological researchers due to its easy usability, its capability to handle large complex networks and also the features above that have been incorporated. A user will not need to have experience in the algorithms used and all the information produced by these algorithms is presented clearly and easy to explore in the software as well as being stored in a user specified file. The user can also export the network in compatible formats for use in other network visualization and analysis software.
Sectors Digital/Communication/Information Technologies (including Software),Environment

 
Description Building the PTM map of the human genome through commensal computing
Amount £232,708 (GBP)
Funding ID BB/L005239/1 
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 01/2014 
End 01/2017
 
Description NeST - Network Sciences and Technologies
Amount £30,000 (GBP)
Funding ID NeST 
Organisation University of Liverpool 
Sector Academic/University
Country United Kingdom
Start 01/2013 
 
Description Targeting suitable stem cell donors through ML based modelling, classification, clustering and prediction
Amount £10,000 (GBP)
Organisation University of Liverpool 
Sector Academic/University
Country United Kingdom
Start 11/2018 
End 07/2019
 
Title Pipelined network analyser (including visualisation) 
Description The main purpose of the developed software is to assist the user in better understanding of gene expression data via a suitable construction and further analysis of (hidden) networks. Previously the CS team had developed an early network analysis software allowing analysis, manipulation and visualization of generic networked data. The software developed as a part of this project provides a pipeline that conveniently incorporates a number of already used by biologists data processing and analysis mechanisms previously handled manually. More particularly, all elements/features of the pipeline are introduced as additional modules to the existing network analysis/visualization software tool. The main features incorporated into the software include: - the ability to load data in the form of an Expression Matrix and a ranked list/lists of genes. (With options to view and select the required information from these files) - the automated use of the SOTA: Self Organising Tree Algorithm to cluster the genes provided. (the user has control over the input parameters if required) - gene set enrichment capabilities (with user control over input parameters) - functional annotation of genes and clusters (with user control over input parameters) - automated use of ARACNE algorithm to construct networks. (the user is given the choice of building either a network based on individual genes or the clusters created by the SOTA algorithm) - various colouring capabilities to aid with cluster visualization in the main window of the software. - data display windows to enable users to easily click on a gene or cluster in the network and gain access to the data procured from the above features. 
Type Of Technology Software 
Year Produced 2013 
Impact The software produced so far should be seen as a proof of concept. The model and the solutions proved to be correct including suitability for our partners in Life Sciences. However, we still need to make several important amendments to the developed syst