Development of a graph-theoretic approach to predict protein function by integrating large scale heterogeneous data

Lead Research Organisation: Royal Holloway University of London
Department Name: Computer Science

Abstract

The list of organisms with completed genome sequence is continuously growing and this has led to the identification of thousands of genes whose function is still unknown. These genes could potentially be involved in important biological cell functions and could represent important targets for diagnostic and pharmacogenomics studies and be of industrial and agronomical importance. A major undertaking for biology is therefore that of identifying the function of these uncharacterized genes on a genomic scale. The challenge for bioinformatics is then to devise algorithmic methods that, given a gene, can predict a hypothesis for its function that can then be validated by wet-lab assays. Luckily, new experimental techniques have become available, producing data which offer clues about protein function and can therefore be employed for function prediction, e.g. protein interaction data, gene expression data. Some experimental and computational data have a natural representation as networks (e.g. protein interaction data), others are inherently 'one-dimensional' (e.g. sequence patterns). Three facts have recently become clear: while each data type contains important information that can help in determining the function of a protein, no single data type by itself suffices; large-scale functional inference greatly improves by integrating evidence from different sources; for those data types which can be represented as networks, the best results are obtained by algorithms that take advantage of the networks' topologies. So far, methods that make functional inferences on networks are very limited in the type of data they can integrate, while methods that can integrate a greater variety of data do not take advantage of the networks' topologies. I intend to investigate a general method that can integrate essentially any data type currently available taking into account its intrinsic structure: it takes advantage of the graph topology for network data, and it can integrate this evidence together with one-dimensional information. I shall develop graph-theoretical methods that use the diffusion of information over graphs to generate functional evidence from network data. This evidence is then combined with other one-dimensional information using machine learning techniques. The strength of the methodology lies in its ability to use diverse sets of noisy data, and to combine them to obtain sound statistical inferences; the weak signals contained in each dataset is enhanced by integrating the data. The methodology will be first developed on Yeast, and I shall then transfer this approach to higher organisms such as C. elegans, D. melanogaster, A. thaliana, and H. sapiens. For all these organisms the performance of the algorithms will then be evaluated 'in silico' by means of test sets; that is I shall verify the accuracy of the methods at predicting the function for genes whose annotation is known. The approach will then be tested 'in vivo' on a sub-network of genes that form signalling pathways (MAPK signalling) and function to transmit information from receptors to gene expression. MAPK pathway components are highly diversified in the model plant, Arabidopsis thaliana, with 123 components. For many of these we do not know how they connect up and what their biological functions are. These will be predicted by the algorithms and then functionally tested by silencing their expression using RNA interference and in mutant lines. I shall also design and implement stand-alone and web-based software tools incorporating the algorithms developed. The applications will enable the biologist to easily apply the algorithms through a user-friendly interface; to visualize the relevant biological networks thus making the inference process transparent and providing an explanation for the functional annotation predicted by the system. A web tool will also be created. All these tools will be made freely available to the scientific community.

Technical Summary

Statistically sound large-scale protein function prediction can be obtained only by integrating evidence from different sources. Functional inference methods that exploit biological networks topologies offer good performance. But so far such methods are limited in the type of data they can integrate, while methods that can integrate a greater variety of data do not take advantage of the networks' topologies. I propose a general method that can integrate essentially any data type available taking into account the intrinsic structure of each data type: it uses graph-theoretic methods to produce functional evidence from network data, and it integrates it with evidence from one-dimensional information using machine learning techniques. Defining function in terms of the Gene Ontology, I shall collect datasets for S. cerevisiae, C. elegans, D. melanogaster, A. thaliana, H. sapiens. Algorithm development and testing will be done on S. cerevisiae. I shall then verify how these methods transfer to the other organisms. Performance on these organisms will be evaluated 'in silico', by means of test sets. The approach will also be tested 'in vivo' by predicting the Biological Process for a group of MAP kinases that belong to the signalling pathways of A. thaliana. These predictions will be tested through functional assays: 1. an RNAi screen and quantitative measurements of MAPK signalling outputs, MAPK activities and promoter activations in cultured Arabidopsis cells 2. quantitative phenotypic tests for selected phenotypes in cell differentiation (e.g. stomata development) and stress responses. I shall design and implement stand-alone and web-based software tools incorporating the algorithms developed. These will enable the biologist to easily apply the algorithms through a user-friendly interface; visualization tools will make the functional inference process transparent to the user. All these tools will be made freely available to the scientific community.
 
Title Artist in residence Kerry Lemon 
Description Drawing of plants with increased understanding how development shapes growth 
Type Of Art Artwork 
Year Produced 2014 
Impact Stimulating discussions with students. Media release. Planned exhibition. 
URL http://www.kerrylemon.co.uk/
 
Description The list of organisms with completed genome sequence is continuously growing and this has led to the identification of thousands of genes whose function is still unknown. These genes could potentially be involved in important biological cell functions and could represent important targets for diagnostic and pharmacogenomics studies and be of industrial and agronomical importance. A major undertaking for biology is therefore that of identifying the function of these uncharacterized genes on a genomic scale. The challenge for bioinformatics is then to devise algorithmic methods that, given a gene, can predict a hypothesis for its function that can then be validated by wet-lab assays.

In this grant we focused our attention to the problem of protein function for organisms for which little or no experimental data is available and the only available information is the set of protein sequences. This is a relevant problem with important implications for both industry and human health - it is the case, for example, of newly sequenced bacterial genomes. We successfully developed a new method for solving this problem based on a recent development in computer science: the diffusion of information over graphs. These methods emulate the way in which heat diffuses on a metal bar.

We also developed two further methods that predict protein function by grouping proteins into families. The first method, called GFam (Gene Family Annotation and Maintenance) groups proteins in a way that proteins in the same group share common domain architecture, and hence function. SCPS (Spectral Clustering of Protein Sequences) groups proteins according to their sequence similarity - similar proteins are likely to have evolved from a common ancestor and therefore are likely to share a similar function.

Our research in protein function prediction also led to the development of novel methods for inference and structure discovery in biological networks. This included ClusterONE, an algorithm for detecting protein complexes from experimental data, and GOSSTO, a method for quantifying the functional similarity between two genes.

Importantly, we applied these methods within a collaboration project with the labs of Andrew Emili (University of Toronto) and Edward Marcotte (Universty of Texas, Austin) which was aimed at detecting human protein complexes - the fundamental molecular machineries in the cell.
We were able to obtain the largest catalogue to date of human protein complexes from cell culture. In total, we detected 622 complexes encompassing 2,634 distinct
proteins. Notably, the majority (62%; 385/622) of the complexes were previously unknown (i.e., only 237 were already present in curated public databases). This catalogue constitutes a first draft of human protein complexes and therefore it provides a glimpse into the global physical molecular organization of human cells.

An important output of this project is constituted by user-friendly and reliable software packages implementing the algorithms that we developed. We have created a piece of software for every algorithm developed in this project, namely: S2F, GFam, SCPS, ClusterONE, GOSSTO. These tools allow biologists and bioinformaticians to easily deploy our methods, without the need of re-implementing our algorithms. All our software packages are freely available for the scientific community as downloadable applications from the lab website. Some of our tools are also available as web applications hosted on our servers. The high number of downloads of our tools testifies their importance for the scientific community; for example ClusterONE has already been downloaded 4801 times.
Exploitation Route The problem of protein function prediction is central in today's biology. Possible beneficiaries include:

1. The biological community at large, interested in comprehensive annotation of genomes.

2. The medical community, since elucidating human gene function can help us associate genes with certain human diseases.

3. Agriculture: predicting function for plant genes should enable us to design genetic methods to improve plant performance. Particularly, the signaling pathways on which we worked in this grant are important for plant adaptation to environmental changes.

4. Pharmaceutical companies looking to attack specific pathways.

5. New sequencing efforts: our software enables scientists to rapidly assign putative function to new genes in freshly sequenced organisms without conducting expensive functional assays.
Sectors Agriculture, Food and Drink,Chemicals,Energy,Environment,Healthcare,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology

URL http://www.paccanarolab.org
 
Description We are in contact with two research groups who have been using the output of S2F for organisms of high practical interest for crop production and for biofuel production. 1) Pablo Sotelo, from the Universidad Nacional de Asuncion (Paraguay) is working with the fungus Macrophomina phaseolina, a plant pathogen affecting more than 500 plant species (many crops among them, including soya). S2F has been used to produce a functional annotation for the proteome of this fungus, which is the first step for its characterization. This work is aimed at finding better and more targeted mechanisms for pest-control. Paraguay is one of the largest producers of soya in the world, and this work has important economic implications for the country. 2) Matteo Pellegrini (University of California, Los Angeles) leads a lab with a high interest in algal genomics. The lab is currently sequencing the genome of the unicellular alga Cyclotella cryptica, a model organism for lipid accumulation. This work has application in the biofuel production industry. The Pellegrini lab has been using the functional predictions provided by S2F to annotate this algal genome. Importantly, the algorithms we developed for specific biological networks can be applied to other types of networks. Therefore, some of our algorithms have impact not only on those problems for which we originally developed them, but also on different problems in Systems Biology as well as in other disciplines such as Pharmacology, Medicine or even Social Networks. For example, we originally developed ClusterONE for detecting protein complexes from protein interaction networks. However, ClusterONE is a general algorithm for overlapping clustering on weighted large scale networks. Therefore other research groups have successfully applied ClusterONE and proved its usefulness in several different domains. Some examples include: 1. Medicine: Clustering a genome-scale network obtained by integrating SNP array, gene expression microarray, array-CGH, CGH, GWAS and gene mutation data. This study was aimed at identifying key functional modules in lung adenocarcinoma. 2. Pharmacology: Associating drugs with protein domains in the context of myocardial infarction. 3. Pharmacology: Studying the mechanisms of adverse side effects of Torcetrapib, a drug being developed to treat hypercholesterolemia (elevated cholesterol levels) and prevent cardiovascular disease (its development was halted in 2006). 4. Social Networks: Detecting communities in Social Networks. Our research on diffusion methods for protein function prediction led to the development of methods for inference and structure discovery in biological networks. We applied some of these methods within a collaboration project with the labs of Andrew Emili (University of Toronto) and Edward Marcotte (Universty of Texas, Austin) which was aimed at detecting human protein complexes. In particular, for this project we deployed: ClusterONE, our algorithm for detecting overlapping protein complexes from PPI networks; GOSSTO, our method for calculating semantic similarities on the Gene Ontology; an information diffusion method we developed for denoising protein interaction data. The protein interaction networks identified experimentally in Emili's lab were enriched with networks generated using comparative genomics approaches in Marcotte's lab. Then, in our lab, we integrated this network with a semantic similarity graph (obtained using GOSSTO), applied our denoising procedure, and finally clustered the resulting graph using ClusterONE. We thus obtained the largest catalogue to date of human protein complexes from cell culture. The human protein complexes repository contains all the data generated in this study in an easily navigable format. These include all the pairwise protein interactions obtained through integration of the experimental data with public genomic evidence and the subunit composition of the 622 putative protein complexes obtained by clustering using ClusterONE. In our group, we have used S2F to participate in the second CAFA challenge, a competition of protein function prediction. Although this activity is within the academic domain, we think it has been important for acquiring visibility and engaging further collaborations. Finally, GFam was successfully used on Arabidopsis and the family groupings it provided were included in the TAIR10 genome release.
First Year Of Impact 2012
Sector Agriculture, Food and Drink,Chemicals,Energy,Environment,Healthcare,Pharmaceuticals and Medical Biotechnology
Impact Types Economic

 
Description 11. Ara-MKK-D: A bioinformatics and systems biology approach for the functional analysis of a growth-regulating MAP kinase pathway in Arabidopsis.
Amount ā‚¬Ā 189,670 (EUR)
Funding ID 41909 
Organisation European Commission 
Sector Public
Country European Union (EU)
Start 10/2007 
End 10/2009
 
Description ABI innovation
Amount $1,203,514 (USD)
Organisation National Science Foundation (NSF) 
Sector Public
Country United States
Start 09/2017 
End 09/2020
 
Description BBSRC Tools and Resources Development Fund
Amount Ā£114,257 (GBP)
Funding ID BB/K004131/1 
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 07/2012 
End 12/2013
 
Description EU, Marie Curie Fellowship to Dr Beatrix Horvath
Amount ā‚¬Ā 309,235 (EUR)
Organisation Marie Sklodowska-Curie Actions 
Sector Charity/Non Profit
Country Global
Start 03/2013 
End 05/2015
 
Description EU, Marie Curie Fellowship to Dr Fabio Manfredini (with Prof Mark Brown)
Amount ā‚¬Ā 221,606 (EUR)
Organisation Marie Sklodowska-Curie Actions 
Sector Charity/Non Profit
Country Global
Start 04/2014 
End 04/2016
 
Description EU, Marie Curie Fellowship to Dr Papdi Csaba (with Prof L. Bogre)
Amount ā‚¬Ā 221,606 (EUR)
Organisation Marie Sklodowska-Curie Actions 
Sector Charity/Non Profit
Country Global
Start 04/2013 
End 04/2015
 
Description Inference of RBR network and dynamic RBR complexes during leaf development.
Amount ā‚¬Ā 319,888 (EUR)
Funding ID 330789 
Organisation European Commission 
Sector Public
Country European Union (EU)
Start 03/2013 
End 03/2015
 
Description MAPK signalling network to adapt leaf growth to drought conditions.
Amount ā‚¬Ā 221,765 (EUR)
Funding ID 330713 
Organisation European Commission 
Sector Public
Country European Union (EU)
Start 05/2013 
End 05/2015
 
Description Molecular signatures: a systems biology tool to understand how leaf development is constrained by drought.
Amount ā‚¬Ā 121,869 (EUR)
Funding ID 255035 
Organisation European Commission 
Sector Public
Country European Union (EU)
Start 08/2010 
End 07/2011
 
Description Newton International Fellowship to Dr Tamas Nepusz
Amount Ā£98,000 (GBP)
Organisation The Royal Society 
Sector Charity/Non Profit
Country United Kingdom
Start 02/2009 
End 02/2011
 
Title Purification of protein complexes 
Description Use genomic tagged GFP lines for rapid purification of protein complexes and identification of protein complex components 
Type Of Material Biological samples 
Year Produced 2016 
Provided To Others? Yes  
Impact Established collaborations and accepted manuscript in EMBO J in 2017 
 
Title mutant lines, antibodies, GFP-tagged lines 
Description Tools for lipid signalling kinases, MAPKs, E2F-RBR such as antibodies, mutant lines, GFP-tagged lines 
Type Of Material Cell line 
Provided To Others? Yes  
Impact shared research material facilitate research in other groups 
 
Title ClusterONE 
Description Cluster ONE (Clustering with Overlapping Neighborhood Expansion) is a graph clustering algorithm that is able to handle weighted graphs and readily generates overlapping clusters. Owing to these properties, it is especially useful for detecting protein complexes in protein-protein interaction networks with associated confidence values. Cluster ONE is available as a standalone command-line application, as a plugin to Cytoscape or ProCope and as a web application. 
Type Of Material Computer model/algorithm 
Year Produced 2012 
Provided To Others? Yes  
Impact ClusterONE was one of the key steps in our Soluble Human Protein Complexes project, which provided the largest catalogue to date of human protein complexes from cell culture. The original publication describing the ClusterONE algorithm has received in excess of 130 citations so far (Google Scholar). 
URL http://www.paccanarolab.org/clusterone
 
Title ConSAT 
Description ConSAT is a database of Consensus Signature Architectures. A consensus architecture is a set of non-overlapping domain assignments (considering insertions) which tries to define uniquely each protein. These architectures are used for prediction of GO categories, and to assign weighted words derived from mining PubMed abstracts. The database is available at http://paccanarolab.org/consat 
Type Of Material Database/Collection of data 
Year Produced 2014 
Provided To Others? Yes  
Impact The results contained in this database are currently being used by two research groups who are actively working with organism of a high practical interest for crop production and for biofuel production (Pablo Sotelo, Universidad Nacional de Asuncion (Paraguay); Matteo Pellegrini, University of California, Los Angeles (USA)). 
URL http://paccanarolab.org/consat
 
Title Disease Similarity 
Description We introduce a MeSH-based method that accurately quantifies similarity between heritable diseases at molecular level. This method effectively brings together the existing information about diseases that is scattered across the vast corpus of biomedical literature. We prove that sets of MeSH terms provide a highly descriptive representation of heritable disease and that the structure of MeSH provides a natural way of combining individual MeSH vocabularies. We show that our measure can be used effectively in the prediction of candidate disease genes. 
Type Of Material Computer model/algorithm 
Year Produced 2015 
Provided To Others? Yes  
Impact There are no impacts yet, this work appeared only about 3 months ago. 
 
Title GFAM 
Description GFam (Gene Family Annotation and Maintenance) is a command-line tool for automatic functional annotation of gene families. GFam offers a framework for complete genome initiatives and model organism resources to build domain-based gene families, derive meaningful functional labels and maintain family annotation across genome releases seamlessly. Our approach constitutes a unified system for grouping proteins based on evolutionary and functional relationships. 
Type Of Material Computer model/algorithm 
Year Produced 2012 
Provided To Others? Yes  
Impact The family groupings provided by GFam for Arabidopsis were included in the tenth (and last) release of TAIR (The Arabidopsis Information Resource). The dataset produced with our method can be found at ftp://ftp.arabidopsis.org/home/tair/Genes/TAIR10_genome_release/TAIR10_domain_architectures.tab.t10 
URL http://www.paccanarolab.org/gfam
 
Title Landis 
Description Disease similarity measures quantify the distance between disease modules on the interactome. These measures can provide a starting point for in-depth exploration of the diseases at molecular level, and are of particular relevance for orphan diseases. LanDis is an explorable database, containing the disease similarities of 28.5 million pairs of heritable diseases. These are calculated by summarising the existing phenotype information about diseases through large scale analysis of hand curated data. 
Type Of Material Database/Collection of data 
Year Produced 2016 
Provided To Others? Yes  
Impact The paper presenting this database/model is still under review, so most scientist are not aware of its existence yet. However, I have already presented to conferences and meetings, receiving an extremely good feedback from everyone who tried it, especially clinician scientists. 
URL http://www.paccanarolab.org/landis/
 
Title MAPK 
Description This is a general repository of MAPK sequences and orthologues in the plant kingdom. Orthologues were inferred using the InParanoid and Plaza orthologue identifier programs. This site also contains pointers to published evidence for constructing MAPK networks in Arabidopsis Yeast and Human, including high throughput and targeted experiments. The base dataset included here appeared in the paper by Dóczi, Ökrész, Romero, Paccanaro and Bögre (see reference). 
Type Of Material Database/Collection of data 
Year Produced 2012 
Provided To Others? Yes  
Impact The original paper has been cited more than 20 times (Google Scholar). 
URL http://paccanarolab.org/static_content/MAPKevol/index.html
 
Title S2F 
Description S2F (Sequence to Function) is a software package implementing our diffusion-based method for predicting protein function in organisms for which little or no experimental data is available and the only available information is the set of protein sequences. Protein function is predicted with respect to terms in the Gene Ontology (GO). For a given protein the system provides a probability distribution over the GO terms, which is consistent with the ontology structure, i.e. the probability of a more general term is always higher than the probability of a more specific one. The stand-alone package is self-contained, including tools for generating a set of initial seed functional labels to diffuse as well as methods for inferring the biological networks onto which to diffuse the labels. 
Type Of Material Computer model/algorithm 
Year Produced 2012 
Provided To Others? Yes  
Impact The results obtained by this algorithm are currently being used by two research groups who are actively working with organism of a high practical interest for crop production and for biofuel production (Pablo Sotelo, Universidad Nacional de Asuncion (Paraguay); Matteo Pellegrini, University of California, Los Angeles (USA)). 
URL http://paccanarolab.org/s2f
 
Title SemanticSimilarity 
Description The introduction of ontologies for gene functional annotation allows us to compare genes by quantifying the similarity of the terms with which they are annotated. These comparisons are important as they contribute to the inference of functional relationships between gene products by providing a perspective that complements both experimental information and sequence-based approaches. The proposed measure, which we call the random walk contribution (RWC) can be integrated with any standard semantic similarity measure, which we call host similarity measure (HSM), to yield an integrated similarity measure (ISM) that takes into account the whole ontology structure. In other words our random walk similarity measure is a kind of 'add on' to one's favourite underlying similarity measure. 
Type Of Material Computer model/algorithm 
Year Produced 2012 
Provided To Others? Yes  
Impact One of the key steps in our Soluble Human Complexes project was the application of our Semantic Similarity method for calculating semantic similarities between human genes on the Gene Ontology. To date, the publication containing the method itself has been cited 22 times (Google Scholar). 
URL http://www.paccanarolab.org/static_content/gosim/
 
Title SolubleComplexes 
Description Our research on diffusion methods for protein function prediction led to the development of methods for inference and structure discovery in biological networks. We applied some of these methods within a collaboration project with the labs of Andrew Emili (University of Toronto) and Edward Marcotte (Universty of Texas, Austin) which was aimed at detecting human protein complexes. In particular, for this project we deployed: ClusterONE, our algorithm for detecting overlapping protein complexes from PPI networks; GOSSTO, our method for calculating semantic similarities on the Gene Ontology; an information diffusion method we developed for denoising protein interaction data. The protein interaction networks identified experimentally in Emili's lab were enriched with networks generated using comparative genomics approaches in Marcotte's lab. Then, in my lab, we integrated this network with a semantic similarity graph (obtained using GOSSTO), applied our denoising procedure, and finally clustered the resulting graph using ClusterONE. We thus obtained the largest catalogue to date of human protein complexes from cell culture. The human protein complexes repository contains all the data generated in this study in an easily navigable format. These include all the pairwise protein interactions obtained through integration of the experimental data with public genomic evidence and the subunit composition of the 622 putative protein complexes obtained by clustering using ClusterONE. 
Type Of Material Database/Collection of data 
Year Produced 2012 
Provided To Others? Yes  
Impact The original publication where this dataset was first release has been cited, to date, more than 100 times (Google Scholar). 
URL http://human.med.utoronto.ca/php/data_download.php
 
Title mutation3d 
Description A new algorithm and Web server, mutation3D (http://mutation3d.org), proposes driver genes in cancer by identifying clusters of amino acid substitutions within tertiary protein structures. We demonstrated the feasibility of using a 3D clustering approach to implicate proteins in cancer based on explorations of single proteins using the mutation3D Web interface. 
Type Of Material Computer model/algorithm 
Year Produced 2016 
Provided To Others? Yes  
Impact No notable impacts yet, the paper only appeared about a month ago. 
URL http://mutation3d.org/
 
Description Albrecht Von Arnim 
Organisation University of Tennessee
Department Department of Geography
Country United States 
Sector Academic/University 
PI Contribution TOR and S6K signalling, EBP1
Collaborator Contribution regulation of translation, making constructs for root meristem specific analysis of translatome and translational regulation
Impact project partner, manuscripts in preparation
Start Year 2015
 
Description Cancer genomics -- Haiyuan Yu (Cornell University) 
Organisation Cornell University
Country United States 
Sector Academic/University 
PI Contribution We recently started a collaboration with Yu lab in the field of cancer genomics, where we contributed to the development of a clustering method to predict cancer mutation hotspots in proteins. We used our expertise in clustering methods to provide an efficient solution an integrate it into a comprehensive analysis pipeline.
Collaborator Contribution Prof Yu and his lab have great expertise in the field of cancer genomics. They have contributed the biological question and the data.
Impact A journal paper describing the method is currently under review in BMC Biology. The collaboration is multi-disciplinary involving biologists and computational scientists.
Start Year 2013
 
Description Clustering of protein interaction networks -- Haiyuan Yu (Cornell University) 
Organisation Cornell University
Country United States 
Sector Academic/University 
PI Contribution We developed ClusterONE, a new method for protein complex detection using clustering on protein-protein interaction networks.
Collaborator Contribution Haiyuan Yu is an expert in protein-protein interaction screening, and protein-protein interaction prediction and he proposed different ways to evaluate the quality of the predictions. He also gave important feedback on the method. ClusterONE was published in 2012 in Nature methods (see below).
Impact This is an interdisciplinary collaboration between molecular biologists (Yu lab) and computational scientists (our lab). The collaboration has produced one clustering algorithm for detecting protein complexes from protein protein interaction networks, and its corresponding implementation (ClusterONE). The details of the publication are the following: T. Nepusz, H. Yu, and A. Paccanaro Detecting overlapping protein complexes in protein-protein interaction networks Nature Methods, vol. 9, pp. 471-472, 2012. The software (ClusterONE) is available in our website ( http://www.paccanarolab.org/clusterone/ ) an is released under a free software license (can be freely downloaded, executed and eventually, modified). The collaboration is multi-disciplinary involving biologists and computational scientists.
Start Year 2011
 
Description Development of a web resource for protein functional annotation -- Raj Sasidharan (BASF) 
Organisation BASF
Country Germany 
Sector Private 
PI Contribution We developed ConSAT, a tool for protein functional annotation using protein consensus domain architectures. In this project a new algorithm was developed and a web resource (ConSAT) with precomputed results was created (available at http://paccanarolab.org/consat ). The method includes three different types of functional prediction methods, two assigning Gene Ontology terms from the protein architecture, and one assigning English weighted words.
Collaborator Contribution Rajkumar Sasidharan's help was very important for the development of this project, mainly in two different fields: first, he provided expert knowledge in structural biology; second, he helped giving feedback on the usability of the web server, leading to its improvement.
Impact The project main output is the above referenced website. Publications are currently being written. The collaboration is multi-disciplinary involving biologists and computational scientists.
Start Year 2012
 
Description Disease gene prioritisation by the combination of gene networks -- Giorgio Valentini (Milan) 
Organisation University of Milan
Country Italy 
Sector Academic/University 
PI Contribution We preprocessed, cleaned and provided a set of biological datasets to Giorgio Valentini to assist in the development of several methods of gene networks combination for disease-gene prioritisation (that is, finding new causative genes for diseases). We provided, among others, several semantic similarity networks among sets of human genes. We also suggested new evaluation measures for this task.
Collaborator Contribution Giorgio Valentini developed a set of algorithms for finding new disease-gene associations. In that context he proposed many different ways in which different gene networks (both weighted and unweighted) could be combined to produce a resulting network resembling a relation based on the fact that two linked genes are supposed to share an underlying disease. The new predictions are given as an output of the paper (available at http://homes.di.unimi.it/re/suppmat/genesmeshnetwpred/supmatTBL1.html ).
Impact Apart from the above mentioned URL, the collaboration led to the following publication: G Valentini, A Paccanaro, H Caniza, AE Romero, M Re An extensive analysis of disease-gene associations using network integration and fast kernel-based gene prioritization methods Artificial Intelligence in Medicine 61 (2), 63-78
Start Year 2013
 
Description Dr Tamas Meszaros 
Organisation Semmelweiss University
Country Hungary 
Sector Academic/University 
PI Contribution In vitro translation of RBR and E2Fs and CDK kinases for protein-protein interaction and phosphorylation studies. In vitro translation of MAPKs and MKKs. Study protein-protein interaction and activation.
Collaborator Contribution In vitro protein interaction and phosphorylation screen
Impact Joined publications, projects
Start Year 2015
 
Description Dr Zoltan Magyar 
Organisation Hungarian Academy of Sciences (MTA)
Department Biological Research Centre (BRC)
Country Hungary 
Sector Academic/University 
PI Contribution Working on RBR-E2F, connecting translational regulation and cell cycle
Collaborator Contribution Providing antibodies and mutants in the RBR-E2F pathway
Impact research papers, collaboration with Bayern Crop Science
 
Description Drug side effect prediction (with Mark Gerstein and Shantao Li, Yale University) 
Organisation Yale University
Country United States 
Sector Academic/University 
PI Contribution We have developed a new method for predicting side effects of drugs. Our preliminary results show that our method represents a great improvement with respect to the existing state of the art in terns of side effect prediction. Moreover, it is the first method that can predict the expected frequency of side effects in the population.
Collaborator Contribution They are helping us to provide an explanation of some aspects of our models in terms of the biology/biochemistry/pharmacology.
Impact A journal article is in preparation.The collaboration is multi-disciplinary involving biologists and computer scientists.
Start Year 2017
 
Description Enhancer prediction using epigenetic signals in different mouse tissues (with Mark Gerstein and Mengting Gu, Yale University) 
Organisation Yale University
Department Department of Molecular Biophysics and Biochemistry
Country United States 
Sector Academic/University 
PI Contribution Apply machine learning, signal processing and pattern recognition methods for improving the performance of the enhancer prediction for different tissues in the mouse genome. Preliminary results indicate that ensemble methods perform better than other classifiers. More advanced methods for feature extraction such as deep learning are going to be tested on the data.
Collaborator Contribution Members of the Gerstein Lab developed a pattern recognition method called matched filters for enhancer prediction. However, our preliminary results show that advanced machine learning may improve prediction accuracy. The Gerstein Lab supplied the data and will interpret the results in the context of enhancer and promoters in the genome.
Impact The collaboration is multi-disciplinary involving biologists and computer scientists.
Start Year 2017
 
Description Finding evolutionary relations between plant MAPKs -- Laszlo Bogre (Royal Holloway) 
Organisation Royal Holloway, University of London
Department School of Biological Sciences
Country United Kingdom 
Sector Academic/University 
PI Contribution We collaborated with the Bogre lab in the elucidation of the evolutionary relations between the different Mitogen-activated protein kinases (MAPKs) in different model plants. Using computational techniques we were able to depict some of this relations, ultimately leading to the construction of the 'Plant MAPK Network Resource', available at http://www.paccanarolab.org/static_content/MAPKevol/ .
Collaborator Contribution Prof Bogre and his team provided us with their MAPK dataset, their expert knowledge in the field and their biological questions. This lead to the improvement of our methods for ortholog detection. The collaboration is still ongoing and we are currently developing new computational methods to detect relations between MAPKs and substrates.
Impact The outputs of this project are two: one web resource (the plant MAPK network resource, see above) and one joint publication: R. Dóczi, L. Ökrész, A. E. Romero, A. Paccanaro, and L. Bögre Exploring the evolutionary path of plant MAPK networks Trends in Plant Science, vol. 17, iss. 9, pp. 518-525, 2012. The collaboration is multi-disciplinary involving biologists and computational scientists.
Start Year 2011
 
Description Functional prediction for Cyclotella cryptica -- Matteo Pellegrini (UCLA) 
Organisation University of California, Los Angeles (UCLA)
Country United States 
Sector Academic/University 
PI Contribution The Pellegrini Lab is interested in better understanding certain metabolic pathways in the genome of the alga Cyclotella cryptica. This alga is particularly important from an economic perspective as it is important to the growing algal biofuels industry due to its higher levels of lipid production. In order to better understand those pathways, an important step is to provide a functional annotation in the genes of the organism. Our contribution to Prof Pellegrini research has been based in providing a functional annotation of this alga using ConSAT and S2F (the function annotation tools that we developed in the context of our grants).
Collaborator Contribution Though this collaboration is still ongoing, feedback from the Pellegrini's lab has been incorporated into our tool to make it more usable.
Impact We expect journal publications to be written soon. The collaboration is multi-disciplinary involving biologists and computational scientists.
Start Year 2012
 
Description Functional prediction for Macrophomina phaseolina -- Pablo Sotelo (Universidad Nacional de Asuncion) 
Organisation National University of Asuncion
Country Paraguay 
Sector Academic/University 
PI Contribution We have provided the Sotelo lab with a complete functional annotation of the fungus Macrophomina phaseolina. This was done using both S2F and CONSAT, our systems for protein function prediction. Macrophomina phaseolina has been recently sequenced and is responsible for a plague affecting many crops and particularly soya, of which Paraguay is one of the largest producers in the world. Our contribution will help, in ultimate analysis, both the development of new pesticides to fight this fungus, and in the research of genetically modified varieties of soya, resistant to this plague.
Collaborator Contribution The Sotelo lab has been providing us with feedback to improve our system and on the accuracy of our predictions. This is very helpful for us in order to improve our system.
Impact This is a multidisciplinary collaboration, between computational scientists (Paccanaro lab) and life scientists (Sotelo lab). We expect to produce a joint publication in the near future as an output of this collaboration. The collaboration is multi-disciplinary involving biologists and computational scientists.
Start Year 2014
 
Description GFam, a tool to predict protein architectures -- Raj Sasidharan (UCLA, TAIR) 
Organisation BASF
Country Germany 
Sector Private 
PI Contribution Our contribution (the GFam software) was motivated by the needs of TAIR (The Arabidopsis Genome Initiative, a public hub initiative to understand the plant genomes) of an automatic tool to curate functional categories assigned to the official release of the Arabidopsis thaliana genome. GFam was specifically created for this purpose, although it was published as a general tool for protein function annotation. GFam was used to produce the tenth official release of the functional annotation of Arabidopsis thaliana, the model organism for plants.
Collaborator Contribution Several of the ideas implemented in GFam came from the semi-manual procedures used in TAIR by Raj Sasidharan and others to perform functional annotation of protein sequences.
Impact The GFam families for Arabidopsis can be found in TAIR (ftp://ftp.arabidopsis.org/home/tair/Genes/TAIR10_genome_release/TAIR10_domain_architectures.tab.t10) The collaboration also led to a publication: R. Sasidharan, T. Nepusz, D. Swarbreck, E. Huala, and A. Paccanaro GFam: a platform for automatic annotation of gene families Nucleic Acids Research, vol. 40, iss. 19, p. 152, 2012. The collaboration is multi-disciplinary involving biologists and computational scientists.
Start Year 2008
 
Description GFam, a tool to predict protein architectures -- Raj Sasidharan (UCLA, TAIR) 
Organisation University of California, Los Angeles (UCLA)
Country United States 
Sector Academic/University 
PI Contribution Our contribution (the GFam software) was motivated by the needs of TAIR (The Arabidopsis Genome Initiative, a public hub initiative to understand the plant genomes) of an automatic tool to curate functional categories assigned to the official release of the Arabidopsis thaliana genome. GFam was specifically created for this purpose, although it was published as a general tool for protein function annotation. GFam was used to produce the tenth official release of the functional annotation of Arabidopsis thaliana, the model organism for plants.
Collaborator Contribution Several of the ideas implemented in GFam came from the semi-manual procedures used in TAIR by Raj Sasidharan and others to perform functional annotation of protein sequences.
Impact The GFam families for Arabidopsis can be found in TAIR (ftp://ftp.arabidopsis.org/home/tair/Genes/TAIR10_genome_release/TAIR10_domain_architectures.tab.t10) The collaboration also led to a publication: R. Sasidharan, T. Nepusz, D. Swarbreck, E. Huala, and A. Paccanaro GFam: a platform for automatic annotation of gene families Nucleic Acids Research, vol. 40, iss. 19, p. 152, 2012. The collaboration is multi-disciplinary involving biologists and computational scientists.
Start Year 2008
 
Description Gene prioritisation for lymphoma growth on mutagenesis study 
Organisation Medical Research Council (MRC)
Department MRC Clinical Sciences Centre (CSC)
Country United Kingdom 
Sector Public 
PI Contribution Prediction of lymphoma growth stage by analysis of gene clonality values from a sample. Prioritisation of genes selected from broad loci sources involved in lymphomagenesis. This process yielded a set of about 20 genes selected for further studies.
Collaborator Contribution Mutagenesis developed lymphoma studies on over 500 mice, with the corresponding sample clonality analysis. Ongoing gene relevance analysis.
Impact Studies are still ongoing on the relevance of the selected genes. We expect to obtain a publication about this work when the process finishes. The study is multi-disciplinary and it comprises the following disciplines: cancer genomics, molecular biotechnology, systems biology, computer science, big data analysis, bioinformatics.
Start Year 2015
 
Description GoSSTo, a Tool for computing Gene Ontology Semantic Similarites -- Giorgio Valentini (University of Milan) 
Organisation University of Milan
Country Italy 
Sector Academic/University 
PI Contribution We developed GoSSTo a command line based-tool to compute semantic similarities between gene products. The tool implemented an algorithm previously published in our group, trying to make it accessible to any possible researcher. We also implemented GoSSToWeb, a web server providing easier access to this tool for biological researchers.
Collaborator Contribution Giorgio Valentini and his lab provided help for the development of the web interface of our tool for computing semantic similarities which was recently published, and also provided user feedback on the command line tool.
Impact The output is constituted by our software tools (GoSSTo and GoSSToWeb). Our web tool, available at www.paccanarolab.org/gosstoweb has had over 50 registered users and 70 submitted jobs thus far. Moreover, the collaboration is manifested in the following publication: H. Caniza, A. E. Romero, S. Heron, H. Yang, A. Devoto, M. Frasca, M. Mesiti, G. Valentini, and A. Paccanaro, GOssTo: a user-friendly stand-alone and web tool for calculating semantic similarities on the Gene Ontology Bioinformatics, vol. 30, iss. pp. 2235-2236, 2014. A preliminary version of this paper was submitted and accepted to the ISMB conference in 2013: H. Caniza, A. E. Romero, S. Heron, H. Yang, M. Frasca, M. Mesiti, G. Valentini, and A. Paccanaro. 'GOssTo and GOssToWeb: user-friendly tools for calculating semantic similarities on the Gene Ontology.' Bio-Ontologies SIG 2013-ISMB 2013 (2013).
Start Year 2012
 
Description Human Protein Complexes -- Emili (Un. Toronto), Marcotte (Un. Texas, Austin) 
Organisation University of Toronto
Country Canada 
Sector Academic/University 
PI Contribution Our research on diffusion methods for protein function prediction led to the development of methods for inference and structure discovery in biological networks. We applied some of these methods within a collaboration project with the labs of Andrew Emili (University of Toronto) and Edward Marcotte (Universty of Texas, Austin) which was aimed at detecting human protein complexes. In particular, for this project we deployed: ClusterONE, our algorithm for detecting overlapping protein complexes from PPI networks; GOSSTO, our method for calculating semantic similarities on the Gene Ontology; an information diffusion method we developed for denoising protein interaction data. The protein interaction networks identified experimentally in Emili's lab were enriched with networks generated using comparative genomics approaches in Marcotte's lab. Then, in my lab, we integrated this network with a semantic similarity graph (obtained using GOSSTO), applied our denoising procedure, and finally clustered the resulting graph using ClusterONE. We thus obtained the largest catalogue to date of human protein complexes from cell culture.
Collaborator Contribution The protein interaction networks identified experimentally in Emili's lab were enriched with networks generated using comparative genomics approaches in Marcotte's lab. Then, in my lab, we integrated this network with a semantic similarity graph (obtained using GOSSTO), applied our denoising procedure, and finally clustered the resulting graph using ClusterONE.
Impact 1) The human protein complexes repository contains all the data generated in this study in an easily navigable format. These include all the pairwise protein interactions obtained through integration of the experimental data with public genomic evidence and the subunit composition of the 622 putative protein complexes obtained by clustering using ClusterONE. 2) P. C. Havugimana, T. G. Hart, T. Nepusz, H. Yang, A. L. Turinsky, Z. Li, P. I. Wang, D. R. Boutz, V. Fong, S. Phanse, M. Babu, S. A. Craig, P. Hu, C. Wan, J. Vlasblom, V. U. Dar, A. Bezginov, G. W. Clark, G. C. Wu, S. J. Wodak, E. R. Tillier, A. Paccanaro, E. M. Marcotte, and A. Emili A census of human soluble protein complexes Cell, vol. 150, iss. 5, pp. 1068-1081, 2012. The collaboration is multi-disciplinary involving biologists and computational scientists.
Start Year 2009
 
Description Learning disease-gene associations by exploiting disease similarities (with Mark Gerstein, Yale University) 
Organisation Yale University
Department Department of Molecular Biophysics and Biochemistry
Country United States 
Sector Academic/University 
PI Contribution We recently developed a disease similarity measure and calculated all the disease-disease similarities between OMIM diseases. We established a prior disease-gene association probability and provided training and testing datasets for the learning. We fitted the model.
Collaborator Contribution Developed a Lipschitz diffusion model, that we used to spread the disease-gene association through the interactome, and a fully functional fast implementation of the algorithm.
Impact The collaboration is multi-disciplinary involving biologists and computer scientists.
Start Year 2017
 
Description Network-based Genome Analysis Reveals Structural and Functional Properties of Genes (with Mark Gerstein and Koon-Kiu Yan, Yale University) 
Organisation Yale University
Country United States 
Sector Academic/University 
PI Contribution We have analysed the spatial proximity of all pathway genes (KEGG Database) across various cancer cell lines. Our preliminary results provide strong evidence for a relationship between disease pathways and cancer. The study also helps identify candidate genes for a number of diseases.
Collaborator Contribution They have successfully applied network community detection techniques to Hi-C data (three-dimensional architecture of genomes) in order to identify topologically associating domains (TADs) of genomic regions.
Impact The collaboration is multi-disciplinary involving biologists and computer scientists.
Start Year 2017
 
Description Objective of the project is to elucidate the mechanism of action of a drug for multiple sclerosis 
Organisation Imperial College London
Department Faculty of Medicine
Country United Kingdom 
Sector Academic/University 
PI Contribution To analyse transcriptomics data obtained from a trial on human patients using network medicine approaches.
Collaborator Contribution They hosted a trial with human patients and extracted transcriptomics data at different times..
Impact No outputs yet. This collaboration is multidisciplinary involving: computer science, network science, machine learning, medicine, biology and pharmacology.
Start Year 2015
 
Description Pavla Binarova 
Organisation Academy of Sciences of the Czech Republic
Country Czech Republic 
Sector Academic/University 
PI Contribution Analysing RBR phosphorylation and interaction with microtubules.
Collaborator Contribution Microtubules, cell biology
Impact research papers, joined projects
Start Year 2010
 
Description Robert Doczi. MAPK evolutionary network, MAPK substrate prediction. 
Organisation Hungarian Academy of Sciences (MTA)
Department Centre for Agricultural Research (ATK)
Country Hungary 
Sector Academic/University 
PI Contribution The Paccanaro group analysed MAPK docking sites, and MAPK-MKK interaction surfaces when there is no canonical docking site.
Collaborator Contribution Developed a high throughput in vivo MAPK activation screen
Impact publications. Multidisciplinary collaboration. Computer Science, Biology
Start Year 2010
 
Title CONSAT 
Description ConSAT is a terminal-based application which can be used to functionally annotate a set of proteins, using its consensus domain architecture. Proteins are assigned Gene Ontology terms based on the domains composition of the architecture and on the already known experimental terms of proteins with a given architecture. In order to help in the production of a description of a protein sequence, it also assigns weighted English words derived from mining PubMed articles. ConSAT is written in Python. 
Type Of Technology Software 
Year Produced 2014 
Open Source License? Yes  
Impact ConSAT has been used to produce the homonym database (see 'databases'), which is being used in two external collaborations (with Pablo Sotello and Matteo Pellegrini, see 'collaborations'). ConSAT has been used for our participation in the second CAFA challenge, organized by an international research community of more than 50 research groups devoted to the study of protein function prediction methods. 
URL http://paccanarolab.org/ConSAT
 
Title ClusterONE 
Description ClusterONE (Clustering with Overlapping Neighborhood Expansion) is a graph clustering algorithm that is able to handle weighted graphs and readily generates overlapping clusters. Owing to these properties, it is especially useful for detecting protein complexes in protein-protein interaction networks with associated confidence values. ClusterONE is available as a standalone command-line application, as a plugin to Cytoscape or ProCope. 
Type Of Technology Software 
Year Produced 2012 
Open Source License? Yes  
Impact For the creation of the Human protein complexes repository (http://human.med.utoronto.ca/) the standalone version of ClusterONE was used to produce the putative protein complexes. This project provided the largest catalogue to date of human protein complexes from cell culture. All versions of the ClusterONE Cytoscape plugin have been downloaded a total of 4801 times, with 5 releases produced so far. The ClusterONE publication has in excess 130 citations. 
URL http://paccanarolab.org/clusterone
 
Title GFAM 
Description GFam (Gene Family Annotation and Maintenance) is a command-line tool for automatic functional annotation of gene families. GFam offers a framework for complete genome initiatives and model organism resources to build domain-based gene families, derive meaningful functional labels and maintain family annotation across genome releases seamlessly. Our approach constitutes a unified system for grouping proteins based on evolutionary and functional relationships. 
Type Of Technology Software 
Year Produced 2012 
Open Source License? Yes  
Impact The family groupings provided by GFam for Arabidopsis were included in TAIR10 genome release. The results are available from the official TAIR (The Arabidopsis Information Resource) website: ftp://ftp.arabidopsis.org/home/tair/Genes/TAIR10_genome_release/TAIR10_domain_architectures.tab.t10 
URL http://paccanarolab.org/gfam
 
Title GOSSTO 
Description Semantic similarity calculations aim to provide a quantifiable measure of functional relatedness of genes by assessing the similarity of the functional terms with which they are annotated. GOSSTO (Gene Ontology Semantic Similarity Tool) is a tool for calculating this measure with respect to Gene Ontology terms. It implements an improved diffusion-based measure developed in this project, as well as several well-established measures, such as those proposed by Resnik, Lin, Jiang, simUI. Powerful extension capabilities are included in GOSSTO, enabling the user to extend it with new similarity measures. GOSSTO is available as a standalone command-line application running on Windows, GNU/Linux and MacOS as well as a web tool. The webtool is available at www.paccanarolab.org/gosstoweb 
Type Of Technology Software 
Year Produced 2014 
Open Source License? Yes  
Impact For the creation of the Human protein complexes repository (http://human.med.utoronto.ca/) the standalone version of GOSSTO was used to compute semantic similarities between human genes in the Gene Ontology. This project provided the largest catalogue to date of human protein complexes from cell culture. Our web tool, available at www.paccanarolab.org/gosstoweb has had over 50 registered users and 70 submitted jobs thus far. 
URL http://paccanarolab.org/gossto
 
Title JustClust 
Description JustClust is a tool for analysing biological data with cluster analysis. JustClust can handle many formats of data and cluster the data with many state-of-the-art techniques. The aim of JustClust is to provide an easy-to-use application which can perform any analysis on any data. 
Type Of Technology Software 
Year Produced 2014 
Open Source License? Yes  
Impact The manuscript is currently being finalised. 
URL http://paccanarolab.org/justclust
 
Title Landis 
Description Disease similarity measures quantify the distance between disease modules on the interactome. These measures can provide a starting point for in-depth exploration of the diseases at molecular level, and are of particular relevance for orphan diseases. LanDis is a freely available web-based interactive tool that allows domain experts, medical doctors and the larger community to graphically navigate the landscape of human disease similarities. LanDis is designed to explore the similarity landscape of over 28.5 million pairs of heritable diseases, introducing a fully interactive and navigable plot in which diseases are represented as nodes and their pairwise similarity as the links joining them. 
Type Of Technology Webtool/Application 
Year Produced 2016 
Impact The paper presenting this webtool is still under review, so most scientist are not aware of its existence yet. However, I have already presented to conferences and meetings, receiving an extremely good feedback from everyone who tried it, especially clinician scientists. 
URL http://www.paccanarolab.org/landis
 
Title S2F 
Description S2F (Sequence-to-Function) is a software package implementing our diffusion-based method for predicting protein function in organisms for which little or no experimental data is available and the only available information is the set of protein sequences. Protein function is predicted with respect to terms in the Gene Ontology (GO). For a given protein the system provides a probability distribution over the GO terms, which is consistent with the ontology structure, i.e. the probability of a more general term is always higher than the probability of a more specific one. The stand-alone package is self-contained, including tools for generating a set of initial seed functional labels to diffuse as well as methods for inferring the biological networks onto which to diffuse the labels. 
Type Of Technology Software 
Year Produced 2014 
Open Source License? Yes  
Impact The results obtained using S2F are currently being used by two research groups who are actively working with organism of a high practical interest for crop production and for biofuel production (Pablo Sotelo, Universidad Nacional de Asuncion (Paraguay); Matteo Pellegrini, University of California, Los Angeles (USA)). S2F has been used for our participation in two CAFA challenges, organized by an international research community of more than 50 research groups devoted to the study of protein function prediction methods. 
URL http://paccanarolab.org/s2f
 
Title SCPS 
Description SCPS (Spectral Clustering of Protein Sequences) is an efficient, user-friendly, scalable and multi-platform implementation of a spectral clustering method for clustering homologous proteins. SCPS also implements connected component analysis and hierarchical clustering, integrates TribeMCL and interfaces with external tools such as Cytoscape and NCBI BLAST. 
Type Of Technology Software 
Year Produced 2010 
Open Source License? Yes  
Impact The paper is classified as 'highly accessed' on the journal website. The work has been cited 28 times already. Many of the papers citing SCPS make use of the software for large scale clustering of protein sequences in practical, real world applications. 
URL http://paccanarolab.org/scps
 
Title mutation3D 
Description mutation3D is a functional prediction and visualization tool for studying the spatial arrangement of amino acid substitutions on protein models and structures. It is intended to be used to identify clusters of amino acid substitutions arising from somatic cancer mutations across many patients in order to identify functional hotspots and fuel downstream hypotheses. It is also useful for clustering other kinds of mutational data, or simply as a tool to quickly assess relative locations of amino acids in proteins. 
Type Of Technology Webtool/Application 
Year Produced 2016 
Impact It is still too early, the tool was released about a month ago. 
URL http://mutation3d.org/
 
Description Artist in Residence Kerry Lemon 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Public/other audiences
Results and Impact Stimulating discussions to bridge the gap between science and artistic thinking

Artistic drawing with understanding of plant development
Year(s) Of Engagement Activity 2014,2015,2016,2017
URL http://www.kerrylemon.co.uk/
 
Description Bristol2012 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other academic audiences (collaborators, peers etc.)
Results and Impact The talks led to interesting discussions and finding new contacts

Some plans were made for future collaboration
Year(s) Of Engagement Activity 2012
 
Description Cambridge2013 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other academic audiences (collaborators, peers etc.)
Results and Impact Talks about our research and methods with peers

Setting collaboration activities with our peers
Year(s) Of Engagement Activity 2013
 
Description ClusterONE press release 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Media (as a channel to the public)
Results and Impact We advertised on the Royal Holloway college website the publication of the ClusterONE algorithm and of its accompanying software in Nature Methods. The advertisements sparked a lot of interest for the algorithm in the college.

As a consequence of the advertisement, we were approached by biologists in the School of Biological Sciences at Royal Holloway with whom we started collaborating for clustering large scale experimental co-expression networks that they were producing.
Year(s) Of Engagement Activity 2012
URL https://www.royalholloway.ac.uk/computerscience/news/newsarticles/researchersalgorithmpublishedinsci...
 
Description Co-PI Talk 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Schools
Results and Impact The Co-PI gave a talk at a conference organized for grammar school pupils at Aylesbury Grammar School. Discussions of the complexities of signalling pathways and why they are important.

As the conferences were focused in plant biology, a number of pupils decided to find out more options to study Biological Sciences after high school.
Year(s) Of Engagement Activity 2010
 
Description Cornell2010 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other academic audiences (collaborators, peers etc.)
Results and Impact Meeting with researchers and interesting work discussions

Elaboration of plans for future collaboration
Year(s) Of Engagement Activity 2010
 
Description Cornell2013 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other academic audiences (collaborators, peers etc.)
Results and Impact The talk sparked discussions with other scientists. The feedback I obtained was useful for my current research. The talk was important to advertise my research and to make contacts for future collaborations.

A collaboration was initiated with the group of Prof. Haiyuan Yu for a new joint research project aimed at finding hotspot mutations in Cancer proteins. The collaboration is ongoing and a paper is currently under review in BMC Biology.
Year(s) Of Engagement Activity 2013
 
Description GlaxoSmithKline2013 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other academic audiences (collaborators, peers etc.)
Results and Impact Engagement with contacts and discussions of mutual interests

Plans for collaboration with some contacts made
Year(s) Of Engagement Activity 2013
 
Description ISMB 2010 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other academic audiences (collaborators, peers etc.)
Results and Impact The poster presentation generated interest and positive feedback from the participants of the event

Other participants provided interesting ideas that helped us on our research
Year(s) Of Engagement Activity 2010
URL http://www.iscb.org/archive/conferences/iscb/ismb2010.html
 
Description ISMB BioOntologies SIG 2013 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other academic audiences (collaborators, peers etc.)
Results and Impact From the poster presentation, some interesting talks were developed and new contacts were made

The feedback from the activity was useful for further develop on our research
Year(s) Of Engagement Activity 2013
URL http://www.iscb.org/ismbeccb2013-program/ismbeccb2013-satellite-meetings#bio
 
Description ISMB NetBIO SIG 2013 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other academic audiences (collaborators, peers etc.)
Results and Impact After the poster presentation, some contacts were made and we had interesting discussions of the presented work

We analysed our work with other researchers that helped us improve it furtherly
Year(s) Of Engagement Activity 2013
URL http://www.iscb.org/ismbeccb2013-program/ismbeccb2013-satellite-meetings#netbio
 
Description Invited participation in experts' roundtable at the The Bioinformatics Strategy Meeting in London 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact I participated in an Experts' roundtable together with other academics and members of Industry
Year(s) Of Engagement Activity 2016
 
Description London Area Plant Molecular Sciences 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Professional Practitioners
Results and Impact Increase the togetherness and cohesion of plant science in the London area

repeated yearly meetings for 10 years
Year(s) Of Engagement Activity Pre-2006,2006,2007,2008,2009,2010
 
Description MRC2012 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other academic audiences (collaborators, peers etc.)
Results and Impact Discussions about biological problems that we could help on, that were analysed on their community

Establishing links with biologists and creating collaboration networks
Year(s) Of Engagement Activity 2012
 
Description Milan2009-Biology 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other academic audiences (collaborators, peers etc.)
Results and Impact The talks continued with analysing some other problems with the research we talked about

Some plans were made for future collaboration with our new contacts
Year(s) Of Engagement Activity 2009
 
Description Milan2009-CS 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other academic audiences (collaborators, peers etc.)
Results and Impact The talks generated interesting discussions and we met some contacts

Some plans for future collaboration were made with the University
Year(s) Of Engagement Activity 2009
 
Description NIPS 2008 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other academic audiences (collaborators, peers etc.)
Results and Impact After the poster presentation, we made interesting contacts and we had positive discussions about our work

We got feedback that allowed further development on our research
Year(s) Of Engagement Activity 2008
URL http://nips.cc/Conferences/2008/
 
Description Poster ClusterONE 2013 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other academic audiences (collaborators, peers etc.)
Results and Impact The poster presentation led to discussions on the work with fellow researchers

The feedback provided by our peers was useful for further development
Year(s) Of Engagement Activity 2013
URL http://www.iscb.org/ismbeccb2013
 
Description RHUL Open Days 
Form Of Engagement Activity Participation in an open day or visit at my research institution
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Public/other audiences
Results and Impact The University opens to the public and each department presents a showcase of its research, in a way which is accessible to a wider, non-specialist audience.
This generated interest in the Research done by the CS Department.

Many students joined the Computer Science Department
Year(s) Of Engagement Activity 2009,2010,2011,2012,2013,2014
 
Description School visits 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Schools
Results and Impact increase awareness in plant research

Increased interest, motivation of school kids
Year(s) Of Engagement Activity Pre-2006,2006,2007,2008,2009,2010,2011,2012,2013,2014
 
Description Science Club at the Desborough School 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Schools
Results and Impact Dr Safina Khan organized a Science Club at the Desborough School (Maidenhead, UK) for a period of one year. This consisted of weekly meetings of one hour during which pupils performed experiments designed by Dr Khan and discussed with her scientific ideas, which also included concepts from this project.
This generated interest and discussions from the students.

Recently Dr Khan obtained a grant to continue this work funded by the Royal Society Partnership Grant Scheme together with the Desborough School.
Year(s) Of Engagement Activity 2009
 
Description Talks to the groups of Martin Wilkins and Paul Matthews -- summer 2015 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Other audiences
Results and Impact I presented our recent results in the area of Network Medicine to Prof Martin Wilkins and Prof Paul Matthews and their groups (I gave two separate talks) at the Department of Medicine, Imperial College, Hammersmith Hospital. The talk sparked interesting discussions and it was the beginning of a very interesting collaboration with the lab of Prof Matthews in the area of Multiple Sclerosis.
Year(s) Of Engagement Activity 2015
 
Description Tasters courses 
Form Of Engagement Activity Participation in an open day or visit at my research institution
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Schools
Results and Impact One-day courses opened to school pupils. They enquired about the courses that Computer Science departments offered, and future studies possibilities.

Some students chose to follow the lead we gave them and engaged in Computer Science studies in our department.
Year(s) Of Engagement Activity 2009,2010,2011,2012,2013,2014
 
Description UCA-Py2009 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other academic audiences (collaborators, peers etc.)
Results and Impact The talks generated interest and requests for more information on our work to some students

We got a full time PhD student for the Computer Science department at Royal Holloway
Year(s) Of Engagement Activity 2009
 
Description UCAS open days 
Form Of Engagement Activity Participation in an open day or visit at my research institution
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Schools
Results and Impact During this talk I try to convey to school pupils what computer science is and why it is an exciting field of study.
Often the talked sparked questions and discussions.

A high percentage of school pupils who came to the talk decided to study Computer Science and many of these chose to study it in our department at Royal Holloway.
Year(s) Of Engagement Activity 2008,2009,2010,2011,2012,2013,2014
 
Description UCLondon2012 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other academic audiences (collaborators, peers etc.)
Results and Impact Using the talks as a medium we met with multiple peers and engaged in interesting conversations

We elaborated plans around the talks we had with some peers
Year(s) Of Engagement Activity 2012
 
Description UNA-Py2009 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other academic audiences (collaborators, peers etc.)
Results and Impact We presented our research and the departmental study programs, that led to requests for more information and to meeting research contacts

We extended our contact network for collaborations
Year(s) Of Engagement Activity 2009
 
Description Venice2009 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other academic audiences (collaborators, peers etc.)
Results and Impact Following our talk, some interesting discussions sparked with new research contacts

Some plans to collaborate with the researchers were done
Year(s) Of Engagement Activity 2009
 
Description Venice2012 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other academic audiences (collaborators, peers etc.)
Results and Impact Our presentation led to meetings with contacts

We developed some plans for collaborations
Year(s) Of Engagement Activity 2009,2012
 
Description Yale2010 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other academic audiences (collaborators, peers etc.)
Results and Impact Talks about our work and meeting with new contacts

Construction of plans for future collaboration
Year(s) Of Engagement Activity 2010
 
Description talk at Galway -- May 2015 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact I presented my work at the School of Mathematics, Statistics and Applied Mathematics at Galway University, Ireland. The talk sparked discussions with other scientists. The feedback I obtained was useful for my current research. The talk was important to advertise my research and to make contacts for future collaborations.
Year(s) Of Engagement Activity 2015