Development of a graph-theoretic approach to predict protein function by integrating large scale heterogeneous data
Lead Research Organisation:
Royal Holloway University of London
Department Name: Computer Science
Abstract
The list of organisms with completed genome sequence is continuously growing and this has led to the identification of thousands of genes whose function is still unknown. These genes could potentially be involved in important biological cell functions and could represent important targets for diagnostic and pharmacogenomics studies and be of industrial and agronomical importance. A major undertaking for biology is therefore that of identifying the function of these uncharacterized genes on a genomic scale. The challenge for bioinformatics is then to devise algorithmic methods that, given a gene, can predict a hypothesis for its function that can then be validated by wet-lab assays. Luckily, new experimental techniques have become available, producing data which offer clues about protein function and can therefore be employed for function prediction, e.g. protein interaction data, gene expression data. Some experimental and computational data have a natural representation as networks (e.g. protein interaction data), others are inherently 'one-dimensional' (e.g. sequence patterns). Three facts have recently become clear: while each data type contains important information that can help in determining the function of a protein, no single data type by itself suffices; large-scale functional inference greatly improves by integrating evidence from different sources; for those data types which can be represented as networks, the best results are obtained by algorithms that take advantage of the networks' topologies. So far, methods that make functional inferences on networks are very limited in the type of data they can integrate, while methods that can integrate a greater variety of data do not take advantage of the networks' topologies. I intend to investigate a general method that can integrate essentially any data type currently available taking into account its intrinsic structure: it takes advantage of the graph topology for network data, and it can integrate this evidence together with one-dimensional information. I shall develop graph-theoretical methods that use the diffusion of information over graphs to generate functional evidence from network data. This evidence is then combined with other one-dimensional information using machine learning techniques. The strength of the methodology lies in its ability to use diverse sets of noisy data, and to combine them to obtain sound statistical inferences; the weak signals contained in each dataset is enhanced by integrating the data. The methodology will be first developed on Yeast, and I shall then transfer this approach to higher organisms such as C. elegans, D. melanogaster, A. thaliana, and H. sapiens. For all these organisms the performance of the algorithms will then be evaluated 'in silico' by means of test sets; that is I shall verify the accuracy of the methods at predicting the function for genes whose annotation is known. The approach will then be tested 'in vivo' on a sub-network of genes that form signalling pathways (MAPK signalling) and function to transmit information from receptors to gene expression. MAPK pathway components are highly diversified in the model plant, Arabidopsis thaliana, with 123 components. For many of these we do not know how they connect up and what their biological functions are. These will be predicted by the algorithms and then functionally tested by silencing their expression using RNA interference and in mutant lines. I shall also design and implement stand-alone and web-based software tools incorporating the algorithms developed. The applications will enable the biologist to easily apply the algorithms through a user-friendly interface; to visualize the relevant biological networks thus making the inference process transparent and providing an explanation for the functional annotation predicted by the system. A web tool will also be created. All these tools will be made freely available to the scientific community.
Technical Summary
Statistically sound large-scale protein function prediction can be obtained only by integrating evidence from different sources. Functional inference methods that exploit biological networks topologies offer good performance. But so far such methods are limited in the type of data they can integrate, while methods that can integrate a greater variety of data do not take advantage of the networks' topologies. I propose a general method that can integrate essentially any data type available taking into account the intrinsic structure of each data type: it uses graph-theoretic methods to produce functional evidence from network data, and it integrates it with evidence from one-dimensional information using machine learning techniques. Defining function in terms of the Gene Ontology, I shall collect datasets for S. cerevisiae, C. elegans, D. melanogaster, A. thaliana, H. sapiens. Algorithm development and testing will be done on S. cerevisiae. I shall then verify how these methods transfer to the other organisms. Performance on these organisms will be evaluated 'in silico', by means of test sets. The approach will also be tested 'in vivo' by predicting the Biological Process for a group of MAP kinases that belong to the signalling pathways of A. thaliana. These predictions will be tested through functional assays: 1. an RNAi screen and quantitative measurements of MAPK signalling outputs, MAPK activities and promoter activations in cultured Arabidopsis cells 2. quantitative phenotypic tests for selected phenotypes in cell differentiation (e.g. stomata development) and stress responses. I shall design and implement stand-alone and web-based software tools incorporating the algorithms developed. These will enable the biologist to easily apply the algorithms through a user-friendly interface; visualization tools will make the functional inference process transparent to the user. All these tools will be made freely available to the scientific community.
Organisations
- Royal Holloway University of London (Lead Research Organisation)
- Hungarian Academy of Sciences (MTA) (Collaboration)
- University of Milan (Collaboration)
- Royal Holloway, University of London (Collaboration)
- University of California, Los Angeles (UCLA) (Collaboration)
- University of Tennessee (Collaboration)
- IMPERIAL COLLEGE LONDON (Collaboration)
- Academy of Sciences of the Czech Republic (Collaboration)
- Yale University (Collaboration)
- Cornell University (Collaboration)
- Universidad Nacional de Asunción (Collaboration)
- University of Toronto (Collaboration)
- Semmelweiss University (Collaboration)
- Medical Research Council (MRC) (Collaboration)
- BASF (Collaboration)
Publications
Menges M
(2008)
Comprehensive gene expression atlas for the Arabidopsis MAP kinase signalling pathways.
in The New phytologist
Hu P
(2009)
Global functional atlas of Escherichia coli encompassing previously uncharacterized proteins.
in PLoS biology
Gianoulis TA
(2009)
Quantifying environmental adaptation of metabolic pathways in metagenomics.
in Proceedings of the National Academy of Sciences of the United States of America
Umbrasaite J
(2010)
MAPK phosphatase AP2C3 induces ectopic proliferation of epidermal cells leading to stomata development in Arabidopsis.
in PloS one
Nepusz T
(2010)
SCPS: a fast implementation of a spectral method for detecting protein families on a genome-wide scale.
in BMC bioinformatics
Dóczi R
(2011)
Mitogen-activated protein kinase activity and reporter gene assays in plants.
in Methods in molecular biology (Clifton, N.J.)
Havugimana PC
(2012)
A census of human soluble protein complexes.
in Cell
Sasidharan R
(2012)
GFam: a platform for automatic annotation of gene families.
in Nucleic acids research
Bhat P
(2012)
Computational selection of transcriptomics experiments improves Guilt-by-Association analyses.
in PloS one
Nepusz T
(2012)
Detecting overlapping protein complexes in protein-protein interaction networks.
in Nature methods
Dóczi R
(2012)
Exploring the evolutionary path of plant MAPK networks.
in Trends in plant science
Abbruscato P
(2012)
OsWRKY22, a monocot WRKY gene, plays a role in the resistance response to blast.
in Molecular plant pathology
Yang H
(2012)
Improving GO semantic similarity measures by exploring the ontology beneath the terms and modelling uncertainty.
in Bioinformatics (Oxford, England)
Radivojac P
(2013)
A large-scale evaluation of computational protein function prediction.
in Nature methods
Caniza H
(2014)
GOssTo: a stand-alone application and a web tool for calculating semantic similarities on the Gene Ontology.
in Bioinformatics (Oxford, England)
Nepusz T
(2014)
Springer Handbook of Bio-/Neuroinformatics
Pérez-Salamó I
(2014)
The heat shock factor A4A confers salt tolerance and is regulated by oxidative stress and the mitogen-activated protein kinases MPK3 and MPK6.
in Plant physiology
Valentini G
(2014)
An extensive analysis of disease-gene associations using network integration and fast kernel-based gene prioritization methods.
in Artificial intelligence in medicine
Smieszek SP
(2014)
Progressive promoter element combinations classify conserved orthogonal plant circadian gene expression modules.
in Journal of the Royal Society, Interface
Caniza H
(2015)
A network medicine approach to quantify distance between hereditary disease modules on the interactome.
in Scientific reports
Nagy SK
(2015)
Activation of AtMPK9 through autophosphorylation that makes it independent of the canonical MAPK cascades.
in The Biochemical journal
Galeano D
(2016)
Drug targets prediction using chemical similarity
Jiang Y
(2016)
An expanded evaluation of protein function prediction methods shows an improvement in accuracy.
in Genome biology
Meyer MJ
(2016)
mutation3D: Cancer Gene Prediction Through Atomic Clustering of Coding Variants in the Structural Proteome.
in Human mutation
Manfredini F
(2017)
Neurogenomic Signatures of Successes and Failures in Life-History Transitions in a Key Insect Pollinator.
in Genome biology and evolution
Webster P
(2018)
Subclonal mutation selection in mouse lymphomagenesis identifies known cancer loci and suggests novel candidates.
in Nature communications
Cáceres JJ
(2019)
Disease gene prediction for molecularly uncharacterized diseases.
in PLoS computational biology
Galeano D
(2019)
Predicting the Frequency of Drug Side effects
Webster P
(2019)
Author Correction: Subclonal mutation selection in mouse lymphomagenesis identifies known cancer loci and suggests novel candidates.
in Nature communications
Gliozzo J
(2020)
Network modeling of patients' biomolecular profiles for clinical phenotype/outcome prediction.
in Scientific reports
Ye C
(2020)
The corrected gene proximity map for analyzing the 3D genome organization using Hi-C data.
in BMC bioinformatics
Galeano D
(2020)
Predicting the frequencies of drug side effects.
in Nature communications
Torres M
(2021)
Protein function prediction for newly sequenced organisms
in Nature Machine Intelligence
McDonald JT
(2021)
Role of miR-2392 in driving SARS-CoV-2 infection.
in Cell reports
Galeano D
(2022)
Machine learning prediction of side effects for drugs in clinical trials.
in Cell reports methods
Santos SS
(2022)
Machine learning and network medicine approaches for drug repositioning for COVID-19.
in Patterns (New York, N.Y.)
Gliozzo J
(2022)
Heterogeneous data integration methods for patient similarity networks.
in Briefings in bioinformatics
Title | Artist in residence Kerry Lemon |
Description | Drawing of plants with increased understanding how development shapes growth |
Type Of Art | Artwork |
Year Produced | 2014 |
Impact | Stimulating discussions with students. Media release. Planned exhibition. |
URL | http://www.kerrylemon.co.uk/ |
Description | The list of organisms with completed genome sequence is continuously growing and this has led to the identification of thousands of genes whose function is still unknown. These genes could potentially be involved in important biological cell functions and could represent important targets for diagnostic and pharmacogenomics studies and be of industrial and agronomical importance. A major undertaking for biology is therefore that of identifying the function of these uncharacterized genes on a genomic scale. The challenge for bioinformatics is then to devise algorithmic methods that, given a gene, can predict a hypothesis for its function that can then be validated by wet-lab assays. In this grant we focused our attention to the problem of protein function for organisms for which little or no experimental data is available and the only available information is the set of protein sequences. This is a relevant problem with important implications for both industry and human health - it is the case, for example, of newly sequenced bacterial genomes. We successfully developed a new method for solving this problem based on a recent development in computer science: the diffusion of information over graphs. These methods emulate the way in which heat diffuses on a metal bar. We also developed two further methods that predict protein function by grouping proteins into families. The first method, called GFam (Gene Family Annotation and Maintenance) groups proteins in a way that proteins in the same group share common domain architecture, and hence function. SCPS (Spectral Clustering of Protein Sequences) groups proteins according to their sequence similarity - similar proteins are likely to have evolved from a common ancestor and therefore are likely to share a similar function. Our research in protein function prediction also led to the development of novel methods for inference and structure discovery in biological networks. This included ClusterONE, an algorithm for detecting protein complexes from experimental data, and GOSSTO, a method for quantifying the functional similarity between two genes. Importantly, we applied these methods within a collaboration project with the labs of Andrew Emili (University of Toronto) and Edward Marcotte (Universty of Texas, Austin) which was aimed at detecting human protein complexes - the fundamental molecular machineries in the cell. We were able to obtain the largest catalogue to date of human protein complexes from cell culture. In total, we detected 622 complexes encompassing 2,634 distinct proteins. Notably, the majority (62%; 385/622) of the complexes were previously unknown (i.e., only 237 were already present in curated public databases). This catalogue constitutes a first draft of human protein complexes and therefore it provides a glimpse into the global physical molecular organization of human cells. An important output of this project is constituted by user-friendly and reliable software packages implementing the algorithms that we developed. We have created a piece of software for every algorithm developed in this project, namely: S2F, GFam, SCPS, ClusterONE, GOSSTO. These tools allow biologists and bioinformaticians to easily deploy our methods, without the need of re-implementing our algorithms. All our software packages are freely available for the scientific community as downloadable applications from the lab website. Some of our tools are also available as web applications hosted on our servers. The high number of downloads of our tools testifies their importance for the scientific community; for example ClusterONE has already been downloaded 4801 times. |
Exploitation Route | The problem of protein function prediction is central in today's biology. Possible beneficiaries include: 1. The biological community at large, interested in comprehensive annotation of genomes. 2. The medical community, since elucidating human gene function can help us associate genes with certain human diseases. 3. Agriculture: predicting function for plant genes should enable us to design genetic methods to improve plant performance. Particularly, the signaling pathways on which we worked in this grant are important for plant adaptation to environmental changes. 4. Pharmaceutical companies looking to attack specific pathways. 5. New sequencing efforts: our software enables scientists to rapidly assign putative function to new genes in freshly sequenced organisms without conducting expensive functional assays. |
Sectors | Agriculture Food and Drink Chemicals Energy Environment Healthcare Manufacturing including Industrial Biotechology Pharmaceuticals and Medical Biotechnology |
URL | http://www.paccanarolab.org |
Description | We are in contact with two research groups who have been using the output of S2F for organisms of high practical interest for crop production and for biofuel production. 1) Pablo Sotelo, from the Universidad Nacional de Asuncion (Paraguay) is working with the fungus Macrophomina phaseolina, a plant pathogen affecting more than 500 plant species (many crops among them, including soya). S2F has been used to produce a functional annotation for the proteome of this fungus, which is the first step for its characterization. This work is aimed at finding better and more targeted mechanisms for pest-control. Paraguay is one of the largest producers of soya in the world, and this work has important economic implications for the country. 2) Matteo Pellegrini (University of California, Los Angeles) leads a lab with a high interest in algal genomics. The lab is currently sequencing the genome of the unicellular alga Cyclotella cryptica, a model organism for lipid accumulation. This work has application in the biofuel production industry. The Pellegrini lab has been using the functional predictions provided by S2F to annotate this algal genome. Importantly, the algorithms we developed for specific biological networks can be applied to other types of networks. Therefore, some of our algorithms have impact not only on those problems for which we originally developed them, but also on different problems in Systems Biology as well as in other disciplines such as Pharmacology, Medicine or even Social Networks. For example, we originally developed ClusterONE for detecting protein complexes from protein interaction networks. However, ClusterONE is a general algorithm for overlapping clustering on weighted large scale networks. Therefore other research groups have successfully applied ClusterONE and proved its usefulness in several different domains. Some examples include: 1. Medicine: Clustering a genome-scale network obtained by integrating SNP array, gene expression microarray, array-CGH, CGH, GWAS and gene mutation data. This study was aimed at identifying key functional modules in lung adenocarcinoma. 2. Pharmacology: Associating drugs with protein domains in the context of myocardial infarction. 3. Pharmacology: Studying the mechanisms of adverse side effects of Torcetrapib, a drug being developed to treat hypercholesterolemia (elevated cholesterol levels) and prevent cardiovascular disease (its development was halted in 2006). 4. Social Networks: Detecting communities in Social Networks. Our research on diffusion methods for protein function prediction led to the development of methods for inference and structure discovery in biological networks. We applied some of these methods within a collaboration project with the labs of Andrew Emili (University of Toronto) and Edward Marcotte (Universty of Texas, Austin) which was aimed at detecting human protein complexes. In particular, for this project we deployed: ClusterONE, our algorithm for detecting overlapping protein complexes from PPI networks; GOSSTO, our method for calculating semantic similarities on the Gene Ontology; an information diffusion method we developed for denoising protein interaction data. The protein interaction networks identified experimentally in Emili's lab were enriched with networks generated using comparative genomics approaches in Marcotte's lab. Then, in our lab, we integrated this network with a semantic similarity graph (obtained using GOSSTO), applied our denoising procedure, and finally clustered the resulting graph using ClusterONE. We thus obtained the largest catalogue to date of human protein complexes from cell culture. The human protein complexes repository contains all the data generated in this study in an easily navigable format. These include all the pairwise protein interactions obtained through integration of the experimental data with public genomic evidence and the subunit composition of the 622 putative protein complexes obtained by clustering using ClusterONE. In our group, we have used S2F to participate in the second CAFA challenge, a competition of protein function prediction. Although this activity is within the academic domain, we think it has been important for acquiring visibility and engaging further collaborations. Finally, GFam was successfully used on Arabidopsis and the family groupings it provided were included in the TAIR10 genome release. |
First Year Of Impact | 2012 |
Sector | Agriculture, Food and Drink,Chemicals,Energy,Environment,Healthcare,Pharmaceuticals and Medical Biotechnology |
Impact Types | Economic |
Description | 11. Ara-MKK-D: A bioinformatics and systems biology approach for the functional analysis of a growth-regulating MAP kinase pathway in Arabidopsis. |
Amount | € 189,670 (EUR) |
Funding ID | 41909 |
Organisation | European Commission |
Sector | Public |
Country | European Union (EU) |
Start | 09/2007 |
End | 10/2009 |
Description | ABI innovation |
Amount | $1,203,514 (USD) |
Organisation | National Science Foundation (NSF) |
Sector | Public |
Country | United States |
Start | 08/2017 |
End | 09/2020 |
Description | BBSRC Tools and Resources Development Fund |
Amount | £114,257 (GBP) |
Funding ID | BB/K004131/1 |
Organisation | Biotechnology and Biological Sciences Research Council (BBSRC) |
Sector | Public |
Country | United Kingdom |
Start | 06/2012 |
End | 12/2013 |
Description | EU, Marie Curie Fellowship to Dr Beatrix Horvath |
Amount | € 309,235 (EUR) |
Organisation | Marie Sklodowska-Curie Actions |
Sector | Charity/Non Profit |
Country | Global |
Start | 03/2013 |
End | 05/2015 |
Description | EU, Marie Curie Fellowship to Dr Fabio Manfredini (with Prof Mark Brown) |
Amount | € 221,606 (EUR) |
Organisation | Marie Sklodowska-Curie Actions |
Sector | Charity/Non Profit |
Country | Global |
Start | 03/2014 |
End | 04/2016 |
Description | EU, Marie Curie Fellowship to Dr Papdi Csaba (with Prof L. Bogre) |
Amount | € 221,606 (EUR) |
Organisation | Marie Sklodowska-Curie Actions |
Sector | Charity/Non Profit |
Country | Global |
Start | 03/2013 |
End | 04/2015 |
Description | Inference of RBR network and dynamic RBR complexes during leaf development. |
Amount | € 319,888 (EUR) |
Funding ID | 330789 |
Organisation | European Commission |
Sector | Public |
Country | European Union (EU) |
Start | 03/2013 |
End | 03/2015 |
Description | MAPK signalling network to adapt leaf growth to drought conditions. |
Amount | € 221,765 (EUR) |
Funding ID | 330713 |
Organisation | European Commission |
Sector | Public |
Country | European Union (EU) |
Start | 04/2013 |
End | 05/2015 |
Description | Molecular signatures: a systems biology tool to understand how leaf development is constrained by drought. |
Amount | € 121,869 (EUR) |
Funding ID | 255035 |
Organisation | European Commission |
Sector | Public |
Country | European Union (EU) |
Start | 07/2010 |
End | 07/2011 |
Description | Newton International Fellowship to Dr Tamas Nepusz |
Amount | £98,000 (GBP) |
Organisation | The Royal Society |
Sector | Charity/Non Profit |
Country | United Kingdom |
Start | 02/2009 |
End | 02/2011 |
Title | Purification of protein complexes |
Description | Use genomic tagged GFP lines for rapid purification of protein complexes and identification of protein complex components |
Type Of Material | Biological samples |
Year Produced | 2016 |
Provided To Others? | Yes |
Impact | Established collaborations and accepted manuscript in EMBO J in 2017 |
Title | mutant lines, antibodies, GFP-tagged lines |
Description | Tools for lipid signalling kinases, MAPKs, E2F-RBR such as antibodies, mutant lines, GFP-tagged lines |
Type Of Material | Cell line |
Provided To Others? | Yes |
Impact | shared research material facilitate research in other groups |
Title | ClusterONE |
Description | Cluster ONE (Clustering with Overlapping Neighborhood Expansion) is a graph clustering algorithm that is able to handle weighted graphs and readily generates overlapping clusters. Owing to these properties, it is especially useful for detecting protein complexes in protein-protein interaction networks with associated confidence values. Cluster ONE is available as a standalone command-line application, as a plugin to Cytoscape or ProCope and as a web application. |
Type Of Material | Computer model/algorithm |
Year Produced | 2012 |
Provided To Others? | Yes |
Impact | ClusterONE was one of the key steps in our Soluble Human Protein Complexes project, which provided the largest catalogue to date of human protein complexes from cell culture. The original publication describing the ClusterONE algorithm has received in excess of 130 citations so far (Google Scholar). |
URL | http://www.paccanarolab.org/clusterone |
Title | ConSAT |
Description | ConSAT is a database of Consensus Signature Architectures. A consensus architecture is a set of non-overlapping domain assignments (considering insertions) which tries to define uniquely each protein. These architectures are used for prediction of GO categories, and to assign weighted words derived from mining PubMed abstracts. The database is available at http://paccanarolab.org/consat |
Type Of Material | Database/Collection of data |
Year Produced | 2014 |
Provided To Others? | Yes |
Impact | The results contained in this database are currently being used by two research groups who are actively working with organism of a high practical interest for crop production and for biofuel production (Pablo Sotelo, Universidad Nacional de Asuncion (Paraguay); Matteo Pellegrini, University of California, Los Angeles (USA)). |
URL | http://paccanarolab.org/consat |
Title | Disease Similarity |
Description | We introduce a MeSH-based method that accurately quantifies similarity between heritable diseases at molecular level. This method effectively brings together the existing information about diseases that is scattered across the vast corpus of biomedical literature. We prove that sets of MeSH terms provide a highly descriptive representation of heritable disease and that the structure of MeSH provides a natural way of combining individual MeSH vocabularies. We show that our measure can be used effectively in the prediction of candidate disease genes. |
Type Of Material | Computer model/algorithm |
Year Produced | 2015 |
Provided To Others? | Yes |
Impact | There are no impacts yet, this work appeared only about 3 months ago. |
Title | GFAM |
Description | GFam (Gene Family Annotation and Maintenance) is a command-line tool for automatic functional annotation of gene families. GFam offers a framework for complete genome initiatives and model organism resources to build domain-based gene families, derive meaningful functional labels and maintain family annotation across genome releases seamlessly. Our approach constitutes a unified system for grouping proteins based on evolutionary and functional relationships. |
Type Of Material | Computer model/algorithm |
Year Produced | 2012 |
Provided To Others? | Yes |
Impact | The family groupings provided by GFam for Arabidopsis were included in the tenth (and last) release of TAIR (The Arabidopsis Information Resource). The dataset produced with our method can be found at ftp://ftp.arabidopsis.org/home/tair/Genes/TAIR10_genome_release/TAIR10_domain_architectures.tab.t10 |
URL | http://www.paccanarolab.org/gfam |
Title | Landis |
Description | Disease similarity measures quantify the distance between disease modules on the interactome. These measures can provide a starting point for in-depth exploration of the diseases at molecular level, and are of particular relevance for orphan diseases. LanDis is an explorable database, containing the disease similarities of 28.5 million pairs of heritable diseases. These are calculated by summarising the existing phenotype information about diseases through large scale analysis of hand curated data. |
Type Of Material | Database/Collection of data |
Year Produced | 2016 |
Provided To Others? | Yes |
Impact | The paper presenting this database/model is still under review, so most scientist are not aware of its existence yet. However, I have already presented to conferences and meetings, receiving an extremely good feedback from everyone who tried it, especially clinician scientists. |
URL | http://www.paccanarolab.org/landis/ |
Title | MAPK |
Description | This is a general repository of MAPK sequences and orthologues in the plant kingdom. Orthologues were inferred using the InParanoid and Plaza orthologue identifier programs. This site also contains pointers to published evidence for constructing MAPK networks in Arabidopsis Yeast and Human, including high throughput and targeted experiments. The base dataset included here appeared in the paper by Dóczi, Ökrész, Romero, Paccanaro and Bögre (see reference). |
Type Of Material | Database/Collection of data |
Year Produced | 2012 |
Provided To Others? | Yes |
Impact | The original paper has been cited more than 20 times (Google Scholar). |
URL | http://paccanarolab.org/static_content/MAPKevol/index.html |
Title | S2F |
Description | S2F (Sequence to Function) is a software package implementing our diffusion-based method for predicting protein function in organisms for which little or no experimental data is available and the only available information is the set of protein sequences. Protein function is predicted with respect to terms in the Gene Ontology (GO). For a given protein the system provides a probability distribution over the GO terms, which is consistent with the ontology structure, i.e. the probability of a more general term is always higher than the probability of a more specific one. The stand-alone package is self-contained, including tools for generating a set of initial seed functional labels to diffuse as well as methods for inferring the biological networks onto which to diffuse the labels. |
Type Of Material | Computer model/algorithm |
Year Produced | 2012 |
Provided To Others? | Yes |
Impact | The results obtained by this algorithm are currently being used by two research groups who are actively working with organism of a high practical interest for crop production and for biofuel production (Pablo Sotelo, Universidad Nacional de Asuncion (Paraguay); Matteo Pellegrini, University of California, Los Angeles (USA)). |
URL | http://paccanarolab.org/s2f |
Title | SemanticSimilarity |
Description | The introduction of ontologies for gene functional annotation allows us to compare genes by quantifying the similarity of the terms with which they are annotated. These comparisons are important as they contribute to the inference of functional relationships between gene products by providing a perspective that complements both experimental information and sequence-based approaches. The proposed measure, which we call the random walk contribution (RWC) can be integrated with any standard semantic similarity measure, which we call host similarity measure (HSM), to yield an integrated similarity measure (ISM) that takes into account the whole ontology structure. In other words our random walk similarity measure is a kind of 'add on' to one's favourite underlying similarity measure. |
Type Of Material | Computer model/algorithm |
Year Produced | 2012 |
Provided To Others? | Yes |
Impact | One of the key steps in our Soluble Human Complexes project was the application of our Semantic Similarity method for calculating semantic similarities between human genes on the Gene Ontology. To date, the publication containing the method itself has been cited 22 times (Google Scholar). |
URL | http://www.paccanarolab.org/static_content/gosim/ |
Title | SolubleComplexes |
Description | Our research on diffusion methods for protein function prediction led to the development of methods for inference and structure discovery in biological networks. We applied some of these methods within a collaboration project with the labs of Andrew Emili (University of Toronto) and Edward Marcotte (Universty of Texas, Austin) which was aimed at detecting human protein complexes. In particular, for this project we deployed: ClusterONE, our algorithm for detecting overlapping protein complexes from PPI networks; GOSSTO, our method for calculating semantic similarities on the Gene Ontology; an information diffusion method we developed for denoising protein interaction data. The protein interaction networks identified experimentally in Emili's lab were enriched with networks generated using comparative genomics approaches in Marcotte's lab. Then, in my lab, we integrated this network with a semantic similarity graph (obtained using GOSSTO), applied our denoising procedure, and finally clustered the resulting graph using ClusterONE. We thus obtained the largest catalogue to date of human protein complexes from cell culture. The human protein complexes repository contains all the data generated in this study in an easily navigable format. These include all the pairwise protein interactions obtained through integration of the experimental data with public genomic evidence and the subunit composition of the 622 putative protein complexes obtained by clustering using ClusterONE. |
Type Of Material | Database/Collection of data |
Year Produced | 2012 |
Provided To Others? | Yes |
Impact | The original publication where this dataset was first release has been cited, to date, more than 100 times (Google Scholar). |
URL | http://human.med.utoronto.ca/php/data_download.php |
Title | mutation3d |
Description | A new algorithm and Web server, mutation3D (http://mutation3d.org), proposes driver genes in cancer by identifying clusters of amino acid substitutions within tertiary protein structures. We demonstrated the feasibility of using a 3D clustering approach to implicate proteins in cancer based on explorations of single proteins using the mutation3D Web interface. |
Type Of Material | Computer model/algorithm |
Year Produced | 2016 |
Provided To Others? | Yes |
Impact | No notable impacts yet, the paper only appeared about a month ago. |
URL | http://mutation3d.org/ |
Description | Albrecht Von Arnim |
Organisation | University of Tennessee |
Department | Department of Geography |
Country | United States |
Sector | Academic/University |
PI Contribution | TOR and S6K signalling, EBP1 |
Collaborator Contribution | regulation of translation, making constructs for root meristem specific analysis of translatome and translational regulation |
Impact | project partner, manuscripts in preparation |
Start Year | 2015 |
Description | Cancer genomics -- Haiyuan Yu (Cornell University) |
Organisation | Cornell University |
Country | United States |
Sector | Academic/University |
PI Contribution | We recently started a collaboration with Yu lab in the field of cancer genomics, where we contributed to the development of a clustering method to predict cancer mutation hotspots in proteins. We used our expertise in clustering methods to provide an efficient solution an integrate it into a comprehensive analysis pipeline. |
Collaborator Contribution | Prof Yu and his lab have great expertise in the field of cancer genomics. They have contributed the biological question and the data. |
Impact | A journal paper describing the method is currently under review in BMC Biology. The collaboration is multi-disciplinary involving biologists and computational scientists. |
Start Year | 2013 |
Description | Clustering of protein interaction networks -- Haiyuan Yu (Cornell University) |
Organisation | Cornell University |
Country | United States |
Sector | Academic/University |
PI Contribution | We developed ClusterONE, a new method for protein complex detection using clustering on protein-protein interaction networks. |
Collaborator Contribution | Haiyuan Yu is an expert in protein-protein interaction screening, and protein-protein interaction prediction and he proposed different ways to evaluate the quality of the predictions. He also gave important feedback on the method. ClusterONE was published in 2012 in Nature methods (see below). |
Impact | This is an interdisciplinary collaboration between molecular biologists (Yu lab) and computational scientists (our lab). The collaboration has produced one clustering algorithm for detecting protein complexes from protein protein interaction networks, and its corresponding implementation (ClusterONE). The details of the publication are the following: T. Nepusz, H. Yu, and A. Paccanaro Detecting overlapping protein complexes in protein-protein interaction networks Nature Methods, vol. 9, pp. 471-472, 2012. The software (ClusterONE) is available in our website ( http://www.paccanarolab.org/clusterone/ ) an is released under a free software license (can be freely downloaded, executed and eventually, modified). The collaboration is multi-disciplinary involving biologists and computational scientists. |
Start Year | 2011 |
Description | Development of a web resource for protein functional annotation -- Raj Sasidharan (BASF) |
Organisation | BASF |
Country | Germany |
Sector | Private |
PI Contribution | We developed ConSAT, a tool for protein functional annotation using protein consensus domain architectures. In this project a new algorithm was developed and a web resource (ConSAT) with precomputed results was created (available at http://paccanarolab.org/consat ). The method includes three different types of functional prediction methods, two assigning Gene Ontology terms from the protein architecture, and one assigning English weighted words. |
Collaborator Contribution | Rajkumar Sasidharan's help was very important for the development of this project, mainly in two different fields: first, he provided expert knowledge in structural biology; second, he helped giving feedback on the usability of the web server, leading to its improvement. |
Impact | The project main output is the above referenced website. Publications are currently being written. The collaboration is multi-disciplinary involving biologists and computational scientists. |
Start Year | 2012 |
Description | Disease gene prioritisation by the combination of gene networks -- Giorgio Valentini (Milan) |
Organisation | University of Milan |
Country | Italy |
Sector | Academic/University |
PI Contribution | We preprocessed, cleaned and provided a set of biological datasets to Giorgio Valentini to assist in the development of several methods of gene networks combination for disease-gene prioritisation (that is, finding new causative genes for diseases). We provided, among others, several semantic similarity networks among sets of human genes. We also suggested new evaluation measures for this task. |
Collaborator Contribution | Giorgio Valentini developed a set of algorithms for finding new disease-gene associations. In that context he proposed many different ways in which different gene networks (both weighted and unweighted) could be combined to produce a resulting network resembling a relation based on the fact that two linked genes are supposed to share an underlying disease. The new predictions are given as an output of the paper (available at http://homes.di.unimi.it/re/suppmat/genesmeshnetwpred/supmatTBL1.html ). |
Impact | Apart from the above mentioned URL, the collaboration led to the following publication: G Valentini, A Paccanaro, H Caniza, AE Romero, M Re An extensive analysis of disease-gene associations using network integration and fast kernel-based gene prioritization methods Artificial Intelligence in Medicine 61 (2), 63-78 |
Start Year | 2013 |
Description | Dr Tamas Meszaros |
Organisation | Semmelweiss University |
Country | Hungary |
Sector | Academic/University |
PI Contribution | In vitro translation of RBR and E2Fs and CDK kinases for protein-protein interaction and phosphorylation studies. In vitro translation of MAPKs and MKKs. Study protein-protein interaction and activation. |
Collaborator Contribution | In vitro protein interaction and phosphorylation screen |
Impact | Joined publications, projects |
Start Year | 2015 |
Description | Dr Zoltan Magyar |
Organisation | Hungarian Academy of Sciences (MTA) |
Department | Biological Research Centre (BRC) |
Country | Hungary |
Sector | Academic/University |
PI Contribution | Working on RBR-E2F, connecting translational regulation and cell cycle |
Collaborator Contribution | Providing antibodies and mutants in the RBR-E2F pathway |
Impact | research papers, collaboration with Bayern Crop Science |
Description | Drug side effect prediction (with Mark Gerstein and Shantao Li, Yale University) |
Organisation | Yale University |
Country | United States |
Sector | Academic/University |
PI Contribution | We have developed a new method for predicting side effects of drugs. Our preliminary results show that our method represents a great improvement with respect to the existing state of the art in terns of side effect prediction. Moreover, it is the first method that can predict the expected frequency of side effects in the population. |
Collaborator Contribution | They are helping us to provide an explanation of some aspects of our models in terms of the biology/biochemistry/pharmacology. |
Impact | A journal article is in preparation.The collaboration is multi-disciplinary involving biologists and computer scientists. |
Start Year | 2017 |
Description | Enhancer prediction using epigenetic signals in different mouse tissues (with Mark Gerstein and Mengting Gu, Yale University) |
Organisation | Yale University |
Department | Department of Molecular Biophysics and Biochemistry |
Country | United States |
Sector | Academic/University |
PI Contribution | Apply machine learning, signal processing and pattern recognition methods for improving the performance of the enhancer prediction for different tissues in the mouse genome. Preliminary results indicate that ensemble methods perform better than other classifiers. More advanced methods for feature extraction such as deep learning are going to be tested on the data. |
Collaborator Contribution | Members of the Gerstein Lab developed a pattern recognition method called matched filters for enhancer prediction. However, our preliminary results show that advanced machine learning may improve prediction accuracy. The Gerstein Lab supplied the data and will interpret the results in the context of enhancer and promoters in the genome. |
Impact | The collaboration is multi-disciplinary involving biologists and computer scientists. |
Start Year | 2017 |
Description | Finding evolutionary relations between plant MAPKs -- Laszlo Bogre (Royal Holloway) |
Organisation | Royal Holloway, University of London |
Department | School of Biological Sciences |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | We collaborated with the Bogre lab in the elucidation of the evolutionary relations between the different Mitogen-activated protein kinases (MAPKs) in different model plants. Using computational techniques we were able to depict some of this relations, ultimately leading to the construction of the 'Plant MAPK Network Resource', available at http://www.paccanarolab.org/static_content/MAPKevol/ . |
Collaborator Contribution | Prof Bogre and his team provided us with their MAPK dataset, their expert knowledge in the field and their biological questions. This lead to the improvement of our methods for ortholog detection. The collaboration is still ongoing and we are currently developing new computational methods to detect relations between MAPKs and substrates. |
Impact | The outputs of this project are two: one web resource (the plant MAPK network resource, see above) and one joint publication: R. Dóczi, L. Ökrész, A. E. Romero, A. Paccanaro, and L. Bögre Exploring the evolutionary path of plant MAPK networks Trends in Plant Science, vol. 17, iss. 9, pp. 518-525, 2012. The collaboration is multi-disciplinary involving biologists and computational scientists. |
Start Year | 2011 |
Description | Functional prediction for Cyclotella cryptica -- Matteo Pellegrini (UCLA) |
Organisation | University of California, Los Angeles (UCLA) |
Country | United States |
Sector | Academic/University |
PI Contribution | The Pellegrini Lab is interested in better understanding certain metabolic pathways in the genome of the alga Cyclotella cryptica. This alga is particularly important from an economic perspective as it is important to the growing algal biofuels industry due to its higher levels of lipid production. In order to better understand those pathways, an important step is to provide a functional annotation in the genes of the organism. Our contribution to Prof Pellegrini research has been based in providing a functional annotation of this alga using ConSAT and S2F (the function annotation tools that we developed in the context of our grants). |
Collaborator Contribution | Though this collaboration is still ongoing, feedback from the Pellegrini's lab has been incorporated into our tool to make it more usable. |
Impact | We expect journal publications to be written soon. The collaboration is multi-disciplinary involving biologists and computational scientists. |
Start Year | 2012 |
Description | Functional prediction for Macrophomina phaseolina -- Pablo Sotelo (Universidad Nacional de Asuncion) |
Organisation | National University of Asuncion |
Country | Paraguay |
Sector | Academic/University |
PI Contribution | We have provided the Sotelo lab with a complete functional annotation of the fungus Macrophomina phaseolina. This was done using both S2F and CONSAT, our systems for protein function prediction. Macrophomina phaseolina has been recently sequenced and is responsible for a plague affecting many crops and particularly soya, of which Paraguay is one of the largest producers in the world. Our contribution will help, in ultimate analysis, both the development of new pesticides to fight this fungus, and in the research of genetically modified varieties of soya, resistant to this plague. |
Collaborator Contribution | The Sotelo lab has been providing us with feedback to improve our system and on the accuracy of our predictions. This is very helpful for us in order to improve our system. |
Impact | This is a multidisciplinary collaboration, between computational scientists (Paccanaro lab) and life scientists (Sotelo lab). We expect to produce a joint publication in the near future as an output of this collaboration. The collaboration is multi-disciplinary involving biologists and computational scientists. |
Start Year | 2014 |
Description | GFam, a tool to predict protein architectures -- Raj Sasidharan (UCLA, TAIR) |
Organisation | BASF |
Country | Germany |
Sector | Private |
PI Contribution | Our contribution (the GFam software) was motivated by the needs of TAIR (The Arabidopsis Genome Initiative, a public hub initiative to understand the plant genomes) of an automatic tool to curate functional categories assigned to the official release of the Arabidopsis thaliana genome. GFam was specifically created for this purpose, although it was published as a general tool for protein function annotation. GFam was used to produce the tenth official release of the functional annotation of Arabidopsis thaliana, the model organism for plants. |
Collaborator Contribution | Several of the ideas implemented in GFam came from the semi-manual procedures used in TAIR by Raj Sasidharan and others to perform functional annotation of protein sequences. |
Impact | The GFam families for Arabidopsis can be found in TAIR (ftp://ftp.arabidopsis.org/home/tair/Genes/TAIR10_genome_release/TAIR10_domain_architectures.tab.t10) The collaboration also led to a publication: R. Sasidharan, T. Nepusz, D. Swarbreck, E. Huala, and A. Paccanaro GFam: a platform for automatic annotation of gene families Nucleic Acids Research, vol. 40, iss. 19, p. 152, 2012. The collaboration is multi-disciplinary involving biologists and computational scientists. |
Start Year | 2008 |
Description | GFam, a tool to predict protein architectures -- Raj Sasidharan (UCLA, TAIR) |
Organisation | University of California, Los Angeles (UCLA) |
Country | United States |
Sector | Academic/University |
PI Contribution | Our contribution (the GFam software) was motivated by the needs of TAIR (The Arabidopsis Genome Initiative, a public hub initiative to understand the plant genomes) of an automatic tool to curate functional categories assigned to the official release of the Arabidopsis thaliana genome. GFam was specifically created for this purpose, although it was published as a general tool for protein function annotation. GFam was used to produce the tenth official release of the functional annotation of Arabidopsis thaliana, the model organism for plants. |
Collaborator Contribution | Several of the ideas implemented in GFam came from the semi-manual procedures used in TAIR by Raj Sasidharan and others to perform functional annotation of protein sequences. |
Impact | The GFam families for Arabidopsis can be found in TAIR (ftp://ftp.arabidopsis.org/home/tair/Genes/TAIR10_genome_release/TAIR10_domain_architectures.tab.t10) The collaboration also led to a publication: R. Sasidharan, T. Nepusz, D. Swarbreck, E. Huala, and A. Paccanaro GFam: a platform for automatic annotation of gene families Nucleic Acids Research, vol. 40, iss. 19, p. 152, 2012. The collaboration is multi-disciplinary involving biologists and computational scientists. |
Start Year | 2008 |
Description | Gene prioritisation for lymphoma growth on mutagenesis study |
Organisation | Medical Research Council (MRC) |
Department | MRC Clinical Sciences Centre (CSC) |
Country | United Kingdom |
Sector | Public |
PI Contribution | Prediction of lymphoma growth stage by analysis of gene clonality values from a sample. Prioritisation of genes selected from broad loci sources involved in lymphomagenesis. This process yielded a set of about 20 genes selected for further studies. |
Collaborator Contribution | Mutagenesis developed lymphoma studies on over 500 mice, with the corresponding sample clonality analysis. Ongoing gene relevance analysis. |
Impact | Studies are still ongoing on the relevance of the selected genes. We expect to obtain a publication about this work when the process finishes. The study is multi-disciplinary and it comprises the following disciplines: cancer genomics, molecular biotechnology, systems biology, computer science, big data analysis, bioinformatics. |
Start Year | 2015 |
Description | GoSSTo, a Tool for computing Gene Ontology Semantic Similarites -- Giorgio Valentini (University of Milan) |
Organisation | University of Milan |
Country | Italy |
Sector | Academic/University |
PI Contribution | We developed GoSSTo a command line based-tool to compute semantic similarities between gene products. The tool implemented an algorithm previously published in our group, trying to make it accessible to any possible researcher. We also implemented GoSSToWeb, a web server providing easier access to this tool for biological researchers. |
Collaborator Contribution | Giorgio Valentini and his lab provided help for the development of the web interface of our tool for computing semantic similarities which was recently published, and also provided user feedback on the command line tool. |
Impact | The output is constituted by our software tools (GoSSTo and GoSSToWeb). Our web tool, available at www.paccanarolab.org/gosstoweb has had over 50 registered users and 70 submitted jobs thus far. Moreover, the collaboration is manifested in the following publication: H. Caniza, A. E. Romero, S. Heron, H. Yang, A. Devoto, M. Frasca, M. Mesiti, G. Valentini, and A. Paccanaro, GOssTo: a user-friendly stand-alone and web tool for calculating semantic similarities on the Gene Ontology Bioinformatics, vol. 30, iss. pp. 2235-2236, 2014. A preliminary version of this paper was submitted and accepted to the ISMB conference in 2013: H. Caniza, A. E. Romero, S. Heron, H. Yang, M. Frasca, M. Mesiti, G. Valentini, and A. Paccanaro. 'GOssTo and GOssToWeb: user-friendly tools for calculating semantic similarities on the Gene Ontology.' Bio-Ontologies SIG 2013-ISMB 2013 (2013). |
Start Year | 2012 |
Description | Human Protein Complexes -- Emili (Un. Toronto), Marcotte (Un. Texas, Austin) |
Organisation | University of Toronto |
Country | Canada |
Sector | Academic/University |
PI Contribution | Our research on diffusion methods for protein function prediction led to the development of methods for inference and structure discovery in biological networks. We applied some of these methods within a collaboration project with the labs of Andrew Emili (University of Toronto) and Edward Marcotte (Universty of Texas, Austin) which was aimed at detecting human protein complexes. In particular, for this project we deployed: ClusterONE, our algorithm for detecting overlapping protein complexes from PPI networks; GOSSTO, our method for calculating semantic similarities on the Gene Ontology; an information diffusion method we developed for denoising protein interaction data. The protein interaction networks identified experimentally in Emili's lab were enriched with networks generated using comparative genomics approaches in Marcotte's lab. Then, in my lab, we integrated this network with a semantic similarity graph (obtained using GOSSTO), applied our denoising procedure, and finally clustered the resulting graph using ClusterONE. We thus obtained the largest catalogue to date of human protein complexes from cell culture. |
Collaborator Contribution | The protein interaction networks identified experimentally in Emili's lab were enriched with networks generated using comparative genomics approaches in Marcotte's lab. Then, in my lab, we integrated this network with a semantic similarity graph (obtained using GOSSTO), applied our denoising procedure, and finally clustered the resulting graph using ClusterONE. |
Impact | 1) The human protein complexes repository contains all the data generated in this study in an easily navigable format. These include all the pairwise protein interactions obtained through integration of the experimental data with public genomic evidence and the subunit composition of the 622 putative protein complexes obtained by clustering using ClusterONE. 2) P. C. Havugimana, T. G. Hart, T. Nepusz, H. Yang, A. L. Turinsky, Z. Li, P. I. Wang, D. R. Boutz, V. Fong, S. Phanse, M. Babu, S. A. Craig, P. Hu, C. Wan, J. Vlasblom, V. U. Dar, A. Bezginov, G. W. Clark, G. C. Wu, S. J. Wodak, E. R. Tillier, A. Paccanaro, E. M. Marcotte, and A. Emili A census of human soluble protein complexes Cell, vol. 150, iss. 5, pp. 1068-1081, 2012. The collaboration is multi-disciplinary involving biologists and computational scientists. |
Start Year | 2009 |
Description | Learning disease-gene associations by exploiting disease similarities (with Mark Gerstein, Yale University) |
Organisation | Yale University |
Department | Department of Molecular Biophysics and Biochemistry |
Country | United States |
Sector | Academic/University |
PI Contribution | We recently developed a disease similarity measure and calculated all the disease-disease similarities between OMIM diseases. We established a prior disease-gene association probability and provided training and testing datasets for the learning. We fitted the model. |
Collaborator Contribution | Developed a Lipschitz diffusion model, that we used to spread the disease-gene association through the interactome, and a fully functional fast implementation of the algorithm. |
Impact | The collaboration is multi-disciplinary involving biologists and computer scientists. |
Start Year | 2017 |
Description | Network-based Genome Analysis Reveals Structural and Functional Properties of Genes (with Mark Gerstein and Koon-Kiu Yan, Yale University) |
Organisation | Yale University |
Country | United States |
Sector | Academic/University |
PI Contribution | We have analysed the spatial proximity of all pathway genes (KEGG Database) across various cancer cell lines. Our preliminary results provide strong evidence for a relationship between disease pathways and cancer. The study also helps identify candidate genes for a number of diseases. |
Collaborator Contribution | They have successfully applied network community detection techniques to Hi-C data (three-dimensional architecture of genomes) in order to identify topologically associating domains (TADs) of genomic regions. |
Impact | The collaboration is multi-disciplinary involving biologists and computer scientists. |
Start Year | 2017 |
Description | Objective of the project is to elucidate the mechanism of action of a drug for multiple sclerosis |
Organisation | Imperial College London |
Department | Faculty of Medicine |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | To analyse transcriptomics data obtained from a trial on human patients using network medicine approaches. |
Collaborator Contribution | They hosted a trial with human patients and extracted transcriptomics data at different times.. |
Impact | No outputs yet. This collaboration is multidisciplinary involving: computer science, network science, machine learning, medicine, biology and pharmacology. |
Start Year | 2015 |
Description | Pavla Binarova |
Organisation | Academy of Sciences of the Czech Republic |
Country | Czech Republic |
Sector | Academic/University |
PI Contribution | Analysing RBR phosphorylation and interaction with microtubules. |
Collaborator Contribution | Microtubules, cell biology |
Impact | research papers, joined projects |
Start Year | 2010 |
Description | Robert Doczi. MAPK evolutionary network, MAPK substrate prediction. |
Organisation | Hungarian Academy of Sciences (MTA) |
Department | Centre for Agricultural Research (ATK) |
Country | Hungary |
Sector | Academic/University |
PI Contribution | The Paccanaro group analysed MAPK docking sites, and MAPK-MKK interaction surfaces when there is no canonical docking site. |
Collaborator Contribution | Developed a high throughput in vivo MAPK activation screen |
Impact | publications. Multidisciplinary collaboration. Computer Science, Biology |
Start Year | 2010 |
Title | CONSAT |
Description | ConSAT is a terminal-based application which can be used to functionally annotate a set of proteins, using its consensus domain architecture. Proteins are assigned Gene Ontology terms based on the domains composition of the architecture and on the already known experimental terms of proteins with a given architecture. In order to help in the production of a description of a protein sequence, it also assigns weighted English words derived from mining PubMed articles. ConSAT is written in Python. |
Type Of Technology | Software |
Year Produced | 2014 |
Open Source License? | Yes |
Impact | ConSAT has been used to produce the homonym database (see 'databases'), which is being used in two external collaborations (with Pablo Sotello and Matteo Pellegrini, see 'collaborations'). ConSAT has been used for our participation in the second CAFA challenge, organized by an international research community of more than 50 research groups devoted to the study of protein function prediction methods. |
URL | http://paccanarolab.org/ConSAT |
Title | ClusterONE |
Description | ClusterONE (Clustering with Overlapping Neighborhood Expansion) is a graph clustering algorithm that is able to handle weighted graphs and readily generates overlapping clusters. Owing to these properties, it is especially useful for detecting protein complexes in protein-protein interaction networks with associated confidence values. ClusterONE is available as a standalone command-line application, as a plugin to Cytoscape or ProCope. |
Type Of Technology | Software |
Year Produced | 2012 |
Open Source License? | Yes |
Impact | For the creation of the Human protein complexes repository (http://human.med.utoronto.ca/) the standalone version of ClusterONE was used to produce the putative protein complexes. This project provided the largest catalogue to date of human protein complexes from cell culture. All versions of the ClusterONE Cytoscape plugin have been downloaded a total of 4801 times, with 5 releases produced so far. The ClusterONE publication has in excess 130 citations. |
URL | http://paccanarolab.org/clusterone |
Title | GFAM |
Description | GFam (Gene Family Annotation and Maintenance) is a command-line tool for automatic functional annotation of gene families. GFam offers a framework for complete genome initiatives and model organism resources to build domain-based gene families, derive meaningful functional labels and maintain family annotation across genome releases seamlessly. Our approach constitutes a unified system for grouping proteins based on evolutionary and functional relationships. |
Type Of Technology | Software |
Year Produced | 2012 |
Open Source License? | Yes |
Impact | The family groupings provided by GFam for Arabidopsis were included in TAIR10 genome release. The results are available from the official TAIR (The Arabidopsis Information Resource) website: ftp://ftp.arabidopsis.org/home/tair/Genes/TAIR10_genome_release/TAIR10_domain_architectures.tab.t10 |
URL | http://paccanarolab.org/gfam |
Title | GOSSTO |
Description | Semantic similarity calculations aim to provide a quantifiable measure of functional relatedness of genes by assessing the similarity of the functional terms with which they are annotated. GOSSTO (Gene Ontology Semantic Similarity Tool) is a tool for calculating this measure with respect to Gene Ontology terms. It implements an improved diffusion-based measure developed in this project, as well as several well-established measures, such as those proposed by Resnik, Lin, Jiang, simUI. Powerful extension capabilities are included in GOSSTO, enabling the user to extend it with new similarity measures. GOSSTO is available as a standalone command-line application running on Windows, GNU/Linux and MacOS as well as a web tool. The webtool is available at www.paccanarolab.org/gosstoweb |
Type Of Technology | Software |
Year Produced | 2014 |
Open Source License? | Yes |
Impact | For the creation of the Human protein complexes repository (http://human.med.utoronto.ca/) the standalone version of GOSSTO was used to compute semantic similarities between human genes in the Gene Ontology. This project provided the largest catalogue to date of human protein complexes from cell culture. Our web tool, available at www.paccanarolab.org/gosstoweb has had over 50 registered users and 70 submitted jobs thus far. |
URL | http://paccanarolab.org/gossto |
Title | JustClust |
Description | JustClust is a tool for analysing biological data with cluster analysis. JustClust can handle many formats of data and cluster the data with many state-of-the-art techniques. The aim of JustClust is to provide an easy-to-use application which can perform any analysis on any data. |
Type Of Technology | Software |
Year Produced | 2014 |
Open Source License? | Yes |
Impact | The manuscript is currently being finalised. |
URL | http://paccanarolab.org/justclust |
Title | Landis |
Description | Disease similarity measures quantify the distance between disease modules on the interactome. These measures can provide a starting point for in-depth exploration of the diseases at molecular level, and are of particular relevance for orphan diseases. LanDis is a freely available web-based interactive tool that allows domain experts, medical doctors and the larger community to graphically navigate the landscape of human disease similarities. LanDis is designed to explore the similarity landscape of over 28.5 million pairs of heritable diseases, introducing a fully interactive and navigable plot in which diseases are represented as nodes and their pairwise similarity as the links joining them. |
Type Of Technology | Webtool/Application |
Year Produced | 2016 |
Impact | The paper presenting this webtool is still under review, so most scientist are not aware of its existence yet. However, I have already presented to conferences and meetings, receiving an extremely good feedback from everyone who tried it, especially clinician scientists. |
URL | http://www.paccanarolab.org/landis |
Title | S2F |
Description | S2F (Sequence-to-Function) is a software package implementing our diffusion-based method for predicting protein function in organisms for which little or no experimental data is available and the only available information is the set of protein sequences. Protein function is predicted with respect to terms in the Gene Ontology (GO). For a given protein the system provides a probability distribution over the GO terms, which is consistent with the ontology structure, i.e. the probability of a more general term is always higher than the probability of a more specific one. The stand-alone package is self-contained, including tools for generating a set of initial seed functional labels to diffuse as well as methods for inferring the biological networks onto which to diffuse the labels. |
Type Of Technology | Software |
Year Produced | 2014 |
Open Source License? | Yes |
Impact | The results obtained using S2F are currently being used by two research groups who are actively working with organism of a high practical interest for crop production and for biofuel production (Pablo Sotelo, Universidad Nacional de Asuncion (Paraguay); Matteo Pellegrini, University of California, Los Angeles (USA)). S2F has been used for our participation in two CAFA challenges, organized by an international research community of more than 50 research groups devoted to the study of protein function prediction methods. |
URL | http://paccanarolab.org/s2f |
Title | SCPS |
Description | SCPS (Spectral Clustering of Protein Sequences) is an efficient, user-friendly, scalable and multi-platform implementation of a spectral clustering method for clustering homologous proteins. SCPS also implements connected component analysis and hierarchical clustering, integrates TribeMCL and interfaces with external tools such as Cytoscape and NCBI BLAST. |
Type Of Technology | Software |
Year Produced | 2010 |
Open Source License? | Yes |
Impact | The paper is classified as 'highly accessed' on the journal website. The work has been cited 28 times already. Many of the papers citing SCPS make use of the software for large scale clustering of protein sequences in practical, real world applications. |
URL | http://paccanarolab.org/scps |
Title | mutation3D |
Description | mutation3D is a functional prediction and visualization tool for studying the spatial arrangement of amino acid substitutions on protein models and structures. It is intended to be used to identify clusters of amino acid substitutions arising from somatic cancer mutations across many patients in order to identify functional hotspots and fuel downstream hypotheses. It is also useful for clustering other kinds of mutational data, or simply as a tool to quickly assess relative locations of amino acids in proteins. |
Type Of Technology | Webtool/Application |
Year Produced | 2016 |
Impact | It is still too early, the tool was released about a month ago. |
URL | http://mutation3d.org/ |
Description | Artist in Residence Kerry Lemon |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | Local |
Primary Audience | Public/other audiences |
Results and Impact | Stimulating discussions to bridge the gap between science and artistic thinking Artistic drawing with understanding of plant development |
Year(s) Of Engagement Activity | 2014,2015,2016,2017 |
URL | http://www.kerrylemon.co.uk/ |
Description | Bristol2012 |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Other academic audiences (collaborators, peers etc.) |
Results and Impact | The talks led to interesting discussions and finding new contacts Some plans were made for future collaboration |
Year(s) Of Engagement Activity | 2012 |
Description | Cambridge2013 |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Other academic audiences (collaborators, peers etc.) |
Results and Impact | Talks about our research and methods with peers Setting collaboration activities with our peers |
Year(s) Of Engagement Activity | 2013 |
Description | ClusterONE press release |
Form Of Engagement Activity | A press release, press conference or response to a media enquiry/interview |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Media (as a channel to the public) |
Results and Impact | We advertised on the Royal Holloway college website the publication of the ClusterONE algorithm and of its accompanying software in Nature Methods. The advertisements sparked a lot of interest for the algorithm in the college. As a consequence of the advertisement, we were approached by biologists in the School of Biological Sciences at Royal Holloway with whom we started collaborating for clustering large scale experimental co-expression networks that they were producing. |
Year(s) Of Engagement Activity | 2012 |
URL | https://www.royalholloway.ac.uk/computerscience/news/newsarticles/researchersalgorithmpublishedinsci... |
Description | Co-PI Talk |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Local |
Primary Audience | Schools |
Results and Impact | The Co-PI gave a talk at a conference organized for grammar school pupils at Aylesbury Grammar School. Discussions of the complexities of signalling pathways and why they are important. As the conferences were focused in plant biology, a number of pupils decided to find out more options to study Biological Sciences after high school. |
Year(s) Of Engagement Activity | 2010 |
Description | Cornell2010 |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Other academic audiences (collaborators, peers etc.) |
Results and Impact | Meeting with researchers and interesting work discussions Elaboration of plans for future collaboration |
Year(s) Of Engagement Activity | 2010 |
Description | Cornell2013 |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Other academic audiences (collaborators, peers etc.) |
Results and Impact | The talk sparked discussions with other scientists. The feedback I obtained was useful for my current research. The talk was important to advertise my research and to make contacts for future collaborations. A collaboration was initiated with the group of Prof. Haiyuan Yu for a new joint research project aimed at finding hotspot mutations in Cancer proteins. The collaboration is ongoing and a paper is currently under review in BMC Biology. |
Year(s) Of Engagement Activity | 2013 |
Description | GlaxoSmithKline2013 |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Other academic audiences (collaborators, peers etc.) |
Results and Impact | Engagement with contacts and discussions of mutual interests Plans for collaboration with some contacts made |
Year(s) Of Engagement Activity | 2013 |
Description | ISMB 2010 |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Other academic audiences (collaborators, peers etc.) |
Results and Impact | The poster presentation generated interest and positive feedback from the participants of the event Other participants provided interesting ideas that helped us on our research |
Year(s) Of Engagement Activity | 2010 |
URL | http://www.iscb.org/archive/conferences/iscb/ismb2010.html |
Description | ISMB BioOntologies SIG 2013 |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Other academic audiences (collaborators, peers etc.) |
Results and Impact | From the poster presentation, some interesting talks were developed and new contacts were made The feedback from the activity was useful for further develop on our research |
Year(s) Of Engagement Activity | 2013 |
URL | http://www.iscb.org/ismbeccb2013-program/ismbeccb2013-satellite-meetings#bio |
Description | ISMB NetBIO SIG 2013 |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Other academic audiences (collaborators, peers etc.) |
Results and Impact | After the poster presentation, some contacts were made and we had interesting discussions of the presented work We analysed our work with other researchers that helped us improve it furtherly |
Year(s) Of Engagement Activity | 2013 |
URL | http://www.iscb.org/ismbeccb2013-program/ismbeccb2013-satellite-meetings#netbio |
Description | Invited participation in experts' roundtable at the The Bioinformatics Strategy Meeting in London |
Form Of Engagement Activity | A formal working group, expert panel or dialogue |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Industry/Business |
Results and Impact | I participated in an Experts' roundtable together with other academics and members of Industry |
Year(s) Of Engagement Activity | 2016 |
Description | London Area Plant Molecular Sciences |
Form Of Engagement Activity | A formal working group, expert panel or dialogue |
Part Of Official Scheme? | No |
Geographic Reach | Local |
Primary Audience | Professional Practitioners |
Results and Impact | Increase the togetherness and cohesion of plant science in the London area repeated yearly meetings for 10 years |
Year(s) Of Engagement Activity | Pre-2006,2006,2007,2008,2009,2010 |
Description | MRC2012 |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Other academic audiences (collaborators, peers etc.) |
Results and Impact | Discussions about biological problems that we could help on, that were analysed on their community Establishing links with biologists and creating collaboration networks |
Year(s) Of Engagement Activity | 2012 |
Description | Milan2009-Biology |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Other academic audiences (collaborators, peers etc.) |
Results and Impact | The talks continued with analysing some other problems with the research we talked about Some plans were made for future collaboration with our new contacts |
Year(s) Of Engagement Activity | 2009 |
Description | Milan2009-CS |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Other academic audiences (collaborators, peers etc.) |
Results and Impact | The talks generated interesting discussions and we met some contacts Some plans for future collaboration were made with the University |
Year(s) Of Engagement Activity | 2009 |
Description | NIPS 2008 |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Other academic audiences (collaborators, peers etc.) |
Results and Impact | After the poster presentation, we made interesting contacts and we had positive discussions about our work We got feedback that allowed further development on our research |
Year(s) Of Engagement Activity | 2008 |
URL | http://nips.cc/Conferences/2008/ |
Description | Poster ClusterONE 2013 |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Other academic audiences (collaborators, peers etc.) |
Results and Impact | The poster presentation led to discussions on the work with fellow researchers The feedback provided by our peers was useful for further development |
Year(s) Of Engagement Activity | 2013 |
URL | http://www.iscb.org/ismbeccb2013 |
Description | RHUL Open Days |
Form Of Engagement Activity | Participation in an open day or visit at my research institution |
Part Of Official Scheme? | No |
Geographic Reach | Local |
Primary Audience | Public/other audiences |
Results and Impact | The University opens to the public and each department presents a showcase of its research, in a way which is accessible to a wider, non-specialist audience. This generated interest in the Research done by the CS Department. Many students joined the Computer Science Department |
Year(s) Of Engagement Activity | 2009,2010,2011,2012,2013,2014 |
Description | School visits |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Regional |
Primary Audience | Schools |
Results and Impact | increase awareness in plant research Increased interest, motivation of school kids |
Year(s) Of Engagement Activity | Pre-2006,2006,2007,2008,2009,2010,2011,2012,2013,2014 |
Description | Science Club at the Desborough School |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | Local |
Primary Audience | Schools |
Results and Impact | Dr Safina Khan organized a Science Club at the Desborough School (Maidenhead, UK) for a period of one year. This consisted of weekly meetings of one hour during which pupils performed experiments designed by Dr Khan and discussed with her scientific ideas, which also included concepts from this project. This generated interest and discussions from the students. Recently Dr Khan obtained a grant to continue this work funded by the Royal Society Partnership Grant Scheme together with the Desborough School. |
Year(s) Of Engagement Activity | 2009 |
Description | Talks to the groups of Martin Wilkins and Paul Matthews -- summer 2015 |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Regional |
Primary Audience | Other audiences |
Results and Impact | I presented our recent results in the area of Network Medicine to Prof Martin Wilkins and Prof Paul Matthews and their groups (I gave two separate talks) at the Department of Medicine, Imperial College, Hammersmith Hospital. The talk sparked interesting discussions and it was the beginning of a very interesting collaboration with the lab of Prof Matthews in the area of Multiple Sclerosis. |
Year(s) Of Engagement Activity | 2015 |
Description | Tasters courses |
Form Of Engagement Activity | Participation in an open day or visit at my research institution |
Part Of Official Scheme? | No |
Geographic Reach | Local |
Primary Audience | Schools |
Results and Impact | One-day courses opened to school pupils. They enquired about the courses that Computer Science departments offered, and future studies possibilities. Some students chose to follow the lead we gave them and engaged in Computer Science studies in our department. |
Year(s) Of Engagement Activity | 2009,2010,2011,2012,2013,2014 |
Description | UCA-Py2009 |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Other academic audiences (collaborators, peers etc.) |
Results and Impact | The talks generated interest and requests for more information on our work to some students We got a full time PhD student for the Computer Science department at Royal Holloway |
Year(s) Of Engagement Activity | 2009 |
Description | UCAS open days |
Form Of Engagement Activity | Participation in an open day or visit at my research institution |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Schools |
Results and Impact | During this talk I try to convey to school pupils what computer science is and why it is an exciting field of study. Often the talked sparked questions and discussions. A high percentage of school pupils who came to the talk decided to study Computer Science and many of these chose to study it in our department at Royal Holloway. |
Year(s) Of Engagement Activity | 2008,2009,2010,2011,2012,2013,2014 |
Description | UCLondon2012 |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Other academic audiences (collaborators, peers etc.) |
Results and Impact | Using the talks as a medium we met with multiple peers and engaged in interesting conversations We elaborated plans around the talks we had with some peers |
Year(s) Of Engagement Activity | 2012 |
Description | UNA-Py2009 |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Other academic audiences (collaborators, peers etc.) |
Results and Impact | We presented our research and the departmental study programs, that led to requests for more information and to meeting research contacts We extended our contact network for collaborations |
Year(s) Of Engagement Activity | 2009 |
Description | Venice2009 |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Other academic audiences (collaborators, peers etc.) |
Results and Impact | Following our talk, some interesting discussions sparked with new research contacts Some plans to collaborate with the researchers were done |
Year(s) Of Engagement Activity | 2009 |
Description | Venice2012 |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Other academic audiences (collaborators, peers etc.) |
Results and Impact | Our presentation led to meetings with contacts We developed some plans for collaborations |
Year(s) Of Engagement Activity | 2009,2012 |
Description | Yale2010 |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Other academic audiences (collaborators, peers etc.) |
Results and Impact | Talks about our work and meeting with new contacts Construction of plans for future collaboration |
Year(s) Of Engagement Activity | 2010 |
Description | talk at Galway -- May 2015 |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Other audiences |
Results and Impact | I presented my work at the School of Mathematics, Statistics and Applied Mathematics at Galway University, Ireland. The talk sparked discussions with other scientists. The feedback I obtained was useful for my current research. The talk was important to advertise my research and to make contacts for future collaborations. |
Year(s) Of Engagement Activity | 2015 |