Graphical Models for Relational Data: New Challenges and Solutions

Lead Research Organisation: University of Cambridge
Department Name: Engineering

Abstract

Data often come under the form of objects and relationships: forinstance, a library consists of books that cite each other; proteinsbind to other proteins according to a variety of patterns; a networkof online customers is formed by people that indicate which othercustomers give reliable product recommendations. Such relationshipscan be used to predict the behavior and properties of each object. Forinstance, if a particular news article cites several sport articles,this is evidence that the particular article is likely to be aboutsports. We propose novel ways of exploring this relationalinformation. The first task is precisely how to predict the propertiesof an object (e.g., the class of a news article) based on otherobjects that that share a relationship with it (e.g., the otherarticles that are cited by or cite our target). We show that thereare important forms of relationship that are not properly treated bycurrent methods, and propose a new methodology to account for suchrelations. The second task focuses on ways to measure similarity ofrelational structures. For instance, if we know that two proteinsphysically interact inside a yeast cell, can we infer which otherpairs of proteins are linked in a similar way? We show how toformulate problems like this using probabilistic models, and developnovel ways of discovering patterns in relational data withapplications to a variety of real-world problems.

Publications

10 25 50
publication icon
Silva R. (2009) Factorial mixture of Gaussians and the marginal independence model in Journal of Machine Learning Research

publication icon
Silva R. (2009) Hidden common cause relations in relational learning in Advances in Neural Information Processing Systems 20 - Proceedings of the 2007 Conference

publication icon
Silva R (2010) RANKING RELATIONS USING ANALOGIES IN BIOLOGICAL AND INFORMATION NETWORKS. in The annals of applied statistics

publication icon
Lacoste-Julien S (2013) SIGMa

 
Description Data come under the form of objects and relationships: for instance, a library consists of books that cite each other; proteins bind to other proteins according to a variety of patterns; a network of customers is formed by people that indicate which other customers are trusted reviewers. Such relationships can be used to the predict the behavior and properties of each object. For instance, if a particular news article cites several sport articles, this is evidence that the particular article is likely to be about sports. We have developed novel ways of exploring this relational information. The first task is precisely how to predict
the properties of an object (e.g., the class of a news article) based on other objects that that share a relationship with it (e.g., the other articles that are cited by or cite our target).
We showed that there are important forms of relationship that are not properly treated by current methods, and developed a new methodology to account for such relations. The second task is how to measure similarity of relational structures. For instance, if two proteins are physically interacting inside a yeast cell, which other pairs of proteins are linked in a similar way? We showed which probabilistic models correspond to this question, and developed novel ways of discovering patterns in relational data with applications in molecular biology. We also explored aspects of causal learning, and how to combine large databases of relational data.
Exploitation Route Relational data are ubiquitous. Our methods can be used in many areas of Data Science to understand and predict the relations between objects.
Sectors Digital/Communication/Information Technologies (including Software)

 
Description Due to the widespread availability of relational data, our work can be directly used in a variety of domains. For instance, companies that want to automatically generate metadocuments based on classifying groups of text files (e.g., the pages generated automatically by Google News) will benefit from a new approach to classify relational objects: in their case, objects are text documents, and relationships are citations or hyperlinks between documents. Biologists that want to unveil new patterns of protein-protein interactions will benefit from new tools that measure similarity of relational structures. Moreover, our work has also had a direct impact in theoretical machine learning. We developed new families of graphical models and inference algorithms which solve problems that cannot be treated with current machine learning methods.
First Year Of Impact 2012
Sector Digital/Communication/Information Technologies (including Software),Healthcare
Impact Types Economic

 
Description Google
Amount £55,000 (GBP)
Funding ID Google Research Award 
Organisation Google 
Sector Private
Country United States
Start 06/2009 
 
Description Microsoft
Amount £83,600 (GBP)
Funding ID Award 
Organisation Microsoft Research 
Sector Private
Country Global
Start  
 
Description Microsoft
Amount £66,000 (GBP)
Funding ID PhD Scholarship 
Organisation Microsoft Research 
Sector Private
Country Global
Start