Graphical Models for Relational Data: New Challenges and Solutions

Lead Research Organisation: University of Cambridge

Department Name: Engineering

Abstract

Data often come under the form of objects and relationships: forinstance, a library consists of books that cite each other; proteinsbind to other proteins according to a variety of patterns; a networkof online customers is formed by people that indicate which othercustomers give reliable product recommendations. Such relationshipscan be used to predict the behavior and properties of each object. Forinstance, if a particular news article cites several sport articles,this is evidence that the particular article is likely to be aboutsports. We propose novel ways of exploring this relationalinformation. The first task is precisely how to predict the propertiesof an object (e.g., the class of a news article) based on otherobjects that that share a relationship with it (e.g., the otherarticles that are cited by or cite our target). We show that thereare important forms of relationship that are not properly treated bycurrent methods, and propose a new methodology to account for suchrelations. The second task focuses on ways to measure similarity ofrelational structures. For instance, if we know that two proteinsphysically interact inside a yeast cell, can we infer which otherpairs of proteins are linked in a similar way? We show how toformulate problems like this using probabilistic models, and developnovel ways of discovering patterns in relational data withapplications to a variety of real-world problems.

Funded Value:

£190,575

Funded Period:

Oct 08 - Sep 10

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/F026641/1

Principal Investigator:

Zoubin Ghahramani

Research Subject:

Info. & commun. Technol. (90%)

Tools, technologies & methods (10%)

Research Topic:

Artificial Intelligence (80%)

Bioinformatics (10%)

Information & Knowledge Mgmt (10%)

Organisations

University of Cambridge (Lead Research Organisation)

People	ORCID iD
Zoubin Ghahramani (Principal Investigator)
Ricardo Silva (Researcher Co-Investigator)

Publications

Author Name

Title Publication Date Published

10 25 50

H Wallach (2010) Learning the Structure of Deep Sparse Graphical Models

Lacoste-Julien S (2013) SIGMa

R Silva (2008) Hidden Common Cause Relations in Relational Learning

S Lacoste-Julien (2011) Approximate inference for the loss-calibrated Bayesian

Silva R (2010) RANKING RELATIONS USING ANALOGIES IN BIOLOGICAL AND INFORMATION NETWORKS. in The annals of applied statistics

Silva R. (2009) Factorial mixture of Gaussians and the marginal independence model in Journal of Machine Learning Research

Silva R. (2009) Hidden common cause relations in relational learning in Advances in Neural Information Processing Systems 20 - Proceedings of the 2007 Conference

Key Findings
Impact Summary
Further Funding


Description	Data come under the form of objects and relationships: for instance, a library consists of books that cite each other; proteins bind to other proteins according to a variety of patterns; a network of customers is formed by people that indicate which other customers are trusted reviewers. Such relationships can be used to the predict the behavior and properties of each object. For instance, if a particular news article cites several sport articles, this is evidence that the particular article is likely to be about sports. We have developed novel ways of exploring this relational information. The first task is precisely how to predict the properties of an object (e.g., the class of a news article) based on other objects that that share a relationship with it (e.g., the other articles that are cited by or cite our target). We showed that there are important forms of relationship that are not properly treated by current methods, and developed a new methodology to account for such relations. The second task is how to measure similarity of relational structures. For instance, if two proteins are physically interacting inside a yeast cell, which other pairs of proteins are linked in a similar way? We showed which probabilistic models correspond to this question, and developed novel ways of discovering patterns in relational data with applications in molecular biology. We also explored aspects of causal learning, and how to combine large databases of relational data.
Exploitation Route	Relational data are ubiquitous. Our methods can be used in many areas of Data Science to understand and predict the relations between objects.
Sectors	Digital/Communication/Information Technologies (including Software)


Description	Due to the widespread availability of relational data, our work can be directly used in a variety of domains. For instance, companies that want to automatically generate metadocuments based on classifying groups of text files (e.g., the pages generated automatically by Google News) will benefit from a new approach to classify relational objects: in their case, objects are text documents, and relationships are citations or hyperlinks between documents. Biologists that want to unveil new patterns of protein-protein interactions will benefit from new tools that measure similarity of relational structures. Moreover, our work has also had a direct impact in theoretical machine learning. We developed new families of graphical models and inference algorithms which solve problems that cannot be treated with current machine learning methods.
First Year Of Impact	2012
Sector	Digital/Communication/Information Technologies (including Software),Healthcare
Impact Types	Economic


Description	Google
Amount	£55,000 (GBP)
Funding ID	Google Research Award
Organisation	Google
Sector	Private
Country	United States
Start	06/2009


Description	Microsoft
Amount	£83,600 (GBP)
Funding ID	Award
Organisation	Microsoft Research
Sector	Private
Country	Global
Start


Description	Microsoft
Amount	£66,000 (GBP)
Funding ID	PhD Scholarship
Organisation	Microsoft Research
Sector	Private
Country	Global
Start

Abstract

Organisations

People

ORCID iD

Publications