Distill A Knowledge Graph from Unstructured Text via Deep Learning Technologies

Lead Research Organisation: University of Oxford
Department Name: Computer Science

Abstract

Summary: A knowledge graph (KG) is a structured graphical representation of semantic knowledge and relations where nodes in the graph denote the entities and the edges represent the relation between them. As an effective way to store and search knowledge, knowledge graphs have been applied in many intelligent systems and drawn a lot of research interest in recent years. And many knowledge graphs have been constructed and published, such as Freebase, Wikidata, DBpedia, ConceptNet and YAGO, but none of these works could completely fulfil the requirement of different applications in that knowledge from our world keeps continuously evolving and updated. Besides, constructing a knowledge graph, especially for a large one with millions or even billions of nodes and edges, still remains a rather challenging problem as it requires large amounts of expert labour to achieve structured information.
To overcome this issue, many approaches have attempted to build KGs automatically from unstructured text, which can broadly be classified under the following three categories: supervised approaches, semi-supervised approaches and distant supervision. Although some progress has been made by existing approaches, there are still lots of problems and challenges remaining unsolved.
In this project, our aims are to explore different ways of understanding the unstructured data better and extracting structured information needed for the construction of knowledge graphs. This involves lots of subtasks, such as entity identification, link prediction and referential disambiguation, each of which still remains challenging due to the difficulties and subtleties of language and ethereal transient nature of knowing something (e.g. facts and knowledge are continuously evolving). We aim at developing novel techniques for automated knowledge graph construction by leveraging techniques from Nature Language Processing (NLP), Machine Learning, and Knowledge Representation and Reasoning. The availability of such techniques will allow, for instance, for intelligent search over large bodies of documents using highly scalable query processing technologies over KGs.
Our objective is to be able, given a corpus of unstructured text, to automatically and efficiently construct a high-quality knowledge graph with minimized human intervention. This would be useful for a variety of downstream tasks, such as Webs search (e.g. improve the relevance and the quality of search in case of search engines like Google and Bing), question answering (e.g. help better understand natural language queries and give accurate answers in applications like Microsoft Cortana and Apple Siri ), and recommendation systems (e.g. help recommend more accurate and related product items to customers in some e-commerce websites like Amazon and Taobao).
This project falls within artificial intelligence technologies, natural language processing and databases research areas of EPSRC primarily within the ICT theme.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/T517811/1 01/10/2020 30/09/2025
2426711 Studentship EP/T517811/1 01/10/2020 31/03/2024