Combining Semantic Technologies and Machine Learning for the (semi-) automatic annotation of data.

Lead Research Organisation: University of Oxford
Department Name: Computer Science

Abstract

Context:
Identifying real-world entities and their structural relations in different data formats is of crucial importance for many applications including data integration, data cleaning, data mining, and knowledge discovery. In many domains, this is still a very labour-intensive task due to missing, incomplete or obfuscated metadata. The increase in the global data volume makes the manual annotation of data no longer feasible and requires a more scalable solution. New techniques for the (semi-) automatic data annotation would, therefore, add a lot of value in these domains.

Goals/Objectives:
The goal of the project is to investigate the (semi-) automatic annotation of data with meaningful metadata that describes the underlying structure and semantics. Preliminary research from the Information Systems group at the University of Oxford has shown that the combination of semantic technologies (such as large online knowledge graphs and ontologies) and state-of-the-art machine learning techniques (such as deep neural networks and semantic embedding) delivered very promising results for the annotation of relational data. This indicates great potential for further research regarding the annotation of more complex structural relationships on relational data as well as the annotation of other structured, semi-structured or unstructured data formats.

Novelty of the research methodology:
There exists only a limited number of publications that use a combination of semantic technologies (e.g. online knowledge graphs and ontologies) and deep learning (e.g. deep neural networks and semantic embedding) for the automatic annotation of data. Additionally, most of this work focuses on relational data. The combination of technologies and the application on different data formats constitute a novel research methodology unique to this project.

Alignment to EPSRC's strategies and research areas (which EPSRC research area the project relates to):
'This project falls within the EPSRC Information and Communication Technologies research area'
It is especially related to topics such as:
Information Systems
Databases
Machine Learning/Artificial Intelligence

Company collaborators:
The Information Systems group in Oxford works in collaboration with the Siemens research group in Munich on this project. The Siemens team contributes realistic use-cases as well as considerable research expertise and resources in this area.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/T517653/1 01/10/2019 30/09/2024
2248787 Studentship EP/T517653/1 01/10/2019 30/09/2022 Maximilian Pfluger Pfluger