Categorical Natural Language Processing

Lead Research Organisation: University of Oxford
Department Name: Computer Science

Abstract

This project falls within the EPSRC "Information and Communication Technologies (ICT), Natural Language Processing" research area.

This project is aimed at developing categorical Natural Language Processing (NLP) algorithms and implementing them for tasks such as meaning extraction for automated question answering, as well as other NLP tasks.

Current state-of-the-art voice recognition technologies such as Alexa, Google Assistant, Siri etc. rely on Natural Language Processing tasks. So do search engines like Google and online translators like Google Translate. Natural Language Processing tasks are often performed via Machine Learning algorithms and/or distributional methods, i.e. Corpus-based models of meaning.

In recent years (Coecke, 2010), Category Theory has been applied in order to create compositional distributional models of meaning for vector-space semantics. The novelty of such models consists of the possibility of composing word meanings to extract the meaning of a sentence (or a text). The compositionality is mediated by the Grammar, which is modelled over a suitable compact closed grammar category. Compositional models have been used to construct Oxford & Google's automated reasoning software Quantomatic. Moreover, compositional models of meaning were proven efficient for a variety of NLP tasks (E. Grefenstette and M. Sadrzadeh, 2011), and often able to outperform non-compositional ones.

This project aims to research new categorical models for computational semantics and explore the connection between the Machine Learning approaches and the categorical compositional ones. The main objectives are the following:

- To improve the current mathematical framework for the model of meaning; this will be obtained by extending my thesis' research. The latter explored the categorical properties of a class of meaning categories in the light of linguistics applications. Moreover, it researched how to model meaning update mechanism in those categories. Further research on these aspects is essential to address more complicated tasks in computational semantics, including Automated Question Answering, Machine Translation etc.
- To implement categorical compositional models of meaning on real data, for specific NLP tasks, e.g. Automated Question Answering.
- To combine Deep Learning techniques with the categorical ones in order to develop categorical compositional meaning-extraction algorithms based not on distributional techniques but on neural networks.

The methodology will be mathematical and computational. In fact, this project will provide an innovative approach to NLP tasks due to the combination of machine learning and mathematical structures. It promises new applications for Category Theory and potential improvements in some of the current performances for specific NLP tasks. Moreover, this project will provide implementations to current categorical compositional models of meaning, opening the road to more potential computational applications.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/R513295/1 01/10/2018 30/09/2023
2219572 Studentship EP/R513295/1 01/10/2019 09/10/2021 Irene Rizzo