A Unified Model of Compositional and Distributional Semantics: Theory and Applications

Lead Research Organisation: University of Sussex

Department Name: Sch of Engineering and Informatics

Abstract

Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.

Funded Value:

£370,065

Funded Period:

Sep 12 - Oct 15

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/I037458/1

Principal Investigator:

David Weir

Research Subject:

Info. & commun. Technol. (60%)

Linguistics (40%)

Research Topic:

Artificial Intelligence (60%)

Comput./Corpus Linguistics (40%)

Organisations

University of Sussex (Lead Research Organisation)

People	ORCID iD
David Weir (Principal Investigator)
William Keller (Co-Investigator)

Publications

Author Name

Title Publication Date Published

10 25 50

Bollegala D (2013) Cross-Domain Sentiment Classification Using a Sentiment Sensitive Thesaurus in IEEE Transactions on Knowledge and Data Engineering

Clarke D (2015) Fast Semantic Parsing with a Tensor Kernel in International Journal of Computational Linguistics and Applications

Clarke D. (2015) Efficiency in ambiguity: Two models of probabilistic semantics for natural language in IWCS 2015 - Proceedings of the 11th International Conference on Computational Semantics

Weeds J (2014) Learning to Distinguish Hypernyms and Co-Hyponyms in Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics

Weeds, J E (2017) When a Red Herring in Not a Red Herring: Using Compositional Methods to Detect Non-Compositional Phrases

Weir D (2016) Aligning Packed Dependency Trees: A Theory of Composition for Distributional Semantics in Computational Linguistics

Key Findings
Impact Summary


Description	We have discovered a new way to conceive of distributional composition, and developed a theoretical framework called Anchored Packed Trees (APTs) that implements this conception. We have developed a software implementation of this theory and have demonstrated that it can achieve state-of-the-art performance on several key tasks. We have demonstrated that it is possible for a machine learning model to learn to distinguish between different ontological relationships that distributional similarity measures typically conflate. We have compared the effectiveness of a wide variety of proposals regarding how to compose distributional representations of meaning. A distinctive feature of this work is that it is a so-called extrinsic evaluation, focussing on impact on practical applications rather than accuracy on artificial test sets. This has provided greater clarify as to where further research effort in this area should placed. We have devised a novel approach to distributional composition that involves higher-order dependency relations and are investigating applications of the approach in a number of NLP contexts. We have collaborated in a research initiative investigating ways in which to map distributional representations from one domain to another, and have shown that this very general approach can lead to effective cross-domain methods on a variety of tasks.
Exploitation Route	The methods being developed in this project (and in other related projects) are showing significant potential in applied natural language processing contexts. In general, this is a result of the fact that, in a machine learning scenario, these methods make it possible to generalise from knowledge of individual word forms to knowledge of the semantics that these words denote.
Sectors	Digital/Communication/Information Technologies (including Software) Government Democracy and Justice


Description	Our findings are being used in several ongoing projects being undertaking with commercial collaborators, in particular, projects funded by Innovation UK (formally TSB) where we are building applied Natural Language Processing tools. The impact of this project on these applied projects concerns the way that it is enabling the creation of more robust language processing tools. For example, in one project, where we are interfacing with arge product databases, we are using distributional methods arising out of this project to create less brittle database matching algorithms.
First Year Of Impact	2014
Sector	Digital/Communication/Information Technologies (including Software)
Impact Types	Economic

Abstract

Organisations

People

ORCID iD

Publications