An integrated model of syntactic and semantic prediction in human language processing

Lead Research Organisation: University of Edinburgh

Department Name: Sch of Informatics

Abstract

When humans process language, they do so incrementally: they compute the meaning of a sentence on a word-by-word basis, rather than waiting until they reach the end of the sentence. As a consequence, readers and listeners have to constantly update their interpretations as new input becomes available. Experimental evidence shows that they also make predictions about upcoming input: for example, when hearing a verbs such as eat , the listener predicts that an object such as soup is likely to follow. The prediction process has two components: syntactic prediction, i.e., the structure of the upcoming input is anticipated (after eat , an object is likely, but a subject isn't), and semantic prediction, i.e., the meaning of the upcoming input is anticipated (after eat , a noun referring to edible things is likely, but one referring to abstract things isn't).Previous research has developed computational models of either syntactic or semantic prediction in human sentence processing. But there are currently no models that capture both processes in a single framework, despite clear experimental evidence that humans rely on both types of information when generating predictions. The aim of this project is to develop a model of human sentence processing that integrates syntactic and semantic prediction; such a model will not only make it possible to investigate an important theoretical question in psycholinguistics, but it also has important potential applications in natural language processing.Our model will bring together two key approaches in sentence processing. On the syntactic side, we will develop an incremental, probabilistic parser that generates syntactic predictions. This parser will be based on an extension of the Tree-adjoining Grammar (TAG) formalism, which in previous work has been shown to capture prediction data. The parser will be combined with a distributional model of semantics, which is the standard way of modeling word meaning in cognitive science; we will extend this model to capture sentential meaning, thus making it amenable to integration with a parser. Three distinct ways of achieving such an integration will be pursued, each corresponding to a theoretical position in psycholinguistics: the autonomous processing view, which holds that syntax and semantics operate independently, the syntax-first view, which holds that semantic processing has access to syntax, but not vice versa, and the interactive processing view, according to which the two components freely exchange information.By implementing these three approaches, and evaluating the resulting predictions against data from eye-tracking and priming experiments, we will be able to shed light on a key question in psycholinguistics, viz., how syntactic and semantic processing interact.Apart from this theoretical contribution, the project also has a practical aim: a computational model of human sentence processing can be used to determine which parts of a text are hard to understand. This information can be used to provide feedback to human writers, score essays, or correct the output of automatic language generation systems. In order to assess the potential for such applications, we will focus on one particular problem, viz., text simplification. We will develop a system that takes input text and makes it easier to read, e.g., for language-impaired readers or for language learners. Our integrated model of syntax and semantics will be used to pinpoint the difficult parts of a text, which will then be replaced by simplified passages using a technique called integer linear programming, which has previously been used successfully for text rewriting. The resulting simplified texts will be evaluated for their intelligibility in studies with human readers.

Planned Impact

Benefits for users of simplified texts The proposed project will further our understanding of human language processing, through the development of a model that predicts word-by-word processing effort for text. The project will also develop a system for text simplification that utilizes this model of processing effort. The aim of this system is to detect passages in a text that are hard to process and automatically replace them with simplified text. The beneficiaries of high-quality text simplification are people with language impairments like aphasia, who often encounter problems in understanding written text, second language learners (by aiding the construction of texts that are of the desired linguistic complexity), and users that face other cognitive demands at the same time (e.g., while driving). The quality of life of these users would be enhanced by the proposed text simplification system. This impact could be realized as soon as a commercial version of the text simplification system is available. Benefits for language-impaired individuals An accurate model of human processing effort has the potential to improve the diagnosis of language impairments (e.g., aphasia), as it makes it possible to compare model simulations with the actual processing behavior (e.g., reading times) of impaired individuals. It is also conceivable that the model is useful for the diagnosis of learning disabilities (e.g., autism) that affect linguistic abilities. The primary beneficiaries of such an application would be the language-impaired individuals themselves, as well as health care professionals involved in the diagnosis and treatment of such individuals. Ultimately, this would result in a positive impact on the quality of life and health in the UK, as well as contributing to the effectiveness of public services. The time scale for this impact to be realized is 5-10 years. Benefits for the language technology industry The size of the language technology industry in the EU was 8.4 billion euros in 2008, with a projected growth rate of 10% annually [1]. The text simplification system to be developed by this project can be commercialized and will therefore benefit this industry, contributing to global economic performance, and specifically the economic competitiveness of the UK. Commercialization can begin as soon as the project is finished. Apart from text simplification, the model of human processing difficulty envisaged in this project will be useful for a range of other language technology applications. This includes tutoring systems that provide feedback to human writers, pinpointing passages that are difficult to read and need editing. Another possible application is essay scoring, which is already widely automated for proficiency tests such as TOEFL. In a machine translation system, a model of processing difficulty could be used to identify badly translated passages, thus improving translation system output (or automating system evaluation). For these benefits to the language technology industry to be realized, significant further development will be required, with a realistic time scale of 5-10 years. [1] http://ec.europa.eu/dgs/translation/publications/studies/size_of_language_industry_en.pdf

Funded Value:

£329,561

Funded Period:

Sep 11 - Feb 15

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/I032916/1

Principal Investigator:

Frank Keller

Research Subject:

Info. & commun. Technol. (80%)

Linguistics (20%)

Research Topic:

Artificial Intelligence (30%)

Cognitive Science Appl. in ICT (10%)

Comput./Corpus Linguistics (20%)

Human Communication in ICT (40%)

Organisations

University of Edinburgh (Lead Research Organisation)

People	ORCID iD
Frank Keller (Principal Investigator)
Mirella Lapata (Co-Investigator)

Publications

Author Name

Title Publication Date Published

10 25 50

F Sangati (2013) Incremental Tree Substitution Grammar for Parsing and Sentence Prediction. in Transactions of the Association for Computational Linguistics

I Konstas (2014) Incremental Semantic Role Labeling with Tree Adjoining Grammar in Proceedings of the Conference on Empirical Methods in Natural Language Processing

Konstas, I (2015) Semantic Role Labeling improves incremental parsing

Silberer C (2016) Grounded Models of Semantic Representation

W Blacoe (2012) A Comparison of Vector-based Representations for Semantic Composition in Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

W Blacoe (2013) A Quantum-Theoretic Approach to Distributional Semantics in Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Key Findings
Impact Summary


Description	We developed a computational model of human language processing that exhibits key cognitive features. In particular, the model builds structures incrementally, and integrates both syntactic and semantic information.
Exploitation Route	Other researchers can download our model and test in experimental and modelling studies, or extend it.
Sectors	Digital/Communication/Information Technologies (including Software),Education
URL	https://github.com/sinantie/PLTAG


Description	Google has shown an interest in using our incremental parsing technology to improve web search and web question answering.
First Year Of Impact	2015
Sector	Digital/Communication/Information Technologies (including Software)
Impact Types	Economic

Abstract

Planned Impact

Organisations

People

ORCID iD

Publications