Robust Incremental Semantic Resources for Dialogue

Lead Research Organisation: Queen Mary University of London
Department Name: Sch of Electronic Eng & Computer Science

Abstract

When humans process language, they do so incrementally, understanding and producing sentences on a word-by-word basis. In conversation, we easily switch roles between speaker and hearer mid-sentence, taking turns speaking and listening to show attention, clarify information or add detail when needed, interactively contributing to a shared, emerging picture of what we mean. If we want human-computer dialogue systems to be natural, efficient and easy to use, they must behave as incrementally as humans do: understanding and reacting interactively on a word-by-word basis rather than insisting on fully-formed sentences. We would prefer a system which behaves as in (1) below to the more familiar but annoying (2), or even the more patient but less interactive (3):

(1)
Usr: I'd like er [pause] . . .
Sys: Yes?
Usr: a ticket to Paris from, hang on . . .
Sys: Paris, France?
Usr: right, from London please.
Sys: OK, checking for Paris to London.

(2)
Usr: I'd like er [pause] . . .
Sys: I'm sorry, I don't understand. Please state your destination.

(3)
Usr: I'd like er [pause] . . .
Usr: a ticket to Paris from, hang on . . .
Usr: from London please.
Sys: OK. Do you mean Paris, France?

Previous research has developed computational models of dialogue which can behave incrementally, allowing the kind of interaction shown in (1); but they currently rely on hand-written rules or statistical models to relate words to actions and concepts. These lack the ability to express the complex meanings that human language is so good at conveying, and are time-consuming to create for any new system, domain or task. Instead, they need incremental models which deal with semantics, updating some representation of meaning as each word is heard or spoken, and which can be automatically learned from data; but general methods for doing this are currently lacking. This project will bridge this gap, providing a linguistically-based, learnable framework for incremental semantic interpretation and generation, which can be used to improve and extend existing dialogue systems.

The project will start from recent work in theoretical linguistics and dialogue modelling which has produced the incremental semantic processing framework Dynamic Syntax (Kempson et al., 2001). This shows promise in modelling complex incremental dialogue, but is currently under-developed from a practical point of view, needing time-consuming expert hand-crafting, and missing a link between action planning and language generation. This project will address these issues. First, we will develop methods for automatically learning Dynamic Syntax grammars from data, allowing other researchers to easily produce and use their own versions in their own systems. Second, we will develop its methods for generating language so that it can be integrated with the way dialogue systems plan their actions on the fly. These new capabilities will be implemented computationally and evaluated on real data. Together, they will then be used to build a demonstration dialogue system which can behave incrementally, and will be packaged into a publicly available toolkit for researchers to develop their own incremental, semantic dialogue systems.

Planned Impact

Benefits for users of human-computer dialogue systems:

Language-based or speech-enabled human-computer interfaces are common, and are becoming more common as mobile phone use and technology improves, and as computer games become more sophisticated. Users of these interfaces now include general internet users (via language-based interfaces to remote databases and networked applications); mobile phone/device users (via speech-based interfaces to local and remote devices); computer games players (via dialogue-based interaction with virtual characters); visually impaired individuals (via screen-free dialogue interaction). The quality of life of these users would be enhanced by the increased naturalness and interactivity of the incremental dialogue systems that will be enabled by this research. Realising this impact will involve transfer of the tools and toolkit resulting from the project to other academic and commercial research groups, with subsequent commercial development requiring timescales of 2-5 years - see below.

Benefits for the language technology industry:

Dialogue systems are used by many commercial applications (telephone call-handling, customer response, ticket-booking etc.), and the size of the language technology industry in the EU was 8.4 billion euros in 2008, with a projected growth rate of 10% annually [1]. The resources and toolkit to be developed by this project can lead to commercialisable dialogue system applications, and will therefore benefit this industry in future, contributing to global economic performance, and specifically the economic competitiveness of the UK. Commercialisation will require significant further development, but the compatibility of the resources with existing approaches to dialogue system implementation mean timescales could be short (e.g. 2-5 years, as witnessed by similar developments of e.g the CHAT or LetsGo! systems).

Benefits for school and university students:

The tools and toolkit produced will provide resources for use in university coursework within QMUL and beyond (resources will be released publicly under an open-source license), and for use in school teaching on relevant technology and language-related courses. We will facilitate realisation of this impact by publishing a schools-targetted article in the cs4fn magazine and website, by presenting at events such as the Big Bang and National Science and Engineering Week, and by developing a web-based demonstration of the technology.

Benefits for academic researchers:

Many dialogue researchers are currently engaged in research into incrementality in dialogue modelling and dialogue system implementation, but this is a young and emerging sub-field. The project results, and resulting toolkit and resources, will directly provide resources for continued development of such systems, providing general incremental semantic processing capabilities for interpretation and generation and easy deployment via learning from data and integration with general agent planning methods. Through continued research into incremental systems, we expect commercialisation options to begin over the next few years, as was the case with non-incremental dialogue system research in the preceding decades.

[1] http://www.langtech.co.uk/us/knowledge-center/doc_download/10-study-on-the-size-of-the-language-industry-in-the-eu.html

Publications

10 25 50
publication icon
Eshghi A (2013) Probabilistic Induction for an Incremental Semantic Grammar in 10th International Conference on Computational Semantics (IWCS)

publication icon
Eshghi A (2013) Incremental Grammar Induction from Child-Directed Dialogue Utterances in 4th Annual Workshop on Cognitive Modeling and Computational Linguistics (CMCL)

publication icon
Eshghi A; Hough J; Purver M; Kempson R; Gregoromichelaki E (2012) From Quantification to Conversation

publication icon
Eshghi, A (2012) Inducing lexical entries for an incremental semantic grammar in 7th International Workshop on Constraint Solving and Language Processing (CSLP)

publication icon
Ginzburg J; Purver M (2012) From Quantification to Conversation

publication icon
Healey PG (2014) Divergence in dialogue. in PloS one

publication icon
Healey PG (2016) Better late than Now-or-Never: The case of interactive repair phenomena. in The Behavioral and brain sciences

publication icon
Hough J (2014) Probabilistic Type Theory for Incremental Dialogue Processing in EACL 2014 Workshop on Type Theory and Natural Language Semantics (TTNLS)

publication icon
Hough J (2012) Processing Self-Repairs in an Incremental Type-Theoretic Dialogue System in 16th SemDial Workshop on the Semantics and Pragmatics of Dialogue (SeineDial)

 
Description The RISER project developed a framework to allow computers to understand and produce natural language as used in dialogue in a way that is more natural and human-like than before: it works incrementally, in a word-by-word fashion, and mirrors the way humans can repair or complete each other's utterances. By combining and extending the existing frameworks of Dynamic Syntax (DS) and Type Theory with Records (TTR), this new model can produce meaning representations fully incrementally, and generate language from them in the same way. The adaptation and integration of DS and TTR have allowed the framework to work with structured representations of meaning, compatible with current state-of-the-art information-state-based dialogue systems. The use of TTR type relations now allows the generation of natural language sentences from purely semantic meaning representations (rather than linguistically informed trees as in earlier DS frameworks), permitting its use with standard dialogue system managers. The addition of a graph-based representation of the language parsing process and context has also provided a model of the disfluencies and self-repairs common in natural conversation. The project also developed a method for learning grammars (specific instantiations within the general framework) from data, allowing researchers to produce incremental semantic grammars for any domain without needing expert linguistic insight. This has been tested by learning a grammar from real adult-child conversation data, simulating the language acquisition process of a child; the grammar learned was then able to interpret unseen data with semantic accuracy of 80%.
Exploitation Route Dialogue systems (both text- and speech-based) are becoming widely used in industry (telephone answering and routing systems, computer games, web interfaces, mobile phone assistants e.g. Siri). Improvements in the human-like quality of conversation available can be of direct benefit to these applications. The DYLAN framework and software produced on the RISER project has since been used at Heriot-Watt University within the EU FP7 SpaceBook project and EPSRC BABBLE project (EP/M01553X/1) to help develop practical personal spoken dialogue systems.
Sectors Creative Economy,Digital/Communication/Information Technologies (including Software),Education,Healthcare,Leisure Activities, including Sports, Recreation and Tourism,Culture, Heritage, Museums and Collections

URL http://cogsci.eecs.qmul.ac.uk/oldpages/RISER/
 
Description The main outputs of the project were new methods for incremental language processing, and induction from weakly labelled data. These methods were used to produce a public research software toolkit, DYLAN, available online and providing modules for dialogue researchers to integrate within other dialogue systems; leading to new research projects in the UK, Germany and Sweden, including BABBLE (EPSRC EP/M01553X/1) investigating new ways to learn dialogue systems. They also produced a prototype for social media analysis used to form a new spinout, Chatterbox Labs Ltd, now trading as a private limited company helping companies understand their markets. They also led to follow-on funding for joint research with two independent companies. In one case, IESO Digital Health Ltd, this investigated how to use language processing to help automatically monitor progress and improve outcomes in mental health therapy - this led to methods now being used by them in industry, and applications for patents to protect these worldwide in 2014 (extended to a US patent application in 2017). In the other case, Quality Health Ltd, this investigated how to use language processing to discover patients' opinions and experiences of healthcare expressed in social media or surveys - this is being applied through an ongoing project funded by the Care Quality Commission.
First Year Of Impact 2012
Sector Creative Economy,Digital/Communication/Information Technologies (including Software),Healthcare
Impact Types Societal,Economic,Policy & public services

 
Description CreativeWorks London (CMSI)
Amount £15,000 (GBP)
Organisation Creativeworks London 
Sector Charity/Non Profit
Country United Kingdom
Start 02/2013 
End 11/2013
 
Description EPSRC Pump-Priming (PPAT)
Amount £24,000 (GBP)
Funding ID EP/J501360/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 01/2012 
End 06/2012
 
Description EPSRC standard responsive mode
Amount £278,283 (GBP)
Funding ID EP/M01553X/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 04/2015 
End 09/2017
 
Description EU FP7 (ConCreTe)
Amount € 500,000 (EUR)
Funding ID 611733 
Organisation European Commission 
Department Seventh Framework Programme (FP7)
Sector Public
Country European Union (EU)
Start 10/2013 
End 09/2016
 
Description Innovation Fund (AOTD)
Amount £10,000 (GBP)
Organisation Queen Mary University of London 
Department Queen Mary Innovation
Sector Private
Country United Kingdom
Start 10/2013 
End 03/2014
 
Description Innovation Fund (RC)
Amount £8,000 (GBP)
Organisation Queen Mary University of London 
Department Queen Mary Innovation
Sector Private
Country United Kingdom
Start 11/2013 
End 03/2015
 
Description Innovation Fund (SLADE)
Amount £26,000 (GBP)
Organisation Queen Mary University of London 
Department Queen Mary Innovation
Sector Private
Country United Kingdom
Start 04/2015 
End 10/2015
 
Description Innovation Voucher (TEMPO)
Amount £5,000 (GBP)
Organisation Innovate UK 
Sector Public
Country United Kingdom
Start 02/2016 
End 06/2016
 
Description Proof of Concept Fund (Chatterbox)
Amount £46,000 (GBP)
Organisation Queen Mary University of London 
Department Queen Mary Innovation
Sector Private
Country United Kingdom
Start 10/2012 
End 04/2013
 
Description SMART Development of Prototype (Chatterbox)
Amount £250,000 (GBP)
Funding ID 720149 
Organisation Innovate UK 
Sector Public
Country United Kingdom
Start 01/2013 
End 12/2013
 
Description SMART Proof of Market (Chatterbox)
Amount £25,000 (GBP)
Funding ID 700081 
Organisation Innovate UK 
Sector Public
Country United Kingdom
Start 01/2012 
End 03/2012
 
Description Text Severity (Invitation to tender)
Amount £25,000 (GBP)
Organisation Care Quality Commission (CQC) 
Sector Public
Country United Kingdom
Start 03/2018 
End 05/2018
 
Title Annotations for conversational repair 
Description We provide a set of annotations for self- and other-repair phenomena relating to two sub-parts of the British National Corpus. 
Type Of Material Database/Collection of data 
Year Produced 2018 
Provided To Others? Yes  
Impact Computational repair detection experiments, with method and results detailed in Matthew Purver, Julian Hough and Christine Howes (2018). Computational Models of Miscommunication Phenomena. Topics in Cognitive Science, 2018. 
URL https://osf.io/w4dmz/
 
Description Chatterbox Labs Ltd 
Organisation Chatterbox Analytics Ltd
Country United Kingdom 
Sector Private 
PI Contribution Research and development of methods for social media analysis
Collaborator Contribution Research and development of methods for social media analysis
Impact Spin-out creation, collaborative projects via CreativeWorks London, TSB
Start Year 2012
 
Description IESO Digital health 
Organisation IESO Digital Health Ltd
Country United Kingdom 
Sector Private 
PI Contribution Research & development of methods of analysing language in clinical therapy and predicting treatment outcomes.
Collaborator Contribution Contribution of datasets and consultancy time
Impact Patent application, IP and licensing agreements - see relevant sections.
Start Year 2013
 
Description Quality Health Ltd 
Organisation Quality Health Limited
Country United Kingdom 
Sector Private 
PI Contribution Research and development of methods for health service evaluation via natural language processing of social media and survey data.
Collaborator Contribution Subcontracted research from InnovateUK voucher; joint research via Care Quality Commission contract; consultancy time.
Impact Research under InnovateUK voucher (TEMPO project). Research under Care Quality Commission tender (SEVERE project.)
Start Year 2016
 
Title ANALYSING TEXT-BASED MESSAGES SENT BETWEEN PATIENTS AND THERAPISTS 
Description Analysing text-based messages sent between patients and therapists A computer-implemented method comprising: obtaining text from text-based messages sent between a patient and a therapist providing psychological therapy; determining at least one feature of the text; and determining a characteristic of the patient and/or the therapist using the at least one feature. 
IP Reference WO2016071659 
Protection Patent application published
Year Protection Granted 2016
Licensed Yes
Impact Industrial collaboration with IESO Digital Health Ltd (previously PsychologyOnline Ltd); this IP has been licensed to them (Dec 2016) for use in software supporting their online cognitive behavioural therapy service (available via the NHS), by automatically identifying condition severity and predicting outcomes. Extended in 2017 to US patent application WO2016071659A1
 
Title Social media sentiment and emotion detection software 
Description Method and software for emotion detection in social media text using distant supervision and supervised classification - licensed to new spinout company Chatterbox Analytics Ltd. 
IP Reference  
Protection Copyrighted (e.g. software)
Year Protection Granted 2011
Licensed Yes
Impact Software license taken up by new spinout company Chatterbox Analytics Ltd.
 
Title Conversational repair detection 
Description Computational repair detection software, with method and experimental results detailed in Matthew Purver, Julian Hough and Christine Howes (2018). Computational Models of Miscommunication Phenomena. Topics in Cognitive Science, 2018. 
Type Of Technology Software 
Year Produced 2018 
Open Source License? Yes  
Impact Used in successful computational experiments, with method and results detailed in Matthew Purver, Julian Hough and Christine Howes (2018). Computational Models of Miscommunication Phenomena. Topics in Cognitive Science, 2018. 
URL http://onlinelibrary.wiley.com/doi/10.1111/tops.12324/full
 
Title DYLAN 
Description An open-source software platform for incremental dialogue modelling, including natural language understanding and generation, and data-driven grammar induction. 
Type Of Technology Software 
Year Produced 2012 
Open Source License? Yes  
Impact Use by other international teams in research 
URL http://dylan.sf.net
 
Title DiaSim 
Description DiaSim ("Dialogue Similarity") is an open-source Java project for calculating lexical, syntactic and semantic similarity in dialogue corpora, including within- and between-speaker similarity and comparison to various randomly re-ordered baselines - see Patrick Healey, Matthew Purver and Christine Howes (2014). Divergence in Dialogue. PLoS ONE 9(6): e98598, 2014. Available for download from SourceForge. 
Type Of Technology Software 
Year Produced 2014 
Open Source License? Yes  
Impact Used for computational experiments showing new results counter to existing theories of dialogue alignment - see Patrick Healey, Matthew Purver and Christine Howes (2014). Divergence in Dialogue. PLoS ONE 9(6): e98598, 2014. 
URL http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0098598
 
Title London Twitter Emotion Map 
Description An online live visualisation of emotional content across London expressed on Twitter. 
Type Of Technology Webtool/Application 
Year Produced 2010 
Impact Exhibited publicly at public QMUL research events with at least 100 visitors. 
URL http://www.eecs.qmul.ac.uk/~mpurver/emomap/
 
Title Reel Reviews 
Description Android mobile application for automatic mining of movie reviews from social media text 
Type Of Technology Webtool/Application 
Year Produced 2014 
Impact Application downloaded by over 500 people; average review score 4.8/5 
URL http://www.qappsonline.com/apps/reelreviews/
 
Title Sentimental 
Description iPhone mobile application to demonstrate sentiment detection techniques: automatically mines positive and negative sentiment topics being discussed in your geographical area by mining social media. 
Type Of Technology Webtool/Application 
Year Produced 2011 
Impact Over 650 downloads; intermediate step in establishing spinout company Chatterbox Labs Ltd; since withdrawn as company's commercial offering established. 
 
Title Sentimental APIs 
Description A set of online APIs for sentiment and emotion detection in online social media text; intermediate step in establishing spinout company Chatterbox Labs Ltd; since withdrawn as company's commercial offering established. 
Type Of Technology Webtool/Application 
Year Produced 2012 
Impact Used by hundreds of users; intermediate step in establishing spinout company Chatterbox Labs Ltd; since withdrawn as company's commercial offering established. 
URL http://mashape.com/sentimental
 
Company Name Chatterbox Labs Ltd 
Description Chatterbox Labs Ltd (established 2011 as Chatterbox Analytics Ltd) produces software to analyse and categorise sentiment, topic and intention in social media messages. 
Year Established 2011 
Impact Work supplying service to various organisations including national broadcaster, national energy supplier, public relations agency.
Website http://chatterbox.co/
 
Description Extending and Learning an Incremental Semantic Grammar 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other academic audiences (collaborators, peers etc.)
Results and Impact Extending and Learning an Incremental Semantic Grammar.

Invited talk, Computational Linguistics group, Department of Linguistics, University of Potsdam.

Increased collaboration between international groups
Year(s) Of Engagement Activity 2012
URL http://www.eecs.qmul.ac.uk/~mpurver/papers/purver-et-al12potsdam-talk.pdf
 
Description FACULTI online video 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Public/other audiences
Results and Impact An online video describing the project research, for public information, was recorded & released via FACULTI Media
Year(s) Of Engagement Activity 2013
URL http://facultimedia.com/data-driven-learning-in-an-incremental-grammar-framework/
 
Description Invited lecture at clinical workshop (ZiF, Bielefeld University, Language Processing for Diagnosis and Prediction in Mental Health) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Invited lecture to begin an interdisciplinary workshop on Language Processing and Clinical Differential Diagnosis at the Centre for Interdiscplinary Research, University of Bielefeld. Attended by many clinical researchers, clinical practitioners, health professionals, who then discussed possible future research and proposals with multiple disciplines applied to mental health care improvement.
Year(s) Of Engagement Activity 2015
URL http://www.uni-bielefeld.de/(en)/ZiF/AG/2015/11-26-Frank.html
 
Description Probabilistic Induction for an Incremental Semantic Grammar 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other academic audiences (collaborators, peers etc.)
Results and Impact Invited talk, Faculty of Linguistics and Literature, University of Bielefeld.

http://www.eecs.qmul.ac.uk/~mpurver/papers/purver-et-al12bielefeld-talk.pdf

Increased collaboration between international groups
Year(s) Of Engagement Activity 2012
URL http://www.eecs.qmul.ac.uk/~mpurver/papers/purver-et-al12bielefeld-talk.pdf
 
Description Talk and online video (University of Ljubljana, Analysing Dialogue for Diagnosis and Prediction in Mental Health) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Invited talk "Analysing Dialogue for Diagnosis and Prediction in Mental Health" attended by 50 academics and research students, which led to new collaborative research and ongoing H2020 funding proposal. Talk put online, now with 97 views.
Year(s) Of Engagement Activity 2017
URL http://videolectures.net/jota_purver_mental_health/
 
Description Using Conversations to Understand Influence and Interaction 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Professional Practitioners
Results and Impact About 50 people attended our talk at an industry one-day conference "Chinwag Insight: Psychology of Online Influence" in London, May 2012.

Interest in the research and spinout company, continuing industry contacts
Year(s) Of Engagement Activity 2012
URL http://chinwag.com/insight/psychology