Scaling up Statistical Spoken Dialogue Systems for real user goals using automatic belief state compression

Lead Research Organisation: Heriot-Watt University

Department Name: S of Mathematical and Computer Sciences

Abstract

Spoken dialogue systems (SDS) are increasingly being deployed in avariety of commercial applications ranging from traditional CallCentre automation (e.g. travel information) to new ``troubleshooting''or customer self-service lines (e.g. help fixing broken internetconnections).SDS are notoriously fragile (especially to speech recognition errors),do not offer natural ease of use, and do not adapt to differentusers. One of the main problems for SDS is to maintain an accurateview of the user's goals in the conversation (e.g. find a good indianrestaurant nearby, or repair a broadband connection) underuncertainty, and thereby to compute the optimal next system dialogueaction (e.g. offer a restaurant, ask for clarification). Recentresearch in statistical spoken dialogue systems (SSDS) hassuccessfully addressed aspects of these problems but, we shall show,it is currently hamstrung by an impoverished representation of usergoals, which has been adopted to enable tractable learning withstandard techniques.In the field as a whole, currently only small and unrealistic dialogueproblems (usually less than 100 searchable entities) are tackled withstatistical learning methods, for reasons of computationaltractability.In addition, current user goal state approximations in SSDS make itimpossible to represent some plausible user goals, e.g. someone whowants to know about nearby cheap restaurants and high-quality onesfurther away. This renders dialogue management sub-optimal and makesit impossible to deal adequately with the following types of userutterance: ``I'm looking for french or italian food'' and ``NotItalian, unless it's expensive''. User utterances with negations anddisjunctions of various sorts are very natural, and exploit the fullpower of natural language input, but current SSDS are unable toprocess them adequately. Moreover, much work in dialogue systemevaluation shows that real user goals are generally sets of items withdifferent features, rather than a single item. People like to explorepossible trade offs between features of items.Our main proposal is therefore to: a) develop realistic large-scale SSDS with an accurate, extended representation of user goals, and b) to use new Automatic Belief Compression (ABC) techniques to plan over the large state spaces thus generated.Techniques such as Value-Directed Compression demonstrate thatcompressible structure can be found automatically in the SSDS domain(for example compressing a test problem of 433 states to 31 basisfunctions).These techniques have their roots in methods for handling the largestate spaces required for robust robot navigation in realenvironments, and may lead to breakthroughs in the development ofrobust, efficient, and natural human-computer dialogue systems, withthe potential to radically improve the state-of-the-art in dialoguemanagement.

Funded Value:

£297,342

Funded Period:

Oct 09 - Dec 12

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/G069840/1

Principal Investigator:

Oliver Lemon

Research Subject:

Info. & commun. Technol. (80%)

Linguistics (20%)

Research Topic:

Artificial Intelligence (80%)

Comput./Corpus Linguistics (20%)

Organisations

Heriot-Watt University (Lead Research Organisation)

People	ORCID iD
Oliver Lemon (Principal Investigator)
Paul Crook (Researcher Co-Investigator)

Publications

Author Name

Title Publication Date Published

|< < 1 2 > >|

10 25 50

Crook P (2014) Real user evaluation of a POMDP spoken dialogue system using automatic belief compression in Computer Speech & Language

Janarthanam S (2014) Adaptive Generation in Dialogue Systems Using Dynamic User Modeling in Computational Linguistics

Lemon O (2011) Learning what to say and how to say it: Joint optimisation of spoken dialogue management and natural language generation in Computer Speech & Language

O Lemon (2011) Machine learning for adaptivity in spoken dialogue systems in ACM Transactions in Speech and Language Processing

Paul Crook (Author) (2012) A Statistical Spoken Dialogue System using Complex User Goals and Lossless Value Directed Compression in EACL

Paul Crook (Author) (2011) Parallel Computing and Practical Constraints when applying the Standard POMDP Belief Update Formalism to Spoken Dialogue Management in International Workshop Series on Spoken Dialogue Systems Technology (IWSDS)

Paul Crook (Author) (2011) Lossless Value Directed Compression of Complex User Goal States for Statistical Spoken Dialogue Systems in Interspeech

Paul Crook (Author) (2010) Representing Uncertainty about Complex User Goals in Statistical Dialogue Systems in SIGDIAL

Rieser V (2014) Natural Language Generation as Incremental Planning Under Uncertainty: Adaptive Information Presentation for Statistical Dialogue Systems in IEEE/ACM Transactions on Audio, Speech, and Language Processing

Rieser V (2010) Empirical Methods in Natural Language Generation

Key Findings
Impact Summary
Further Funding
Research Databases and Models
Software and Technical Products


Description	This project has helped to develop more robust, efficient, and natural human-computer speech interfaces. Such interfaces are becoming used more frequently in everyday life -- for example in the Apple iPhone speech interface "Siri" and Google's "Now" and Voice Search applications. In this project we experimented with new computational models and statistical machine learning methods for tackling two main problems for such interfaces: 1) allowing users of speech systems to express more complex and natural goals, and 2) scaling these systems up to handle larger spoken dialogue problems. To do this, we invented new representations of complex user goals (for example "I want french food, or else italian if there's one close to me"), and we investigated techniques for"Automatic Belief Compression" that allow such large-scale, high-dimensional computational problems to be reduced in size to a lower, more tractable dimension. In practical terms we developed and deployed real telephone-based speech interfaces that implemented these ideas, and we tested them both in simulation and with members of the public, using crowdsourcing methods. We collected and analysed data from 2193 calls from 85 users. Our key findings have been that methods for automatically compressing such problems can produce speech systems which are almost as effective as those where expert human designers have hand-crafted a suitable lower-dimensional problem space. We also developed new knowledge about the effectiveness of a variety of different automatic compression methods. In addition, we developed a new method for automatic belief compression, which overcomes several problems with previous approaches. We published a number of conference papers reporting this work, and contributed to 2 books on new statistical learning methods for the development of speech interfaces. We have recently written 3 journal papers reporting our findings.
Exploitation Route	This outputs of this research can be used in industrial and commercial development of novel speech and natural language interfaces such as future variants and extensions of Apple's iPhone speech interface Siri and Google's Now and Voice Search applications. Similar future application domains would be in interaction with virtual characters in areas such as education, healthcare, games, automated customer service, and human-robot interaction. Other applications are in hands-busy and eyes-busy operating situations, such as while driving and in medical contexts, where speech interfaces to information services are useful. In addition, advanced and natural speech interfaces are useful for blind, disabled, and ageing users who cannot easily use traditional interaction devices such as keyboards and screens. Finally, speech interfaces can be used to open up information services for illiterate users, for example in some developing countries. This research can be used in new interfaces and technologies for human-computer interaction - in particular for future speech interfaces and multimodal systems (these are interfaces which combine human communication channels such as speech, gesture, facial expression, body pose, gaze, graphics, natural language, and touch). The research allows more natural expression of user search goals using natural language, and develops computational methods for decision-making in such systems. The exploitation routes are therefore primarily in new interfaces and technologies for human-computer interaction, for example in human-robot interaction and in speech and multimodal interfaces, such as speech interfaces used with mobile phones, in cars, or for disabled users.
Sectors	Digital/Communication/Information Technologies (including Software)
URL	https://sites.google.com/site/abcpomdp/


Description	This outputs of this research are useful in industrial and commercial development of novel speech and natural language interfaces such as future variants and extensions of Apple's iPhone speech interface Siri, Microsoft's Cortana, and Google's Now and Voice Search applications. Similar future application domains would be in interaction with virtual characters in areas such as education, healthcare, games, automated customer service, and human-robot interaction. Other applications are in hands-busy and eyes-busy operating situations, such as while driving and in medical contexts, where speech interfaces to information services are useful. In addition, advanced and natural speech interfaces are useful for blind, disabled, and ageing users who cannot easily use traditional interaction devices such as keyboards and screens. Finally, such advanced speech interfaces can be used to open up information services for illiterate users, for example in some developing countries.
First Year Of Impact	2012
Sector	Digital/Communication/Information Technologies (including Software)
Impact Types	Societal,Economic


Description	Amazon Alexa Challenge 2017
Amount	$100,000 (USD)
Organisation	Amazon.com
Sector	Private
Country	United States
Start	11/2016
End	11/2017


Description	EC FP7 ICT grant: SpaceBook
Amount	£645,000 (GBP)
Funding ID	270019
Organisation	European Commission
Sector	Public
Country	European Union (EU)
Start	03/2011
End	02/2014


Description	EC FP7 ICT project: JAMES: Joint Action for Multimodal Embodied Social Systems
Amount	€ 3,209,918 (EUR)
Funding ID	270435
Organisation	European Commission
Sector	Public
Country	European Union (EU)
Start	02/2011
End	09/2014


Description	ERC Advanced Research Grant (STAC)
Amount	€ 1,930,000 (EUR)
Funding ID	269427
Organisation	European Commission
Sector	Public
Country	European Union (EU)
Start	06/2011
End	05/2017


Description	Horizon 2020 ICT : MuMMER project - Multimodal Mall Entertainment Robot
Amount	€ 900,000 (EUR)
Funding ID	688147
Organisation	European Commission
Sector	Public
Country	European Union (EU)
Start	03/2016
End	02/2020


Title	ABC dialogue management algorithms
Description	A set of new statistical algorithms for spoken dialogue management -- see Crook et al. 2014
Type Of Material	Computer model/algorithm
Year Produced	2014
Provided To Others?	Yes
Impact	Some of the algorithms developed are used for dialogue management in current/recent projects such as EC FP7 PARLANCE and SpaceBook and JAMES
URL	https://sites.google.com/site/abcpomdp/home


Title	Spoken dialogue data - ABC
Description	A collection of real user spoken dialogues with our automated dialogue systems, as described in Crook et al. 2014
Type Of Material	Database/Collection of data
Year Produced	2014
Provided To Others?	Yes
Impact	Use of data in EC FP7 projects such as SpaceBook and PARLANCE
URL	https://sites.google.com/site/abcpomdp/home


Title	End-to-end statistical spoken dialogue systems software and architecture
Description	Automated spoken dialogue system using a fully statistical end-to-end architecture (see publications).
Type Of Technology	Webtool/Application
Year Produced	2009
Impact	Used in future projects, e.g. EPSRC-follow on and EC FP7 project such as SpaceBook and PARLANCE

Abstract

Organisations

People

ORCID iD

Publications