Scaling up Statistical Spoken Dialogue Systems for real user goals using automatic belief state compression
Lead Research Organisation:
Heriot-Watt University
Department Name: S of Mathematical and Computer Sciences
Abstract
Spoken dialogue systems (SDS) are increasingly being deployed in avariety of commercial applications ranging from traditional CallCentre automation (e.g. travel information) to new ``troubleshooting''or customer self-service lines (e.g. help fixing broken internetconnections).SDS are notoriously fragile (especially to speech recognition errors),do not offer natural ease of use, and do not adapt to differentusers. One of the main problems for SDS is to maintain an accurateview of the user's goals in the conversation (e.g. find a good indianrestaurant nearby, or repair a broadband connection) underuncertainty, and thereby to compute the optimal next system dialogueaction (e.g. offer a restaurant, ask for clarification). Recentresearch in statistical spoken dialogue systems (SSDS) hassuccessfully addressed aspects of these problems but, we shall show,it is currently hamstrung by an impoverished representation of usergoals, which has been adopted to enable tractable learning withstandard techniques.In the field as a whole, currently only small and unrealistic dialogueproblems (usually less than 100 searchable entities) are tackled withstatistical learning methods, for reasons of computationaltractability.In addition, current user goal state approximations in SSDS make itimpossible to represent some plausible user goals, e.g. someone whowants to know about nearby cheap restaurants and high-quality onesfurther away. This renders dialogue management sub-optimal and makesit impossible to deal adequately with the following types of userutterance: ``I'm looking for french or italian food'' and ``NotItalian, unless it's expensive''. User utterances with negations anddisjunctions of various sorts are very natural, and exploit the fullpower of natural language input, but current SSDS are unable toprocess them adequately. Moreover, much work in dialogue systemevaluation shows that real user goals are generally sets of items withdifferent features, rather than a single item. People like to explorepossible trade offs between features of items.Our main proposal is therefore to: a) develop realistic large-scale SSDS with an accurate, extended representation of user goals, and b) to use new Automatic Belief Compression (ABC) techniques to plan over the large state spaces thus generated.Techniques such as Value-Directed Compression demonstrate thatcompressible structure can be found automatically in the SSDS domain(for example compressing a test problem of 433 states to 31 basisfunctions).These techniques have their roots in methods for handling the largestate spaces required for robust robot navigation in realenvironments, and may lead to breakthroughs in the development ofrobust, efficient, and natural human-computer dialogue systems, withthe potential to radically improve the state-of-the-art in dialoguemanagement.
Organisations
Publications
Zhuoran Wang
(2015)
On the Linear Belief Compression of POMDPs: A re-examination of current methods
V Rieser
(2011)
Reinforcement Learning for Adaptive Dialogue Systems
Rieser V.
(2010)
Optimising information presentation for spoken dialogue systems
in Proceedings of the Annual Meeting of the Association for Computational Linguistics
Rieser V
(2010)
Empirical Methods in Natural Language Generation
Rieser V
(2014)
Natural Language Generation as Incremental Planning Under Uncertainty: Adaptive Information Presentation for Statistical Dialogue Systems
in IEEE/ACM Transactions on Audio, Speech, and Language Processing
Rieser V
(2011)
Learning and Evaluation of Dialogue Strategies for New Applications: Empirical Methods for Optimization from Small Data Sets
in Computational Linguistics
Paul Crook (Author)
(2010)
Representing Uncertainty about Complex User Goals in Statistical Dialogue Systems
in SIGDIAL
Paul Crook (Author)
(2012)
A Statistical Spoken Dialogue System using Complex User Goals and Lossless Value Directed Compression
in EACL
Paul Crook (Author)
(2011)
Parallel Computing and Practical Constraints when applying the Standard POMDP Belief Update Formalism to Spoken Dialogue Management
in International Workshop Series on Spoken Dialogue Systems Technology (IWSDS)
Description | This project has helped to develop more robust, efficient, and natural human-computer speech interfaces. Such interfaces are becoming used more frequently in everyday life -- for example in the Apple iPhone speech interface "Siri" and Google's "Now" and Voice Search applications. In this project we experimented with new computational models and statistical machine learning methods for tackling two main problems for such interfaces: 1) allowing users of speech systems to express more complex and natural goals, and 2) scaling these systems up to handle larger spoken dialogue problems. To do this, we invented new representations of complex user goals (for example "I want french food, or else italian if there's one close to me"), and we investigated techniques for"Automatic Belief Compression" that allow such large-scale, high-dimensional computational problems to be reduced in size to a lower, more tractable dimension. In practical terms we developed and deployed real telephone-based speech interfaces that implemented these ideas, and we tested them both in simulation and with members of the public, using crowdsourcing methods. We collected and analysed data from 2193 calls from 85 users. Our key findings have been that methods for automatically compressing such problems can produce speech systems which are almost as effective as those where expert human designers have hand-crafted a suitable lower-dimensional problem space. We also developed new knowledge about the effectiveness of a variety of different automatic compression methods. In addition, we developed a new method for automatic belief compression, which overcomes several problems with previous approaches. We published a number of conference papers reporting this work, and contributed to 2 books on new statistical learning methods for the development of speech interfaces. We have recently written 3 journal papers reporting our findings. |
Exploitation Route | This outputs of this research can be used in industrial and commercial development of novel speech and natural language interfaces such as future variants and extensions of Apple's iPhone speech interface Siri and Google's Now and Voice Search applications. Similar future application domains would be in interaction with virtual characters in areas such as education, healthcare, games, automated customer service, and human-robot interaction. Other applications are in hands-busy and eyes-busy operating situations, such as while driving and in medical contexts, where speech interfaces to information services are useful. In addition, advanced and natural speech interfaces are useful for blind, disabled, and ageing users who cannot easily use traditional interaction devices such as keyboards and screens. Finally, speech interfaces can be used to open up information services for illiterate users, for example in some developing countries. This research can be used in new interfaces and technologies for human-computer interaction - in particular for future speech interfaces and multimodal systems (these are interfaces which combine human communication channels such as speech, gesture, facial expression, body pose, gaze, graphics, natural language, and touch). The research allows more natural expression of user search goals using natural language, and develops computational methods for decision-making in such systems. The exploitation routes are therefore primarily in new interfaces and technologies for human-computer interaction, for example in human-robot interaction and in speech and multimodal interfaces, such as speech interfaces used with mobile phones, in cars, or for disabled users. |
Sectors | Digital/Communication/Information Technologies (including Software) |
URL | https://sites.google.com/site/abcpomdp/ |
Description | This outputs of this research are useful in industrial and commercial development of novel speech and natural language interfaces such as future variants and extensions of Apple's iPhone speech interface Siri, Microsoft's Cortana, and Google's Now and Voice Search applications. Similar future application domains would be in interaction with virtual characters in areas such as education, healthcare, games, automated customer service, and human-robot interaction. Other applications are in hands-busy and eyes-busy operating situations, such as while driving and in medical contexts, where speech interfaces to information services are useful. In addition, advanced and natural speech interfaces are useful for blind, disabled, and ageing users who cannot easily use traditional interaction devices such as keyboards and screens. Finally, such advanced speech interfaces can be used to open up information services for illiterate users, for example in some developing countries. |
First Year Of Impact | 2012 |
Sector | Digital/Communication/Information Technologies (including Software) |
Impact Types | Societal Economic |
Description | Amazon Alexa Challenge 2017 |
Amount | $100,000 (USD) |
Organisation | Amazon.com |
Sector | Private |
Country | United States |
Start | 11/2016 |
End | 11/2017 |
Description | EC FP7 ICT grant: SpaceBook |
Amount | £645,000 (GBP) |
Funding ID | 270019 |
Organisation | European Commission |
Sector | Public |
Country | European Union (EU) |
Start | 03/2011 |
End | 02/2014 |
Description | EC FP7 ICT project: JAMES: Joint Action for Multimodal Embodied Social Systems |
Amount | € 3,209,918 (EUR) |
Funding ID | 270435 |
Organisation | European Commission |
Sector | Public |
Country | European Union (EU) |
Start | 02/2011 |
End | 09/2014 |
Description | ERC Advanced Research Grant (STAC) |
Amount | € 1,930,000 (EUR) |
Funding ID | 269427 |
Organisation | European Commission |
Sector | Public |
Country | European Union (EU) |
Start | 05/2011 |
End | 05/2017 |
Description | Horizon 2020 ICT : MuMMER project - Multimodal Mall Entertainment Robot |
Amount | € 900,000 (EUR) |
Funding ID | 688147 |
Organisation | European Commission |
Sector | Public |
Country | European Union (EU) |
Start | 03/2016 |
End | 02/2020 |
Title | ABC dialogue management algorithms |
Description | A set of new statistical algorithms for spoken dialogue management -- see Crook et al. 2014 |
Type Of Material | Computer model/algorithm |
Year Produced | 2014 |
Provided To Others? | Yes |
Impact | Some of the algorithms developed are used for dialogue management in current/recent projects such as EC FP7 PARLANCE and SpaceBook and JAMES |
URL | https://sites.google.com/site/abcpomdp/home |
Title | Spoken dialogue data - ABC |
Description | A collection of real user spoken dialogues with our automated dialogue systems, as described in Crook et al. 2014 |
Type Of Material | Database/Collection of data |
Year Produced | 2014 |
Provided To Others? | Yes |
Impact | Use of data in EC FP7 projects such as SpaceBook and PARLANCE |
URL | https://sites.google.com/site/abcpomdp/home |
Title | End-to-end statistical spoken dialogue systems software and architecture |
Description | Automated spoken dialogue system using a fully statistical end-to-end architecture (see publications). |
Type Of Technology | Webtool/Application |
Year Produced | 2009 |
Impact | Used in future projects, e.g. EPSRC-follow on and EC FP7 project such as SpaceBook and PARLANCE |