Open Domain Statistical Spoken Dialogue Systems

Lead Research Organisation: University of Cambridge
Department Name: Engineering

Abstract

Spoken Dialogue Systems (SDS) encompass the technologies required to build effective man-machine interfaces which depend primarily on voice. To date they have mostly been deployed in telephone-based call centre applications such as banking, billing queries and travel information and they are built using hand-crafted rules.

The recent introduction of Apple Siri and Google Now has moved voice-based interfaces into the main-stream. These virtual personal assistants (VPAs) offer the potential to revolutionise the way we interact with machines, and they open the way to properly control and manage the emerging Internet of Things - the rapidly growing network of smart devices which lack any form of conventional user interface. However, current personal assistants are built using the same technology as limited domain spoken dialogue systems. They are not capable of sustaining conversational dialogues except within the selected limited domains which they have been explicitly programmed to handle.

Very recent work on statistical SDS has demonstrated that it is not only possible for such a system to adapt and improve performance within the domain for which it has been designed but it is also possible for the system to automatically extend its coverage to include new, hitherto unseen concepts. This suggests that it should be possible to build on the progress achieved in the development of limited domain statistical SDS to design a radically new form of spoken dialogue system (and hence VPA) which is able to extend and adapt with use to cover an ever-wider range of conversational topics. The design of such a system is the focus of this research proposal.

The key idea is to integrate the latest statistical dialogue technology into a wide coverage knowledge graph (such as freebase) which contains not only ontological information about entities but also the operations that can be applied to those entities (e.g. find flight information, book a hotel room, buy an ebook, etc. ).

The implementation of a single monolithic spoken dialogue system capable of interpreting and responding to every conceivable user request is simply not practicable. Hence, rather than simply trying to broaden the coverage of existing SDS, a novel distributed system architecture is proposed with three key features:

1. the three essential components of an SDS (semantic decoder, dialogue manager and response generator) are distributed across the knowledge-graph. In essence, every node in the graph has the capability to recognise when it is being referred to and have the capability to respond appropriately.

2. when the user speaks, all semantic decoders are listening, based on the activation levels of the decoder outputs, a topic tracker identifies which concept is in focus and activates its dialogue policy.

3. all components are statistical enabling them to be adapted automatically on-line using unsupervised adaptation. Data sparsity is managed by ensuring that the top level nodes in the class hierarchy have well-trained components. Initially, lower level more specialised concepts simply inherit the required statistical models from their super-classes. As the system interacts with users and more data is collected, lower level components acquire sufficient data to train their own dedicated statistical models.

The end result is a system that continually learns on-line. It starts with a limited and stilted conversational style, but the more it is used, the more fluent it becomes, and as users explore new topics, the system learns to adapt and extend its capability to handle those new topics. Since many users can be using the system simultaneously, learning can be fast and capable of accommodating live updates of the underlying data, all of which are characteristics that a virtual personal assistant must have to be genuinely useful.

Planned Impact

The principal goal of this research is to extend the theory and practice of spoken dialogue systems to support conversational interaction in unrestricted open domains. This enabling technology is critical to the development of accessible and widely available general purpose human computer interfaces, especially virtual personal assistants (VPAs).

VPAs offer the potential to revolutionise the way we interact with machines. They are being introduced to the public via smart phones, but they are actually independent -- all that they really need is an audio channel to a remote server via the internet. This is a disruptive technology with a relatively low barrier to entry and high impact. VPAs have the potential to change not only the way we interact with machines, but also the infrastructure and economic models that underly much of the digital economy since they provide an opportunity to capture users in much the same way that Facebook and Amazon try to capture their users today. This work therefore has the potential to have a direct impact on UK competitiveness.

VPAs will also become essential because speech is the only way to properly control and manage the emerging Internet of Things -- the rapidly growing network of smart devices for which conventional user interfaces are either ergonomically difficult (e.g. Apple iWatch, Google glass) or inappropriate (e.g. home devices such as thermostats, fridges, etc.). Furthermore, users will need all of these devices to be integrated into their highly personalised digital worlds, with a consistent single-point of contact. VPAs are the obvious way to achieve this. Given the UK's dependence on service industry and the knowledge economy, it is essential that it has the technology and expertise to compete in this space.

In order to ensure that the research outputs of this project can be exploited to the benefit of the UK, the Cambridge Dialogue Group is working closely with a Cambridge-based SME called VocalIQ Ltd (www.vocaliq.com) in which Cambridge University is a major share holder. VocalIQ is developing automatic self-learning spoken interfaces for applications in a variety of areas including automobiles, home automation and education. VocalIQ will collaborate on system development, provide advice on commercial deployment issues and provide access to a platform to allow the prototype system to be tested on real users. This direct interaction with a local SME should ensure that the benefits of the research outputs are realised during and soon after project completion.

Through recent and current research projects, the group also has working collaborations with Yahoo Iberia (Mika), Toshiba Cambridge Research Laboratory (Stylianou) and General Motors Advanced Technical Centre (Tzirkel-Hancock).

The two members of research staff working directly on the project will further enhance their skills in machine learning, natural language processing, human-computer interaction and the system skills needed to make complex real-time systems accessible to large segments of the public. There will also be 3 research students associated with the project. All will develop similar skills which will eventually feed into the workforce.

Finally, Cambridge will be launching a new MPhil in Machine Learning and Speech and Language Technology in October 2015 and this project will provide a catalyst for Masters projects and eventually for further PhD projects.