Babble: domain-general methods for learning natural spoken dialogue systems

Lead Research Organisation: Heriot-Watt University
Department Name: S of Mathematical and Computer Sciences

Abstract

The demand for future conversational speech technologies is estimated to reach a market value of $3 billion by 2020 (Grand View Research, 2014). Our proposed technology will provide vital foundations and impetus for the rapid development of a next-generation of naturally interactive conversational interfaces with deep language understanding, in areas as diverse as healthcare, human-robot interaction, wearables, home automation, education, games, and assistive technologies.

Future conversational speech interfaces should allow users to interact with machines using everyday spontaneous language to achieve everyday needs. A commercial example with quite basic capabilities is Apple's Siri. However, even today's limited speech interfaces are very difficult and time-consuming to develop for new applications: their key components currently need to be tailor-made by experts for specific application domains, relying either on hand-written rules or statistical methods that depend on large amounts of expensive, domain-specific, human-annotated dialogue data. The components thus produced are of little or no use for any new application domain, resulting in expensive and time-consuming development cycles.

One key underlying reason for this status quo is that for spoken dialogue, general, scalable methods for natural language understanding (NLU), dialogue management (DM), and language generation (NLG) are not yet available. Current domain-general methods for language processing are sentence-based and so perform fairly well for processing written text, but they quickly run into difficulties in the case of spoken dialogue, because ordinary conversation is highly fragmentary and incremental: it naturally happens word-by-word, rather than sentence-by-sentence. Real conversation happens bit by bit, using half-starts, suggested add-ons, pauses, interruptions, and corrections -- without respecting the boundaries of sentences. And it is precisely these properties that contribute to the feeling of being engaged in a normal, natural conversation, which current state-of-the-art speech interfaces fail to produce.

We propose to solve these two problems together, by for the first time:

(1) combining domain-general, incremental, and scalable approaches to NLU, DM, and NLG;

(2) developing machine learning algorithms to automatically create working speech interfaces from data, using (1).

We propose a new method "BABBLE" in which speech systems can be trained to interact naturally with humans, much like a child who experiments with new combinations of words to discover their usefulness (though doing this offline to avoid annoying real users while doing so!).

BABBLE will be deployed as a developer kit and as mobile speech Apps for public use and engagement, and will also generate large dialogue data sets for scientific and industry use.

This new method will not require expensive data annotation or expert developers, leading to easy creation of new speech interfaces that advance the state-of-the-art in interacting more naturally, and therefore more successfully and engagingly with users.

New advances have been made in key areas relevant to this proposal: incremental grammars, formal semantic models of dialogue, and sample-efficient machine learning methods. The opportunity to combine and develop these approaches has arisen only recently, and now makes major advances in spoken dialogue technology possible.

Planned Impact

There are 3 main arenas of impact for this work: commercial, societal, and academic.
The major societal impact of this work is in widening access to information technology, and making it more efficient and engaging for users through the use of spoken conversation.
The core technology for natural conversational speech interfaces developed here has the potential to impact on several important groups of users:

* Internet users, mobile App users, and wearable technology users via improved conversational interfaces for control and search;
* Visually impaired individuals via better speech interfaces;
* The elderly and disabled, via automated assisted independent living using dialogue-based interaction;
* Users of immersive virtual reality interfaces (using speech for control and search);
* Computer Game players, e.g. via improved dialogue-based interaction with virtual characters;
* Users of educational technologies, using conversational virtual characters for learning;
* People interacting with robots (spoken Human-Robot Interaction).

In all of the above areas, the important economic impact of the work is also in lowering costs for industry through the automation of speech interface development -- a central objective of this proposal. Companies will benefit through the lowering or removal of costs associated with hiring expert system developers and data annotators. These advances will make speech interfaces a more affordable technology, deployable more rapidly in wider contexts.
Academic and industrial researchers will also benefit via the provision of new, open resources for conversational system development.

To realise the full impact of the research, we will therefore focus on the following pathways:

* BABBLE mobile Apps: public engagement with the developed systems released as speech Apps on smartphones, mobile devices, and wearables (like Siri and Google Now);

* Open data releases: the large anonymised spoken dialogue data-sets generated by the project will be of interest to academic researchers and industry;

* BABBLE Toolkit: software and tools developed will be released for academic and industrial use;

* Interdisciplinary publications and research visits: combining research from two previously disconnected communities: wide-coverage grammars of Natural Language and statistical approaches to automated dialogue systems;

* Demonstrations: showcasing the BABBLE technology at events such as the Scottish Informatics and Computer Science Alliance (SICSA) demofests and EC ICT events, which bring together academia with industrial developers;

* Robotarium demos: showcasing BABBLE systems for human-robot interaction applications at the Robotarium (EPSRC Infrastructure Grant, 2013);

* Impact workshop: organised at the end of the project to amplify these avenues to impact, inviting researchers from both academia and industry;

* Spin-out company: the BABBLE technology will be integrated in to a spin-out speech technology company which is currently being developed at Heriot-Watt by the PI, under an "Impact Acceleration" grant.

To reach the general public as well as interested academic and industry researchers, we will also use a range of social media such as Twitter and YouTube, as we have done in the past with success for a number of our projects, as well as using traditional websites. We will also further develop our contacts with press and media, which have lead to a number of newspaper, radio, and television outputs describing the PI's research. Our advisory board member Dr. Matthew Purver, founder and Chief Data Scientist of Chatterbox, will advise us on technology transfer and commercialisation of dialogue and language technology (see letters of support).
 
Description We have developed a method which allows the automatic construction of natural spoken dialogue systems (or conversational agents), such as Alexa and Siri, from very small amounts of un-annotated data. This reduces the need for expert developers, and so reduces cost.
This is an advance because previous methods required large amounts of data to be collected and annotated, which is both expensive and time-consuming.
Moreover, our method supports more natural human conversation than many previous systems, because it processes language word-by-word, rather than needing to wait for the end of an utterance, as current systems like Alexa do.
We have applied this method in a number of dialogue systems. We also entered the Amazon Alexa Challenge in 2017, to build a socially intelligent conversational agent, and were one of only 3 teams who made it to the final. We competed again in 2018. and also made it to the final. We have also developed and tested a method for creating conversational agents that can learn novel word meanings from a human tutor (best paper award, ACL Robo-NLP 2017). We also released 2 new datasets for conversational agent development. The BABBLE software is available as open-source. We are in the process of spinning-out a company (in 2020) from the university which focusses on conversational AI.
Exploitation Route This technology can be used by developers of future speech and dialogue interfaces -- to more rapidly and cheaply develop natural spoken dialogue interfaces for conversational devices and services.
We have also released new dialogue data and software for use by others.
Sectors Creative Economy,Digital/Communication/Information Technologies (including Software),Education,Financial Services, and Management Consultancy,Healthcare,Culture, Heritage, Museums and Collections,Retail

URL https://sites.google.com/site/hwinteractionlab/babble
 
Description This project has potential economic impact in developing a prototype method for creating dialogue systems from small amounts of data, without the need for expert developers, thus reducing the cost of developing such systems substantially. We were involved in several knowledge transfer collaborations with industry, including new funded projects with Voysis.com (£103K), Speech-Graphics.com/DataLab (£110K), and Amazon.com ($350K, for the 2017 and 2018 Amazon Alexa Challenges). These collaborations use our technical expertise and experience in dialogue system development gained during the project. We were also featured on BBC documentaries "6 robots and us" in 2017, and "The Joy of AI" in 2018. We also appeared on "Tomorrow's World Live" in 2018. We have also created a new MSc programme in AI with Speech and Multimodal Interaction, where methods and techniques developed as part of this project are taught to students in new courses on Conversational Agents. 30 students took this course in 2018, 35 in 2019, and about 70 in 2020. We are in the process of spinning-out a company (in 2020) from the university which focusses on conversational AI.
First Year Of Impact 2017
Sector Digital/Communication/Information Technologies (including Software),Education
Impact Types Cultural,Societal,Economic

 
Description Amazon Alexa Challenge 2017
Amount $100,000 (USD)
Organisation Amazon.com 
Sector Private
Country United States
Start 11/2016 
End 11/2017
 
Description Amazon Alexa Challenge 2018
Amount $250,000 (USD)
Organisation Amazon.com 
Sector Private
Country United States
Start 01/2018 
End 11/2018
 
Description Commercial Contract - Voysis
Amount £103,390 (GBP)
Organisation Voysis Ltd 
Sector Private
Country Ireland
Start 10/2017 
End 09/2018
 
Description DataLab
Amount £110,033 (GBP)
Organisation Government of Scotland 
Department Scottish Funding Council
Sector Public
Country United Kingdom
Start 07/2018 
End 06/2019
 
Title BURCHAK human-human dialogue dataset 
Description A new freely available human-human dialogue data set for interactive learning of word meanings from a human tutor. The data has been collected using the DiET Chat Tool (Healey et al., 2003; Mills and Healey, submitted) with a novel task, where a Learner needs to learn invented visual attribute words (such as "burchak" for square) from a tutor. 
Type Of Material Database/Collection of data 
Year Produced 2017 
Provided To Others? Yes  
Impact Other researchers in the field have already started using this dataset, which can be used to train conversational agents that can learn new word meanings from a human tutor. No dataset for this task existed prior to this. 
URL https://arxiv.org/abs/1709.10431
 
Title NIPS 2015 deep RL data 
Description Data for results reported in the paper: Strategic Dialogue Management via Deep Reinforcement Learning, NIPS 2015. 
Type Of Material Database/Collection of data 
Year Produced 2016 
Provided To Others? Yes  
Impact The first application of Deep Reinforcement Learning methods to dialogue management problems. 
 
Title bAbI+ dataset 
Description Dialog bAbI+ bAbI+ is an extension of the bAbI Task 1 dialogues with everyday incremental dialogue phenomena (hesitations, restarts, and corrections) which model the disfluencies and communication problems in everyday spoken interaction in real-world environments. See https://www.researchgate.net/publication/319128941_Challenging_Neural_Dialogue_Models_with_Natural_Data_Memory_Networks_Fail_on_Incremental_Phenomena, http://aclweb.org/anthology/D17-1235 
Type Of Material Database/Collection of data 
Year Produced 2018 
Provided To Others? Yes  
Impact Released as part of ParlAI from Facebook research see: http://www.parl.ai/static/docs/tasks.html 
URL http://www.parl.ai/static/docs/tasks.html
 
Title BABBLE software 
Description This is an implementation continuously under development of: (a) Dynamic Syntax and Type Theory with Records (DS-TTR) (Cann et. al. 2015; Eshghi et. al. 2012), a word-by-word incremental semantic grammar, especially suited to dialogue processing, with semantic and contextual representations produced in the TTR formalism. The implementation contains the following: (1) depth-first and breadth-first parsers and generators, based on hand-crafted domain-specific lexicons that cover a broad range of structures, including relatives clauses and tense; (2) a prototype incremental dialogue system, DyLan, based on Jindigo (Skantze & Hjalmarsson, 2010), but using the Dynamic Syntax (Kempson et. al. 2001, Cann et. al. 2005) parser/generator; (3) a grammar induction module which learns DS incremental grammars from data (Eshghi et. al. 2013). This has been updated with improvements made under the ESPRC BABBLE project, specifically to incorporate an interactive parser based on Eshghi et. al.'s (2015) model of feedback in dialogue. (b) A Dialogue System for interactive learning of visually grounded language from a human partner that uses DS-TTR for dialogue processing and grounding (Yu et. al. 2016). (c) An integration of DS-TTR with Reinforcement Learning, allowing incremental dialogue systems to be automatically induced from raw, unannotated dialogue examples (Eshghi & Lemon, 2014; Kalatzis et. al. 2016). (a) has been updated and improved continuously throughout the BABBLE project. (b) and (c) have been produced exclusively within the BABBLE project. 
Type Of Technology Software 
Year Produced 2017 
Open Source License? Yes  
Impact Demonstration at SemDial 2015 conference. Demo at INLG 2016 conference. Demo at Semdial 2016 conference. Contribution to best paper award at ACL Robo-NLP 2017. Demonstration system VOILA at SIGDIAL 2017. 
URL https://bitbucket.org/dylandialoguesystem
 
Title Multi-Task Learning for Domain-General Spoken Disfluency Detection in Dialogue Systems 
Description Code for the paper "Multi-Task Learning for Domain-General Spoken Disfluency Detection in Dialogue Systems" (Igor Shalyminov, Arash Eshghi, and Oliver Lemon) [SemDial 2018 
Type Of Technology Software 
Year Produced 2018 
Impact Spontaneous spoken dialogue is often disfluent, containing pauses, hesitations, self-corrections and false starts. Processing such phenomena is essential in understanding a speaker's intended meaning and controlling the flow of the conversation. Furthermore, this processing needs to be word-by-word incremental to allow further downstream processing to begin as early as possible in order to handle real spontaneous human conversational behaviour. In addition, from a developer's point of view, it is highly desirable to be able to develop systems which can be trained from 'clean' examples while also able to generalise to the very diverse disfluent variations on the same data - thereby enhancing both data-efficiency and robustness. In this paper, we present a multitask LSTM-based model for incremental detection of disfluency structure1 , which can be hooked up to any component for incremental interpretation (e.g. an incremental semantic parser), or else simply used to 'clean up' the current utterance as it is being produced. We train the system on the Switchboard Dialogue Acts (SWDA) corpus and present its accuracy on this dataset. Our model outperforms prior neural network-based incremental approaches by about 10 percentage points on SWDA while employing a simpler architecture. To test the model's generalisation potential, we evaluate the same model on the bAbI+ dataset, without any additional training. bAbI+ is a dataset of synthesised goal-oriented dialogues where we control the distribution of disfluencies and their types. This shows that our approach has good generalisation potential, and sheds more light on which types of disfluency might be amenable to domain-general processing. 
URL https://github.com/ishalyminov/multitask_disfluency_detection
 
Title SimpleDS -- Deep Reinforcement Learning for dialogue management 
Description SimpleDS is a simple dialogue system trained with deep reinforcement learning. In contrast to other dialogue systems, this system selects dialogue actions directly from raw (noisy) text of the last system and user responses. The motivation is to train dialogue agents with as little human intervention as possible. 
Type Of Technology Software 
Year Produced 2016 
Open Source License? Yes  
Impact This software is now being used in several projects in our lab. It lead to our NIPS 2015 paper and it was demonstrated at the IWSDS 2016 conference. 
URL https://github.com/cuayahuitl/SimpleDS
 
Description 2nd Dynamic Syntax Conference 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact This was an academic conference, website here: https://sites.google.com/site/seconddsconf/
Year(s) Of Engagement Activity 2018
URL https://sites.google.com/site/seconddsconf/
 
Description National TV - demonstration and discission on "The Joy of AI" documentary, BBC4 
Form Of Engagement Activity A broadcast e.g. TV/radio/film/podcast (other than news/press)
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact Demonstrated conversational AI system on BBC4 documentary "The Joy of AI" hosted by Jim Al Khalili -- international reach. Many follow-up requests for collaboration.
Year(s) Of Engagement Activity 2018
URL https://www.bbc.co.uk/programmes/p06jt7j4
 
Description National TV: demonstration on BBC's Tomorrow's World Live 
Form Of Engagement Activity A broadcast e.g. TV/radio/film/podcast (other than news/press)
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact Prof Lemon gave a live demonstration of a conversational AI system and discussed the science behind it on Tomorrow's World Live -- a TV broadcast with international reach and a large audience.
Year(s) Of Engagement Activity 2018
URL https://www.bbc.co.uk/programmes/p06vvw9h
 
Description Poster at Alan Turing Institute Deep Learning workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Edinburgh University hosted the "Alan Turing Institute Deep Learning workshop" in 2015. We presented out work to practitioners and postgraduate students.
Year(s) Of Engagement Activity 2015
URL http://workshops.inf.ed.ac.uk/deep/deepATI/
 
Description Research Visit to Dialogue Systems Group, Bielefeld 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Eshghi visited the DSG research group at Bielefeld University. This included a presentation of BABBLE project ideas & results to the group, as well as one-on-one discussion sessions with individual members of the group.
Year(s) Of Engagement Activity 2016
 
Description Research visit to the Centre for Linguistic Theory and Studies in Probability, Gothenburg 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Eshghi visited the CLASP centre, a research group at Gothenburg University. This included a presentation of BABBLE project ideas & results to the group, as well as one-on-one discussion sessions with its individual members.
Year(s) Of Engagement Activity 2015
 
Description SICSA workshop on Conversational AI 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact We organised a day workshop on 12/12/2018 on Conversational AI -- see https://sites.google.com/site/workshoponconversationalai/home
There were about 60 attendees from all over the UK, including industry and media (e.g. BBC, Voysis).
Year(s) Of Engagement Activity 2018
URL https://sites.google.com/site/workshoponconversationalai/home
 
Description TED-x technology demonstration session 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Industry/Business
Results and Impact Heriot-Watt hosted the TEDx event described here: http://www.tedxhwu.com/

I did a demo session for participants, who were a mix of business, media, professionals, and students -- showcasing some of our dialogue and robot technology
Year(s) Of Engagement Activity 2015
URL http://www.tedxhwu.com/
 
Description What does a robot think about when noone is around 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Public/other audiences
Results and Impact A public outreach activity at the Science Festival in Edinburgh in March 2018.
Year(s) Of Engagement Activity 2018