Nonparametric Learning for Situated Data-to-Text Generation: Helping People to Understand Uncertain Data

Lead Research Organisation: Heriot-Watt University
Department Name: S of Mathematical and Computer Sciences

Abstract

Information overload is a pervasive problem in many environments, particularly those in which human decision making is based on extensive data sets. Data-to-text systems have been shown to successfully address this problem by automatically generating textual descriptions of the underlying data. However, when translating (numerical) data into words, an appropriate level of precision needs to be chosen. The following example is from a system which summarises medical time series data for neonatal care: "At 17:24 T1 is 35.7 and T2 is 34.5C" (Gatt et al., 2009). This summary is clearly targeted to experts, such as doctors or nurses, which need precise information for decision making. However, other users, such as visiting parents might be more happy with a description such as "In the evening your baby had normal temperature."

In this project, we will build a data-to-text system that automatically determines the appropriate level of precision for a given context by using statistical machine learning methods. These methods can learn an optimal generation policy from real data and promise to be more robust to new situations than hand-written rules by human experts.
We will also investigate novel feedback-based non-parametric state estimation methods to reduce the data annotation cost for data-to-text systems. Typically, the first step in creating such systems is to manually interpret and align the raw data sources. However, this step is very costly as human experts need to trained for this task. Our new methods promise for data-to-text systems to be rapidly applied to new domains.

The domain we will be targeting for this initial project is pedestrian navigation, where the task is to translate uncertain user positions into walking instructions. The underlying data uncertainty here arises from several sources, such as the user's speech signal, the GPS location, estimated viewshed, walking direction and speed. We will integrate and test our learnt data-to-text generation strategy by integrating it in an existing system and running an evaluation with real users.

One of the outcomes of this project is a data-driven linguistic view on the question of "how to communicate uncertainty", which is an active interdisciplinary research area, including researchers from medicine, law, environmental modelling and climate change.
In future work we will also investigate how the proposed framework transfers to new domains, such as natural language generation from medical data, weather forecasts, or output from complex environmental models.

Planned Impact

The overall aim of this research is to provide better interfaces for people to understand "big" data more intuitively. As such, the outcomes of this research have three main impact beneficiaries: (1) academic research investigating how to (automatically) communicate data, (2) informing policy makers how to communicate their findings, and (3) the general public needing to make decisions based on (uncertain) data.

(1) Within the academic community, this proposal aims to bridge the gap between two disciplines, which are both concerned with decision support: data-to-text Natural Language Generation (NLG) systems, and interdisciplinary research working on communicating uncertainty. While other disciplines, such as medicine, environmental modelling, climate change or weather forecasts strongly promote the need for communicating data uncertainty, data-to-text systems still assume that their underlying data is precise and correct. If automatic data-to-text systems are to be widely used within decision support, they must have mechanisms to communicate uncertain data in an effective way. This research will contribute a principled study and data-driven framework for generating descriptions of underlying data uncertainty.

(2) The developed models will not only be beneficial to academics from other disciplines, but also to policy makers, such as the International Panel of Climate Change (IPCC) for example. Currently, the IPCC prescribes a standardised mapping of data uncertainty into words, which is widely recognised and applied beyond climate change research. However, the guidelines by the IPCC are not grounded in linguistic research and have been reported to be problematic in their use. In future, these guidelines could be informed by the outcomes of this research.

(3) Finally, the long-term beneficiary of this research is the general public, who in their daily life have to make decisions based on vast amounts of unstructured information becoming more readily available. For example, the British government recently announced in a white paper that it will be greatly expanding the amount of data which it shares with the rest of us (http://www.guardian.co.uk/politics/2012/jun/27/public-services-data-published-price). However, most people lack the skills and tools to access and interpret large data sets. The overall aim of this research is to provide better direct access to data through user-friendly interfaces, which help people to understand data more intuitively and support decision making.

For a description of how these impact goals will be realised and how their success will be measured, please see "Part III: Pathways to Impact" of this proposal.
 
Description This project has provided new insights on how to communicate risk and uncertainty for decision support. We present a comparison of different information presentations for uncertain data and, for the first time, measure their effects on human decision-making. We show that the use of Natural Language Generation (NLG) improves decision-making under uncertainty, compared to state-of-the-art graphical-based representation methods. In a task-based study with 442 adults, we found that presentations using NLG lead to 24% better decision-making on average than the graphical presentations, and to 44% better decision-making when NLG is combined with graphics. We also show that women achieve significantly better results when presented with NLG output (an 87% increase on average compared to graphical presentations).
Exploitation Route Our results provide further insights to the question on "how to communicate uncertainty". This is an active research question in areas such as environmental research, medicine, climate change, or weather forecasting. As such, our findings will help related disciplines to develop better interfaces and give advice to practitioners. For example, our findings have sparked interest from insurance industry, as well as for improving the presentation of search results from search engines, such as Google.
Sectors Digital/Communication/Information Technologies (including Software),Education,Environment,Financial Services, and Management Consultancy,Healthcare

URL https://understandinguncertainty.org/women-listen-and-men-look-how-best-communicate-risk-support-decision-making
 
Description Our main results were published at ACL 2016, the premier conference in the field, as well as in the IEEE Computational Intelligence Magazine (Impact Factor 6.3). Since then, the following impacts were created: 1) We were invited to contribute to a popular blog on Communicating Uncertainty by Prof David Spiegelhalter (Cambridge). 2) A private insurance company has interest in extending the WeatherGame to measure personal risk taking behaviour. 3) The released data set is used by other research institutions, including the University of Aberdeen and Tilburg University. 4) Our results have influenced research in related fields, e.g. presenting online search results [Voskarides et al., 2016]. 5) We have presented our work to practitioners Health Informatics Scotland. 6) We have consulted the MetOffice based on our findings. 7) Our team has been awarded 3rd place in the Amazon Alexa Challenge 2017. 8) The papers published as part of this research were cited over 50 times.
First Year Of Impact 2014
Sector Education,Environment,Financial Services, and Management Consultancy,Healthcare
Impact Types Societal,Economic

 
Description New MSc Programme in Speech and Multimodal Interaction
Geographic Reach National 
Policy Influence Type Influenced training of practitioners or researchers
Impact Verena Rieser created a new postgraduate MSc programme at Heriot-Watt, which aims to educate highly employable experts in creating conversational multimodal interfaces. The programme recently received 6 fully funded studentships by the DataLab/ Scottish funding council.
URL http://www.macs.hw.ac.uk/cs/pgcourses/aiws.htm
 
Description DataLab MSc scholarships
Amount £36,000 (GBP)
Organisation Government of Scotland 
Department Scottish Funding Council
Sector Public
Country United Kingdom
Start 09/2017 
End 08/2018
 
Description DataLab knowledge exchange UK Industry
Amount £114,000 (GBP)
Organisation Government of Scotland 
Department Scottish Funding Council
Sector Public
Country United Kingdom
Start 12/2016 
End 12/2017
 
Description EPSRC Impact Acceleration
Amount £45,000 (GBP)
Organisation Heriot-Watt University 
Sector Academic/University
Country United Kingdom
Start 11/2017 
End 10/2018
 
Description EPSRC Standard Grant
Amount £520,000 (GBP)
Funding ID EP/N017536/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Academic/University
Country United Kingdom
Start 06/2016 
End 05/2019
 
Description EPSRC Standard Grant
Amount £454,000 (GBP)
Funding ID EP/M005429/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Academic/University
Country United Kingdom
Start 03/2015 
End 02/2018
 
Description James Watt PhD Scholarship
Amount £40,000 (GBP)
Organisation Heriot-Watt University 
Sector Academic/University
Country United Kingdom
Start 08/2016 
End 07/2019
 
Description SICSA Conference and workshop organisation
Amount £700 (GBP)
Organisation SICSA Scottish Informatics and Computer Science Alliance 
Sector Academic/University
Country United Kingdom
Start 03/2015 
End 03/2015
 
Description SICSA Postdoctoral and Early Career Researcher Exchanges (PECE)
Amount £2,028 (GBP)
Organisation SICSA Scottish Informatics and Computer Science Alliance 
Sector Academic/University
Country United Kingdom
Start  
 
Title Game-based online data collection: Educational Gaming 
Description Running laboratory based experiments is costly. We have used and further developed a method for collecting data online, using a game-based setup. In contrast to conventional crowd-sourcing, participants are not paid crowd-workers, but the general public. The incentive for the participants is that the game is fun to play and that the game is also educational. We have tested this method in two different set-ups: The WeatherGame and the for creating the REAL corpus. For example, in the WeatherGame the participants improve their understanding of risk and uncertainty. The REAL corpus, consists of human generated and evaluated object descriptions in spatial real-world images. Participants were able to test their ability to uniquely identify and describe complex scenes. 
Type Of Material Improvements to research infrastructure 
Year Produced 2015 
Provided To Others? Yes  
Impact We have gathered two large corpora using only a fraction of time and effort of a conventional lab-based experiment. We have consulted the University of Reading, Department of Meteorology, who are planning to use a similar setup for their studies. 
URL http://www.macs.hw.ac.uk/InteractionLab/weathergame/
 
Title REAL corpus 
Description The REAL (Referring Expressions Anchored Language) corpus contains a collection of images of real-world urban scenes together with verbal descriptions of target objects generated by humans, paired with data on how successful other people were able to identify the same object based on these descriptions. In total, the corpus contains 32 images with on average 27 descriptions per image and 3 verifications for each description. The data has been provided by our collaborators (Universities of Edinburgh and Stirling). Within the project, we completed the corpus by annotating a variety of linguistically motivated features and also released the data via the ELRA repository. 
Type Of Material Database/Collection of data 
Year Produced 2016 
Provided To Others? Yes  
Impact The data will be released in May 2016 as part of our LREC submission (Bartie et al., 2016). 
URL http://www.timemirror.com/lrec2016.html
 
Title WeatherGame corpus 
Description We collected data in order to study the effect of uncertain data on decision making. We therefore designed the Extended Weather Game, which is an extension of the MetOffice's Weather Game (Stephens et al., 2011). In this one-player game, the player has to choose where to send an ice-cream seller in order to maximise sales, given weather uncertain forecasts for four weeks and two locations. We recruited 442 unique players (197 females, 241 males, 4 non-disclosed) using social media. We collected 450 unique game instances (just a few people played the game twice). 
Type Of Material Database/Collection of data 
Provided To Others? No  
Impact 2 conference papers published, 1 conference paper in submission, 1 journal paper in prep. 
URL http://www.macs.hw.ac.uk/InteractionLab/weathergame/
 
Description Aalto University, Helsinki, Finland 
Organisation Aalto University
Country Finland 
Sector Academic/University 
PI Contribution We collaborated with the Aalto University on data-2-text technologies for runners. In particular, we crowd-sourced data from runner to train machine learning models in order to describe the suitability of a running track. The models are to be used on a wearable device.
Collaborator Contribution Aalto University collected the data from runners and we developed the models.
Impact David McGookin, Dimitra Gkatzia and Helen Hastie. Supporting Exploratory Navigation for Runners Through Geographic Area Classification with Crowd-Sourced Data. In Proc. of the 17th International Conference on Human-Computer Interaction with Mobile Devices and Services (MobileHCI). Copenhagen, Denmark, 2015 (2015 acceptance rate: 25.2%).
Start Year 2014
 
Description Amazon Alexa Challenge 2017, 2018 
Organisation Amazon.com
Country United States 
Sector Private 
PI Contribution My team was selected to participate in the Amazon Alexa Challenge in two consecutive years: 2017 and 2018. The aim of this challenge is to build a social chat bot that can converse coherently and engagingly with humans on popular topics for 20 minutes. For the 2017 round, we were one of 12 teams selected out of a pool of over 100 applicants. For the 2018 round, we were 1 in eight teams selected out of ca. 200 applicants.
Collaborator Contribution We received a generous gift of $100,000 (2017) and $250,000 (2018) and various in-kind contributions worth ca. $100k for both years, e.g. free training and access to Amazon Web services, Alexa-enabled devices, weekly class with one of Amazon senior researchers, invited research visits to Amazon HQ in Seattle (including sponsored travel for the team) etc. We won 3rd prize for the 2017 challenge, which included a $50,000 cash prize for the students.
Impact Increased recognition and visibility of my research group and department.
Start Year 2016
 
Description EmoTech North Industry Knowledge Exchange 
Organisation EmoTech Ltd
Country United Kingdom 
Sector Private 
PI Contribution we collaborate on designing and implementing a conversational interface for Olly the Robot - a product developed by Emotech Ltd, an in-home robot with conversational capabilities. The Olly robot recently won 4 awards for Innovation at the CES showcase. (The CES Innovation Awards is an annual competition honoring outstanding design and engineering in consumer technology products over the world.) Recently showcased at CES '17 http://www.bbc.com/news/technology-38504512 The project outcome will directly contribute the Olly product of Emotech. Emotech will release 1000-1500 units in June/July via a Kickstarter program to gauge early adopter feedback. Full commercial release is expected in Q3/4 2017 at a retail price of $600-800 per unit. The revenue of Emotech LTD in 2017 is estimated to be £2m, and is expected to grow to £20-40m in 2018. Emotech North Ltd will be a NLP(Natural Language Processing) hub for Emotech. Its growth will create more employment positions, more collaborations with other industry partners and universities in Scotland.
Collaborator Contribution Cash contribution of £58k to support RA. Invited research visit to London (1 week) fully supported.
Impact Robotics hardware, neuroscience, human-computer interaction
Start Year 2016
 
Description MetOffice 
Organisation Meteorological Office UK
Country United Kingdom 
Sector Public 
PI Contribution I initiated a collaboration with the MetOffice in order to extend their successful WeatherGame. In this game, the participants have to help an ice cream seller to locate his van in order to maximise the chances of sunshine. In the original game, these weather-related probabilities were presented as graphics. In our version, risk and uncertainty is also verbally described. We were able to show that textual output is as least as effective as graphics only. The best results were obtained using multimodal (text + graphics) output, confirming previous research. In addition, we were also able to show that text is particularly helpful for female participants.
Collaborator Contribution The MetOffice provided us with a software license for the WeatherGame.
Impact This collaboration involved experts from Meteorology, Numerical Modeling, Computer Science, and Linguistics. The created outcomes are: (1) Multimodal corpus (to be released) with data from 442 participants. (2) 2 publications (2 in prep.) (3) updated WeatherGame software (4) We have consulted the University of Reading, Department of Meteorology, who are planning to use a similar setup for their studies.
Start Year 2014
 
Description Spatial Reference with GeoScience 
Organisation University of Edinburgh
Department School of Geosciences Edinburgh
Country United Kingdom 
Sector Academic/University 
PI Contribution We have collaborated with the Universities of Stirling and Edinburgh to collect a corpus of spatial referring expressions. That is, how humans refer to objects in visual scenes, where there is a lot of uncertainty. Our main contribution was to annotate the corpus with linguistic features and analyse the data.
Collaborator Contribution The Universities of Stirling and Edinburgh collected the data and designed the experimental setup for the data collection.
Impact Disciplines involved: GeoScience, Computer Science, Linguistics. Outputs: 1 new data set/corpus; 2 publications at high ranking international conferences (EMNLP, LREC).
Start Year 2014
 
Description Spatial Reference with GeoScience 
Organisation University of Stirling
Country United Kingdom 
Sector Academic/University 
PI Contribution We have collaborated with the Universities of Stirling and Edinburgh to collect a corpus of spatial referring expressions. That is, how humans refer to objects in visual scenes, where there is a lot of uncertainty. Our main contribution was to annotate the corpus with linguistic features and analyse the data.
Collaborator Contribution The Universities of Stirling and Edinburgh collected the data and designed the experimental setup for the data collection.
Impact Disciplines involved: GeoScience, Computer Science, Linguistics. Outputs: 1 new data set/corpus; 2 publications at high ranking international conferences (EMNLP, LREC).
Start Year 2014
 
Description 1st Workshop on Data-to-text Generation 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact The 1st Workshop on data-to-text covers a broad spectrum of areas aimed at: generating textual descriptions from data, decision support systems to facilitate data access using natural language; information presentation from data, summarisation from data etc. It also aims to bridge the gap between Natural Language Generation and Data Science.
We received 25 submissions, 6 of which were presented as talks and 19 as posters.
One of the outcomes of this event is that this workshop will now be an annual event, following a similar informal format, as unanimously decided by the attendees.
Year(s) Of Engagement Activity 2015
URL http://www.macs.hw.ac.uk/InteractionLab/d2t/
 
Description Diversity and inclusion in academic ICT research 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Study participants or study members
Results and Impact I am taking part in the focus group Diversity and inclusion in academic ICT research run by the EPSRC and organised by Edinburgh Napier University.
Year(s) Of Engagement Activity 2017
URL https://www.epsrc.ac.uk/newsevents/news/ictdiversityinclusionresearch/
 
Description EXPLORATHON Afternoon at Edinburgh Zoo 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Schools
Results and Impact EXPLORATHON 2015 is Scotland's European Researchers' Night and took place on 25 September 2015. The project joined other researchers from Heriot-Watt University to engage with the general public (mainly school children) at Edinburgh Zoo for an afternoon. Our aim was to help children experience the impact of uncertainty on decision making by playing our weather game. A total of 79 participants interacted with the weather game, from which 55 were children between 8 and 14 years old.
A positive side effect of this activity was, that we gathered data on how children act under uncertainty and also tested their numercy skills. To our knowledge, this is the first instance of such a study being conducted. The data will be published as part of a forthcoming publication (in preparation).
Year(s) Of Engagement Activity 2015
URL http://www.explorathon.co.uk/edinburgh/zoo
 
Description Interview for international news (WDR) 
Form Of Engagement Activity A broadcast e.g. TV/radio/film/podcast (other than news/press)
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Media (as a channel to the public)
Results and Impact Interview for German national radio - almost whole feature around our group and our research.
Year(s) Of Engagement Activity 2018
URL https://www1.wdr.de/mediathek/audio/wdr3/wdr3-kulturfeature/audio-sprich-mit-mir---versuche-mit-masc...
 
Description Interview for national news (Telegraph) 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Media (as a channel to the public)
Results and Impact Interview for the Telegraph about Women in AI
Year(s) Of Engagement Activity 2019
URL https://www.telegraph.co.uk/technology/2019/03/08/artificial-intelligence-has-gender-problem-meet-pi...
 
Description Invited blog post on Understanding Uncertainty 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact I was invited to write an article for the blog by Prof Spiegelhalter (Winton Professor for the Public Understanding of Risk at Cambridge University) on ``Understanding Uncertainty" summarising my research on multimodal information presentation to communicate risk for decision support.
Year(s) Of Engagement Activity 2016
URL https://understandinguncertainty.org/women-listen-and-men-look-how-best-communicate-risk-support-dec...
 
Description Invited industry talk at Thomson Reuters 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact Verena Rieser was invited to present her research to Thomson Reuters via an online seminar. This seminar will be broadcasted to all research employees of Thomson Reuters worldwide.
Year(s) Of Engagement Activity 2017
 
Description NESTA interview - 12 women shaping AI 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Media (as a channel to the public)
Results and Impact Media interview and article published by NESTA (global innovation foundation)
Year(s) Of Engagement Activity 2019
URL https://www.nesta.org.uk/feature/12-women-ai/
 
Description Native Scientist German School Outreach 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Schools
Results and Impact Verena Rieser engaged school children in her research. The half-day event was organised by Alleman Fun (German Saturday School) and Native Scientist. The engagement activity was held in German.
Year(s) Of Engagement Activity 2016
URL http://www.macs.hw.ac.uk/RoboticsLab/news/german-native-scientist-volunteers-reaching-out-to-childre...
 
Description Online WeatherGame 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact The WeatherGame was played by 442 online participants. It is designed to demonstrate how uncertainty can affect decision making and possible outcomes. People playing the game reported an increased awareness of risk and uncertainty associated to decisions.

The WeatherGame was widely advertised via the MetOffice blog and various Twitter accounts, including Prof. Spiegelhalter's (Winton Professor for the Public Understanding of Risk, Cambridge University).

We were contacted and asked for advice by the University of Reading, Department of Meteorology, who are planning to use a similar setup for their studies.
Year(s) Of Engagement Activity 2015
URL http://blog.metoffice.gov.uk/2015/10/15/heriot-watt-university-revives-weather-game/
 
Description Organisation and programme chair for 9th International Conference on Natural Language Generation 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact Programme chair, local host and organisation for 9th International Conference on Natural Language Generation.
Year(s) Of Engagement Activity 2016
URL http://www.macs.hw.ac.uk/InteractionLab/INLG2016/#
 
Description Plenary keynote at 1st workshop on NLP for Conversational AI (ACL2019, Florence) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Invited plenary keynote at 1st workshop on NLP for Conversational AI (ACL2019, Florence)
Year(s) Of Engagement Activity 2019
URL https://sites.google.com/view/nlp4convai/program?authuser=0
 
Description Plenary keynote at 2nd workshop on Vocal Interactivity in-and-between Humans, Animals and Robots (VIHAR-2019) (Turing Institute, London) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Invited plenary keynote at 2nd workshop on Vocal Interactivity in-and-between Humans, Animals and Robots (VIHAR-2019) (Turing Institute, London)
Year(s) Of Engagement Activity 2019
URL http://vihar-2019.vihar.org/keynotes/
 
Description Plenary keynote at IVA 2019 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Invited plenary keynote at 19th International Conference on Intelligent Virtual Agents (IVA 2019, Paris)
Year(s) Of Engagement Activity 2019
URL https://iva2019.sciencesconf.org/
 
Description Women@CS 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Undergraduate students
Results and Impact Verena Rieser organises a local support group for female students studying Computer Science, inspired by the "Sisters Clubs" in American universities. The goal is to attract and retain female UG students to study CS.
Year(s) Of Engagement Activity 2016