nlvis: Natural Language Interaction for Visual Data Analysis

Lead Research Organisation: City, University of London
Department Name: Computing

Abstract

The unprecedented increase in the amount, variety and value of data has been significantly transforming the way that scientific research is carried out and businesses operate. As data sources become increasingly diverse and complex, analysis approaches where the human and the computer operate in collaboration have proven to be an effective approach to derive actionable observations. This is achieved through an iterative human-computer dialogue where the knowledge and the creativity of the human meets the power of computation. In such human-in-the-loop data analysis approaches, interactive visualisation methods are core facilitators of this dialogue. However, these methods still rely on conventional, not often intuitive interaction mechanisms that can introduce unnecessary complexities into the process. There is an urgent need to rethink the ways how analysts interact with visualisations in data-intensive analysis situations. The recent advances in natural language based interaction methodologies offer promising avenues to address that.

This project aims to develop a fundamental understanding of how analysts can use natural language elements to perform visualisation empowered data analysis and use that understanding to develop a framework where natural language and visualisation based interactions operate in harmony. The project then aims to demonstrate how such a multi-modal interaction scheme can radically transform the analysts' experience with the goal of achieving significant improvements in the value and the volume of actionable observations generated.

Within the project, we will initially identify and develop a taxonomy of natural language interaction elements for describing visualisations and for carrying out a visual data analysis process. Here, we will inform our investigation with findings from data collected through crowd-based survey methodologies. We will then design a conceptual framework that facilitates an iterative data analysis process through interactions with both natural language and visualisation elements. We will make use of the data analysis and visualisation related language taxonomy from the earlier stage to define the scope and the capabilities of the interaction elements.

The project will then move on to realising its vision through a prototype where the conceptual framework will operate through the help of an established conversational interface mechanism. The prototype will involve a combination of natural language and visual interaction capabilities and will also incorporate underlying computational capacities. We will then evaluate our approaches through a series of carefully designed use-cases that encompass common visual analysis scenarios. Our success criteria will be to achieve enhanced engagement and improved productivity during the visual analysis of complex data-intensive problems.

Potential beneficiaries of the outputs of this project ranges widely from academic researchers, professional data analysts, data analysis industry, and the general public. For visualisation and visual analytics researchers, findings will benefit researchers who are working on understanding user-intent and mechanisms of sense-making in interactive visual analysis processes. For businesses that offer visualisation-empowered solutions to their customers (according to some reports, the visualisation market size is expected to reach a $2.8 Billion by 2020), the framework developed will provide the basis for new forms of products that are easier to learn and engage with. For professional data analysts, the novel interaction capability will offer a more fluid and natural experience, improving their efficiency and positively impacting the quality of observations. For the general public, natural interaction mechanisms will provide an enhanced experience when using data-intensive products that are becoming to be widely adopted.

Planned Impact

The project plans two main pathways to impact and a number of side activities to widen the impact further. The two main activities are Knowledge Transfer through Project Partnership and a public engagement workshop. We also plan to organise more informal public engagement activities, and also try and increase the local outreach of the project. The PI will use the last 6 months of the project to focus on widening the impact of the project. The following are the main pathways to impact:

KNOWLEDGE AND TECHNOLOGY TRANSFER THROUGH PROJECT PARTNERSHIP: The first and most concrete pathway to impact plan of the project is through the project partner company named Redsift. They are looking into innovative ways to inform the development of their conversational analysis solution and data analytics capabilities within their product range. Therefore the impact of the research results will be almost immediate in this pathway to impact.

DISSEMINATION and ENGAGEMENT WORKSHOP: The second pathway to impact is through an engagement and dissemination activity that aims to get together experts from a wide range of academic, industrial and governmental institutions, and facilitate a lively discussion around the topic of "The role of the user/interaction in Data Science".

ACADEMIC PUBLICATIONS: During the duration of the project, we aim to produce a Short paper and a Workshop paper for the EuroVis 2018 conference to share the initial work-in-progress results. We then aim to produce a number of scientific papers: i) one detailing the observations from WP1 and the visualisation, task, and analytical intent related vocabulary ii) one describing the design process of the conceptual framework along with the results of the evaluation study. Even after the project has ended, we plan to work on position papers and other workshop papers that will discuss the implications of the utilisation of natural language approaches within visual analysis.

PUBLIC ENGAGEMENT: We aim to engage in activities where we can present project results to the general public. We will aim to take place in digital technology related activities that are organised in popular, social spaces. A couple of candidate events are: Digital Design Week at Victoria & Albert Museum, or appropriate themed events within Science Museum activities such as Science Museum Lates.

OPEN ACCESS TO PROJECT RESULTS: We will make sure that all the projects' outputs are available as open source. Any published paper will be made available through the Open Access System of CITY. And whenever there are no limitations we will ensure the anonymity of the data and make the data available along with the research papers through services such as GitHub.
 
Description The project led to a number of scientific articles which for the explored the theoretical framework to integrate visualisation and natural language. Through an experimental study conducted as part of the project, we identified the language used by a wide range of participants in describing verbally how correlation relations are seen in visualisations. The study advances the understanding on how people perceive visual representations of data and what analytical features are most prominent. As well as providing an understanding on how visualisations work, the results from the paper informs researchers on how well visual representations communicate the intended messages. On top this, we contribute a novel methodology to help researchers to study such visual representations through the use of crowd-sourced studies and semi-automated analysis techniques.
Exploitation Route The framework for combining verbal and visual representations of data as developed in this project is now informing further work we carry out in this space and we have started work to apply these techniques for both health services related decision making and the involvement of citizen scientists in biodiversity research. In applications where information is communicated to a wider audience, such multimodal representations is likely to increase, engagement, understandability and usefulness of data artefacts.
Sectors Digital/Communication/Information Technologies (including Software)

URL https://www.gicentre.net/nlvis
 
Description The learnings from this project in explainability of computational models informed work being conducted with partners at Redsift. Interfaces and technical solutions informed by the learnings in this project have been realised within the production system of the company built for a Cyber Security context. This was achieved through a brief consultancy-type project carried out between researchers from the university and the company. The resulting product is now at a prototype stage and opportunities to productise the prototype is now explored through potential further UKRI funding. The results of the project has been shared with researchers from Elsevier's accessibility team to inform their research and development on accessible visualisations on their publication services.
First Year Of Impact 2019
Sector Digital/Communication/Information Technologies (including Software)
Impact Types Economic

 
Description City, University of London, Pump-priming Fund: Investigating Interactive Visualisation Techniques for AI Explainability
Amount £5,000 (GBP)
Organisation City, University of London 
Sector Academic/University
Country United Kingdom
Start 01/2019 
End 12/2019
 
Description DECIDE - Delivering Enhanced Biodiversity Information with Adaptive Citizen Science and Intelligent Digital Engagements
Amount £96,631 (GBP)
Funding ID NE/V003143/1 
Organisation Natural Environment Research Council 
Sector Public
Country United Kingdom
Start 09/2020 
End 09/2022
 
Title Empirical Data on Verbal Descriptions of Scatterplots of Varying Levels of Correlation 
Description This research dataset is the first empirical data set to establish the relations between scatterplot visualisations of varying levels of correlation and their verbal descriptions. The resource also contains an analysis workflow that involves natural language processing and semi-manual thematic analysis. 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
Impact Given the recency of the release of this data, there are not yet notable benefits. 
URL https://github.com/nlvis/wec
 
Description Partnership with Redsift 
Organisation Redsift
Country United Kingdom 
Sector Private 
PI Contribution We have and still are collaborating on a prototype that would embed an conversation-based agent software that can work in relation to the technical infrastructure they have at Redsift. The work is still under development though.
Collaborator Contribution The partnership so far involved Redsift staff giving directions and input to the development side of the work. Redsift Limited already showed interest through a support letter that was part of the initial application.
Impact We are not reporting any concrete outcome from this collaboration yet. The software code that we are listing under the software category relates to the collaboration and built through incorporating the guidance from Redsift staff. However, further work is needed to generate impact through this.
Start Year 2018
 
Title Conversation-based analysis supporting agent system for Jupyter Notebook 
Description This repository contains the code for a software system that encompasses the code and instructions for incorporating an conversational-agent based solution to support interactive data analysis processes through a combination of interactive visualisation and natural language interaction. The system builds on a number of Python libraries and is working embedded within the Jupyter Computing environment. 
Type Of Technology Software 
Year Produced 2019 
Open Source License? Yes  
Impact This is a first version of this software. The software underpins the technical infrastructure to support ongoing academic investigations within nlVis. The system will be further developed to make multimodal-interactive processes possible for a wide pool of users within the Python computing environment. Further evidence will be reported if/once the software is being adopted by analysts/developers. 
URL https://gitlab.city.ac.uk/nlvis
 
Description Honoured Speaker at the 11th Peking University Visualization Summer School, July 2019 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Invited speaker and a lecturer within the 11th Peking University Visualization Summer School delivered to a select audience of students across the country. As well as industrial speakers. Several interesting links built with the partners and students in Beijing, China which is likely to continue further with joint papers and projects.
Year(s) Of Engagement Activity 2019
 
Description Human-Data-Algorithm Interactions in Data Science 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Invited talk at the Visual Computing Forum organised by the University of Bergen. This event was organised by Center for Data Science and the Visualisation Group at the university and was attended by over 30 researchers with an interdisciplinary mix. There were researchers in medicine and biology who are interested in applying a number of the techniques in their own research.
Year(s) Of Engagement Activity 2021
URL https://vis.uib.no/events/vcf-cagatay-turkay/
 
Description Invited talk at IEEE VIS 2021 conference 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact The results of the project were presented as an invited talk at the IEEE VIS 2021 Conference that took place online. This is the premier conference in visualisation and attended by the visualisation research community, as well as a wide range of practitioners. Several researchers and practitioners, including Elsevier's accessibility team , and researchers from Worcester Polytechnic Institute have reached out to discuss the results further.
Year(s) Of Engagement Activity 2021
URL http://ieeevis.org/year/2021/welcome
 
Description The Inquisitive Data Scientist: Facilitating Well-Informed Data Science through Visual Analytics 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Professional Practitioners
Results and Impact The talk introduced the general concepts of using visualisation and interaction within the context of data analysis and held at the VRVIS research centre in Vienna, Austria. The talk reported some of the initial outcomes from the nlVIS project to the practitioners and technology developers who build solutions to data modellers and analysers in various sectors. The talk sparked interest from researchers and developers from VRVIS in relation to the visual analysis solutions they are building and offering to their clients and they would like to investigate the role for natural language interaction further in their solutions.
Year(s) Of Engagement Activity 2018
 
Description The inner lives of visualisations: Studying how visualisations of statistical concepts are perceived and understood 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact This is a talk delivered at Royal Statistical Society's RSS2021 conference. This talk presented research outcomes from the nlVis project to an audience over 40 participants from various backgrounds -- a mix of statistics practitioners and researchers. Along with other research outcomes, the talk highlighted studies conducted within the nlVis project and drew attention to the role of verbal descriptions in understanding how visual depictions of data operate. The talk raised interest and further discussions with a number of practitioners including researchers from Microsoft Research.
Year(s) Of Engagement Activity 2021
URL https://rss.org.uk/training-events/events/events-2021/conferences/rss-2021-international-conference/