CONVER-SE: Conversational Programming for Smart Environments

Lead Research Organisation: University of Sussex
Department Name: Sch of Engineering and Informatics

Abstract

Smart environments are designed to react intelligently to the needs of those who visit, live and work in them. For example, the lights can come on when it gets dark in a living room or a video exhibit can play in the correct language when a museum visitor approaches it. However, we lack intuitive ways for users without technical backgrounds to understand and reconfigure the behaviours of such environments, and there is considerable public mistrust of automated environments. Whilst there are tools that let users view and change the rules defining smart environment behaviours without having programming knowledge, they have not seen wide uptake beyond technology enthusiasts. One drawback of existing tools is that they pull attention away from the environment in question, requiring users to translate from real world objects to abstract screen-based representations of them. New programming tools that allow users to harness their understandings of and references to objects in the real world could greatly increase trust and uptake of smart environments.

This research will investigate how users understand and describe smart environment behaviours whilst in situ, and use the findings to develop more intuitive programming tools. For example, a tool could let someone simply say that they want a lamp to come on when it gets dark, and point at it to identify it. Speech interfaces are now widely used in intelligent personal assistants, but the functionality is largely limited to issuing immediate commands or setting simple reminders. In reality, there are many challenges with using speech interfaces for programming tasks, and idealised interactions such as the lamp example are not at all simple, in reality. In many cases, research used to design programming interfaces for everyday users is carried out in research labs rather than in the real home or workplace settings, and the people invited to take part in design and evaluation studies are often university students or staff, or people with an existing interest or background in technology. These interfaces often fall down once taken away from the small set of toy usage scenarios in which they have been designed and tested and given to everyday users.

This research investigates the challenges with using speech for programming, and evaluates ways to mitigate these challenges, including conversational prompts, use of gesture and proximity data to avoid ambiguity, and providing default behaviours that can be customised. In this project, we focus primarily on smart home scenarios, and we will carry out our studies in real domestic settings. Speech interfaces are increasingly being used in these scenarios, but there is no support for querying, debugging and alternating the behaviours through speech.

We will recruit participants with no programming background, including older and disabled users, who are often highlighted as people who could benefit from smart home technology, but rarely included in studies of this sort. We will carry out interviews in people's homes to understand how they naturally describe rules for smart environments, taking into account speech, gesture and location. We will look for any errors or unclear elements in the rules they describe, and investigate how far prompts from researchers can help them to be able to express the rules clearly. We will also explore how far participants can customise default behaviours presented to them. This data will be used to allow us to create a conversational interface that harnesses the approaches that worked with human prompts, and test it in real world settings. Some elements of the system will be controlled by a human researcher, but the system will simulate the experience of interacting with an intelligent conversational interface. This will allow us to identify fruitful areas to pursue in developing fully functional conversational programming tools, which may also be useful in museums, education, agriculture and robotics.

Planned Impact

This research has potential to unlock engagement with computation in smart environments for a wide range of people and organisations. The primary beneficiaries are the project partners, and everyday users and organisations that use or develop IoT technologies for smart environments. The more intuitive end-user programming approaches that will be developed as a result of this research will make it easier for non-technical users to understand and change the behaviours of smart environments, which is an important step in increasing trust and uptake of the IoT technologies that power them. The potential benefits for end-users are extensive, as more intuitive tools mean greater control and greater customisability. There is also good potential for increasing understanding of automated environments, and understanding of computation more generally. Kinaesthetic and tangible activities are often used to support learning of computer science concepts, as many people find understanding concrete manifestations of an idea an important first step when developing abstract conceptual knowledge.

The first key group of potential industry beneficiaries are those offering smart home devices and software. To increase uptake of IoT technologies there is a need to increase trust and understanding of them. The findings from this research will lead to the development of more intuitive tools that allow interrogation of default rules, live debugging and re-authoring. More intuitive tools can increase comprehension, sense of control, and lead to a resulting increase in trust.

Other key beneficiaries include the developers behind intelligent personal assistants, such as Amazon Alexa and Google Assistant. These tools support users in running predefined behaviours, but at present are limited by users not being able to edit or author new behaviours through the conversational interface. Instead, users must log in to a screen-based end-user programming tool such as IFTTT and compose a rule defining a new behaviour. Offering debugging, editing and authoring of behaviours through a conversational interface could reduce barriers to engagement, greatly simplify the implementation and testing loop, and increase trust by making rules defining behaviours more transparent.

There is potential for broader impact beyond the smart home context. Within the scope of this project we actively investigate the potential for impact in the museum sector through meetings and evaluation workshops with heritage professionals and museum technology specialists. There is increasing interest in use of embedded technologies to support visitor engagement in museums, but in many cases museums do not have staff with extensive computing backgrounds. This means that interactive exhibits are often installed and maintained by external companies, making it harder for heritage professionals with domain-expertise to be actively involved in the design of exhibits, and making them expensive and time consuming to change once installed. There is potential for multimodal conversational interfaces to support museum staff in in situ configuration of digitally augmented tours and exhibits.

In future work, there is good potential to explore applications in human-robot interaction and smart agriculture. The benefits of in situ authoring of rules and harnessing of contextual data could support users in each of these application areas. Additionally, there are potential educational technology applications. Given that the curriculum for computing now starts from age 5, it is increasingly important to find new approaches to engaging with computation in a meaningful way, which do not require the learning of a context syntax, and support the sensory-motor stage of learning, where embodied activities are important. Using the proposed new approaches addressed through this research, young children could engage with IoT concepts in an active way before they have developed writing skills.
 
Description Key findings are summarised below:

How end-users without programming experience understand and describe smart environment behaviours in their homes:
- Descriptions broadly map to trigger-action rule format, but triggers are less rigid, more complex and more ambiguous than often supported in existing platforms.
- Descriptions often include example scenarios embedded into the context of the home.
- Important exceptions often only emerge after prompting by system or researcher.
- Specific devices and sensors are often referenced explicitly, with behaviours often described as settings for a given device rather than abstract rules.
- Some behaviours map more closely to process-based models of automation, particularly complex sequences designed to mimic natural routines for security purposes.

User preferences for end-user programming in smart home contexts:
- Many users expressed preferences for information over automation, preferring notifications and queries of status over automated actions, and wanting to keep power to veto actions.
- Automation is usually only preferred when a user is unable to carry out an action themselves, for example when they are asleep or not home, or because they have a vision or mobility impairment that makes manual control difficult or impossible.
- Users often described a wish to set up automated behaviours that would control the environment for other household members, particularly children and pets.
- Some users expressed a wish to be able to set up more complex routines by demonstrating or recording interlinked sequences of behaviours.

Design implications for voice user interface support for end-user programming in the home:
- Voice interaction can be helpful for queries, understanding current rules, simple edits and live debugging.
- It is possible to author simple rules from scratch using voice interaction, but this should not be the default mode for this for sighted users as it places high cognitive demands on them.
- Highly structured stepwise authoring and editing is reasonably intuitive for users, but current natural language understanding approaches struggle to identify intents from shorter utterances.
- Programming by demonstration with voice could support authoring of more complex routines.

Key future research questions identified:
- How can voice interaction support programming by demonstration for complex routines?
- To what extent can voice interaction support transparency of end-user programming in multi-user households?
- How does end-user programming for home automation reflect and influence power dynamics in families and multi-user households?
Exploitation Route The following research questions could be usefully pursued in academic and industry research:
- How can voice interaction support programming by demonstration for complex routines?
- To what extent can voice interaction support transparency of end-user programming in multi-user households?
- How does end-user programming for home automation reflect and influence power dynamics in families and multi-user households?

Those designing voice user interface support for end-user programming in the home should consider the implications of these findings for their work:
- Voice interaction can be helpful for queries, understanding current rules, simple edits and live debugging.
- It is possible to author simple rules from scratch using voice interaction, but this should not be the default mode for this for sighted users as it places high cognitive demands on them.
- Highly structured stepwise authoring and editing is reasonably intuitive for users, but current natural language understanding approaches struggle to identify intents from shorter utterances.
- Programming by demonstration with voice could support authoring of more complex routines.
Sectors Communities and Social Services/Policy,Digital/Communication/Information Technologies (including Software),Healthcare

URL https://blogs.sussex.ac.uk/conver-se/
 
Description Findings and methods from this research have contributed to an emerging collaboration with an educational technology SME, who are looking at how natural language interactions can support debugging and understanding in the context of learning programming at secondary schools. So far this has led to meetings and workshops, and a joint funding application, with more applications planned. Delays due to Covid in the analysis and write up of the work mean that other new avenues for impact are still being explored.
First Year Of Impact 2022
Sector Digital/Communication/Information Technologies (including Software),Education
Impact Types Societal,Economic

 
Description Natural language support in an IDE - School Research Development Award
Amount £2,231 (GBP)
Organisation University of Sussex 
Sector Academic/University
Country United Kingdom
Start 07/2022 
End 07/2023
 
Title CONVER-SE Dialogue Manager 
Description Dialogue Manager (DM) is a C#/.NET application for 'Wizard of Oz' prototyping of voice user interfaces, which allows researches to generate and play audio clips to simulate interaction. 
Type Of Technology Software 
Year Produced 2020 
Open Source License? Yes  
Impact This tool was used in the first study on the CONVER-SE project. Additionally, two other research groups have requested access to this tool to date, for use in their own research on voice user interfaces (one in the School of Psychology at University of Sussex and one in the School of Computer Science at University of Nottingham). 
 
Description Interview for international magazine 
Form Of Engagement Activity A magazine, newsletter or online publication
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact I was interviewed by a reporter from the US visiting Brighton to profile the city for an international audience for 'PC Mag' magazine.
Year(s) Of Engagement Activity 2018
URL https://uk.pcmag.com/news/118874/welcome-to-the-silicon-seaside
 
Description Invited Talk - Programming Tools for Non-Programmers, Nerd Nite Brighton - 19th July 2018 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact Presented research from the CONVER-SE project to a general audience of 90 people at a monthly public event in Brighton City Centre. I gave a 20 minute talk and engaged in questons and discussions with the audience afterwards. Afterwards I was approached by an industry professional, an academic I had not previously met and a potential participant, who all expressed interest in finding out more about the project and how to get involved.
Year(s) Of Engagement Activity 2018
URL https://brighton.nerdnite.com/2018/07/05/nnb-53-chocolate-rhinos-programming/
 
Description Smart Homes event 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact 30 attendees, including representatives from industry and third sector organisations, as well as academics, undergraduate and postgraduate students, attended an event on designing smart homes with underrepresented user groups, which included a presentation on findings from the CONVER-SE project. There were a number of discussions about links with other projects and businesses and conversations are ongoing (event took place last month).
Year(s) Of Engagement Activity 2020
URL https://blogs.sussex.ac.uk/conver-se/symposium/