GLANCE: GLAnceable Nuances for Contextual Events

Lead Research Organisation: University of Bristol
Department Name: Computer Science

Abstract

This project will develop and validate exciting novel ways in which people can interact with the world via cognitive wearables -intelligent on-body computing systems that aim to understand the user, the context, and importantly, are prompt-less and useful. Specifically, we will focus on the automatic production and display of what we call glanceable guidance. Eschewing traditional and intricate 3D Augmented Reality approaches that have been difficult to show significant usefulness, glanceable guidance aims to synthesize the nuances of complex tasks in short snippets that are ideal for wearable computing systems and that interfere less with the user and that are easier to learn and use.

There are two key research challenges, the first is to be able to mine information from long, raw and unscripted wearable video taken from real user-object interactions in order to generate the glanceable supports. Another key challenge is how to automatically detect user's moments of uncertainty during which support should be provided without the user's explicit prompt.

The project aims to address the following fundamental problems:
1. Improve the detection of user's attention by robustly determining periods of time that correspond to task-relevant object interactions from a continuous stream of wearable visual and inertial sensors.
2. Provide assistance only when it is needed by building models of the user, context and task from autonomously identified micro-interactions by multiple users, focusing on models that can facilitate guidance.
3. Identify and predict action uncertainty from wearable sensing in particular gaze patterns and head motions.
4. Detect and weigh user expertise for the identification of task nuances towards the optimal creation of real-time tailored guidance.
5. Design and deliver glanceable guidance that acts in a seamless and prompt-less manner during task performance with minimal interruptions, based on autonomously built models.

GLANCE is underpinned by a rich program of experimental work and rigorous validation across a variety of interaction tasks and user groups. Populations to be tested include skilled and general population and for tasks that include: assembly, using novel equipment (e.g. an unknown coffee maker), and repair tasks (e.g. replacing a bicycle gear cable). It also tightly incorporates the development of working demonstrations.
And in collaboration with our partners the project will explore high-value impact cases related to health care towards assisted living and in industrial settings focusing on assembly and maintenance tasks.

Our team is a collaboration between Computer Science, to develop a the novel data mining and computer vision algorithms, and Behavioral Science to understand when and how users need support.

Planned Impact

Receiving on-the spot training, being able to automatically document industrial processes and being able to better understand users of advanced assistive technology has important and wide spread positive implications across many industries.

It is often ignored the fact that most things that are made, repaired or maintained involve non scripted processes that are performed by a small number of expert, or at least, skilled individuals. This is of concern in many economies and in the UK, lack of training and the ability to increase productivity compared to other economies is of great concern according to the Bank of England [4].

There are many industries within the manufacturing sector that can benefit from methods that with no interruption or extra burden on their workers, allow for the transfer of know-how. Systems that observe, learn and extract the relevant from the irrelevant will be of great help. Training is a prime example of where this process can play a significant role.

High technology sectors such as aerospace have a number of situations where word of mouth is a commonplace yet clearly brittle knowledge transfer method. Consider the case of a turbine engine manufactured by one company, sold to another and maintained by a third party. It is not uncommon that when an engine fails for the first time six years or so after its construction, the small team of about a dozen individuals that built it wont be available to repair it. It can also be the case that no one has repaired a specific issue with such an engine before and thus the first time this is needed will take a substantial amount of time, skill and exploration and thus, high cost. Passing these first and precious few instances of knowledge for training others that will require it is of great potential benefit. The benefit increases further if it simplifies the transmission of this knowledge through geographical locations and across languages and cultures. We have long term aspirations on the impact of the work we are proposed but our industry partner sees some potential benefits in a number of situations in the short to mid term horizon (5-10 years).

Being able to distill from observations of people doing things what is important and being able to present it in manners that are intuitive and unobtrusive is what our project aims to build the foundations for and this has applications beyond industry.

Many perfectly physically able and keen individuals of many ages sharing knowledge that can support one another in-situ will have huge impact on how empowered people feel. This can be hinted by the number of times people resort to trying to find videos on the web on how to do things, and in many cases only to be faced by a difficult sifting process trying to separate experts from amateurs from the irrelevant. Systems that can do this separation automatically will have great social and even cultural implications as people will feel more confident on trying and doing things themselves.

In the longer term, cognitive wearable systems could be of benefit in supporting memory or other neurological conditions. Reminding patients on how to do things in their daily routines and or for supporting their rehabilitation can have positive effects on self esteem and recovery.

Overall, devising more useful methods that understand people uncertainty better and that build models to gain insight on issues such as task relevance or expertise will have many applications anywhere anyone needs help to do or to document actions.
 
Description We have developed new algorithms for the processing and parsing of wearable camera imagery. These have been published in to Computer Vision conferences and we will move next to further impact in this area.
Exploitation Route Our datasets are widely used in the Computer Vision community investigating egocentric perception as well as these datasets have been used in Augmented Reality and Robotics.
Sectors Digital/Communication/Information Technologies (including Software),Education

 
Description The work we developed on skill determination from video is opening new directions for other researchers in Computer Vision to expand on video understanding. The papers published in this space at IEEE CVPR have been well cited. The research has also contributed to the creation of Computer Vision Datasets that are novel to the community. This work has directly resulted in the EPIC-Skill datasets, EPIC-Tent dataset and partially to the EPIC-Kitchens dataset, this latter one led by D Damen. These datasets have together been widely used in Egocentric Vision research including applications beyond Computer Vision such as in Augmented Reality systems and Robotics. The team in this project has also organized multiple workshops in the EPIC@X series at main academic Computer Vision Venues, including in CVPR2020 where we distributed funds to the Egocentric Vision community to create further datasets. The work in this project continues with at least one new PhD student pursuing further work directly derived from this research.
First Year Of Impact 2017
Sector Digital/Communication/Information Technologies (including Software)
 
Description UMPIRE: United Model for the Perception of Interactions in visuoauditory REcognition
Amount £1,001,838 (GBP)
Funding ID EP/T004991/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 02/2020 
End 01/2025
 
Title EPIC Skills dataset 2018 
Description The first dataset that aims to capture varying levels of Skill for people doing daily living tasks. 
Type Of Material Database/Collection of data 
Year Produced 2018 
Provided To Others? Yes  
Impact This will allow us and researchers worldwide to study the problem of visually determining skill. 
 
Title EPIC-Kitchens dataset 
Description The largest dataset in first-person (egocentric) vision; multi-faceted non-scripted recordings in native environments - i.e. the wearers' homes, capturing all daily activities in the kitchen over multiple days. Annotations are collected using a novel `live' audio commentary approach. 32 kitchens - 4 cities Head-mounted camera 55 hours of recording - Full HD, 60fps 11.5M frames Multi-language narrations 39,594 action segments 454,255 object bounding boxes 125 verb classes, 331 noun classes 
Type Of Material Database/Collection of data 
Year Produced 2018 
Provided To Others? Yes  
Impact This dataset is becoming the standard for egocentric research. To date (March 2020), it has been cited 143 times. 
URL https://epic-kitchens.github.io/2020
 
Title EPIC-Tent 
Description An outdoor video dataset annotated with action labels, collected from 29 participants wearing two head-mounted cameras (GoPro and SMI eye tracker) while assembling a camping tent. In total, this is over 7 hours of recordings. Tent assembly includes manual interactions with non-rigid objects such as spreading the tent, securing guylines, reading instructions, and opening a tent bag. An interesting aspect of the dataset is that it reflects participants' proficiency in completing or understanding the task. This leads to participant differences in action sequences and action durations. Our dataset also has several new types of annotations for two synchronised egocentric videos. These include task errors, self-rated uncertainty and gaze position, in addition to the task action labels. 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
URL https://data.bris.ac.uk/data/dataset/2ite3tu1u53n42hjfh3886sa86/
 
Description Collaboration with INRIA 
Organisation The National Institute for Research in Computer Science and Control (INRIA)
Country France 
Sector Public 
PI Contribution Collaboration and exchange of student with Prof Ivan Leptev Senior researcher (Directeur de Recherche) WILLOW project-team. The collaboration led to an internship of student Hazel Doughty who visited their team and resulted in follow up work for an accepted CVPR paper.
Collaborator Contribution The interaction resulted in new ideas to integrate language and video. Specifically on using adverbs.
Impact CVPR Paper: Hazel Doughty, Ivan Laptev, Walterio W. Mayol-Cuevas, Dima Damen. Action Modifiers: Learning From Adverbs in Instructional Videos. CVPR 2020: 865-875
Start Year 2020
 
Description Collaboration with Kyoto University 
Organisation University of Kyoto
Country Japan 
Sector Academic/University 
PI Contribution This collaboration with the group of Prof Yuichi Nakamura has lead to exchange visits from Kyoto to Bristol and also to joint publications. The collaboration is still ongoing.
Collaborator Contribution Provided material artefacts for data collection, provided work for papers, and overall research discussions for new directions.
Impact Longfei Chen, Yuichi Nakamura, Kazuaki Kondo, Walterio W. Mayol-Cuevas: Hotspot Modeling of Hand-Machine Interaction Experiences from a Head-Mounted RGB-D Camera. IEICE Transactions 102-D(2): 319-330 (2019) Longfei Chen, Yuichi Nakamura, Kazuaki Kondo, Dima Damen, Walterio W. Mayol-Cuevas: Hotspots Integrating of Expert and Beginner Experiences of Machine Operations through Egocentric Vision. MVA 2019: 1-6 Longfei Chen, Kazuaki Kondo, Yuichi Nakamura, Dima Damen, Walterio W. Mayol-Cuevas: Hotspots detection for machine operation in egocentric vision. MVA 2017: 223-226
Start Year 2017
 
Title Action modifiers code for CVPR 2020 paper 
Description Code and data for the CVPR 2020 paper 'Action Modifiers: Learning from Adverbs in Instructional Videos'. 
Type Of Technology Software 
Year Produced 2020 
Open Source License? Yes  
Impact Early days for this work. 
URL https://github.com/hazeld/action-modifiers
 
Title Rank aware attention network 
Description Deep computational model for the ranking of skill based on attention. This accompanies the CVPR 2019 paper: Hazel Doughty, Walterio W. Mayol-Cuevas, Dima Damen: The Pros and Cons: Rank-Aware Temporal Attention for Skill Determination in Long Videos. CVPR 2019: 7862-7871. 
Type Of Technology Software 
Year Produced 2019 
Open Source License? Yes  
Impact This software accompanies an academic paper. The impact is still to be measured. 
URL https://github.com/hazeld/rank-aware-attention-network
 
Description EPIC@CVPR2019 workshop organisation 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Organisation of this international workshop where we are leading the egocentric perception community. This event was an official workshop at CVPR 2019.
Year(s) Of Engagement Activity 2019
URL https://www.eyewear-computing.org/EPIC_CVPR19/
 
Description EPIC@CVPR20: The Sixth International Workshop on Egocentric Perception, Interaction and Computing in conjunction with CVPR 2020; 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact This is a workshop we organised on the topic of egocentric perception and at a top Computer Vision venue: CVPR2020
Year(s) Of Engagement Activity 2020
URL http://www.eyewear-computing.org/EPIC_CVPR20/
 
Description EPIC@ECCV20: The Seventh International Workshop on Egocentric Perception, Interaction and Computing in conjunction with ECCV 2020; -Workshop organisation 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact We organised another instance of this workshop on egocentric perception at a top Computer Vision venue ECCV 2020.
Year(s) Of Engagement Activity 2020
URL http://www.eyewear-computing.org/EPIC_ECCV20/
 
Description EPIC@ICCV2019 workshop organisation 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Organisation of this international workshop where we are leading the egocentric perception community. This event was an official workshop at ICCV 2019.
Year(s) Of Engagement Activity 2019
URL https://www.eyewear-computing.org/EPIC_ICCV19/
 
Description Egocentric Perception Interaction and Computing Workshop at ICCV 2017 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact We lead the organisation of this workshop that we intend to be a series over the duration of the project and beyond. This new EPIC@X series of workshops aims to bring together the various communities that are relevant to egocentric perception including Computer Vision, Multimedia, HCI and the Visual Sciences and is planned to be held on the major conferences in these fields. EPIC@ICCV will accept Full Papers for novel work, and Extended Abstracts for ongoing or already published work. Both research and application works related to Egocentric Perception, Interaction and Computing are encouraged, including those that can be demonstrated or are in the prototype stages. We co-organized this with colleagues thatw e are reaching out for expanded impact and collaboration from universities in the US, Italy and Germany.
Year(s) Of Engagement Activity 2017
URL http://www.eyewear-computing.org/EPIC_ICCV17/
 
Description Inaugural Keynote, Encuentro Nacional de Computaci√≥n (ENC) Conference 2021 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Opening Keynote at the main Annual event organized by the Mexican Society of Computer Sciences. This is the principal academic event in Mexico in Computer Science. It is attended (this year online) by students from undergrad to post doctoral as well as by principal investigators and scientific leadership in CS in Mexico.
Year(s) Of Engagement Activity 2021
URL http://computo.fismat.umich.mx/enc2021/
 
Description Keynote at 2nd Workshop on Applications of Egocentric Vision EgoApp. 25th International Conference on Pattern Recognition (ICPR 2020) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Keynote at specialist workshop
Year(s) Of Engagement Activity 2021
URL https://egoappworkshop2020.wordpress.com/
 
Description Keynote at Assistive Computer Vision and Robotics (ACVR) workshop, Aug 28th, ECCV 2020. 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Keynote at specialist workshop
Year(s) Of Engagement Activity 2020
URL https://iplab.dmi.unict.it/acvr2020/
 
Description Keynote at Mexican Academy of Computing (Academia Mexicana de Computacion), Annual event 2018 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Keynote at annual event for this asossiation of Academics in Mexico
Year(s) Of Engagement Activity 2018
 
Description Keynote at XR Day Exploring the future of XR design and technology, University of Washington. USA 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Keynote at specialist event
Year(s) Of Engagement Activity 2020
 
Description Organization of Workshop on egocentric activity EPIC @ ECCV 2018 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact This is another workshop organized on the topic of Egocentric Perception Interaction and Computing. It is the second in the series and there is another one coming up in 2019. These workshops are the only event dedicated to the topic of the grant and attract an international audience at top venues.
Year(s) Of Engagement Activity 2018
URL http://www.eyewear-computing.org/EPIC_ECCV18/