GLANCE: GLAnceable Nuances for Contextual Events

Lead Research Organisation: University of Bristol

Department Name: Computer Science

Abstract

This project will develop and validate exciting novel ways in which people can interact with the world via cognitive wearables -intelligent on-body computing systems that aim to understand the user, the context, and importantly, are prompt-less and useful. Specifically, we will focus on the automatic production and display of what we call glanceable guidance. Eschewing traditional and intricate 3D Augmented Reality approaches that have been difficult to show significant usefulness, glanceable guidance aims to synthesize the nuances of complex tasks in short snippets that are ideal for wearable computing systems and that interfere less with the user and that are easier to learn and use.

There are two key research challenges, the first is to be able to mine information from long, raw and unscripted wearable video taken from real user-object interactions in order to generate the glanceable supports. Another key challenge is how to automatically detect user's moments of uncertainty during which support should be provided without the user's explicit prompt.

The project aims to address the following fundamental problems:
1. Improve the detection of user's attention by robustly determining periods of time that correspond to task-relevant object interactions from a continuous stream of wearable visual and inertial sensors.
2. Provide assistance only when it is needed by building models of the user, context and task from autonomously identified micro-interactions by multiple users, focusing on models that can facilitate guidance.
3. Identify and predict action uncertainty from wearable sensing in particular gaze patterns and head motions.
4. Detect and weigh user expertise for the identification of task nuances towards the optimal creation of real-time tailored guidance.
5. Design and deliver glanceable guidance that acts in a seamless and prompt-less manner during task performance with minimal interruptions, based on autonomously built models.

GLANCE is underpinned by a rich program of experimental work and rigorous validation across a variety of interaction tasks and user groups. Populations to be tested include skilled and general population and for tasks that include: assembly, using novel equipment (e.g. an unknown coffee maker), and repair tasks (e.g. replacing a bicycle gear cable). It also tightly incorporates the development of working demonstrations.
And in collaboration with our partners the project will explore high-value impact cases related to health care towards assisted living and in industrial settings focusing on assembly and maintenance tasks.

Our team is a collaboration between Computer Science, to develop a the novel data mining and computer vision algorithms, and Behavioral Science to understand when and how users need support.

Planned Impact

Receiving on-the spot training, being able to automatically document industrial processes and being able to better understand users of advanced assistive technology has important and wide spread positive implications across many industries.

It is often ignored the fact that most things that are made, repaired or maintained involve non scripted processes that are performed by a small number of expert, or at least, skilled individuals. This is of concern in many economies and in the UK, lack of training and the ability to increase productivity compared to other economies is of great concern according to the Bank of England [4].

There are many industries within the manufacturing sector that can benefit from methods that with no interruption or extra burden on their workers, allow for the transfer of know-how. Systems that observe, learn and extract the relevant from the irrelevant will be of great help. Training is a prime example of where this process can play a significant role.

High technology sectors such as aerospace have a number of situations where word of mouth is a commonplace yet clearly brittle knowledge transfer method. Consider the case of a turbine engine manufactured by one company, sold to another and maintained by a third party. It is not uncommon that when an engine fails for the first time six years or so after its construction, the small team of about a dozen individuals that built it wont be available to repair it. It can also be the case that no one has repaired a specific issue with such an engine before and thus the first time this is needed will take a substantial amount of time, skill and exploration and thus, high cost. Passing these first and precious few instances of knowledge for training others that will require it is of great potential benefit. The benefit increases further if it simplifies the transmission of this knowledge through geographical locations and across languages and cultures. We have long term aspirations on the impact of the work we are proposed but our industry partner sees some potential benefits in a number of situations in the short to mid term horizon (5-10 years).

Being able to distill from observations of people doing things what is important and being able to present it in manners that are intuitive and unobtrusive is what our project aims to build the foundations for and this has applications beyond industry.

Many perfectly physically able and keen individuals of many ages sharing knowledge that can support one another in-situ will have huge impact on how empowered people feel. This can be hinted by the number of times people resort to trying to find videos on the web on how to do things, and in many cases only to be faced by a difficult sifting process trying to separate experts from amateurs from the irrelevant. Systems that can do this separation automatically will have great social and even cultural implications as people will feel more confident on trying and doing things themselves.

In the longer term, cognitive wearable systems could be of benefit in supporting memory or other neurological conditions. Reminding patients on how to do things in their daily routines and or for supporting their rehabilitation can have positive effects on self esteem and recovery.

Overall, devising more useful methods that understand people uncertainty better and that build models to gain insight on issues such as task relevance or expertise will have many applications anywhere anyone needs help to do or to document actions.

Funded Value:

£806,993

Funded Period:

Apr 16 - Feb 22

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/N013964/1

Principal Investigator:

Walterio Mayol-Cuevas

Research Subject:

Info. & commun. Technol. (100%)

Research Topic:

Computer Graphics & Visual. (20%)

Human-Computer Interactions (20%)

Image & Vision Computing (30%)

Vision & Senses - ICT appl. (30%)

Organisations

People	ORCID iD
Walterio Mayol-Cuevas (Principal Investigator)
Iain Gilchrist (Co-Investigator)
Casimir Ludwig (Co-Investigator)
Dima Damen (Co-Investigator)

Publications

Author Name

Title Publication Date Published

|< < 1 2 3 > >|

10 25 50

Brian Sullivan (2017) Abstract: Predicting Eye and Head Coordination While Looking and Pointing

CHEN L (2019) Hotspot Modeling of Hand-Machine Interaction Experiences from a Head-Mounted RGB-D Camera in IEICE Transactions on Information and Systems

CHEN L (2021) Integration of Experts' and Beginners' Machine Operation Experiences to Obtain a Detailed Task Model in IEICE Transactions on Information and Systems

Chen L (2017) Hotspots detection for machine operation in egocentric vision

Chen L (2019) Hotspots Integrating of Expert and Beginner Experiences of Machine Operations through Egocentric Vision

Damen D (2021) The EPIC-KITCHENS Dataset: Collection, Challenges and Baselines. in IEEE transactions on pattern analysis and machine intelligence

Damen D (2018) Computer Vision - ECCV 2018 - 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part IV

Doughty H (2019) The Pros and Cons: Rank-Aware Temporal Attention for Skill Determination in Long Videos

Doughty H (2020) Action Modifiers: Learning From Adverbs in Instructional Videos

Doughty H (2018) Who's Better? Who's Best? Pairwise Deep Ranking for Skill Determination

Key Findings
Impact Summary
Further Funding
Research Databases and Models
Collaboration
Software and Technical Products
Engagement Activities


Description	We have developed new algorithms for the processing and parsing of wearable camera imagery. These have been published in to Computer Vision conferences and we will move next to further impact in this area.
Exploitation Route	Our datasets are widely used in the Computer Vision community investigating egocentric perception as well as these datasets have been used in Augmented Reality and Robotics.
Sectors	Digital/Communication/Information Technologies (including Software),Education


Description	The work we developed on skill determination from video is opening new directions for other researchers in Computer Vision to expand on video understanding. The papers published in this space at IEEE CVPR have been well cited. The research has also contributed to the creation of Computer Vision Datasets that are novel to the community. This work has directly resulted in the EPIC-Skill datasets, EPIC-Tent dataset and partially to the EPIC-Kitchens dataset, this latter one led by D Damen. These datasets have together been widely used in Egocentric Vision research including applications beyond Computer Vision such as in Augmented Reality systems and Robotics. The team in this project has also organized multiple workshops in the EPIC@X series at main academic Computer Vision Venues, including in CVPR2020 where we distributed funds to the Egocentric Vision community to create further datasets. The work in this project continues with at least one new PhD student pursuing further work directly derived from this research.
First Year Of Impact	2017
Sector	Digital/Communication/Information Technologies (including Software)


Description	UMPIRE: United Model for the Perception of Interactions in visuoauditory REcognition
Amount	£1,001,838 (GBP)
Funding ID	EP/T004991/1
Organisation	Engineering and Physical Sciences Research Council (EPSRC)
Sector	Public
Country	United Kingdom
Start	02/2020
End	01/2025


Title	EPIC Skills dataset 2018
Description	The first dataset that aims to capture varying levels of Skill for people doing daily living tasks.
Type Of Material	Database/Collection of data
Year Produced	2018
Provided To Others?	Yes
Impact	This will allow us and researchers worldwide to study the problem of visually determining skill.


Title	EPIC-Kitchens dataset
Description	The largest dataset in first-person (egocentric) vision; multi-faceted non-scripted recordings in native environments - i.e. the wearers' homes, capturing all daily activities in the kitchen over multiple days. Annotations are collected using a novel `live' audio commentary approach. 32 kitchens - 4 cities Head-mounted camera 55 hours of recording - Full HD, 60fps 11.5M frames Multi-language narrations 39,594 action segments 454,255 object bounding boxes 125 verb classes, 331 noun classes
Type Of Material	Database/Collection of data
Year Produced	2018
Provided To Others?	Yes
Impact	This dataset is becoming the standard for egocentric research. To date (March 2020), it has been cited 143 times.
URL	https://epic-kitchens.github.io/2020


Title	EPIC-Tent
Description	An outdoor video dataset annotated with action labels, collected from 29 participants wearing two head-mounted cameras (GoPro and SMI eye tracker) while assembling a camping tent. In total, this is over 7 hours of recordings. Tent assembly includes manual interactions with non-rigid objects such as spreading the tent, securing guylines, reading instructions, and opening a tent bag. An interesting aspect of the dataset is that it reflects participants' proficiency in completing or understanding the task. This leads to participant differences in action sequences and action durations. Our dataset also has several new types of annotations for two synchronised egocentric videos. These include task errors, self-rated uncertainty and gaze position, in addition to the task action labels.
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
URL	https://data.bris.ac.uk/data/dataset/2ite3tu1u53n42hjfh3886sa86/


Description	Collaboration with INRIA
Organisation	The National Institute for Research in Computer Science and Control (INRIA)
Country	France
Sector	Public
PI Contribution	Collaboration and exchange of student with Prof Ivan Leptev Senior researcher (Directeur de Recherche) WILLOW project-team. The collaboration led to an internship of student Hazel Doughty who visited their team and resulted in follow up work for an accepted CVPR paper.
Collaborator Contribution	The interaction resulted in new ideas to integrate language and video. Specifically on using adverbs.
Impact	CVPR Paper: Hazel Doughty, Ivan Laptev, Walterio W. Mayol-Cuevas, Dima Damen. Action Modifiers: Learning From Adverbs in Instructional Videos. CVPR 2020: 865-875
Start Year	2020


Description	Collaboration with Kyoto University
Organisation	University of Kyoto
Country	Japan
Sector	Academic/University
PI Contribution	This collaboration with the group of Prof Yuichi Nakamura has lead to exchange visits from Kyoto to Bristol and also to joint publications. The collaboration is still ongoing.
Collaborator Contribution	Provided material artefacts for data collection, provided work for papers, and overall research discussions for new directions.
Impact	Longfei Chen, Yuichi Nakamura, Kazuaki Kondo, Walterio W. Mayol-Cuevas: Hotspot Modeling of Hand-Machine Interaction Experiences from a Head-Mounted RGB-D Camera. IEICE Transactions 102-D(2): 319-330 (2019) Longfei Chen, Yuichi Nakamura, Kazuaki Kondo, Dima Damen, Walterio W. Mayol-Cuevas: Hotspots Integrating of Expert and Beginner Experiences of Machine Operations through Egocentric Vision. MVA 2019: 1-6 Longfei Chen, Kazuaki Kondo, Yuichi Nakamura, Dima Damen, Walterio W. Mayol-Cuevas: Hotspots detection for machine operation in egocentric vision. MVA 2017: 223-226
Start Year	2017


Title	Action modifiers code for CVPR 2020 paper
Description	Code and data for the CVPR 2020 paper 'Action Modifiers: Learning from Adverbs in Instructional Videos'.
Type Of Technology	Software
Year Produced	2020
Open Source License?	Yes
Impact	Early days for this work.
URL	https://github.com/hazeld/action-modifiers


Title	Rank aware attention network
Description	Deep computational model for the ranking of skill based on attention. This accompanies the CVPR 2019 paper: Hazel Doughty, Walterio W. Mayol-Cuevas, Dima Damen: The Pros and Cons: Rank-Aware Temporal Attention for Skill Determination in Long Videos. CVPR 2019: 7862-7871.
Type Of Technology	Software
Year Produced	2019
Open Source License?	Yes
Impact	This software accompanies an academic paper. The impact is still to be measured.
URL	https://github.com/hazeld/rank-aware-attention-network


Description	EPIC@CVPR2019 workshop organisation
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Organisation of this international workshop where we are leading the egocentric perception community. This event was an official workshop at CVPR 2019.
Year(s) Of Engagement Activity	2019
URL	https://www.eyewear-computing.org/EPIC_CVPR19/


Description	EPIC@CVPR20: The Sixth International Workshop on Egocentric Perception, Interaction and Computing in conjunction with CVPR 2020;
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	This is a workshop we organised on the topic of egocentric perception and at a top Computer Vision venue: CVPR2020
Year(s) Of Engagement Activity	2020
URL	http://www.eyewear-computing.org/EPIC_CVPR20/


Description	EPIC@ECCV20: The Seventh International Workshop on Egocentric Perception, Interaction and Computing in conjunction with ECCV 2020; -Workshop organisation
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	We organised another instance of this workshop on egocentric perception at a top Computer Vision venue ECCV 2020.
Year(s) Of Engagement Activity	2020
URL	http://www.eyewear-computing.org/EPIC_ECCV20/


Description	EPIC@ICCV2019 workshop organisation
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Organisation of this international workshop where we are leading the egocentric perception community. This event was an official workshop at ICCV 2019.
Year(s) Of Engagement Activity	2019
URL	https://www.eyewear-computing.org/EPIC_ICCV19/


Description	Egocentric Perception Interaction and Computing Workshop at ICCV 2017
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	We lead the organisation of this workshop that we intend to be a series over the duration of the project and beyond. This new EPIC@X series of workshops aims to bring together the various communities that are relevant to egocentric perception including Computer Vision, Multimedia, HCI and the Visual Sciences and is planned to be held on the major conferences in these fields. EPIC@ICCV will accept Full Papers for novel work, and Extended Abstracts for ongoing or already published work. Both research and application works related to Egocentric Perception, Interaction and Computing are encouraged, including those that can be demonstrated or are in the prototype stages. We co-organized this with colleagues thatw e are reaching out for expanded impact and collaboration from universities in the US, Italy and Germany.
Year(s) Of Engagement Activity	2017
URL	http://www.eyewear-computing.org/EPIC_ICCV17/


Description	Inaugural Keynote, Encuentro Nacional de Computación (ENC) Conference 2021
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Professional Practitioners
Results and Impact	Opening Keynote at the main Annual event organized by the Mexican Society of Computer Sciences. This is the principal academic event in Mexico in Computer Science. It is attended (this year online) by students from undergrad to post doctoral as well as by principal investigators and scientific leadership in CS in Mexico.
Year(s) Of Engagement Activity	2021
URL	http://computo.fismat.umich.mx/enc2021/


Description	Keynote at 2nd Workshop on Applications of Egocentric Vision EgoApp. 25th International Conference on Pattern Recognition (ICPR 2020)
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Keynote at specialist workshop
Year(s) Of Engagement Activity	2021
URL	https://egoappworkshop2020.wordpress.com/


Description	Keynote at Assistive Computer Vision and Robotics (ACVR) workshop, Aug 28th, ECCV 2020.
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Keynote at specialist workshop
Year(s) Of Engagement Activity	2020
URL	https://iplab.dmi.unict.it/acvr2020/


Description	Keynote at Mexican Academy of Computing (Academia Mexicana de Computacion), Annual event 2018
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Professional Practitioners
Results and Impact	Keynote at annual event for this asossiation of Academics in Mexico
Year(s) Of Engagement Activity	2018


Description	Keynote at XR Day Exploring the future of XR design and technology, University of Washington. USA
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Keynote at specialist event
Year(s) Of Engagement Activity	2020


Description	Organization of Workshop on egocentric activity EPIC @ ECCV 2018
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	This is another workshop organized on the topic of Egocentric Perception Interaction and Computing. It is the second in the series and there is another one coming up in 2019. These workshops are the only event dedicated to the topic of the grant and attract an international audience at top venues.
Year(s) Of Engagement Activity	2018
URL	http://www.eyewear-computing.org/EPIC_ECCV18/

Abstract

Planned Impact

Organisations

People

ORCID iD

Publications