Expertise Determination for Glance-able Guidance

Lead Research Organisation: University of Bristol
Department Name: Computer Science


The aim of this project is to extend prior work on glance-able guidance by automatically determining expertise
from raw video, so guides can be generated for novice users. This glance-able guidance differs significantly from
traditional Augmented Reality (AR) which requires guidance and annotations to be manually crafted, as glanceable
guidance creates guides from raw, unscripted, wearable camera footage. Footage from wearable cameras,
particularly how-to guides, have become popular on sites such as YouTube. Although these collections of videos
are extremely useful, they are also created from a time-consuming editing process. Furthermore, these videos
are created from a mixture of experts and amateurs, with users needing explicit prompts and a basic prior
knowledge to know what to look for. Automatically determining the expertise of users in unscripted videos will
ensure the videos displayed to users will be helpful, while extending the amount of tasks for which guidance can
be displayed.
Prior Work
In [1], Damen et al. proposed a method for automatic discovery of task-relevant objects - these are objects with
which a person interacts during task performance. By collecting data from multiple users, multiple instances of
interactions with the same object were captured. In [2], Leelasawassuk et al. followed the work with an approach
that estimates temporal and spatial attention periods from inertial measurements, trained by physically attaching
a wearable gaze tracker to Google Glass, and learning a regression model.
Expertise Determination
The aim of automatically determining expertise in video is to go beyond using a single simple success criteria
such as completion time as an evaluation metric of expertise. This will involve looking into
novel approaches to model expertise.
Previous attempts at expertise determination from video have been limited to highly structured tasks in highly
specific tasks such as surgery [3]. These works often lack a fine-grained approach to determining expertise level
of the person completing the task and use rough categories such as beginner, intermediate or expert.
To begin extending this work I aim to create a method to rank the expertise level of individual performances of a
task, which works for multiple different tasks.
I will start looking at a range of existing and new methods in visual learning while considering the effect on
glanceable guides.
Overall this work aims to develop new methods and datasets for expertise determination in multiple tasks.
[1] Damen, Dima, Leelasawassuk, Teesid, Haines, Osian, Calway, Andrew and Mayoal-Cuevas, Walterio
(2014). You-Do-I-Learn: Discovering Task Relevant Objects and their Modes of Interaction from Multi-User
Egocentric Video. British Machine Vision Conference (BMVC).
[2] Leelasawassuk, Teesid, Damen, Dima and Mayol-Cuevas, Walterio (2015). Estimating Visual Attention from
a Head Mounted IMU. Internation Symposium on Wearable Computers (ISWC).
[3] Zia, Anneq, Sharma, Yachna, Bettadapura, Vinay, Sarin, Eric L., Ploetz, Thomas, Clements, Mark A. and
Essa Irfan (2016). Automated video-based assessment of surgical skills for training and evaluation in medical
schools. International journal of computer assisted radiology and surgery.


10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/N509619/1 01/10/2016 30/09/2021
1793135 Studentship EP/N509619/1 12/09/2016 11/09/2020 Hazel Doughty
Description We have developed a method which is capable of accurately ranking participant's skill at a previously seen task from video. We have then extended this method to be able to first pick out the important video segments which contribute to this judgement as a human would.
Exploitation Route This method could be used to aid video retrieval. For instance, currently sites such as YouTube rely on user rating to return useful videos. For instructional videos it is important people learn from someone skill at the task, therefore skill determination could help.

The rank aware attention method from our most recent paper could also be used for other complex ranking tasks where attention is useful.
Sectors Digital/Communication/Information Technologies (including Software),Education

Title EPIC-Skills 2019 
Description EPIC-Skills 2019 is a dataset released with the publication of our paper 'Who's Better? Who's Best? Pairwise Deep Ranking for Skill Determination'. It contains a collection of videos from several tasks (dough-rolling, drawing and chopstick-using) ranked in terms of the skill displayed in the videos. We have roughly 40 videos per task. With these videos we collect annotations of video pairs indicating which video in the pair shows higher skill. We use these videos to perform research in automatic skill determination and explore new areas for skill determination i.e. daily living tasks. 
Type Of Material Database/Collection of data 
Year Produced 2018 
Provided To Others? Yes  
Impact It allowed us to train and validate our method in our paper 'Who's Best? Pairwise Deep Ranking for Skill Determination'. It has encouraged more work in skill determination and allowed others to validate their methods e.g. 'Manipulation-skill Assessment from Videos with Spatial Attention Network' - Zhenqiang Li, Yifei Huang, Minjie Cai, Yoichi Sato (2018)/