UMPIRE: United Model for the Perception of Interactions in visuoauditory REcognition

Lead Research Organisation: University of Bristol

Department Name: Computer Science

Abstract

Humans interact with tens of objects daily, at home (e.g. cooking/cleaning) or outdoors (e.g. ticket machines/shopping bags), during working (e.g. assembly/machinery) or leisure hours (e.g. playing/sports), individually or collaboratively. When observing people interacting with objects, our vision assisted by the sense of hearing is the main tool to perceive these interactions. Let's take the example of boiling water from a kettle. We observe the actor press a button, wait and hear the water boil and the kettle's light go off before water is used for, say, preparing tea. The perception process is formed from understanding intentional interactions (called ideomotor actions) as well as reactive actions to dynamic stimuli in the environment (referred to as sensormotor actions). As observers, we understand and can ultimately replicate such interactions using our sensory input, along with our underlying complex cognitive processes of event perception. Evidence in behavioural sciences demonstrates that these human cognitive processes are highly modularised, and these modules collaborate to achieve our outstanding human-level perception.

However, current approaches in artificial intelligence are lacking in their modularity and accordingly their capabilities. To achieve human-level perception of object interactions, including online perception when the interaction results in mistakes (e.g. water is spilled) or risks (e.g. boiling water is spilled), this fellowship focuses on informing computer vision and machine learning models, including deep learning architectures, from well-studied cognitive behavioural frameworks.

Deep learning architectures have achieved superior performance, compared to their hand-crafted predecessors, on video-level classification, however their performance on fine-grained understanding within the video remains modest. Current models are easily fooled by similar motions or incomplete actions, as shown by recent research. This fellowship focuses on empowering these models through modularisation, a principle proven since the 50s in Fodor's Modularity of the Mind, and frequently studied by cognitive psychologists in controlled lab environments. Modularity of high-level perception, along with the power of deep learning architectures, will bring a new understanding to videos analysis previously unexplored.

The targeted perception, of daily and rare object interactions, will lay the foundations for applications including assistive technologies using wearable computing, and robot imitation learning. We will work closely with three industrial partners to pave potential knowledge transfer paths to applications.

Additionally, the fellowship will actively engage international researchers through workshops, benchmarks and public challenges on large datasets, to encourage other researchers to address problems related to fine-grained perception in video understanding.

Planned Impact

The fellowship focuses on learning a model for understanding human object interactions, using visual- and auditory-sensors, with novel capabilities. The model will be capable of understanding the actor's hierarchy of goals and predicting upcoming interactions. The model will also be able to map the perceived interaction into a set of steps that could be replicated by a robot, tested within a simulated environment.

By enhancing the capabilities for computer vision models for recognising human-object interaction, the fellowship has limitless impact on future technologies. The economic and societal impacts are here intertwined where industry would be the prime beneficiary to build new technology, but individuals would be the end users. I summarise the potential through three application areas, impactful on the UK's national capabilities of several industries, and availing opportunities previously unexplored.

1) Assistive Technologies
Every individual can benefit from assistive technologies of object interactions. For example, reminding a person whether they had added salt to their meal or securely closed a water tap are capabilities of the model UMPIRE. Further assistance specialised for the elderly or people with impairments can be envisaged where alarms are raised in cases of unsafe interactions. Several start-ups have attempted to use assistive technologies in daily interactions. These however rely on specialised sensors to be integrated with every instrument (one sensor per tap to detect running water). Instead, this project promises human-level cognition using general visuo-auditory sensors, not specialised for the action. Through a model that can understand and detect the interaction's consequences and changes to environment (e.g. if water is still pouring then the water source has not been secured), the potential for assistive technologies will be widely enhanced. To realise this impact the fellowship, will engage with the Samsung AI Centre Cambridge, where assistive wearable technologies are under development.

2) Robotics and Beyond
A key capability of the UMPIRE model is actionable perception, i.e. a step-by-step procedure for an artificial agent to replicate the object interaction. This capability will be impactful to people working on vision for robotics. Teaching a robot how to 'open a can' by demonstrating the interaction is a main objective for effective household robotics. In this fellowship, I work closely with NVidia, originators of the open source simulating development kits Isaac and PhysX, to prepare for this impact.

3) Entertainment and Gaming
Virtual and augmented reality games can now integrate a three-dimensional avatar in our home, running around our sofas and tables. However, object interaction perception would enhance the ability to integrate these games with our everyday tasks combining life with fun. Though perceiving object interactions, avatars would be able to simulate opening your kitchen tap and augmented water flowing. Currently, such potential requires hand-coded graphics. Using a model for interaction perception would enable novel entertainment applications.

In this fellowship, I will engage with the first two impact areas, but note gaming as a potential for further exploration. Due to the large commercial potential, the fellowship will have a commercialisation plan, developed through consultation with Ultrahaptics and SAIC towards a spin-out and/or knowledge transfer.

In addition to the economic and societal impact, the fellowship has an impact on integrating two very active research communities, particularly in the UK: cognitively-inspired human behaviour and data-driven computer vision. New research directions can emerge introducing tools for data-driven research to cognitive psychologists.

Funded Value:

£1,001,838

Funded Period:

Feb 20 - Jan 25

Funder:

EPSRC

Project Status:

Active

Project Category:

Fellowship

Project Reference:

EP/T004991/1

Principal Investigator:

Dima Damen

Research Subject:

Info. & commun. Technol. (100%)

Research Topic:

Artificial Intelligence (10%)

Human Communication in ICT (10%)

Image & Vision Computing (70%)

Vision & Senses - ICT appl. (10%)

Organisations

People	ORCID iD
Dima Damen (Principal Investigator / Fellow)
Iain Gilchrist (Researcher)

Publications

Author Name

Title Publication Date Published

|< < 1 2 3 > >|

10 25 50

Damen D (2020) The EPIC-KITCHENS Dataset: Collection, Challenges and Baselines

Damen D (2020) Rescaling Egocentric Vision

Damen D (2021) Rescaling Egocentric Vision: Collection, Pipeline and Challenges for EPIC-KITCHENS-100 in International Journal of Computer Vision

Damen D (2021) The EPIC-KITCHENS Dataset: Collection, Challenges and Baselines. in IEEE transactions on pattern analysis and machine intelligence

Darkhalil A (2022) EPIC-KITCHENS VISOR Benchmark: VIdeo Segmentations and Object Relations

Doughty H (2020) Action Modifiers: Learning From Adverbs in Instructional Videos

Flanagan K (2023) Learning Temporal Sentence Grounding From Narrated EgoVideos

Fragomeni A (2023) Computer Vision - ACCV 2022 - 16th Asian Conference on Computer Vision, Macao, China, December 4-8, 2022, Proceedings, Part IV

Grauman K (2022) Ego4D: Around the World in 3,000 Hours of Egocentric Video

Huh J (2023) Epic-Sounds: A Large-Scale Dataset of Actions that Sound

Key Findings
Impact Summary
Policy Influence
Further Funding
Research Databases and Models
Collaboration
Software and Technical Products
Engagement Activities


Description	(2024) To date, understanding videos has restricted itself to what's within the camera's field of view. Videos are partial representations of a continuing world outside the frame of the camera. Research as part of this award has made a transformative change to this limitation, by studying videos within the 3D scene, placing cameras, objects, humans and transformations in a world coordinate frame. Within this frame, fine-grained understanding of object transformations has been thoroughly explored, identifying not only interactions between hands and objects but also interactions between tools (e.g. a hand is holding a glove, that is in turn holding a pot, that contains food on a hot stove). This level of detailed understanding of object interactions is being tackled by this award producing a more informative fine-grained representation of the world. Additionally, the world is a combination of multiple sensors. In this award, we understand both the complementary but also the contradictory nature of audio and video when observing object interactions. At times, audio can provide an ambiguous signal when studied alone. Additionally, audio can offer the only confident insight into materials and properties of object interactions. Together, UMPIRE is building an interpretable model of every frame in a video. This is achieved through a combination of models, labelled data and software that is publicly available for researchers to explore new insights to understanding video.
Exploitation Route	Representing dynamic egocentric videos in 3D space has been achieved through a breakthrough approach with accompanying code: https://epic-kitchens.github.io/epic-fields/ Large scale benchmark EPIC Sounds is now publicly available https://epic-kitchens.github.io/epic-sounds/ Large-scale benchmark VISOR is now publicly available http://epic-kitchens.github.io/VISOR/ The large-scale dataset EGO4D its now publicly available https://ego4d-data.org/ Currently 5 published benchmarks are available for researchers to compare their methods on a hidden test set. Winners of the first round will be announced in June 2021 alongside a workshop in CVPR 2021: https://epic-kitchens.github.io/2021#challenges
Sectors	Creative Economy Digital/Communication/Information Technologies (including Software) Leisure Activities including Sports Recreation and Tourism


Description	One aspect of this project has now contributed to industrial impact. The first is the recently released massive-scale dataset: Ego4D Read here: https://www.bristol.ac.uk/news/2021/october/ego4d.html
First Year Of Impact	2022
Sector	Digital/Communication/Information Technologies (including Software)
Impact Types	Economic


Description	Consultancy to DeepMind
Geographic Reach	Multiple continents/international
Policy Influence Type	Influenced training of practitioners or researchers


Description	Visual AI: An Open World Interpretable Visual Transformer
Amount	£5,912,096 (GBP)
Funding ID	EP/T028572/1
Organisation	Engineering and Physical Sciences Research Council (EPSRC)
Sector	Public
Country	United Kingdom
Start	12/2020
End	11/2025


Title	EPIC Fields: Marrying 3D Geometry and Video Understanding
Description	We introduce EPIC Fields, an augmentation of EPIC-KITCHENS with 3D camera information. Similar to other datasets for neural rendering, EPIC Fields removes the complex and expensive step of reconstructing cameras using photogrammetry, and allows researchers to focus on more interesting modeling problems. We illustrate the challenge of photogrammetry in egocentric videos and propose several technical innovations to address them.
Type Of Material	Database/Collection of data
Year Produced	2023
Provided To Others?	Yes
Impact	Upcoming
URL	https://epic-kitchens.github.io/epic-fields/


Title	EPIC-KITCHENS VISOR
Description	We introduce VISOR, a new dataset of pixel annotations and a benchmark suite for segmenting hands and active objects in egocentric video. VISOR annotates videos from EPIC-KITCHENS, which comes with a new set of challenges not encountered in current video segmentation datasets. Specifically, we need to ensure both short- and long-term consistency of pixel-level annotations as objects undergo transformative interactions, e.g. an onion is peeled, diced and cooked - where we aim to obtain accurate pixel-level annotations of the peel, onion pieces, chopping board, knife, pan, as well as the acting hands. VISOR introduces an annotation pipeline, AI-powered in parts, for scalability and quality. Data published under the Creative Commons Attribution-NonCommerial 4.0 International License.
Type Of Material	Database/Collection of data
Year Produced	2022
Provided To Others?	Yes
Impact	The first dataset for video object segmentations during object interactions where objects are undergoing drastic transformations. This work is testing the limit of previous approaches for tracking or segmentations. An ongoing open challenge is available to the research community.
URL	https://data.bris.ac.uk/data/dataset/2v6cgv1x04ol22qp9rm9x2j6a7/


Title	EPIC-KITCHENS-100
Description	Extended Footage for EPIC-KITCHENS dataset, to 100 hours of footage.
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
Impact	5 open benchmarks are available for researchers to utilise. To-date the dataset was downloaded more than 2.3K times by researchers from 42 different countries.
URL	http://epic-kitchens.github.io/


Title	Epic-Sounds: A Large-scale Dataset of Actions That Sound
Description	We introduce EPIC-SOUNDS, a large-scale dataset of audio annotations capturing temporal extents and class labels within the audio stream of the egocentric videos. We propose an annotation pipeline where annotators temporally label distinguishable audio segments and describe the action that could have caused this sound. We identify actions that can be discriminated purely from audio, through grouping these free-form descriptions of audio into classes. For actions that involve objects colliding, we collect human annotations of the materials of these objects (e.g. a glass object being placed on a wooden surface), which we verify from visual labels, discarding ambiguities. Overall, EPIC-SOUNDS includes 78.4k categorised segments of audible events and actions, distributed across 44 classes as well as 39.2k non-categorised segments. We train and evaluate two state-of-the-art audio recognition models on our dataset, highlighting the importance of audio-only labels and the limitations of current models to recognise actions that sound.
Type Of Material	Database/Collection of data
Year Produced	2023
Provided To Others?	Yes
Impact	A standard benchmark for testing audio-visual models. Already being cited in major publications
URL	https://epic-kitchens.github.io/epic-sounds/


Title	Frame Attributions in Video Models - Interactive Dashboard
Description	Interactive Dashboard to assess the impact of individual frames in a video on current recognition models
Type Of Material	Data analysis technique
Year Produced	2020
Provided To Others?	Yes
Impact	-
URL	https://play-fair.willprice.dev


Description	ASPIRE - research network on human-centered vision and media technologies
Organisation	University of Tokyo
Country	Japan
Sector	Academic/University
PI Contribution	ASPIRE program for funding research visits between Bristol and Tokyo for the next 5 years focusing on human-centred vision. The costs of travel in both directions are covered by the University of Tokyo
Collaborator Contribution	First visiting student from Japan expected to arrive in July and first visit of postdoc from Bristol is planned for Nov.
Impact	-
Start Year	2024


Description	Adobe - Bristol
Organisation	Adobe Inc.
Country	United States
Sector	Private
PI Contribution	Funds for researching GenAI for action understanding
Collaborator Contribution	Supporting the hiring of a PhD student, offered via a charitable donation.
Impact	Ongoing
Start Year	2023


Description	Ego4D Consortium Collaboration
Organisation	Carnegie Mellon University
Country	United States
Sector	Academic/University
PI Contribution	Collecting the largest and most diverse dataset of egocentric videos
Collaborator Contribution	The project was inspired by my prior EPIC-KITCHENS project and I am a founding member of this consortium
Impact	Public dataset for research and commercial purposes of 3670 hours collected by 923 participants in 74 cities around the world
Start Year	2021


Description	Ego4D Consortium Collaboration
Organisation	Facebook
Country	United States
Sector	Private
PI Contribution	Collecting the largest and most diverse dataset of egocentric videos
Collaborator Contribution	The project was inspired by my prior EPIC-KITCHENS project and I am a founding member of this consortium
Impact	Public dataset for research and commercial purposes of 3670 hours collected by 923 participants in 74 cities around the world
Start Year	2021


Description	Ego4D Consortium Collaboration
Organisation	Georgia Institute of Technology
Country	United States
Sector	Academic/University
PI Contribution	Collecting the largest and most diverse dataset of egocentric videos
Collaborator Contribution	The project was inspired by my prior EPIC-KITCHENS project and I am a founding member of this consortium
Impact	Public dataset for research and commercial purposes of 3670 hours collected by 923 participants in 74 cities around the world
Start Year	2021


Description	Ego4D Consortium Collaboration
Organisation	Indian Institute of Technology Hyderabad
Country	India
Sector	Academic/University
PI Contribution	Collecting the largest and most diverse dataset of egocentric videos
Collaborator Contribution	The project was inspired by my prior EPIC-KITCHENS project and I am a founding member of this consortium
Impact	Public dataset for research and commercial purposes of 3670 hours collected by 923 participants in 74 cities around the world
Start Year	2021


Description	Ego4D Consortium Collaboration
Organisation	Indiana University Bloomington
Country	United States
Sector	Academic/University
PI Contribution	Collecting the largest and most diverse dataset of egocentric videos
Collaborator Contribution	The project was inspired by my prior EPIC-KITCHENS project and I am a founding member of this consortium
Impact	Public dataset for research and commercial purposes of 3670 hours collected by 923 participants in 74 cities around the world
Start Year	2021


Description	Ego4D Consortium Collaboration
Organisation	King Abdullah University of Science and Technology (KAUST)
Department	KAUST Supercomputing Laboratory
Country	Saudi Arabia
Sector	Academic/University
PI Contribution	Collecting the largest and most diverse dataset of egocentric videos
Collaborator Contribution	The project was inspired by my prior EPIC-KITCHENS project and I am a founding member of this consortium
Impact	Public dataset for research and commercial purposes of 3670 hours collected by 923 participants in 74 cities around the world
Start Year	2021


Description	Ego4D Consortium Collaboration
Organisation	Massachusetts Institute of Technology
Country	United States
Sector	Academic/University
PI Contribution	Collecting the largest and most diverse dataset of egocentric videos
Collaborator Contribution	The project was inspired by my prior EPIC-KITCHENS project and I am a founding member of this consortium
Impact	Public dataset for research and commercial purposes of 3670 hours collected by 923 participants in 74 cities around the world
Start Year	2021


Description	Ego4D Consortium Collaboration
Organisation	National University of Singapore
Country	Singapore
Sector	Academic/University
PI Contribution	Collecting the largest and most diverse dataset of egocentric videos
Collaborator Contribution	The project was inspired by my prior EPIC-KITCHENS project and I am a founding member of this consortium
Impact	Public dataset for research and commercial purposes of 3670 hours collected by 923 participants in 74 cities around the world
Start Year	2021


Description	Ego4D Consortium Collaboration
Organisation	Universidad de Los Andes, Chile
Country	Chile
Sector	Academic/University
PI Contribution	Collecting the largest and most diverse dataset of egocentric videos
Collaborator Contribution	The project was inspired by my prior EPIC-KITCHENS project and I am a founding member of this consortium
Impact	Public dataset for research and commercial purposes of 3670 hours collected by 923 participants in 74 cities around the world
Start Year	2021


Description	Ego4D Consortium Collaboration
Organisation	University of Catania
Country	Italy
Sector	Academic/University
PI Contribution	Collecting the largest and most diverse dataset of egocentric videos
Collaborator Contribution	The project was inspired by my prior EPIC-KITCHENS project and I am a founding member of this consortium
Impact	Public dataset for research and commercial purposes of 3670 hours collected by 923 participants in 74 cities around the world
Start Year	2021


Description	Ego4D Consortium Collaboration
Organisation	University of Minnesota
Country	United States
Sector	Academic/University
PI Contribution	Collecting the largest and most diverse dataset of egocentric videos
Collaborator Contribution	The project was inspired by my prior EPIC-KITCHENS project and I am a founding member of this consortium
Impact	Public dataset for research and commercial purposes of 3670 hours collected by 923 participants in 74 cities around the world
Start Year	2021


Description	Ego4D Consortium Collaboration
Organisation	University of Pennsylvania
Country	United States
Sector	Academic/University
PI Contribution	Collecting the largest and most diverse dataset of egocentric videos
Collaborator Contribution	The project was inspired by my prior EPIC-KITCHENS project and I am a founding member of this consortium
Impact	Public dataset for research and commercial purposes of 3670 hours collected by 923 participants in 74 cities around the world
Start Year	2021


Description	Ego4D Consortium Collaboration
Organisation	University of Tokyo
Country	Japan
Sector	Academic/University
PI Contribution	Collecting the largest and most diverse dataset of egocentric videos
Collaborator Contribution	The project was inspired by my prior EPIC-KITCHENS project and I am a founding member of this consortium
Impact	Public dataset for research and commercial purposes of 3670 hours collected by 923 participants in 74 cities around the world
Start Year	2021


Description	University of Oxford - Audio-visual Fusion for Egocentric Videos
Organisation	University of Oxford
Department	Department of Engineering Science
Country	United Kingdom
Sector	Academic/University
PI Contribution	This collaboration started earlier than the award but continues to form strong research impacts during the award. Listed are there contributions within this AWARD
Collaborator Contribution	BMVC2021 Paper and Associated Codebase and Models. EPIC-Sounds Dataset - Published in ICASSP 2023.
Impact	(2019) E Kazakos, A Nagrani, A Zisserman, D Damen. EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition. International Conference on Computer Vision (ICCV). (2021) E Kazakos, A Nagrani, A Zisserman, D Damen. Slow-Fast Auditory Streams for Audio Recognition. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). (2021) E Kazakos, J Huh, A Nagrani, A Zisserman, D Damen. With a Little Help from my Temporal Context: Multimodal Egocentric Action Recognition. British Machine Vision Conference (BMVC). (2023) J Huh, J Chalk, E Kazakos, D Damen, A Zisserman. EPIC-SOUDNS: A Large-Scale Dataset of Actions that Sound. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP).
Start Year	2018


Description	VISOR Benchmark: VIdeo Segmentations and Object Relations
Organisation	Procter & Gamble
Country	United States
Sector	Private
PI Contribution	Working to collect a new benchmark of pixel-level objects and relations
Collaborator Contribution	Established and leading the collaboration.
Impact	VISOR Dataset - https://data.bris.ac.uk/data/dataset/2v6cgv1x04ol22qp9rm9x2j6a7 Publication: (2022) A Darkhalil, D Shan, B Zhu, J Ma, A Kar, R Higgins, S Fidler, D Fouhey, D Damen. EPIC-KITCHENS VISOR Benchmark: VIdeo Segmentations and Object Relations. Neural Information Processing Systems (NeurIPS) Datasets and Benchmarks Track. Ongoing Challenges - next cycle for June 2024.
Start Year	2021


Description	VISOR Benchmark: VIdeo Segmentations and Object Relations
Organisation	University of Michigan
Country	United States
Sector	Academic/University
PI Contribution	Working to collect a new benchmark of pixel-level objects and relations
Collaborator Contribution	Established and leading the collaboration.
Impact	VISOR Dataset - https://data.bris.ac.uk/data/dataset/2v6cgv1x04ol22qp9rm9x2j6a7 Publication: (2022) A Darkhalil, D Shan, B Zhu, J Ma, A Kar, R Higgins, S Fidler, D Fouhey, D Damen. EPIC-KITCHENS VISOR Benchmark: VIdeo Segmentations and Object Relations. Neural Information Processing Systems (NeurIPS) Datasets and Benchmarks Track. Ongoing Challenges - next cycle for June 2024.
Start Year	2021


Description	VISOR Benchmark: VIdeo Segmentations and Object Relations
Organisation	University of Toronto
Country	Canada
Sector	Academic/University
PI Contribution	Working to collect a new benchmark of pixel-level objects and relations
Collaborator Contribution	Established and leading the collaboration.
Impact	VISOR Dataset - https://data.bris.ac.uk/data/dataset/2v6cgv1x04ol22qp9rm9x2j6a7 Publication: (2022) A Darkhalil, D Shan, B Zhu, J Ma, A Kar, R Higgins, S Fidler, D Fouhey, D Damen. EPIC-KITCHENS VISOR Benchmark: VIdeo Segmentations and Object Relations. Neural Information Processing Systems (NeurIPS) Datasets and Benchmarks Track. Ongoing Challenges - next cycle for June 2024.
Start Year	2021


Title	Auditory Slow-Fast
Description	Recognising actions using auditory signal only
Type Of Technology	Software
Year Produced	2021
Open Source License?	Yes
Impact	Paper won outstanding paper at ICASSP 2021 - 3 papers selected out of 1400 papers. Well-referenced -46 stars. In a followup work by Deepmind [https://arxiv.org/pdf/2111.12124.pdf] this work is referred to as: "We find the Slowfast architecture is good at learning rich repre- sentations required by different domains" extending this work to speech and music audio.
URL	https://github.com/ekazakos/auditory-slow-fast


Title	EPIC Fields Code
Description	This section contains the pipeline for the dataset introduced in our paper, "EPIC Fields: Marrying 3D Geometry and Video Understanding." We aim to bridge the domains of 3D geometry and video understanding, leading to innovative advancements in both areas.
Type Of Technology	Software
Year Produced	2023
Open Source License?	Yes
Impact	Upcoming
URL	https://github.com/epic-kitchens/epic-Fields-code


Title	Explainable Video Understanding
Description	Frame Attributions in Video Models
Type Of Technology	Software
Year Produced	2020
Open Source License?	Yes
Impact	A corresponding interactive dashboard is available for people to experiment with explainable models.
URL	http://play-fair.uksouth.cloudapp.azure.com/?uid=137966&n-frames=10


Title	Multimodal Temporal Context Network (MTCN)
Description	Audio-Visual Recognition of Object Interactions - New Architecture and Modes
Type Of Technology	Software
Year Produced	2021
Open Source License?	Yes
Impact	Used as baseline by other researchers
URL	https://github.com/ekazakos/MTCN


Title	Temporal-Relational Cross-Transformers (TRX)
Description	Software suite for few-shot action recognition with novel cross-transformer architecture and model (CVPR 2021 paper)
Type Of Technology	Software
Year Produced	2021
Open Source License?	Yes
Impact	Code is highly appreciated by the community (62 stars), and already compared to 10 different follow-up methods
URL	https://github.com/tobyperrett/trx


Title	Video Object Segmentation
Description	Software for Video object segmentation and tracking throughout transformations
Type Of Technology	Software
Year Produced	2022
Open Source License?	Yes
Impact	Starter code for using EPIC-KITCHENS VISOR annotations
URL	https://github.com/epic-kitchens/VISOR-VIS


Description	10th EPIC Workshop
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	10th iteration of our international workshop with a round of international challenges and winners announced along with a technical report and a round table.
Year(s) Of Engagement Activity	2022
URL	https://epic-workshop.org/EPIC_CVPR22/


Description	Compositional and Multimodal Perception of Object Interactions
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Keynote at International Challenge on Compositional and Multimodal Perception held alongside European Conference on Computer Vision (ECCV)
Year(s) Of Engagement Activity	2020
URL	https://www.youtube.com/watch?v=zgwg1K77LBs&feature=youtu.be


Description	Human-Centric Object Interactions - A Fine-Grained Perspective from Egocentric Videos
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Keynote at the first international workshop on deep learning for human-centric activity understanding, held alongside International Conference on Pattern Recognition (ICPR)
Year(s) Of Engagement Activity	2020
URL	http://staff.ustc.edu.cn/~tzzhang/dl-hau2020/program.html


Description	Human-Centric Object Interactions - A Fine-Grained Perspective from Egocentric Videos
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Talk at 1st International Workshop On Human-Centric Multimedia Analysis held alongside ACM Multimedia
Year(s) Of Engagement Activity	2020
URL	https://hcma2020.github.io


Description	Keynote - IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	Main Keynote on the Opportunities in Egocentric Vision
Year(s) Of Engagement Activity	2024
URL	https://wacv2024.thecvf.com


Description	Keynote - International Conference on Machine Vision Applications (MVA)
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	Keynote in a Major International Conference
Year(s) Of Engagement Activity	2023
URL	https://www.mva-org.jp/mva2023/


Description	Naturally Limited Videos of Fine-Grained Actions
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	In this talk, I'll present the case for collecting unscripted video datasets in their native environments, introducing naturally long-tailed datasets. Using such resource, I will present my group's approaches to zero-shot action retrieval [ICCV 2019], few-shot recognition [CVPR 2020], domain adaptation [CVPR 2020, ArXiv] and unsupervised learning [CVPR 2022].
Year(s) Of Engagement Activity	2022
URL	https://sites.google.com/view/l3d-ivu/program


Description	Research Visit: Berkeley AI Research Laboratory
Form Of Engagement Activity	A formal working group, expert panel or dialogue
Part Of Official Scheme?	No
Geographic Reach	Regional
Primary Audience	Professional Practitioners
Results and Impact	Research Visit at BAIR for extending research collaboration and engaging in interesting discussions with researchers in Computer Vision, AI and Robotics
Year(s) Of Engagement Activity	2023


Description	Seventh International Workshop on Egocentric Perception, Interaction and Computing (EPIC)
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	More than 200 researchers attended a full day workshop on egocentric perception, contributing talks, keynotes and poster presentations.
Year(s) Of Engagement Activity	2020
URL	https://eyewear-computing.org/EPIC_ECCV20/


Description	Sixth International Workshop on Egocentric Perception, Interaction and Computing
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	150 researchers from academia and industry attended a virtual international workshop where the latest research on fine-grained action recognition was discussed and presented.
Year(s) Of Engagement Activity	2020
URL	https://eyewear-computing.org/EPIC_CVPR20/


Description	Talk: Learning from Narrated Videos of Everyday Tasks
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Talk on Learning from Narrated Videos of Everyday Tasks at the CVPR2020 workshop on Instructional Videos
Year(s) Of Engagement Activity	2020
URL	https://drive.google.com/file/d/1nMr6wanv9fQFjbJNP9ZjDQBMNVq8kUIT/view


Description	Video Understanding - an Egocentric Perspective
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	Presentations at the 6th Summer School on AI
Year(s) Of Engagement Activity	2022
URL	https://cvit.iiit.ac.in/summerschool2022/


Description	Video Understanding: A Tutorial
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	Participation in the International Computer Vision Summer School
Year(s) Of Engagement Activity	2022
URL	https://iplab.dmi.unict.it/icvss2022/

Abstract

Planned Impact

Organisations

People

ORCID iD

Publications