UMPIRE: United Model for the Perception of Interactions in visuoauditory REcognition

Lead Research Organisation: University of Bristol

Department Name: Computer Science

Abstract

Humans interact with tens of objects daily, at home (e.g. cooking/cleaning) or outdoors (e.g. ticket machines/shopping bags), during working (e.g. assembly/machinery) or leisure hours (e.g. playing/sports), individually or collaboratively. When observing people interacting with objects, our vision assisted by the sense of hearing is the main tool to perceive these interactions. Let's take the example of boiling water from a kettle. We observe the actor press a button, wait and hear the water boil and the kettle's light go off before water is used for, say, preparing tea. The perception process is formed from understanding intentional interactions (called ideomotor actions) as well as reactive actions to dynamic stimuli in the environment (referred to as sensormotor actions). As observers, we understand and can ultimately replicate such interactions using our sensory input, along with our underlying complex cognitive processes of event perception. Evidence in behavioural sciences demonstrates that these human cognitive processes are highly modularised, and these modules collaborate to achieve our outstanding human-level perception.

However, current approaches in artificial intelligence are lacking in their modularity and accordingly their capabilities. To achieve human-level perception of object interactions, including online perception when the interaction results in mistakes (e.g. water is spilled) or risks (e.g. boiling water is spilled), this fellowship focuses on informing computer vision and machine learning models, including deep learning architectures, from well-studied cognitive behavioural frameworks.

Deep learning architectures have achieved superior performance, compared to their hand-crafted predecessors, on video-level classification, however their performance on fine-grained understanding within the video remains modest. Current models are easily fooled by similar motions or incomplete actions, as shown by recent research. This fellowship focuses on empowering these models through modularisation, a principle proven since the 50s in Fodor's Modularity of the Mind, and frequently studied by cognitive psychologists in controlled lab environments. Modularity of high-level perception, along with the power of deep learning architectures, will bring a new understanding to videos analysis previously unexplored.

The targeted perception, of daily and rare object interactions, will lay the foundations for applications including assistive technologies using wearable computing, and robot imitation learning. We will work closely with three industrial partners to pave potential knowledge transfer paths to applications.

Additionally, the fellowship will actively engage international researchers through workshops, benchmarks and public challenges on large datasets, to encourage other researchers to address problems related to fine-grained perception in video understanding.

Planned Impact

The fellowship focuses on learning a model for understanding human object interactions, using visual- and auditory-sensors, with novel capabilities. The model will be capable of understanding the actor's hierarchy of goals and predicting upcoming interactions. The model will also be able to map the perceived interaction into a set of steps that could be replicated by a robot, tested within a simulated environment.

By enhancing the capabilities for computer vision models for recognising human-object interaction, the fellowship has limitless impact on future technologies. The economic and societal impacts are here intertwined where industry would be the prime beneficiary to build new technology, but individuals would be the end users. I summarise the potential through three application areas, impactful on the UK's national capabilities of several industries, and availing opportunities previously unexplored.

1) Assistive Technologies
Every individual can benefit from assistive technologies of object interactions. For example, reminding a person whether they had added salt to their meal or securely closed a water tap are capabilities of the model UMPIRE. Further assistance specialised for the elderly or people with impairments can be envisaged where alarms are raised in cases of unsafe interactions. Several start-ups have attempted to use assistive technologies in daily interactions. These however rely on specialised sensors to be integrated with every instrument (one sensor per tap to detect running water). Instead, this project promises human-level cognition using general visuo-auditory sensors, not specialised for the action. Through a model that can understand and detect the interaction's consequences and changes to environment (e.g. if water is still pouring then the water source has not been secured), the potential for assistive technologies will be widely enhanced. To realise this impact the fellowship, will engage with the Samsung AI Centre Cambridge, where assistive wearable technologies are under development.

2) Robotics and Beyond
A key capability of the UMPIRE model is actionable perception, i.e. a step-by-step procedure for an artificial agent to replicate the object interaction. This capability will be impactful to people working on vision for robotics. Teaching a robot how to 'open a can' by demonstrating the interaction is a main objective for effective household robotics. In this fellowship, I work closely with NVidia, originators of the open source simulating development kits Isaac and PhysX, to prepare for this impact.

3) Entertainment and Gaming
Virtual and augmented reality games can now integrate a three-dimensional avatar in our home, running around our sofas and tables. However, object interaction perception would enhance the ability to integrate these games with our everyday tasks combining life with fun. Though perceiving object interactions, avatars would be able to simulate opening your kitchen tap and augmented water flowing. Currently, such potential requires hand-coded graphics. Using a model for interaction perception would enable novel entertainment applications.

In this fellowship, I will engage with the first two impact areas, but note gaming as a potential for further exploration. Due to the large commercial potential, the fellowship will have a commercialisation plan, developed through consultation with Ultrahaptics and SAIC towards a spin-out and/or knowledge transfer.

In addition to the economic and societal impact, the fellowship has an impact on integrating two very active research communities, particularly in the UK: cognitively-inspired human behaviour and data-driven computer vision. New research directions can emerge introducing tools for data-driven research to cognitive psychologists.

Funded Value:

£1,001,838

Funded Period:

Feb 20 - Jan 25

Funder:

EPSRC

Project Status:

Active

Project Category:

Fellowship

Project Reference:

EP/T004991/1

Principal Investigator:

Dima Damen

Research Subject:

Info. & commun. Technol. (100%)

Research Topic:

Artificial Intelligence (10%)

Human Communication in ICT (10%)

Image & Vision Computing (70%)

Vision & Senses - ICT appl. (10%)

Organisations

People	ORCID iD
Dima Damen (Principal Investigator / Fellow)
Iain Gilchrist (Researcher)

Publications

Author Name

Title Publication Date Published

10 25 50

Damen D (2021) Rescaling Egocentric Vision: Collection, Pipeline and Challenges for EPIC-KITCHENS-100 in International Journal of Computer Vision

Damen D (2021) The EPIC-KITCHENS Dataset: Collection, Challenges and Baselines. in IEEE transactions on pattern analysis and machine intelligence

Darkhalil A (2022) EPIC-KITCHENS VISOR Benchmark: VIdeo Segmentations and Object Relations

Doughty H (2020) Action Modifiers: Learning From Adverbs in Instructional Videos

Fragomeni A (2023) Computer Vision - ACCV 2022 - 16th Asian Conference on Computer Vision, Macao, China, December 4-8, 2022, Proceedings, Part IV

Grauman K (2022) Ego4D: Around the World in 3,000 Hours of Egocentric Video

Jonathan Munro (2020) Multi-Modal Domain Adaptation for Fine-Grained Action Recognition

Kazakos E (2021) Slow-Fast Auditory Streams for Audio Recognition

Kazakos, E (2021) With a Little Help from my Temporal Context: Multimodal Egocentric Action Recognition

Lin K Q (2022) Egocentric Video-Language Pretraining

Perrett T (2021) Temporal-Relational CrossTransformers for Few-Shot Action Recognition

Price W (2022) UnweaveNet: Unweaving Activity Stories

Will Price (2020) Play Fair: Frame Attributions in Video Models

Wray M (2021) On Semantic Similarity in Video Retrieval

Key Findings
Impact Summary
Policy Influence
Further Funding
Research Databases and Models
Collaboration
Software and Technical Products
Engagement Activities


Description	(2023) Fine-grained understanding of object transformations has been thoroughly explored through a new set of annotations for Video Object Segmentations and Hand-Object Relations. During the research conducted this year an extensive study of how the two hands interact with the same or different objects during activities as well as tool understanding has been enabled. The findings were not available to the research community prior to this year's progress on the fellowship ==== (2022) New understanding of the multi-modal nature of hand-object interactions has been achieved by this award. How asynchronous audio-visual data can contribute to understanding ongoing actions, in long videos, will change the potential for assistive technologies. New methods and prototypes have been built ==== (2021) An interpretable model on the importance of every frame in the video, to decide on the action taking place within the video, has been published with an interactive dashboard available from: http://play-fair.uksouth.cloudapp.azure.com/?uid=137966&n-frames=10 This work has changed a main assumption in models that sampling more frames always improves the model's performance. Additionally, we have published a large number of models on the dataset EPIC-KITCHENS, available for researchers to compare their methods on the same benchmark.
Exploitation Route	Large-scale benchmark VISOR is now publicly available http://epic-kitchens.github.io/VISOR/ The large-scale dataset EGO4D its now publicly available https://ego4d-data.org/ Currently 5 published benchmarks are available for researchers to compare their methods on a hidden test set. Winners of the first round will be announced in June 2021 alongside a workshop in CVPR 2021: https://epic-kitchens.github.io/2021#challenges
Sectors	Creative Economy,Digital/Communication/Information Technologies (including Software),Leisure Activities, including Sports, Recreation and Tourism


Description	One aspect of this project has now contributed to industrial impact. The first is the recently released massive-scale dataset: Ego4D Read here: https://www.bristol.ac.uk/news/2021/october/ego4d.html
First Year Of Impact	2022
Sector	Digital/Communication/Information Technologies (including Software)
Impact Types	Economic


Description	Consultancy to DeepMind
Geographic Reach	Multiple continents/international
Policy Influence Type	Influenced training of practitioners or researchers


Description	Visual AI: An Open World Interpretable Visual Transformer
Amount	£5,912,096 (GBP)
Funding ID	EP/T028572/1
Organisation	Engineering and Physical Sciences Research Council (EPSRC)
Sector	Public
Country	United Kingdom
Start	12/2020
End	11/2025


Title	EPIC-KITCHENS VISOR
Description	We introduce VISOR, a new dataset of pixel annotations and a benchmark suite for segmenting hands and active objects in egocentric video. VISOR annotates videos from EPIC-KITCHENS, which comes with a new set of challenges not encountered in current video segmentation datasets. Specifically, we need to ensure both short- and long-term consistency of pixel-level annotations as objects undergo transformative interactions, e.g. an onion is peeled, diced and cooked - where we aim to obtain accurate pixel-level annotations of the peel, onion pieces, chopping board, knife, pan, as well as the acting hands. VISOR introduces an annotation pipeline, AI-powered in parts, for scalability and quality. Data published under the Creative Commons Attribution-NonCommerial 4.0 International License.
Type Of Material	Database/Collection of data
Year Produced	2022
Provided To Others?	Yes
Impact	The first dataset for video object segmentations during object interactions where objects are undergoing drastic transformations. This work is testing the limit of previous approaches for tracking or segmentations. An ongoing open challenge is available to the research community.
URL	https://data.bris.ac.uk/data/dataset/2v6cgv1x04ol22qp9rm9x2j6a7/


Title	EPIC-KITCHENS-100
Description	Extended Footage for EPIC-KITCHENS dataset, to 100 hours of footage.
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
Impact	5 open benchmarks are available for researchers to utilise. To-date the dataset was downloaded more than 2.3K times by researchers from 42 different countries.
URL	http://epic-kitchens.github.io/


Title	Frame Attributions in Video Models - Interactive Dashboard
Description	Interactive Dashboard to assess the impact of individual frames in a video on current recognition models
Type Of Material	Data analysis technique
Year Produced	2020
Provided To Others?	Yes
Impact	-
URL	https://play-fair.willprice.dev


Description	Ego4D Consortium Collaboration
Organisation	Carnegie Mellon University
Country	United States
Sector	Academic/University
PI Contribution	Collecting the largest and most diverse dataset of egocentric videos
Collaborator Contribution	The project was inspired by my prior EPIC-KITCHENS project and I am a founding member of this consortium
Impact	Public dataset for research and commercial purposes of 3670 hours collected by 923 participants in 74 cities around the world
Start Year	2021


Description	Ego4D Consortium Collaboration
Organisation	Facebook
Country	United States
Sector	Private
PI Contribution	Collecting the largest and most diverse dataset of egocentric videos
Collaborator Contribution	The project was inspired by my prior EPIC-KITCHENS project and I am a founding member of this consortium
Impact	Public dataset for research and commercial purposes of 3670 hours collected by 923 participants in 74 cities around the world
Start Year	2021


Description	Ego4D Consortium Collaboration
Organisation	Georgia Institute of Technology
Country	United States
Sector	Academic/University
PI Contribution	Collecting the largest and most diverse dataset of egocentric videos
Collaborator Contribution	The project was inspired by my prior EPIC-KITCHENS project and I am a founding member of this consortium
Impact	Public dataset for research and commercial purposes of 3670 hours collected by 923 participants in 74 cities around the world
Start Year	2021


Description	Ego4D Consortium Collaboration
Organisation	Indian Institute of Technology Hyderabad
Country	India
Sector	Academic/University
PI Contribution	Collecting the largest and most diverse dataset of egocentric videos
Collaborator Contribution	The project was inspired by my prior EPIC-KITCHENS project and I am a founding member of this consortium
Impact	Public dataset for research and commercial purposes of 3670 hours collected by 923 participants in 74 cities around the world
Start Year	2021


Description	Ego4D Consortium Collaboration
Organisation	Indiana University Bloomington
Country	United States
Sector	Academic/University
PI Contribution	Collecting the largest and most diverse dataset of egocentric videos
Collaborator Contribution	The project was inspired by my prior EPIC-KITCHENS project and I am a founding member of this consortium
Impact	Public dataset for research and commercial purposes of 3670 hours collected by 923 participants in 74 cities around the world
Start Year	2021


Description	Ego4D Consortium Collaboration
Organisation	King Abdullah University of Science and Technology (KAUST)
Department	KAUST Supercomputing Laboratory
Country	Saudi Arabia
Sector	Academic/University
PI Contribution	Collecting the largest and most diverse dataset of egocentric videos
Collaborator Contribution	The project was inspired by my prior EPIC-KITCHENS project and I am a founding member of this consortium
Impact	Public dataset for research and commercial purposes of 3670 hours collected by 923 participants in 74 cities around the world
Start Year	2021


Description	Ego4D Consortium Collaboration
Organisation	Massachusetts Institute of Technology
Country	United States
Sector	Academic/University
PI Contribution	Collecting the largest and most diverse dataset of egocentric videos
Collaborator Contribution	The project was inspired by my prior EPIC-KITCHENS project and I am a founding member of this consortium
Impact	Public dataset for research and commercial purposes of 3670 hours collected by 923 participants in 74 cities around the world
Start Year	2021


Description	Ego4D Consortium Collaboration
Organisation	National University of Singapore
Country	Singapore
Sector	Academic/University
PI Contribution	Collecting the largest and most diverse dataset of egocentric videos
Collaborator Contribution	The project was inspired by my prior EPIC-KITCHENS project and I am a founding member of this consortium
Impact	Public dataset for research and commercial purposes of 3670 hours collected by 923 participants in 74 cities around the world
Start Year	2021


Description	Ego4D Consortium Collaboration
Organisation	Universidad de Los Andes, Chile
Country	Chile
Sector	Academic/University
PI Contribution	Collecting the largest and most diverse dataset of egocentric videos
Collaborator Contribution	The project was inspired by my prior EPIC-KITCHENS project and I am a founding member of this consortium
Impact	Public dataset for research and commercial purposes of 3670 hours collected by 923 participants in 74 cities around the world
Start Year	2021


Description	Ego4D Consortium Collaboration
Organisation	University of Catania
Country	Italy
Sector	Academic/University
PI Contribution	Collecting the largest and most diverse dataset of egocentric videos
Collaborator Contribution	The project was inspired by my prior EPIC-KITCHENS project and I am a founding member of this consortium
Impact	Public dataset for research and commercial purposes of 3670 hours collected by 923 participants in 74 cities around the world
Start Year	2021


Description	Ego4D Consortium Collaboration
Organisation	University of Minnesota
Country	United States
Sector	Academic/University
PI Contribution	Collecting the largest and most diverse dataset of egocentric videos
Collaborator Contribution	The project was inspired by my prior EPIC-KITCHENS project and I am a founding member of this consortium
Impact	Public dataset for research and commercial purposes of 3670 hours collected by 923 participants in 74 cities around the world
Start Year	2021


Description	Ego4D Consortium Collaboration
Organisation	University of Pennsylvania
Country	United States
Sector	Academic/University
PI Contribution	Collecting the largest and most diverse dataset of egocentric videos
Collaborator Contribution	The project was inspired by my prior EPIC-KITCHENS project and I am a founding member of this consortium
Impact	Public dataset for research and commercial purposes of 3670 hours collected by 923 participants in 74 cities around the world
Start Year	2021


Description	Ego4D Consortium Collaboration
Organisation	University of Tokyo
Country	Japan
Sector	Academic/University
PI Contribution	Collecting the largest and most diverse dataset of egocentric videos
Collaborator Contribution	The project was inspired by my prior EPIC-KITCHENS project and I am a founding member of this consortium
Impact	Public dataset for research and commercial purposes of 3670 hours collected by 923 participants in 74 cities around the world
Start Year	2021


Description	University of Oxford - Audio-visual Fusion for Egocentric Videos
Organisation	University of Oxford
Department	Department of Engineering Science
Country	United Kingdom
Sector	Academic/University
PI Contribution	Shared publication and code base with Prof Zisserman and PhD student Arsha Nagrani
Collaborator Contribution	ICCV 2019 publication and code base
Impact	(2019) E Kazakos, A Nagrani, A Zisserman, D Damen. EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition. International Conference on Computer Vision (ICCV). (2021) E Kazakos, A Nagrani, A Zisserman, D Damen. Slow-Fast Auditory Streams for Audio Recognition. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). (2021) E Kazakos, J Huh, A Nagrani, A Zisserman, D Damen. With a Little Help from my Temporal Context: Multimodal Egocentric Action Recognition. British Machine Vision Conference (BMVC).
Start Year	2018


Description	VISOR Benchmark: VIdeo Segmentations and Object Relations
Organisation	Procter & Gamble
Country	United States
Sector	Private
PI Contribution	Working to collect a new benchmark of pixel-level objects and relations
Collaborator Contribution	Established and leading the collaboration.
Impact	Ongoing - both dataset and research paper expected this summer
Start Year	2021


Description	VISOR Benchmark: VIdeo Segmentations and Object Relations
Organisation	University of Michigan
Country	United States
Sector	Academic/University
PI Contribution	Working to collect a new benchmark of pixel-level objects and relations
Collaborator Contribution	Established and leading the collaboration.
Impact	Ongoing - both dataset and research paper expected this summer
Start Year	2021


Description	VISOR Benchmark: VIdeo Segmentations and Object Relations
Organisation	University of Toronto
Country	Canada
Sector	Academic/University
PI Contribution	Working to collect a new benchmark of pixel-level objects and relations
Collaborator Contribution	Established and leading the collaboration.
Impact	Ongoing - both dataset and research paper expected this summer
Start Year	2021


Title	Auditory Slow-Fast
Description	Recognising actions using auditory signal only
Type Of Technology	Software
Year Produced	2021
Open Source License?	Yes
Impact	Paper won outstanding paper at ICASSP 2021 - 3 papers selected out of 1400 papers. Well-referenced -46 stars. In a followup work by Deepmind [https://arxiv.org/pdf/2111.12124.pdf] this work is referred to as: "We find the Slowfast architecture is good at learning rich repre- sentations required by different domains" extending this work to speech and music audio.
URL	https://github.com/ekazakos/auditory-slow-fast


Title	Explainable Video Understanding
Description	Frame Attributions in Video Models
Type Of Technology	Software
Year Produced	2020
Open Source License?	Yes
Impact	A corresponding interactive dashboard is available for people to experiment with explainable models.
URL	http://play-fair.uksouth.cloudapp.azure.com/?uid=137966&n-frames=10


Title	Multimodal Temporal Context Network (MTCN)
Description	Audio-Visual Recognition of Object Interactions - New Architecture and Modes
Type Of Technology	Software
Year Produced	2021
Open Source License?	Yes
Impact	Used as baseline by other researchers
URL	https://github.com/ekazakos/MTCN


Title	Temporal-Relational Cross-Transformers (TRX)
Description	Software suite for few-shot action recognition with novel cross-transformer architecture and model (CVPR 2021 paper)
Type Of Technology	Software
Year Produced	2021
Open Source License?	Yes
Impact	Code is highly appreciated by the community (62 stars), and already compared to 10 different follow-up methods
URL	https://github.com/tobyperrett/trx


Title	Video Object Segmentation
Description	Software for Video object segmentation and tracking throughout transformations
Type Of Technology	Software
Year Produced	2022
Open Source License?	Yes
Impact	Starter code for using EPIC-KITCHENS VISOR annotations
URL	https://github.com/epic-kitchens/VISOR-VIS


Description	10th EPIC Workshop
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	10th iteration of our international workshop with a round of international challenges and winners announced along with a technical report and a round table.
Year(s) Of Engagement Activity	2022
URL	https://epic-workshop.org/EPIC_CVPR22/


Description	Compositional and Multimodal Perception of Object Interactions
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Keynote at International Challenge on Compositional and Multimodal Perception held alongside European Conference on Computer Vision (ECCV)
Year(s) Of Engagement Activity	2020
URL	https://www.youtube.com/watch?v=zgwg1K77LBs&feature=youtu.be


Description	Human-Centric Object Interactions - A Fine-Grained Perspective from Egocentric Videos
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Keynote at the first international workshop on deep learning for human-centric activity understanding, held alongside International Conference on Pattern Recognition (ICPR)
Year(s) Of Engagement Activity	2020
URL	http://staff.ustc.edu.cn/~tzzhang/dl-hau2020/program.html


Description	Human-Centric Object Interactions - A Fine-Grained Perspective from Egocentric Videos
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Talk at 1st International Workshop On Human-Centric Multimedia Analysis held alongside ACM Multimedia
Year(s) Of Engagement Activity	2020
URL	https://hcma2020.github.io


Description	Naturally Limited Videos of Fine-Grained Actions
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	In this talk, I'll present the case for collecting unscripted video datasets in their native environments, introducing naturally long-tailed datasets. Using such resource, I will present my group's approaches to zero-shot action retrieval [ICCV 2019], few-shot recognition [CVPR 2020], domain adaptation [CVPR 2020, ArXiv] and unsupervised learning [CVPR 2022].
Year(s) Of Engagement Activity	2022
URL	https://sites.google.com/view/l3d-ivu/program


Description	Research Visit: Berkeley AI Research Laboratory
Form Of Engagement Activity	A formal working group, expert panel or dialogue
Part Of Official Scheme?	No
Geographic Reach	Regional
Primary Audience	Professional Practitioners
Results and Impact	Research Visit at BAIR for extending research collaboration and engaging in interesting discussions with researchers in Computer Vision, AI and Robotics
Year(s) Of Engagement Activity	2023


Description	Seventh International Workshop on Egocentric Perception, Interaction and Computing (EPIC)
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	More than 200 researchers attended a full day workshop on egocentric perception, contributing talks, keynotes and poster presentations.
Year(s) Of Engagement Activity	2020
URL	https://eyewear-computing.org/EPIC_ECCV20/


Description	Sixth International Workshop on Egocentric Perception, Interaction and Computing
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	150 researchers from academia and industry attended a virtual international workshop where the latest research on fine-grained action recognition was discussed and presented.
Year(s) Of Engagement Activity	2020
URL	https://eyewear-computing.org/EPIC_CVPR20/


Description	Talk: Learning from Narrated Videos of Everyday Tasks
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Talk on Learning from Narrated Videos of Everyday Tasks at the CVPR2020 workshop on Instructional Videos
Year(s) Of Engagement Activity	2020
URL	https://drive.google.com/file/d/1nMr6wanv9fQFjbJNP9ZjDQBMNVq8kUIT/view


Description	Video Understanding - an Egocentric Perspective
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	Presentations at the 6th Summer School on AI
Year(s) Of Engagement Activity	2022
URL	https://cvit.iiit.ac.in/summerschool2022/


Description	Video Understanding: A Tutorial
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	Participation in the International Computer Vision Summer School
Year(s) Of Engagement Activity	2022
URL	https://iplab.dmi.unict.it/icvss2022/

Abstract

Planned Impact

Organisations

People

ORCID iD

Publications