Transfer Learning for Frame-based Activity Recognition

Lead Research Organisation: University of Bristol

Department Name: Computer Science

Abstract

Activity recognition is an important task for home surveillance to monitor the wellbeing of children, pets and elderly as well as for security purposes. Despite an increasing number of video monitors used in households little provide smart monitoring. Therefore, this research will be working towards visual activity recognition to detect and classify actions that can be used to determine the health of an individual.
Most research towards activity recognition has focused on recognising a single high-level action in a video (e.g. playing football or making a sandwich) however in the context of home surveillance frame-based activity recognition provides more meaningful information that provide low-level actions (e.g. put down plate or pick up mug) for each frame as soon as the action occurs.
State of the art methods for activity recognition incorporate Convolutional (CNN) and/or Recurrent Neural Networks (RNN). With these methods, incorporating temporal information across a video greatly improves the accuracy of recognition. Popular approaches to activity recognition include extracting features from both RGB and Optical Flow frames using CNNs used for classification, or to train Long Short Term Memory units (LSTM) on the RGB features extracted by CNNs. These techniques have shown varying success across datasets, particularly for frame based action recognition that provide only a small benefit compared to hand crafted features. This lack of success is partly due to the lack of available training data due to the difficulties in collecting and annotating datasets for actions.
Transfer learning has shown to improve the accuracy of object recognition where the specific environment to test the models on has little training data. By pre-training Neural Networks on larger datasets of objects the models can learn features that are also relevant to the test environment before fine-tuning the model on test environment with little training data available. Transfer learning in the temporal domain, shown to be important for action recognition, has had little research.
This work will focus on designing models to improve the accuracy of frame-based activity recognition of low level actions that transfer well to different environments that lack large amounts of training data.

Student:

Jonathan Munro

Period of Study:

Oct 17 - Jul 21

Funder:

EPSRC

Project Status:

Closed

Project Category:

Studentship

Project Reference:

1941917

Research Topic:

Unclassified

Organisations

People	ORCID iD
Dima Damen (Primary Supervisor)
Jonathan Munro (Student)

Publications

Author Name

Title Publication Date Published

10 25 50

Munro J (2020) Multi-Modal Domain Adaptation for Fine-Grained Action Recognition

Damen D (2018) Scaling Egocentric Vision: The EPIC-KITCHENS Dataset

Damen D (2021) The EPIC-KITCHENS Dataset: Collection, Challenges and Baselines. in IEEE transactions on pattern analysis and machine intelligence

Studentship Projects

Project Reference	Relationship	Related To	Start	End	Student Name
EP/N509619/1			01/10/2016	30/09/2021
1941917	Studentship	EP/N509619/1	01/10/2017	31/07/2021	Jonathan Munro

Key Findings
Impact Summary
Research Databases and Models
Collaboration
Software and Technical Products
Engagement Activities


Description	Mathematical models of fine-grained actions and interactions such as "cutting a tomato"or "tightening a bolt" have a wide range of applications in assistive technologies in homes as well as in industry. Currently, deploying such models in new, unseen environments perform poorly as the model has over-fitted to it's training environment. This work has shown that with only unlabelled data in a target environment, which is cheap and easy to collect, models can be adapted to perform well in the deployed environment .
Exploitation Route	Academia and industry may use the methods in the publications to improve fine-grained action recognition in target environments. Researchers may be inspired by this work to improve domain adpatation works for fine-grained action recognition.
Sectors	Digital/Communication/Information Technologies (including Software)


Description	Research and software made in collaboration with Naver Labs Europe.
First Year Of Impact	2020
Sector	Digital/Communication/Information Technologies (including Software)
Impact Types	Economic


Title	EPIC-KITCHENS-100
Description	Extended Footage for EPIC-KITCHENS dataset, to 100 hours of footage. For automatic annotations, see separate dataset at: https://doi.org/10.5523/bris.3l8eci2oqgst92n14w2yqi5ytu 10/09/2020 N.b. please also see ERRATUM published at https://github.com/epic-kitchens/epic-kitchens-100-annotations/blob/master/README.md#erratum
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
Impact	This provided a substantial extention to the EPIC-KITCHENS dataset with data collected 2 years later. We introduced 6 new challenges to the research community for the dataset to advance video understanding.
URL	https://data.bris.ac.uk/data/dataset/2g1n6qdydwa9u22shpxqzp0t8m/


Title	EPIC-Kitchens
Description	Largest dataset in first-person vision, fully annotated with open challenges for object detection, action recognition and action anticipation
Type Of Material	Database/Collection of data
Year Produced	2018
Provided To Others?	Yes
Impact	Open challenges with 15 different universities and research centres competing for winning the relevant challenges.
URL	http://epic-kitchens.github.io


Description	Domain Adaptation for Action Retrieval
Organisation	NAVER LABS Europe
Country	France
Sector	Public
PI Contribution	Research collaboration including regular meetings and scheduled internship for April 2020. Sceduled intership was remote due to COVID-19.
Collaborator Contribution	Funded internship
Impact	-
Start Year	2020


Description	EPIC-Kitchens Dataset Collection
Organisation	University of Catania
Country	Italy
Sector	Academic/University
PI Contribution	Collaboration to collect the largest cross-location dataset of egocentric non-scripted daily activities
Collaborator Contribution	Effort time of partners (Dr Sanja Fidler and Dr Giovanni Maria Farinella) in addition to time of your research team members (Dr Antonino Furnari and Mr David Acuna)
Impact	ECCV 2018 publication, TPAMI publication under review
Start Year	2017


Description	EPIC-Kitchens Dataset Collection
Organisation	University of Toronto
Country	Canada
Sector	Academic/University
PI Contribution	Collaboration to collect the largest cross-location dataset of egocentric non-scripted daily activities
Collaborator Contribution	Effort time of partners (Dr Sanja Fidler and Dr Giovanni Maria Farinella) in addition to time of your research team members (Dr Antonino Furnari and Mr David Acuna)
Impact	ECCV 2018 publication, TPAMI publication under review
Start Year	2017


Title	Code to reproduce results for the Multi-modal Domain Adaptation for Fine-grained Action Recognition
Description	This contains python code to replicate the results for the publication: Multi-modal Doman Adaptation for Fine-grained Action Recognition.
Type Of Technology	Software
Year Produced	2020
Open Source License?	Yes
Impact	This code will allow users to adapt fine-grained action recongition to new unlablled domains.
URL	https://github.com/jonmun/MM-SADA-code


Description	Oral Presentation for CVPR 2020
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	Oral presentation of my publication Multi-modal Domain Adaptation for Fine-grained Action Recognition to the research community who attended CVPR 2020.
Year(s) Of Engagement Activity	2020
URL	http://cvpr2020.thecvf.com/


Description	PAISS, Artificial Intelligence Summer School
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	University students and industry met for a summer school with talks from leading academics in University and Industry.
Year(s) Of Engagement Activity	2018


Description	Poster at BMVA Symposium on Video Understanding
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	Poster presentation to academics in University and industry.
Year(s) Of Engagement Activity	2019
URL	https://dimadamen.github.io/bmva_symposium_2019/#cfp


Description	Poster at ICCV 2019 Workshop on Multi-modal Video Analysis and Moments in Time Challenge
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	Presented a poster presentation to a number of academics from University and industry.
Year(s) Of Engagement Activity	2019
URL	https://sites.google.com/view/multimodalvideo/home

Abstract

Organisations

People

ORCID iD

Publications

Studentship Projects