Transfer Learning for Frame-based Activity Recognition
Lead Research Organisation:
University of Bristol
Department Name: Computer Science
Abstract
Activity recognition is an important task for home surveillance to monitor the wellbeing of children, pets and elderly as well as for security purposes. Despite an increasing number of video monitors used in households little provide smart monitoring. Therefore, this research will be working towards visual activity recognition to detect and classify actions that can be used to determine the health of an individual.
Most research towards activity recognition has focused on recognising a single high-level action in a video (e.g. playing football or making a sandwich) however in the context of home surveillance frame-based activity recognition provides more meaningful information that provide low-level actions (e.g. put down plate or pick up mug) for each frame as soon as the action occurs.
State of the art methods for activity recognition incorporate Convolutional (CNN) and/or Recurrent Neural Networks (RNN). With these methods, incorporating temporal information across a video greatly improves the accuracy of recognition. Popular approaches to activity recognition include extracting features from both RGB and Optical Flow frames using CNNs used for classification, or to train Long Short Term Memory units (LSTM) on the RGB features extracted by CNNs. These techniques have shown varying success across datasets, particularly for frame based action recognition that provide only a small benefit compared to hand crafted features. This lack of success is partly due to the lack of available training data due to the difficulties in collecting and annotating datasets for actions.
Transfer learning has shown to improve the accuracy of object recognition where the specific environment to test the models on has little training data. By pre-training Neural Networks on larger datasets of objects the models can learn features that are also relevant to the test environment before fine-tuning the model on test environment with little training data available. Transfer learning in the temporal domain, shown to be important for action recognition, has had little research.
This work will focus on designing models to improve the accuracy of frame-based activity recognition of low level actions that transfer well to different environments that lack large amounts of training data.
Most research towards activity recognition has focused on recognising a single high-level action in a video (e.g. playing football or making a sandwich) however in the context of home surveillance frame-based activity recognition provides more meaningful information that provide low-level actions (e.g. put down plate or pick up mug) for each frame as soon as the action occurs.
State of the art methods for activity recognition incorporate Convolutional (CNN) and/or Recurrent Neural Networks (RNN). With these methods, incorporating temporal information across a video greatly improves the accuracy of recognition. Popular approaches to activity recognition include extracting features from both RGB and Optical Flow frames using CNNs used for classification, or to train Long Short Term Memory units (LSTM) on the RGB features extracted by CNNs. These techniques have shown varying success across datasets, particularly for frame based action recognition that provide only a small benefit compared to hand crafted features. This lack of success is partly due to the lack of available training data due to the difficulties in collecting and annotating datasets for actions.
Transfer learning has shown to improve the accuracy of object recognition where the specific environment to test the models on has little training data. By pre-training Neural Networks on larger datasets of objects the models can learn features that are also relevant to the test environment before fine-tuning the model on test environment with little training data available. Transfer learning in the temporal domain, shown to be important for action recognition, has had little research.
This work will focus on designing models to improve the accuracy of frame-based activity recognition of low level actions that transfer well to different environments that lack large amounts of training data.
People |
ORCID iD |
Dima Damen (Primary Supervisor) | |
Jonathan Munro (Student) |
Publications
Damen D
(2018)
Scaling Egocentric Vision: The EPIC-KITCHENS Dataset
Damen D
(2021)
The EPIC-KITCHENS Dataset: Collection, Challenges and Baselines.
in IEEE transactions on pattern analysis and machine intelligence
Studentship Projects
Project Reference | Relationship | Related To | Start | End | Student Name |
---|---|---|---|---|---|
EP/N509619/1 | 30/09/2016 | 29/09/2021 | |||
1941917 | Studentship | EP/N509619/1 | 30/09/2017 | 30/07/2021 | Jonathan Munro |
Description | Mathematical models of fine-grained actions and interactions such as "cutting a tomato"or "tightening a bolt" have a wide range of applications in assistive technologies in homes as well as in industry. Currently, deploying such models in new, unseen environments perform poorly as the model has over-fitted to it's training environment. This work has shown that with only unlabelled data in a target environment, which is cheap and easy to collect, models can be adapted to perform well in the deployed environment . |
Exploitation Route | Academia and industry may use the methods in the publications to improve fine-grained action recognition in target environments. Researchers may be inspired by this work to improve domain adpatation works for fine-grained action recognition. |
Sectors | Digital/Communication/Information Technologies (including Software) |
Description | Research and software made in collaboration with Naver Labs Europe. |
First Year Of Impact | 2020 |
Sector | Digital/Communication/Information Technologies (including Software) |
Impact Types | Economic |
Title | EPIC-KITCHENS-100 |
Description | Extended Footage for EPIC-KITCHENS dataset, to 100 hours of footage. For automatic annotations, see separate dataset at: https://doi.org/10.5523/bris.3l8eci2oqgst92n14w2yqi5ytu 10/09/2020 **N.b. please also see ERRATUM published at https://github.com/epic-kitchens/epic-kitchens-100-annotations/blob/master/README.md#erratum** |
Type Of Material | Database/Collection of data |
Year Produced | 2020 |
Provided To Others? | Yes |
Impact | This provided a substantial extention to the EPIC-KITCHENS dataset with data collected 2 years later. We introduced 6 new challenges to the research community for the dataset to advance video understanding. |
URL | https://data.bris.ac.uk/data/dataset/2g1n6qdydwa9u22shpxqzp0t8m/ |
Title | EPIC-Kitchens |
Description | Largest dataset in first-person vision, fully annotated with open challenges for object detection, action recognition and action anticipation |
Type Of Material | Database/Collection of data |
Year Produced | 2018 |
Provided To Others? | Yes |
Impact | Open challenges with 15 different universities and research centres competing for winning the relevant challenges. |
URL | http://epic-kitchens.github.io |
Description | Domain Adaptation for Action Retrieval |
Organisation | NAVER LABS Europe |
Country | France |
Sector | Public |
PI Contribution | Research collaboration including regular meetings and scheduled internship for April 2020. Sceduled intership was remote due to COVID-19. |
Collaborator Contribution | Funded internship |
Impact | - |
Start Year | 2020 |
Description | EPIC-Kitchens Dataset Collection |
Organisation | University of Catania |
Country | Italy |
Sector | Academic/University |
PI Contribution | Collaboration to collect the largest cross-location dataset of egocentric non-scripted daily activities |
Collaborator Contribution | Effort time of partners (Dr Sanja Fidler and Dr Giovanni Maria Farinella) in addition to time of your research team members (Dr Antonino Furnari and Mr David Acuna) |
Impact | ECCV 2018 publication, TPAMI publication under review |
Start Year | 2017 |
Description | EPIC-Kitchens Dataset Collection |
Organisation | University of Toronto |
Country | Canada |
Sector | Academic/University |
PI Contribution | Collaboration to collect the largest cross-location dataset of egocentric non-scripted daily activities |
Collaborator Contribution | Effort time of partners (Dr Sanja Fidler and Dr Giovanni Maria Farinella) in addition to time of your research team members (Dr Antonino Furnari and Mr David Acuna) |
Impact | ECCV 2018 publication, TPAMI publication under review |
Start Year | 2017 |
Title | Code to reproduce results for the Multi-modal Domain Adaptation for Fine-grained Action Recognition |
Description | This contains python code to replicate the results for the publication: Multi-modal Doman Adaptation for Fine-grained Action Recognition. |
Type Of Technology | Software |
Year Produced | 2020 |
Open Source License? | Yes |
Impact | This code will allow users to adapt fine-grained action recongition to new unlablled domains. |
URL | https://github.com/jonmun/MM-SADA-code |
Description | Oral Presentation for CVPR 2020 |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Postgraduate students |
Results and Impact | Oral presentation of my publication Multi-modal Domain Adaptation for Fine-grained Action Recognition to the research community who attended CVPR 2020. |
Year(s) Of Engagement Activity | 2020 |
URL | http://cvpr2020.thecvf.com/ |
Description | PAISS, Artificial Intelligence Summer School |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Postgraduate students |
Results and Impact | University students and industry met for a summer school with talks from leading academics in University and Industry. |
Year(s) Of Engagement Activity | 2018 |
Description | Poster at BMVA Symposium on Video Understanding |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Postgraduate students |
Results and Impact | Poster presentation to academics in University and industry. |
Year(s) Of Engagement Activity | 2019 |
URL | https://dimadamen.github.io/bmva_symposium_2019/#cfp |
Description | Poster at ICCV 2019 Workshop on Multi-modal Video Analysis and Moments in Time Challenge |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Postgraduate students |
Results and Impact | Presented a poster presentation to a number of academics from University and industry. |
Year(s) Of Engagement Activity | 2019 |
URL | https://sites.google.com/view/multimodalvideo/home |