Tensorial modeling of dynamical systems for gait and activity recognition

Lead Research Organisation: Oxford Brookes University

Department Name: Faculty of Tech, Design and Environment

Abstract

Biometrics such as face, iris, or fingerprint recognition have received growing attention in the last decade, as automatic identification systems for surveillance and security have started to enjoy widespread diffusion. They suffer, however, from two major limitations: they cannot be used at a distance, and require user cooperation, assumptions impractical in real-world scenarios. Interestingly, psychological studies show that people are capable of recognizing their friends just from the way they walk, even when their gait is poorly represented by point light display. Gait has several advantages over other biometrics, as it can be measured at a distance, is difficult to disguise or occlude, can be identified even in low-resolution images, and is non-cooperative in nature. Furthermore, gait and face biometrics can be easily integrated for human identity recognition.Despite its attractive features, though, gait identification is still far from being ready to be deployed in practice. What limits its adoption in real-world scenarios is the influence of a large number of nuisance factors which affect appearance and dynamics of the gait. These include, for instance: walking surface, lighting, camera setup (viewpoint), but also footwear and clothing, objects carried, time of execution, walking speed. Similar issues are shared by other applications of motion classification, such as action and activity recognition. Multilinear or tensorial models, in which a number of (nuisance) factors linearly mix to generate what we observe (in our case the walking gait), have been proven in the recent past to be able to describe the influence of such factors, for instance in the context of face recognition. However, video sequences are more complex objects than single images. We first need to represent video footages in a compact way.Encoding the dynamics of videos by means of some sort of dynamical model has been proven effective in both action recognition and gait identification, in situations in which the dynamics is critically discriminative. Besides, the actions of interest have to be temporally segmented from a video sequence, while actions of sometimes very different lengths might have to be compared. Dynamical representations are very effective in coping with temporal detection and compression, and indeed several researchers have explored the idea of encoding motions via linear, nonlinear, stochastic or chaotic dynamical systems.In this project, therefore, we propose to develop a novel, general framework for the classification of video sequences (with a focus on the walking gait), based on the application of tensorial decomposition techniques to image sequences represented as realizations of suitable dynamical models.The proposed framework will allow us to deal with the issue of the nuisance factors which greatly affect identification from gait and activity recognition in a principled way. The main goal is to push towards a more widespread diffusion of gait ID, as a concrete contribution to enhancing the security levels in the country in the current, uncertain scenarios. With their implications for crime prevention and security, biometrics and surveillance are fast growing business areas, a fact reflected by the increasing number of government-sponsored initiatives in the area in most advanced economies. In addition, the techniques devised in this proposal are extendable to action and identity recognition with immense commercial exploitation potential, ranging from content-based video retrieval from repositories such as YouTube, to HMI, to interactive video games, etcetera.

Planned Impact

Within the private sector companies whose core business is in semi-automatic surveillance or biometrics would be the primary beneficiaries of a robust gait identification system. Most of them focus at the moment on cooperative biometrics such as face or iris recognition: investing in behavioral techniques ahead of the rest of the market could provide them with a significant competitive edge. As large-scale tensor modeling can be useful in many applications, however, companies active in vision or medical imaging, physicians specialized in biometric analysis and sport teams labs could also commercially benefit from this research. We expect policy-makers and government agencies to be attracted by the idea of novel surveillance and biometric systems able to improve the general level of security in the country, and that of sensitive areas in particular. Examples are airport management authorities such as BAA, railway companies, underground and public transport authorities (e.g., Transport for London). The wider public will of course be the ultimate beneficiary of any improvement in the security level in public places and transport. A successful outcome of the project will likely boost the competitiveness of the above mentioned private sector actors, to the economic competitive advantage of the United Kingdom. Companies active in biometrics and surveillance will benefit from privileged access to cutting edge technology in behavioral biometrics. The public sector and the government could potentially see the security level of the country dramatically improved by the adoption of some of the outcomes of this research. The infrastructure is basically already there, as millions of active CCTV cameras make the United Kingdom one of the most surveyed western nations. The UK citizen's quality of life will arguably benefit too, in the longer term, by the deployment of behavioral biometric systems as anti-terrorist measures. Among the indirect benefits to the wider public it is worth to mention the impact on people's health of providing physicians with automatic support in diagnosis based on sophisticated data (e.g., 3D MRI scans) as an application of tensor models. Realistic timescales for such benefits vary with the specific target. To start seeing commercial applications in an initially limited context (e.g. biometric access), an additional three or four years of R&D after the end of the proposed research would be needed, possibly in partnership with a company. It is safer to assume a longer time scale when targeting a wider application to security in public areas. Government or agency support will in this case be crucial. The research assistant working full time on the project will develop a variety of skills he/she could exploit in many possible future employment scenarios. These include: project and website management, Matlab coding and use of sophisticated statistical toolboxes, report and paper writing, professional networking. The proposer belongs to the Oxford Brookes Computer Vision Group, which has very well established channels for technology transfer from research to product and a track record of achieving this. Intellectual Property Rights management and exploitation will be managed by the Research and Business Development Office (RBDO) at Oxford Brookes University. To disseminate our results we plan to arrange seminar days with relevant companies active in the Oxford/London area and beyond. Towards the end of the project, public agencies such as the Transport for London or BAA could be contacted with concrete results to show in order for a proof of concept arrangement to be set up. The Vision Group already enjoys links with HMGCC, the government centre of excellence. More details are given in the impact plan.

Funded Value:

£98,363

Funded Period:

Jun 11 - Jan 14

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/I018719/1

Principal Investigator:

Fabio Cuzzolin

Research Subject:

Info. & commun. Technol. (100%)

Research Topic:

Image & Vision Computing (100%)

Organisations

People	ORCID iD
Fabio Cuzzolin (Principal Investigator)

Publications

Author Name

Title Publication Date Published

10 25 50

Antonucci A (2015) Robust classification of multivariate time series by imprecise hidden Markov models in International Journal of Approximate Reasoning

Cuzzolin F (2014) Learning Pullback HMM Distances. in IEEE transactions on pattern analysis and machine intelligence

Cuzzolin F (2017) Metric learning for Parkinsonian identification from IMU gait measurements in Gait & Posture

De Rosa R (2017) Active Incremental Recognition of Human Activities in a Streaming Context in Pattern Recognition Letters

De Rosa R. (2014) Online action recognition via nonparametric incremental learning in Proceedings of BMVC 2014

Fabio Cuzzolin (Author) (2013) Belief modeling regression for pose estimation

Fabio Cuzzolin (Author) (2012) Learning discriminative space-time actions from weakly labelled videos

Gong W (2018) A Belief-Theoretical Approach to Example-Based Pose Estimation in IEEE Transactions on Fuzzy Systems

Sapienza M (2013) Learning Discriminative Space-Time Action Parts from Weakly Labelled Videos in International Journal of Computer Vision

Wenjuan Gong (Author) (2013) Fisher Tensor Decomposition for Unconstrained Gait Recognition

Key Findings
Impact Summary
Further Funding
Collaboration
Engagement Activities


Description	The project is exploring different routes to the computer analysis of human motion captured by traditional cameras. In particular, we are interested in testing the possibility of recognizing people's identities at a distance, from the way the walk, or from their gesturing style. The problem is made extremely difficult by the presence of numerous factors which affect recognition, such as different camera viewpoints, illumination conditions, clothing. We have also explored the use of new modeling techniques to automatically learn, represent and recognize complex human activities as those captured by videos stored on YouTube and elsewhere. In the first step crucial information needs to be extracted from these videos. Based on those measurements, the system automatically learn which parts of the video are the most "discriminative" (relevant for recognition purposes) and assembles them in a coherent hierarchy able to describe complex activities, or the presence of multiple actors. We have obtain significant preliminary results which show that, via these new modelling techniques, recognition rates considerable improve over the current state of the art. Most significantly, we can also localize the presence of an action in space and time within a given video sequence.
Exploitation Route	The societal impact and market potential of reliable automatic action recognition is enormous. Human-machine interfaces allowing humans to gesturally interact with their laptops, smartphones and even cars are being envisaged right now. ABI Research forecast that 600 million smartphones with gesture recognition features will be shipped in 2017 (http://blog.geoactivegroup.com/2012/07/new-applications-for-gesture.html). EyeSight (http://www.eyesight-tech.com/) already produces software solutions that "allow users to control mobile and portable devices with simple hand gestures". Smart rooms are being imagined, in which people are assisted in their everyday activities by distributed intelligence in their own homes (switching lights when they move through the rooms, interpreting their gestures to replace remote controls and switches, etcetera). At the Consumer Electronics Show in Las Vegas in January, Mercedes-Benz showed an experimental system (DICE) which lets drivers perform basic functions with a hand gesture. Given our rapidly ageing population, semiautomatic assistance to non-autonomous elderly people and remote clinical monitoring are rapidly gaining interest. A hand-gesture recognition system that enables doctors to manipulate digital images during medical procedures has recently been tested at the Washington Hospital (www.whcenter.org). Security personnel can be assisted by algorithms able to signal anomalous events to their attention for surveillance purposes, improving the general level of security of the European Union (and of senstive areas such as airports or train stations in particular) in uncertain times such as ours. In the US, DARPA's Video and Image Retrieval and Analysis Tool (VIRAT) and Persistent Stare Exploitation and Analysis System (PerSEAS) programs may soon enable better warfighter analysis of huge amounts of data generated from multiple types of sensors. Companies are investing in "behavioral" biometrics, based on people's distinctive gait pattern, to achieve a significant competitive edge. Finally, techniques able to efficiently datamine the thousands of videos people post, say, on Facebook or YouTube are in dire need: the potential of a "drag and drop" application, similar to that set up by Google for images, able to retrieve videos with a same "semantic" content is easy to imagine. All these companies are investing huge money on internet video retrieval as the next level in the browsing experience. Truly robust action recognition is likely to contribute enormously (via significant gains in productivity) to boost all these economic sectors in the near future. A number of routes are open for the exploitation of the results of this project. New consoles (e.g. Microsoft's Kinect) have opened up novel directions in the gaming industry: yet, these only track the user's movements, without any real interpretation of their actions which could "spice up" the gaming experience. Intelligent action recognition can render games which merely track the user's body posture out of fashion: however, gesture recognition with kinect is still in its infancy (http://www.youtube.com/watch?v=H1wIQ2o4INo). We are exploring the possibility of a collaboration with Sony Entertainment (which has a successful history of KTP project with our group) along these lines. We are now setting up a partnership with BMW Group, who have a plant in Oxford, involving their funding of a PhD student who will study the introduction of visual recognition technologies in their industrial processes (logistics and production alike). In collaboration with Oxford Brookes University's Movement Science group we are conducting work on the use of machine learning techniques for the early diagnosis of dementia and Parkinson's, recently published on Gait and Posture. More recently, we are exploring the use of multimodal data for the diagnosis of diabetes and the monitoring of the awareness of patients recovering from strokes and heart attacks. The results of this project have led to spin off grant applications to the Leverhulme Trust, EPSRC (Healthcare call), and others.
Sectors	Digital/Communication/Information Technologies (including Software),Education,Healthcare,Leisure Activities, including Sports, Recreation and Tourism,Manufacturing, including Industrial Biotechology,Security and Diplomacy
URL	http://cms.brookes.ac.uk/staff/FabioCuzzolin/projects.html


Description	Our findings have been the basis for follow up research and collaborations in a number of application scenarios: human action recognition in the wild, the early diagnosis of dementia and diabetes via machine learning in healthcare, the improvement of industrial processes by means of visual recognition tools, the design of robotic assistant surgeons for laparoscopy powered by the visual recognition of actions and events in the surgical cavity.
First Year Of Impact	2014
Sector	Healthcare,Manufacturing, including Industrial Biotechology
Impact Types	Societal,Economic


Description	Horizon 2020
Amount	€ 4,315,640 (EUR)
Funding ID	779813
Organisation	European Union
Sector	Public
Country	European Union (EU)
Start	01/2018
End	12/2020


Description	Collaboration with BMW Group
Organisation	Bayerische Motoren Werke (BMW)
Country	Germany
Sector	Academic/University
PI Contribution	This collaboration with BMW Group aims at providing visual recognition technologies which can enable to group to improve their processes, affecting a number of departmens from production, to logistics, paint, body in white. Marko Kosenina of their IT Research department is coordinating the effort from BMW's side.
Collaborator Contribution	BMW is discussion what form their contribution is going to take. Several project topics are under discussion, and consultancy seems to be the most attractive option.
Impact	Negotiations are still in the making.
Start Year	2016


Description	Collaboration with BMW Group
Organisation	University of Oxford
Country	United Kingdom
Sector	Academic/University
PI Contribution	This collaboration with BMW Group aims at providing visual recognition technologies which can enable to group to improve their processes, affecting a number of departmens from production, to logistics, paint, body in white. Marko Kosenina of their IT Research department is coordinating the effort from BMW's side.
Collaborator Contribution	BMW is discussion what form their contribution is going to take. Several project topics are under discussion, and consultancy seems to be the most attractive option.
Impact	Negotiations are still in the making.
Start Year	2016


Description	Collaboration with Oxford University on Action Recognition in the wild
Organisation	University of Oxford
Country	United Kingdom
Sector	Academic/University
PI Contribution	This is a collaboration with Professor Philip Torr concerning the preparation of a new EPSRC proposal on the use of deep learning techniques for online action recognition and future action prediction The contribution of our research team is focused on the deep learning part, and on the detection and tracking of appropriate "action tubes" from the input video sequences. We also provide an HPC on which to run the necessary tests.
Collaborator Contribution	Prof Torr's group contributes with their expertise on scene understanding in order to integrate scene context with motion analysis for a comprehensive understanding of the environment and the events taking place there.
Impact	An EPSRC proposal on the topic is in preparation and will be submitted in early 2016.
Start Year	2013


Description	Collaboration with Oxford University on neural video captioning
Organisation	University of Oxford
Country	United Kingdom
Sector	Academic/University
PI Contribution	This is a collaboration with Professor Thomas Lukasiewicz of Oxford University on novel framework for neural video captioning based on analysing the semantic content of videos. The contribution of our research group is on the extracting of semantic information in the form of a plot or storyline using novel discriminative deformable part based models published in outcomes listed under this project. This can provide an attention model for the neural network to focus on the 'important' parts of a video.
Collaborator Contribution	Oxford University's group led by Professor Lukasiewicz will work on the suitable methodologies to teach the network that different sentences can be semantically equivalent, so that the network is able to describe a new video using its own words.
Impact	This collaboration has led to an outline Project Grant submitted to the Leverhulme Trust. The outline proposal has been accepted on March 3 2017, and the Trust has invited us to submit a full proposal within 9 months. We plan to submit the full proposal by March 21 2017.
Start Year	2016


Description	Collaboration with University of Malta on the multilinear classification of EEG signals for BCI applications
Organisation	University of Malta
Country	Malta
Sector	Academic/University
PI Contribution	The project has led to a collaboration in progress with Professor Kenneth Camilleri of the University of Malta on the possible use of the tensorial models developed here to EEG classification for Brain Computer Interfaces: https://www.um.edu.mt/eng/sce/research/biomedical/braincomputerinterfacing The group in Malta is collecting EEG data on which to apply our tensorial classification methodology, in the perspective of a future EU partnership.
Collaborator Contribution	The partners have extensive expertise in EEG classification. They have captured a significant amount of data in the form suitable to be processed by a multilinear classifier.
Impact	We have recently submitted an Outline Research Grant application to the Leverhulme Trust. The proposal was invited for full submission in January 2016. We will submit it by the March 21 2016 deadline.
Start Year	2014


Description	Collaboration with the Movement Science Group on the diagnosis of dementia using machine learning techniques
Organisation	Oxehealth Ltd
Country	United Kingdom
Sector	Private
PI Contribution	This is a collaboration with Professor Helen Dawes of the Movement Science group on the formulation of a machine learning framework for the early diagnosis of dementia using machine learning techniques, applied to data capture by smartphone devices. Our group contributes with our expertise on advanced machine learning techniques, such as those published in the publication outcomes related to this project. In 2017 the collaboration has been extended to Oxehealth, the award-winning Oxford University spinoff active in healthcare monitoring via video cameras. This is developing into two separate proposal submissions: a KTP via InnovateUK, with Oxehealth, and a new EPSRC Healthcare Technologies grant proposal to be submitted by April 2017.
Collaborator Contribution	The Movement Science group contributes with their 12 year expertise on the topic, their link with a company producing smartphone apps and user groups, and with a huge amount of data already in their possession. Oxehealth will contribute with their infrastructure, data and expertise in the monitoring of health conditions in home environments.
Impact	We did submit a proposal to the SIDD call in April 2014. The proposal has received very high scores (6,6,5 and 3) but, disappointingly, was not selected for funding. As mentioned above, we will submit a bid to the 2017 Healthcare Technologies call, and a separate KTP application with Oxehealth as industrial partner.
Start Year	2012


Description	Collaboration with the Movement Science Group on the diagnosis of dementia using machine learning techniques
Organisation	Oxford Brookes University
Country	United Kingdom
Sector	Academic/University
PI Contribution	This is a collaboration with Professor Helen Dawes of the Movement Science group on the formulation of a machine learning framework for the early diagnosis of dementia using machine learning techniques, applied to data capture by smartphone devices. Our group contributes with our expertise on advanced machine learning techniques, such as those published in the publication outcomes related to this project. In 2017 the collaboration has been extended to Oxehealth, the award-winning Oxford University spinoff active in healthcare monitoring via video cameras. This is developing into two separate proposal submissions: a KTP via InnovateUK, with Oxehealth, and a new EPSRC Healthcare Technologies grant proposal to be submitted by April 2017.
Collaborator Contribution	The Movement Science group contributes with their 12 year expertise on the topic, their link with a company producing smartphone apps and user groups, and with a huge amount of data already in their possession. Oxehealth will contribute with their infrastructure, data and expertise in the monitoring of health conditions in home environments.
Impact	We did submit a proposal to the SIDD call in April 2014. The proposal has received very high scores (6,6,5 and 3) but, disappointingly, was not selected for funding. As mentioned above, we will submit a bid to the 2017 Healthcare Technologies call, and a separate KTP application with Oxehealth as industrial partner.
Start Year	2012


Description	Partnership with CREATEC
Organisation	Createc
Country	United Kingdom
Sector	Private
PI Contribution	This is a collaboration in the making with CREATEC, a spinoff of Oxford University, for the setting up a KTP in the field of action detection for the automatic annotation of sports footage.
Collaborator Contribution	CREATEC provide the business case, the links with customers, and the data necessary to design and test these new methodologies.
Impact	Not yet.
Start Year	2017


Description	9th Ambassadors' Roundtable on Artificial Intelligence
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Policymakers/politicians
Results and Impact	Prof Cuzzolin was invited to speak at the Ambassadors' Roundtable on Artificial Intelligence, held on 27th February 2018 at the Royal Society, in London. The meeting provided a good opportunity to share current thinking from UK and Israeli experts and other professionals in the field, with the strong support of the Ambassador of Israel and the Foreign and Commonwealth Office through the British Ambassador to Israel.
Year(s) Of Engagement Activity	2018


Description	BMW Knowledge Day
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Industry/Business
Results and Impact	BMW Group, with their Cowley plant invited Prof Cuzzolin and Dr Fridolin Wild, a colleague from the Department of Computing, to talk to the group's annual Knowledge Day. The presentation was attended by groups of attendees over several BMW plants in the UK, Germany and elsewhere. The title of the presentation was "Disruptive visual AI for smart factories and cars", and was directly inspired by the group's work in action detection and recognition which was kickstarted by this EPSRC grant.
Year(s) Of Engagement Activity	2017


Description	Publication of an article on International Innovation magazine
Form Of Engagement Activity	A magazine, newsletter or online publication
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Other academic audiences (collaborators, peers etc.)
Results and Impact	Article sparked several requests from editor for our interest in further publication of our results.
Year(s) Of Engagement Activity	2013
URL	http://www.research-europe.com/index.php/international-innovation/


Description	Risk Group LLC: invited podcast on "Advances in Artificial Intelligence: Gesture and Action Recognition"
Form Of Engagement Activity	A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Prof. Fabio Cuzzolin, Head of Artificial Intelligence and Vision at Oxford Brookes University, Oxford, United Kingdom participated in Risk Roundup to discuss ''Advances in Artificial Intelligence: Human and Non-Human Gesture and Action Recognition''. How would we define and describe man-machine or a machine-machine interface and why is it relevant to understanding Artificial Intelligence? Mediator between human (and non-human users) and machines, a man-machine or machine-machine interface, is basically a system that takes care of the entire human-non-human communication process. It is responsible for the delivery of the machine or computer knowledge, functionality and available information, in a way that is compatible with the end-user's communication channels, be it human or non-human. It then translates the user's (human or non-human) actions (user input) into a form (instructions/commands) that is understandable by a machine. When increasingly complex Artificial Intelligence based systems, products and services are rapidly emerging across nations, the necessity for more user friendly man-machine or machine-machine interface is becoming increasingly necessary for their effective utilization, and consequently for the success that they were designed for. Published on Risk Group: https://www.riskgroupllc.com/advances-in-artificial-intelligence-human-and-non-human-gesture-and-action-recognition/
Year(s) Of Engagement Activity	2016
URL	https://www.riskgroupllc.com/news/press-releases/advances-in-artificial-intelligence-human-and-non-h...


Description	Towards machines that can read your mind
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Regional
Primary Audience	Public/other audiences
Results and Impact	Professor Fabio Cuzzolin explored in his Professorial Lecture how intelligent machines can negotiate a complex world, fraught with uncertainty. To enable machines to deal with situations they have never encountered in the safest possible way. Interacting naturally with human beings and their complex environments will only be possible if machines are able to put themselves in people's shoes: to guess their goals, beliefs and intentions - in other words, to read our minds. Fabio explains just how visual artificial intelligence can be provided with this mind-reading ability. Watch it with slides on the Brookes Open Lecture series web site: https://lecturecapture.brookes.ac.uk/Mediasite/Play/9c48ee97ce964dc6a3389836dcacfc0b1d PDF slides are available here.
Year(s) Of Engagement Activity	2018
URL	https://www.facebook.com/oxfordbrookes/videos/10156698398637908/

Abstract

Planned Impact

Organisations

People

ORCID iD

Publications