DEFORM: Large Scale Shape Analysis of Deformable Models of Humans

Lead Research Organisation: Imperial College London

Department Name: Computing

Abstract

Recently, computer vision is witnessing a paradigm shift. Standard robust features, such as Scale Invariant Feature Transform (SIFT), Histogram of Oriented Gradienst (HoGs), etc., are replaced by learnable filters via the application of Deep Convolutional Neural Networks (DCNNs). Furthermore, for applications (e.g., detection, tracking, recognition, etc.) that involve deformable objects, such as human bodies/faces/hands etc., traditional statistical or physics-based deformable models are combined with DCNNs with very good results. The current progress is made due to the abundance of complex visual data in the Big Data era, spread mostly through the Internet via web services such as Youtube, Flickr, and Google Images. The latter has led to the development of huge databases (such as ImageNet, Microsoft COCO, and 300W, etc.) consisting of visual data captured "in-the wild". Furthermore, the scientific and industrial community has undertaken large-scale annotation tasks. For example, me and my group have made huge efforts to annotate over 30K facial images and 500K video frames with regards to a large number of facial landmarks. The COCO team has annotated thousands of body images with regards to body joints, etc. All the above annotations generally refer to a set of sparse parts of objects and/or their segments, which can be annotated by humans (e.g., through crowd sourcing). In order to make the next step in automatic understanding of a scene in general, and humans and their actions, in particular, the community needs to acquire 3D dense information. Even though the collection of 2D intensity images is now a relatively easy and inexpensive process, the collection of high-resolution 3D scans of deformable objects, such as humans and their (body) parts, still remains an expensive and laborious process. This is the principal reason why very limited efforts have been made in collecting large-scale databases of 3D faces, heads, hands, bodies, etc.

In DEFORM, I propose to perform large-scale collection of high-resolution 4D sequences of humans. Furthermore, I propose new lines of research in order to provide high quality annotations regarding the correspondences between the 2D intensity "in-the-wild" images and the dense 3D structure of deformable objects' shapes and in particular of humans and their parts. Establishing dense 2D-to-3D correspondences can effortlessly solve many image-level tasks such as landmark (part) localisation, dense semantic part segmentation, estimation of deformations (i.e., behaviour), etc.

Planned Impact

The impact of the DEFORM technology will be enormous, as it will enable the creation of important new applications of ICT in basic research, medicine, healthcare/bioengineering, wearable devices, robotics, virtual/augmented reality (VR/AR), digital economy and business, to name a few. More precisely, the impact of DEFORM spans many different fields, including, but not limited to:

- Computer Vision and Machine learning: the algorithms and statistical models developed in DEFORM can revolutionalise automatic analysis and understanding of humans in images and videos.
- VR, graphics and computer games: the statistical models of human face/body/hand shape and texture can be used for creating huge amounts of realistic human models for populating VR worlds and games (currently the cost of creating content for VR applications is one the reasons impeding progress in the field).
- Medicine, anthropology and forensics: the statistical models can be used to create normative statistical distributions for bodies and hands. In order to maximise the impact of the collected data a clinician will be involved in data collection.
-Bio-engineering, wearables and prosthetics: the statistical models of the 3D shape of bodies and hands can be used to design personalised prosthetic parts and wearable devices.

The research programme of DEFORM provide excellent opportunities for public engagement. That is, the database collection in Science Museum London (SML) will give my research team an opportunity to interact with thousands of people and provide them with a clear understanding of the uses and limitations of the technology. My team will also record the views, ideas and concerns of the public regarding the use of technologies relevant to DEFORM. A dynamic website will host the research and data. Its sections and social media (e.g. dedicated twitter feed) will be directed at non-scientists. Team members will regularly contribute to the website's blog, twitter feed and podcast to explain their work. I will exploit outreach opportunities for face-to-face engagement such as the British Science Festival, and the Royal Society Summer Science Exhibition, providing training for researchers as needed

We believe that the technology developed in this project has very high potential for commercialisation. In particular, the developed statistical models of high-resolution 3D bodies, hand and faces could be licensed to industries working in computer vision, graphics, VR, AR and movies post-production. I have already extensive experience in close collaborations with industry and licensing outcomes of research. I will use our previous experiences to work in collaboration with industry to exploit opportunities for commercialisation of the developed technology. The industrial project partners will also help in this direction. To ensure the potential for commercial exploitation I will protect the developed IP where appropriate (e.g. via patents, if and when appropriate, before dissemination to the community).

Funded Value:

£1,350,282

Funded Period:

Jan 19 - Oct 24

Funder:

EPSRC

Project Status:

Active

Project Category:

Fellowship

Project Reference:

EP/S010203/1

Principal Investigator:

Stefanos Zafeiriou

Research Subject:

Info. & commun. Technol. (100%)

Research Topic:

Image & Vision Computing (100%)

Organisations

People	ORCID iD
Stefanos Zafeiriou (Principal Investigator / Fellow)

Publications

Author Name

Title Publication Date Published

|< < 1 2 3 4 > >|

10 25 50

Zhou Y (2019) Dense 3D Face Decoding Over 2500FPS: Joint Texture & Shape Convolutional Mesh Decoders

Xue N (2019) Side Information for Face Completion: A Robust PCA Approach. in IEEE transactions on pattern analysis and machine intelligence

Wang M (2019) An Adversarial Neuro-Tensorial Approach for Learning Disentangled Representations in International Journal of Computer Vision

Wang H (2020) Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part V

Wang H (2019) Reparameterising 3D Statistical Shape Models

Ververas E (2020) SliderGAN: Synthesizing Expressive Face Images by Sliding 3D Blendshape Parameters in International Journal of Computer Vision

Tzirakis P (2019) Time-series Clustering with Jointly Learning Deep Representations, Clusters and Temporal Boundaries

Potamias R (2020) Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part XXIX

Potamias R (2022) Neural Mesh Simplification

Potamias R (2023) Handy: Towards a High Fidelity 3D Hand Shape and Appearance Model

Key Findings
Impact Summary
Spin Outs


Description	We have developed the first large-scale statistical models for the human head and face. We have developed methodologies for face recognition that were made publicly available and are in the leaderboard of NIST (https://pages.nist.gov/frvt/html/frvt11.html). We have developed methods for synthesizing faces. We have started a large-scale data collection in the Science Museum of London for hands and bodies. From this collection we have developed large scale statistical models for body and hand which have been presented in top venues.
Exploitation Route	The code has been made publicly available and is now used by many practitioners. The paper that describes the work has been already received over 500 citations even though published 8 months ago.
Sectors	Digital/Communication/Information Technologies (including Software)
URL	https://github.com/deepinsight/insightface


Description	During the project Imperial College London collaborated with Ariel AI Inc/Ltd within the EPSRC project DEFORM. Ariel AI Inc/Ltd licensed certain technology that was developed in the project. Ariel AI Inc/Ltd has been recently acquired by Snap Inc [1]. Using the developed technology a new kit for 3D body and hand tracking has been released by Snap Inc [2]. [1] https://www.cnbc.com/2021/01/26/snap-acquires-ariel-ai-to-boost-snapchat-augmented-reality-features.html [2] https://www.linkedin.com/posts/iasonas-kokkinos-1a4593157_augmentedreality-snapchat-activity-6771000735577448448-Iyan/
First Year Of Impact	2020
Sector	Creative Economy,Digital/Communication/Information Technologies (including Software)
Impact Types	Societal,Economic


Company Name	ARIEL AI LTD
Description	Powering the next generation of consumer experiences on mobile devices through pixel-accurate, real-time 3D Human Perception and Reconstruction.
Year Established	2018
Impact	The company was co-founded by many members of DEFORM project. The company has licensed technology developed in the EPSRC DEFORM project and has been recently acquired by Snap Inc.[1] [1] https://www.cnbc.com/2021/01/26/snap-acquires-ariel-ai-to-boost-snapchat-augmented-reality-features.html
Website	https://www.arielai.com/

Abstract

Planned Impact

Organisations

People

ORCID iD

Publications