DEFORM: Large Scale Shape Analysis of Deformable Models of Humans

Lead Research Organisation: Imperial College London

Department Name: Computing

Abstract

Recently, computer vision is witnessing a paradigm shift. Standard robust features, such as Scale Invariant Feature Transform (SIFT), Histogram of Oriented Gradienst (HoGs), etc., are replaced by learnable filters via the application of Deep Convolutional Neural Networks (DCNNs). Furthermore, for applications (e.g., detection, tracking, recognition, etc.) that involve deformable objects, such as human bodies/faces/hands etc., traditional statistical or physics-based deformable models are combined with DCNNs with very good results. The current progress is made due to the abundance of complex visual data in the Big Data era, spread mostly through the Internet via web services such as Youtube, Flickr, and Google Images. The latter has led to the development of huge databases (such as ImageNet, Microsoft COCO, and 300W, etc.) consisting of visual data captured "in-the wild". Furthermore, the scientific and industrial community has undertaken large-scale annotation tasks. For example, me and my group have made huge efforts to annotate over 30K facial images and 500K video frames with regards to a large number of facial landmarks. The COCO team has annotated thousands of body images with regards to body joints, etc. All the above annotations generally refer to a set of sparse parts of objects and/or their segments, which can be annotated by humans (e.g., through crowd sourcing). In order to make the next step in automatic understanding of a scene in general, and humans and their actions, in particular, the community needs to acquire 3D dense information. Even though the collection of 2D intensity images is now a relatively easy and inexpensive process, the collection of high-resolution 3D scans of deformable objects, such as humans and their (body) parts, still remains an expensive and laborious process. This is the principal reason why very limited efforts have been made in collecting large-scale databases of 3D faces, heads, hands, bodies, etc.

In DEFORM, I propose to perform large-scale collection of high-resolution 4D sequences of humans. Furthermore, I propose new lines of research in order to provide high quality annotations regarding the correspondences between the 2D intensity "in-the-wild" images and the dense 3D structure of deformable objects' shapes and in particular of humans and their parts. Establishing dense 2D-to-3D correspondences can effortlessly solve many image-level tasks such as landmark (part) localisation, dense semantic part segmentation, estimation of deformations (i.e., behaviour), etc.

Planned Impact

The impact of the DEFORM technology will be enormous, as it will enable the creation of important new applications of ICT in basic research, medicine, healthcare/bioengineering, wearable devices, robotics, virtual/augmented reality (VR/AR), digital economy and business, to name a few. More precisely, the impact of DEFORM spans many different fields, including, but not limited to:

- Computer Vision and Machine learning: the algorithms and statistical models developed in DEFORM can revolutionalise automatic analysis and understanding of humans in images and videos.
- VR, graphics and computer games: the statistical models of human face/body/hand shape and texture can be used for creating huge amounts of realistic human models for populating VR worlds and games (currently the cost of creating content for VR applications is one the reasons impeding progress in the field).
- Medicine, anthropology and forensics: the statistical models can be used to create normative statistical distributions for bodies and hands. In order to maximise the impact of the collected data a clinician will be involved in data collection.
-Bio-engineering, wearables and prosthetics: the statistical models of the 3D shape of bodies and hands can be used to design personalised prosthetic parts and wearable devices.

The research programme of DEFORM provide excellent opportunities for public engagement. That is, the database collection in Science Museum London (SML) will give my research team an opportunity to interact with thousands of people and provide them with a clear understanding of the uses and limitations of the technology. My team will also record the views, ideas and concerns of the public regarding the use of technologies relevant to DEFORM. A dynamic website will host the research and data. Its sections and social media (e.g. dedicated twitter feed) will be directed at non-scientists. Team members will regularly contribute to the website's blog, twitter feed and podcast to explain their work. I will exploit outreach opportunities for face-to-face engagement such as the British Science Festival, and the Royal Society Summer Science Exhibition, providing training for researchers as needed

We believe that the technology developed in this project has very high potential for commercialisation. In particular, the developed statistical models of high-resolution 3D bodies, hand and faces could be licensed to industries working in computer vision, graphics, VR, AR and movies post-production. I have already extensive experience in close collaborations with industry and licensing outcomes of research. I will use our previous experiences to work in collaboration with industry to exploit opportunities for commercialisation of the developed technology. The industrial project partners will also help in this direction. To ensure the potential for commercial exploitation I will protect the developed IP where appropriate (e.g. via patents, if and when appropriate, before dissemination to the community).

Funded Value:

£1,350,282

Funded Period:

Jan 19 - Oct 24

Funder:

EPSRC

Project Status:

Active

Project Category:

Fellowship

Project Reference:

EP/S010203/1

Principal Investigator:

Stefanos Zafeiriou

Research Subject:

Info. & commun. Technol. (100%)

Research Topic:

Image & Vision Computing (100%)

Organisations

People	ORCID iD
Stefanos Zafeiriou (Principal Investigator / Fellow)

Publications

Author Name

Title Publication Date Published

|< < 1 2 3 4 5 6 7 > >|

10 25 50

Alexandridis K (2022) Inverse Image Frequency for Long-tailed Image Recognition

Alexandridis KP (2023) Inverse Image Frequency for Long-Tailed Image Recognition. in IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

Babiloni F (2023) Adaptive Spiral Layers for Efficient 3D Representation Learning on Meshes

Bahri M (2021) Shape My Face: Registering 3D Face Scans by Surface-to-Surface Translation in International Journal of Computer Vision

Bahri M (2021) Binary Graph Neural Networks

Bouritsas G (2019) Neural 3D Morphable Models: Spiral Convolutional Networks for 3D Shape Representation Learning and Generation

Chrysos G (2020) RoCGAN: Robust Conditional GAN in International Journal of Computer Vision

Chrysos G (2020) P-nets: Deep Polynomial Neural Networks

Chrysos G (2019) PolyGAN: High-Order Polynomial Generators

Key Findings
Impact Summary
Spin Outs


Description	We have developed the first large-scale statistical models for the human head and face. We have developed methodologies for face recognition that were made publicly available and are in the leaderboard of NIST (https://pages.nist.gov/frvt/html/frvt11.html). We have developed methods for synthesizing faces. We have started a large-scale data collection in the Science Museum of London for hands and bodies. From this collection we have developed large scale statistical models for body and hand which have been presented in top venues. Insightface (https://insightface.ai/) publicly available software for face analysis has been released and curated now by the community. The github repository has received over 20K stars and has been downloaded by over 100K people. The handy hand model was made publicly available (https://github.com/rolpotamias/handy). It is now used by hundreds of researchers worldwide. A large scale head and face models was provided publicly available (https://github.com/steliosploumpis/Universal_Head_3DMM). A tongue model and data was provided publicly available (https://github.com/steliosploumpis/tongue#public-release-tongue-dataset). A generative model of facial texture was provided publicly available (https://github.com/barisgecer/TBGAN).
Exploitation Route	The code has been made publicly available and is now used by many practitioners (20K stars). The paper that describes the work has been already received over 6000K citations even though published 8 months ago. The models made publicly available are used by over 100K users.
Sectors	Digital/Communication/Information Technologies (including Software)
URL	https://insightface.ai/


Description	During the project Imperial College London collaborated with Ariel AI Inc/Ltd within the EPSRC project DEFORM. Ariel AI Inc/Ltd licensed certain technology that was developed in the project. Ariel AI Inc/Ltd has been recently acquired by Snap Inc [1]. Using the developed technology a new kit for 3D body and hand tracking has been released by Snap Inc [2]. [1] https://www.cnbc.com/2021/01/26/snap-acquires-ariel-ai-to-boost-snapchat-augmented-reality-features.html [2] https://www.linkedin.com/posts/iasonas-kokkinos-1a4593157_augmentedreality-snapchat-activity-6771000735577448448-Iyan/
First Year Of Impact	2020
Sector	Creative Economy,Digital/Communication/Information Technologies (including Software)
Impact Types	Societal Economic


Company Name	Ariel AI
Description	Ariel AI develops 3D modelling software for mobile phones.
Year Established	2018
Impact	The company was co-founded by many members of DEFORM project. The company has licensed technology developed in the EPSRC DEFORM project and has been recently acquired by Snap Inc.[1] [1] https://www.cnbc.com/2021/01/26/snap-acquires-ariel-ai-to-boost-snapchat-augmented-reality-features.html
Website	http://www.arielai.com

Abstract

Planned Impact

Organisations

People

ORCID iD

Publications