DEFORM: Large Scale Shape Analysis of Deformable Models of Humans

Lead Research Organisation: Imperial College London
Department Name: Computing

Abstract

Recently, computer vision is witnessing a paradigm shift. Standard robust features, such as Scale Invariant Feature Transform (SIFT), Histogram of Oriented Gradienst (HoGs), etc., are replaced by learnable filters via the application of Deep Convolutional Neural Networks (DCNNs). Furthermore, for applications (e.g., detection, tracking, recognition, etc.) that involve deformable objects, such as human bodies/faces/hands etc., traditional statistical or physics-based deformable models are combined with DCNNs with very good results. The current progress is made due to the abundance of complex visual data in the Big Data era, spread mostly through the Internet via web services such as Youtube, Flickr, and Google Images. The latter has led to the development of huge databases (such as ImageNet, Microsoft COCO, and 300W, etc.) consisting of visual data captured "in-the wild". Furthermore, the scientific and industrial community has undertaken large-scale annotation tasks. For example, me and my group have made huge efforts to annotate over 30K facial images and 500K video frames with regards to a large number of facial landmarks. The COCO team has annotated thousands of body images with regards to body joints, etc. All the above annotations generally refer to a set of sparse parts of objects and/or their segments, which can be annotated by humans (e.g., through crowd sourcing). In order to make the next step in automatic understanding of a scene in general, and humans and their actions, in particular, the community needs to acquire 3D dense information. Even though the collection of 2D intensity images is now a relatively easy and inexpensive process, the collection of high-resolution 3D scans of deformable objects, such as humans and their (body) parts, still remains an expensive and laborious process. This is the principal reason why very limited efforts have been made in collecting large-scale databases of 3D faces, heads, hands, bodies, etc.

In DEFORM, I propose to perform large-scale collection of high-resolution 4D sequences of humans. Furthermore, I propose new lines of research in order to provide high quality annotations regarding the correspondences between the 2D intensity "in-the-wild" images and the dense 3D structure of deformable objects' shapes and in particular of humans and their parts. Establishing dense 2D-to-3D correspondences can effortlessly solve many image-level tasks such as landmark (part) localisation, dense semantic part segmentation, estimation of deformations (i.e., behaviour), etc.

Planned Impact

The impact of the DEFORM technology will be enormous, as it will enable the creation of important new applications of ICT in basic research, medicine, healthcare/bioengineering, wearable devices, robotics, virtual/augmented reality (VR/AR), digital economy and business, to name a few. More precisely, the impact of DEFORM spans many different fields, including, but not limited to:

- Computer Vision and Machine learning: the algorithms and statistical models developed in DEFORM can revolutionalise automatic analysis and understanding of humans in images and videos.
- VR, graphics and computer games: the statistical models of human face/body/hand shape and texture can be used for creating huge amounts of realistic human models for populating VR worlds and games (currently the cost of creating content for VR applications is one the reasons impeding progress in the field).
- Medicine, anthropology and forensics: the statistical models can be used to create normative statistical distributions for bodies and hands. In order to maximise the impact of the collected data a clinician will be involved in data collection.
-Bio-engineering, wearables and prosthetics: the statistical models of the 3D shape of bodies and hands can be used to design personalised prosthetic parts and wearable devices.

The research programme of DEFORM provide excellent opportunities for public engagement. That is, the database collection in Science Museum London (SML) will give my research team an opportunity to interact with thousands of people and provide them with a clear understanding of the uses and limitations of the technology. My team will also record the views, ideas and concerns of the public regarding the use of technologies relevant to DEFORM. A dynamic website will host the research and data. Its sections and social media (e.g. dedicated twitter feed) will be directed at non-scientists. Team members will regularly contribute to the website's blog, twitter feed and podcast to explain their work. I will exploit outreach opportunities for face-to-face engagement such as the British Science Festival, and the Royal Society Summer Science Exhibition, providing training for researchers as needed

We believe that the technology developed in this project has very high potential for commercialisation. In particular, the developed statistical models of high-resolution 3D bodies, hand and faces could be licensed to industries working in computer vision, graphics, VR, AR and movies post-production. I have already extensive experience in close collaborations with industry and licensing outcomes of research. I will use our previous experiences to work in collaboration with industry to exploit opportunities for commercialisation of the developed technology. The industrial project partners will also help in this direction. To ensure the potential for commercial exploitation I will protect the developed IP where appropriate (e.g. via patents, if and when appropriate, before dissemination to the community).
 
Description We have developed the first large-scale statistical models for the human head and face. We have developed methodologies for face recognition that were made publicly available and are in the leaderboard of NIST (https://pages.nist.gov/frvt/html/frvt11.html). We have developed methods for synthesizing faces. We have started a large-scale data collection in the Science Museum of London for hands and bodies. From this collection we have developed large scale statistical models for body and hand which have been presented in top venues.
Exploitation Route The code has been made publicly available and is now used by many practitioners. The paper that describes the work has been already received over 500 citations even though published 8 months ago.
Sectors Digital/Communication/Information Technologies (including Software)

URL https://github.com/deepinsight/insightface
 
Description During the project Imperial College London collaborated with Ariel AI Inc/Ltd within the EPSRC project DEFORM. Ariel AI Inc/Ltd licensed certain technology that was developed in the project. Ariel AI Inc/Ltd has been recently acquired by Snap Inc [1]. Using the developed technology a new kit for 3D body and hand tracking has been released by Snap Inc [2]. [1] https://www.cnbc.com/2021/01/26/snap-acquires-ariel-ai-to-boost-snapchat-augmented-reality-features.html [2] https://www.linkedin.com/posts/iasonas-kokkinos-1a4593157_augmentedreality-snapchat-activity-6771000735577448448-Iyan/
First Year Of Impact 2020
Sector Creative Economy,Digital/Communication/Information Technologies (including Software)
Impact Types Societal,Economic

 
Company Name ARIEL AI LTD 
Description Powering the next generation of consumer experiences on mobile devices through pixel-accurate, real-time 3D Human Perception and Reconstruction. 
Year Established 2018 
Impact The company was co-founded by many members of DEFORM project. The company has licensed technology developed in the EPSRC DEFORM project and has been recently acquired by Snap Inc.[1] [1] https://www.cnbc.com/2021/01/26/snap-acquires-ariel-ai-to-boost-snapchat-augmented-reality-features.html
Website https://www.arielai.com/