Diffusion models for Single-image 3D reconstruction of deformable objects.

3D reconstruction of deformable objects has a wealth of potential applications across various fields, including AR/VR, gaming, and animal behaviour research. However, the creation and animation of these 3D models requires significant effort and expert knowledge of a 3D artist. This presents a significant barrier for creating diverse and abundant 3D environments, as well as generating 3D assets for novel object categories.

The aim of this research would be to create a deep learning model that automatically generates such assets from a single input image. This is a very practical, but challenging task as the model must have an a priori understanding of the possible shapes and appearances of the object. Collecting 3D ground-truth data to learn this prior requires significant effort. Recently, there has been a growing interest to instead learn these priors from data that are widely available: Internet images.

Learning from such data poses many challenges such as the lack of multiview constraints, noisy data, occlusions, lack of diverse viewpoints. Given these challenges, the existing state-of-the-art methods, such as MagicPony [1], are not capable of achieving accurate, high-fidelity results and are limited to specific object categories.

To address the above limitations, I plan to combine the existing approaches [1] with powerful pretrained 2D text-to-image diffusion models [2, 3]. These models have the potential to provide additional priors which would lead to more faithful 3D reconstructions. In addition, they can help the model to generalize on unseen categories without the need to collect additional training images. This work potentially would result in a novel state-of-the-art method.

My research will be in these area - Computer Vision, Deep Learning, AI Robustness and AI Ethics.

[1] Wu, S., Li, R., Jakab, T., Rupprecht, C. and Vedaldi, A., 2022. MagicPony: Learning Articulated 3D Animals in the Wild. arXiv preprint arXiv:2211.12497.

[2] Rombach, R., Blattmann, A., Lorenz, D., Esser, P. and Ommer, B., 2022. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 10684-10695).

[3] Poole, B., Jain, A., Barron, J.T. and Mildenhall, B., 2022. Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988.

