6-DoF VR Video: Towards Immersive 360-degree VR Video with Motion Parallax

Lead Research Organisation: University of Bath
Department Name: Computer Science

Abstract

The goal of my Innovation Fellowship is to create a new form of immersive 360-degree VR video. We are massive consumers of visual information, and as new forms of visual media and immersive technologies are emerging, I want to work towards my vision of making people feel truly immersed in this new form of video content. Imagine, for instance, what it would be like to experience the International Space Station as if you were there - without leaving the comfort of your own home.

The Problem:
To feel truly immersed in virtual reality, one needs to be able to freely look around within a virtual environment and see it from the viewpoints of one's own eyes. Immersion requires 'freedom of motion' in six degrees-of-freedom ('6-DoF'), so that viewers see the correct views of an environment. As viewers move their heads, the objects they see should move relative to each other, with different speeds depending on their distance to the viewer. This is called motion parallax.
Viewers need to perceive correct motion parallax regardless of where they are (3 DoF) and where they are looking (+3 DoF). Currently, only computer-generated imagery (CGI) fully supports 6-DoF content with motion parallax, but it remains extremely challenging to match the visual realism of the real world with computer graphics models. Viewers therefore either lose photorealism (with CGI) or immersion (with existing VR video). To date, it is not possible to capture or view high-quality 6-DoF VR video of the real world.

My Goal:
Virtual reality is a new kind of medium that requires new ways to author content. My goal is therefore to create a new form of immersive 360-degree VR video that overcomes the limitations of existing 360-degree VR video. This new form of VR content - 6-DoF VR video - will achieve unparalleled realism and immersion by providing freedom of head motion and motion parallax, which is a vital depth cue for the human visual system and entirely missing from existing 360-degree VR video.
Specifically, the aim of this Fellowship is to accurately and comprehensively capture real-world environments, including visual dynamics such as people and moving animals or plants, and to reproduce the captured environments and their dynamics in VR with photographic realism, correct motion parallax and overall depth perception. 6-DoF VR video is a significant virtual reality capability that will be a significant step forward for overall immersion, realism and quality of experience.

My Approach:
To achieve 6-DoF VR video that enables photorealistic exploration of dynamic real environments in 360-degree virtual reality, my group and I will develop novel video-based capture, 3D reconstruction and rendering techniques. We first explore different approaches for capturing static and dynamic 360-degree environments, which are more challenging, including using 360 cameras and multi-camera rigs. We next reconstruct the 3D geometry of the environments from the captured imagery by extending multi-view geometry/photogrammetry techniques to handle dynamic 360-degree environments. Extending image-based rendering to 360-degree environments will enable 6-DoF motion within a photorealistic 360-degree environment with high visual fidelity, and will result in detailed 360-degree environments covering all possible viewing directions. We first target 6-DoF 360-degree VR photographs (i.e. static scenes) and then extend our approach to 6-DoF VR videos.

Project partners:
This Fellowship is supported by the following project partners in the UK and abroad: Foundry (London) is a leading developer of visual effects software for film, video and VR post-production, and ideally suited to advise on industrial impact. REWIND (St Albans) is a leading cutting-edge creative VR production company that is keen to experiment with 6-DoF VR video. Reality7 (Hamburg, Germany) is a start-up working on cinematic VR video.

Planned Impact

Immersive technologies in all their forms, including 360-degree video, VR and AR, are a major opportunity for the UK's world-leading creative industries. This is an industry worth over £84bn and employing 2.8 million people. Over $4bn investment in immersive technologies is predicted, with global market size for immersive technologies reaching $150bn by 2020.

Our project partners, are indicative of the creative pipelines within this industry and highlight how different organisation types might benefit from this research. For example, we anticipate our research to inform hardware designers working to create the next generation of 360-degree capture equipment (such as Reality7); software providers for capture editing (Foundry) and commercial end users (REWIND). This encompasses content creation for VR, games, film and broadcast (Foundry, REWIND).

The ability to capture and reproduce a wide range of real-world environments with high visual fidelity is a significant capability that finds applications in many research disciplines such as education, visualisation, anthropology, cultural heritage.
Furthermore, technologies for the creation of new virtual environments will have widespread impact beyond the creative sector, for example in education, medicine, simulation and training. This is an expanding area of UK productivity which relies upon new tools and technologies to drive competitiveness.

All project partners will engage in intense periods of collaboration. Close working will disseminate not only research results, but best practice and novel techniques across several industries. As a result, my team will be trained into researchers with a strong understanding of industry needs and working practices, providing them with a strong foundation for their future careers.

Publications

10 25 50

publication icon
Bertel T (2018) MegaParallax

publication icon
Bertel T (2020) OmniPhotos casual 360° VR photography in ACM Transactions on Graphics

publication icon
Bertel T (2019) MegaParallax: Casual 360° Panoramas with Motion Parallax. in IEEE transactions on visualization and computer graphics

publication icon
Jang H (2022) Egocentric scene reconstruction from an omnidirectional video in ACM Transactions on Graphics

publication icon
Kim H (2019) Neural style-preserving visual dubbing in ACM Transactions on Graphics

publication icon
Meka A (2021) Real-time Global Illumination Decomposition of Videos in ACM Transactions on Graphics

publication icon
Richardt C (2019) Capture4VR

 
Description This award has supported the development of several new techniques for capturing scenes in 3D for creating new views, for instance for immersive virtual reality photography as well as video. This includes state-of-the-art techniques for 360° optical flow, monocular depth estimation and scene geometry reconstruction. A key contribution enabled by this award is OmniPhotos, which introduces a new type of 360° VR photography that overcomes several limitations of previous approaches to enable fast, casual, and robust capture of immersive real-world VR experiences. This provides a more realistic perception of panoramic environments, which is particularly useful for virtual reality applications. Our approaches are the first to enable casual consumers to capture and view high-quality 360° panoramas with motion parallax. Video experiences are enabled by our MatryODShka technique, which can convert existing 360° 3D video to immersive video with motion parallax in real-time.
Exploitation Route We have made the implementations of our main projects publicly available, so they can be built on by other researchers in academia as well as industry. This lays the foundations for our findings to become productised by (mobile) camera manufacturers and hence find their way into consumer smartphones.
Sectors Creative Economy

Digital/Communication/Information Technologies (including Software)

Education

Leisure Activities

including Sports

Recreation and Tourism

Culture

Heritage

Museums and Collections

URL https://richardt.name/pub
 
Description Research on Neural Style-Preserving Visual Dubbing (2019), and its predecessor Deep Video Portraits (2018), laid the foundations for the field of audio-driven generative animation. Co-authors founded the company Synthesia, which focuses on video synthesis technology and is valued at $1bn as of 2024. The research publications HoloGAN (2019) and BlockGAN (2020) made significant contributions to the field of generative adversarial networks (GANs). This is evidenced by around 500 and 200 citations, respectively, which represent at least 700 publications building on our findings. Our research on 6-DoF images and videos helped shape the field of novel-view synthesis by laying important technical and algorithmic foundations. The field has seen an explosion in popularity and commercial applications since the invention of Neural Radiance Fields in 2020.
First Year Of Impact 2021
Sector Creative Economy,Digital/Communication/Information Technologies (including Software),Culture, Heritage, Museums and Collections
Impact Types Cultural

Economic

 
Title Dataset for "ExMaps: Long-Term Localization in Dynamic Scenes using Exponential Decay" 
Description This is the dataset that accompanies our publication "ExMaps: Long-Term Localization in Dynamic Scenes using Exponential Decay". The data was collected over a period of time using a custom ARCore based android app. It depicts a retail aisle. The images can be found in the sub-folders "only_jpgs". The rest of the ARCore data such as camera poses can be found in "data_all" subfolders for each day data was collected for. The data can be used to run the benchmarks from the original paper. It can also be used to reconstruct points clouds using SFM (structure from motion) software. 
Type Of Material Database/Collection of data 
Year Produced 2021 
Provided To Others? Yes  
Impact None yet. 
URL https://researchdata.bath.ac.uk/986/
 
Title Datasets for "OmniPhotos: Casual 360° VR Photography" 
Description This dataset contains the raw and processed data used to validate the results for the paper. Each subdirectory in the Preprocessed and Unprocessed folders contains a 360° video captured in a circle at different locations in the world, using an Insta360 One X 360° camera on a rotating selfie stick. These subdirectories are named after these locations. Both the proprietary (.insv) video format, as well as a stitched equirectangular (.mp4) video (used by our preprocessing pipeline), have been included. As well as these videos, each subdirectory contains the Input frames, used by our software to display the scene, a Capture directory that contains structure-from-motion data for the given scene, as well as a Config directory, which contains necessary configuration files to run our software. In the Preprocessed directory, the subdirectories also contain a Cache directory, containing optical flow (.floss) files, a CSV file linking the floss files to the relevant images in Input, and .obj files that contain the scene-dependent proxy mesh (deformed sphere) used to render the scene. 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
Impact This dataset accompanies the code and binary release of our work OmniPhotos (Bertel et al., ACM Transactions on Graphics 2020). We have released a demo application that downloads parts of this dataset, so that anyone can try out our new casual 360° VR photography technique. 
URL https://researchdata.bath.ac.uk/id/eprint/948
 
Title Matterport3D 360° RGBD Dataset 
Description This dataset is an extension of Matterport3D that contains data to train and validate high resolution 360 monocular depth estimation models. The data is structured in 90 folders belonging to 90 different buildings storing a total of 9684 samples. Each sample of the dataset consists of 4 files: the RGB equirectangular 360 image (.png), its depth ground-truth (.dpt), a visualisation of the depth ground-truth (.png) and the camera to world extrinsic parameters for the image (.txt) saved as 7 parameters: 3 for the camera center and the last 4 for the XYWZ rotation quaternion. 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
Impact This dataset has already been used by other researchers for benchmarking their methods against the state of the art for high-resolution 360 monocular depth estimation. 
URL https://researchdata.bath.ac.uk/1126/
 
Title DEFERRED NEURAL RENDERING FOR VIEW EXTRAPOLATION 
Description Novel images may be generated using an image generator implemented on a processor. The image generator may receive as input neural features selected from a neural texture atlas. The image generator may also receive as input one or more position guides identifying position information for a plurality of input image pixels. The novel images may be evaluated using an image discriminator to determine a plurality of optimization values by comparing each of the plurality of novel images with a respective one of a corresponding plurality of input images. Each of the novel images may be generated from a respective camera pose relative to an object identical to that of the respective one of the corresponding plurality of input images. The image generator and the neural features may be updated based on the optimization values and stored on a storage device. 
IP Reference WO2022098606 
Protection Patent / Patent application
Year Protection Granted 2022
Licensed Yes
Impact US Patent US11887256B2 granted.
 
Title 360MonoDepth: High-Resolution 360° Monocular Depth Estimation 
Description Code release for 360monodepth. With our framework we achieve monocular depth estimation for high resolution 360° images based on aligning and blending perspective depth maps. 
Type Of Technology New/Improved Technique/Technology 
Year Produced 2022 
Open Source License? Yes  
Impact None yet. 
URL https://github.com/manurare/360monodepth
 
Title 360° Optical Flow using Tangent Images 
Description Implementation of the BMVC 2021 paper "360° Optical Flow using Tangent Images". 
Type Of Technology New/Improved Technique/Technology 
Year Produced 2021 
Open Source License? Yes  
Impact None yet. 
URL https://github.com/yuanmingze/360OpticalFlow-TangentImages
 
Title OmniPhotos: Casual 360° VR Photography 
Description We publicly released the implementation of our publication "OmniPhotos: Casual 360° VR Photography" (Bertel et al., ACM Transactions on Graphics) on GitHub. This includes the preprocessing pipeline for creating new OmniPhotos, and a standalone application for viewing OmniPhotos on a desktop or in a virtual reality headset. We also make a demo available for people to try out. 
Type Of Technology New/Improved Technique/Technology 
Year Produced 2020 
Open Source License? Yes  
Impact None yet. 
URL https://github.com/cr333/OmniPhotos
 
Description BRLSI Talk on Towards more Immersive Panoramas 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact I gave a talk at the Bath Royal Literary and Scientific Institution (BRLSI) on 11 September 2019 on the topic "Towards more Immersive Panoramas". The talk was a ticketed public event and presented some of the research carried out in my team to a lay audience.
Year(s) Of Engagement Activity 2019
URL https://www.brlsi.org/events-proceedings/events/psc-visual-arts-towards-more-immersive-panoramas
 
Description Create a realistic VR experience using a normal 360-degree camera 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Media (as a channel to the public)
Results and Impact Press release: "Create a realistic VR experience using a normal 360-degree camera on 14 December 2020

TechXplore: https://techxplore.com/news/2020-12-realistic-vr-degree-camera.html
ScienceDaily: https://www.sciencedaily.com/releases/2020/12/201214123510.htm
Year(s) Of Engagement Activity 2020
URL https://www.bath.ac.uk/announcements/create-a-realistic-vr-experience-using-a-normal-360-degree-came...
 
Description Dagstuhl workshop on Real VR (July 2019) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact The Dagstuhl seminar brought together researchers and practitioners from academia and industry to discuss the state-of-the-art, current challenges, as well as promising future research directions in Real VR. Real VR, as defined by the seminar participants, pursues two overarching goals: facilitating the import of real-world scenes into head-mounted displays (HMDs), and attaining perceptual realism in HMDs. The vision of Real VR is enabling to experience movies, concerts, even live sports events in HMDs with the sense of immersion of really "being-there", unattainable by today's technologies.

The workshop has led to a book that is currently in press:
Real VR - Immersive Digital Reality
How to Import the Real World into Head-Mounted Immersive Displays
Editors: Magnor, Marcus, Sorkine-Hornung, Alexander (Eds.)
https://www.springer.com/gp/book/9783030418151
Year(s) Of Engagement Activity 2019
URL https://www.dagstuhl.de/en/program/calendar/semhp/?semnr=19272
 
Description DynaVis 2021 workshop at CVPR 2021 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact This half-day workshop at the leading computer vision conference focused on dynamic scene reconstruction, which is motivated by potential applications in film and broadcast production together with the ultimate goal of automatic understanding of real-world scenes from distributed camera networks.

More than 2,000 people have watched the workshop live stream or it recording on YouTube at https://youtu.be/SVSNU6vKwwY.
Year(s) Of Engagement Activity 2021
URL https://dynavis.github.io/2021
 
Description Invited Speaker at the 5th Summer School on AI (August 2021) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Undergraduate students
Results and Impact I gave a talk titled Towards Reconstructing and Editing the Visual World at the 5th Summer School on AI in August 2021. There were about 200 attendees, mostly from Indian universities.
Year(s) Of Engagement Activity 2021
URL https://cvit.iiit.ac.in/summerschool2021/
 
Description Invited talk at IVRPA 360° Virtual Reality Panoramic Photography Conference 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The annual conference of the International Virtual Reality Photography Association (IVRPA) features four days of talks, workshops, panels and exhibitors, to learn from and meet the key players at the forefront of the 360º VR industry, including photographers, software developers, camera hardware manufacturers and media producers and companies. I was invited to present our latest research to this audience of professional practitioner and give them a glimpse of some of the upcoming 360 photography technologies.
Year(s) Of Engagement Activity 2019
URL https://ivrpa.org/news/megaparallax-towards-6-dof-360-panoramas/
 
Description Keynote Speaker at the OmniCV 2021 workshop at CVPR 2021 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact Christian Richardt gave a keynote on the technology developed within this fellowship, which has been viewed more than 1,200 times.

Recording of the live stream: https://youtu.be/xa7Fl2mD4CA?t=19190
Year(s) Of Engagement Activity 2021
URL https://sites.google.com/view/omnicv2021
 
Description Organised a course at SIGGRAPH 2019 on Capture for VR 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Dr Christian Richardt organised a course at the leading computer graphics conference that provides a comprehensive overview of the latest progress in bringing photographs and video into VR. In this half-day course, we took the audience on a journey from VR photography to VR video that began more than a century ago but which has accelerated tremendously in the last five years. We discussed both commercial state-of-the-art systems by Facebook, Google and Microsoft, as well as the latest research techniques and prototypes.
Year(s) Of Engagement Activity 2019
URL https://richardt.name/publications/capture4vr/
 
Description Pint of Science talk: VR Photography from the Victorians to today 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact Christian Richardt was invited to give a talk at "Pint of Science". This is a locally run festival that invites researchers to give talks in a very informal atmosphere, direct to the public in a pub. The casual atmosphere promotes discussion and the opportunity for inclusion of members of the public who may not normally engage with complicated scientific research.
Year(s) Of Engagement Activity 2019
 
Description Press release: "AI could make dodgy lip sync dubbing a thing of the past" 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Media (as a channel to the public)
Results and Impact Press release: "AI could make dodgy lip sync dubbing a thing of the past" on 17 August 2018

"You thought fake news was bad? Deep fakes are where truth goes to die", The Guardian https://www.theguardian.com/technology/2018/nov/12/deep-fakes-fake-news-truth
"Can you tell a fake video from a real one?", ABC News (Australia) https://www.abc.net.au/news/2018-09-27/fake-news-part-one/10308638
"The New AI Tech Turning Heads in Video Manipulation", SingularityHub https://singularityhub.com/2018/09/03/the-new-ai-tech-turning-heads-in-video-manipulation-2/
"Deepfake Videos Are Ge?ng Impossibly Good", Gizmodo https://gizmodo.com/deepfake-videos-are-getting-impossibly-good-1826759848
"This new face-swapping AI is scarily realistic", Fast Company https://www.fastcompany.com/90175648/this-new-face-swapping-deep-fakes-ai-is-scarily-realistic
"AI can transfer human facial movements from one video to another" Engadget, https://www.engadget.com/2018/06/05/ai-transfer-facial-movements-from-one-video-to-another/
"Forget DeepFakes, Deep Video Portraits are way better (and worse)", TechCrunch https://techcrunch.com/2018/06/04/forget-deepfakes-deep-video-portraits-are-way-better-and-worse/
Year(s) Of Engagement Activity 2018
URL https://www.bath.ac.uk/announcements/ai-could-make-dodgy-lip-sync-dubbing-a-thing-of-the-past/
 
Description Swindon Science Festival of Tomorrow 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Public/other audiences
Results and Impact Reuben Lindroos and Christian Richardt presented our OmniPhotos work in the form of live demos via video call to a remote audience. The Festival of Tomorrow aims to support and encourage people of all ages to develop their love of discovery and creativity through the fields of Science, Technology, Engineering, Arts, and Mathematics (STEAM). UKRI was the sole principal partner supporting this virtual festival.
Year(s) Of Engagement Activity 2021
URL https://www.scienceswindon.com/festival-of-tomorrow