From Video to Virtual: Object indexing in 3D scenes from videos
Lead Research Organisation:
University of Oxford
Abstract
Brief description of the context of the research including potential impact:
It is difficult or sometimes impossible to obtain a sufficiently large amounts of labelled or even
raw data to train 3D perception algorithms. Videos, on the other hand, have emerged as a
pragmatic solution to this challenge as they serve as a rich source of information, offering
the potential to effortlessly capture scenes and objects in their full 3D complexity. In this context,
the focus of this research is the pivotal task of automated extraction and indexing of objects
within 3D scenes, leveraging the convenience and ubiquity of video data. This project aims at
building algorithms that can reason about high-level concepts such as "objectness" to discover
objects in a scene, while also being able to reconstruct these objects and predict their precise
layout in the 3D model of the scene. If successful, our system can be used to generate virtual
models of 3D scenes from monocular videos, which could be used to create realistic augmented
reality experiences. Furthermore, this system also has direct real-world applications -- e.g., to
keep track of inventory in retail systems or to help with stock management in warehouses.
Aims and Objectives
1. The primary objective is to develop a system that can automatically identify objects in a
3D scene and uniquely index them.
2. In addition to uniquely indexing the objects, our system should also be able to accurately
reconstruct the objects and predict their precise layout in the 3D model of the scene.
3. We aim to train this system in an unsupervised manner and without explicitly using 3D
data or annotations as they are expensive to obtain. We will instead extract information
from cheaper sources such as videos capturing the scene or images of the scene from
multiple viewpoints.
4. The project will also explore using text as another modality to provide weak supervision
about the semantic properties of the scene or objects. Another possibility is to work on
dynamic scenes where motion can be a useful cue for identifying objects in the scene.
Novelty of the research methodology:
1. Identifying and cataloguing objects in a scene is a challenging problem because we do
not possess a predefined list of objects, hence we will develop algorithms to train a
system that can reason about "objectness" to discover objects in the scene.
2. Another difficulty comes from the cataloguing aspect as we also need to verify if a newly
found object already corresponds to a catalogued object in the scene(s). Hence, our
research will also focus on 3D shape-aware instance retrieval algorithms that can be
efficiently used for cataloguing.
3. Since 3D annotations are difficult to obtain, we will explore ways to incorporate other
sources of supervision such as text or motion cues from videos to train our models.
Alignment to EPSRC's strategies and research areas (which EPSRC research area the
project relates to)
Image and vision computing (EPSRC link)
Any companies or collaborators involved: The advisors for this DPhil project will be Andrea
Vedaldi, Andrew Zisserman, João Henriques and Iro Laina from the Visual Geometry Group at
University of Oxford
It is difficult or sometimes impossible to obtain a sufficiently large amounts of labelled or even
raw data to train 3D perception algorithms. Videos, on the other hand, have emerged as a
pragmatic solution to this challenge as they serve as a rich source of information, offering
the potential to effortlessly capture scenes and objects in their full 3D complexity. In this context,
the focus of this research is the pivotal task of automated extraction and indexing of objects
within 3D scenes, leveraging the convenience and ubiquity of video data. This project aims at
building algorithms that can reason about high-level concepts such as "objectness" to discover
objects in a scene, while also being able to reconstruct these objects and predict their precise
layout in the 3D model of the scene. If successful, our system can be used to generate virtual
models of 3D scenes from monocular videos, which could be used to create realistic augmented
reality experiences. Furthermore, this system also has direct real-world applications -- e.g., to
keep track of inventory in retail systems or to help with stock management in warehouses.
Aims and Objectives
1. The primary objective is to develop a system that can automatically identify objects in a
3D scene and uniquely index them.
2. In addition to uniquely indexing the objects, our system should also be able to accurately
reconstruct the objects and predict their precise layout in the 3D model of the scene.
3. We aim to train this system in an unsupervised manner and without explicitly using 3D
data or annotations as they are expensive to obtain. We will instead extract information
from cheaper sources such as videos capturing the scene or images of the scene from
multiple viewpoints.
4. The project will also explore using text as another modality to provide weak supervision
about the semantic properties of the scene or objects. Another possibility is to work on
dynamic scenes where motion can be a useful cue for identifying objects in the scene.
Novelty of the research methodology:
1. Identifying and cataloguing objects in a scene is a challenging problem because we do
not possess a predefined list of objects, hence we will develop algorithms to train a
system that can reason about "objectness" to discover objects in the scene.
2. Another difficulty comes from the cataloguing aspect as we also need to verify if a newly
found object already corresponds to a catalogued object in the scene(s). Hence, our
research will also focus on 3D shape-aware instance retrieval algorithms that can be
efficiently used for cataloguing.
3. Since 3D annotations are difficult to obtain, we will explore ways to incorporate other
sources of supervision such as text or motion cues from videos to train our models.
Alignment to EPSRC's strategies and research areas (which EPSRC research area the
project relates to)
Image and vision computing (EPSRC link)
Any companies or collaborators involved: The advisors for this DPhil project will be Andrea
Vedaldi, Andrew Zisserman, João Henriques and Iro Laina from the Visual Geometry Group at
University of Oxford
Planned Impact
AIMS's impact will be felt across domains of acute need within the UK. We expect AIMS to benefit: UK economic performance, through start-up creation; existing UK firms, both through research and addressing skills needs; UK health, by contributing to cancer research, and quality of life, through the delivery of autonomous vehicles; UK public understanding of and policy related to the transformational societal change engendered by autonomous systems.
Autonomous systems are acknowledged by essentially all stakeholders as important to the future UK economy. PwC claim that there is a £232 billion opportunity offered by AI to the UK economy by 2030 (10% of GDP). AIMS has an excellent track record of leadership in spinout creation, and will continue to foster the commercial projects of its students, through the provision of training in IP, licensing and entrepreneurship. With the help of Oxford Science Innovation (investment fund) and Oxford University Innovation (technology transfer office), student projects will be evaluated for commercial potential.
AIMS will also concretely contribute to UK economic competitiveness by meeting the UK's needs for experts in autonomous systems. To meet this need, AIMS will train cohorts with advanced skills that span the breadth of AI, machine learning, robotics, verification and sensor systems. The relevance of the training to the needs of industry will be ensured by the industrial partnerships at the heart of AIMS. These partnerships will also ensure that AIMS will produce research that directly targets UK industrial needs. Our partners span a wide range of UK sectors, including energy, transport, infrastructure, factory automation, finance, health, space and other extreme environments.
The autonomous systems that AIMS will enable also offer the prospect of epochal change in the UK's quality of life and health. As put by former Digital Secretary Matt Hancock, "whether it's improving travel, making banking easier or helping people live longer, AI is already revolutionising our economy and our society." AIMS will help to realise this potential through its delivery of trained experts and targeted research. In particular, two of the four Grand Challenge missions in the UK Industrial Strategy highlight the positive societal impact underpinned by autonomous systems. The "Artificial Intelligence and data" challenge has as its mission to "Use data, Artificial Intelligence and innovation to transform the prevention, early diagnosis and treatment of chronic diseases by 2030". To this mission, AIMS will contribute the outputs of its research pillar on cancer research. The "Future of mobility" challenge highlights the importance the autonomous vehicles will have in making transport "safer, cleaner and better connected." To this challenge, AIMS offers the world-leading research of its robotic systems research pillar.
AIMS will further promote the positive realisation of autonomous technologies through direct influence on policy. The world-leading academics amongst AIMS's supervisory pool are well-connected to policy formation e.g. Prof Osborne serving as a Commissioner on the Independent Commission on the Future of Work. Further, Dr Dan Mawson, Head of the Economy Unit; Economy and Strategic Analysis Team at BEIS will serve as an advisor to AIMS, ensuring bidirectional influence between policy objectives and AIMS research and training.
Broad understanding of autonomous systems is crucial in making a society robust to the transformations they will engender. AIMS will foster such understanding through its provision of opportunities for AIMS students to directly engage with the public. Given the broad societal importance of getting autonomous systems right, AIMS will deliver core training on the ethical, governance, economic and societal implications of autonomous systems.
Autonomous systems are acknowledged by essentially all stakeholders as important to the future UK economy. PwC claim that there is a £232 billion opportunity offered by AI to the UK economy by 2030 (10% of GDP). AIMS has an excellent track record of leadership in spinout creation, and will continue to foster the commercial projects of its students, through the provision of training in IP, licensing and entrepreneurship. With the help of Oxford Science Innovation (investment fund) and Oxford University Innovation (technology transfer office), student projects will be evaluated for commercial potential.
AIMS will also concretely contribute to UK economic competitiveness by meeting the UK's needs for experts in autonomous systems. To meet this need, AIMS will train cohorts with advanced skills that span the breadth of AI, machine learning, robotics, verification and sensor systems. The relevance of the training to the needs of industry will be ensured by the industrial partnerships at the heart of AIMS. These partnerships will also ensure that AIMS will produce research that directly targets UK industrial needs. Our partners span a wide range of UK sectors, including energy, transport, infrastructure, factory automation, finance, health, space and other extreme environments.
The autonomous systems that AIMS will enable also offer the prospect of epochal change in the UK's quality of life and health. As put by former Digital Secretary Matt Hancock, "whether it's improving travel, making banking easier or helping people live longer, AI is already revolutionising our economy and our society." AIMS will help to realise this potential through its delivery of trained experts and targeted research. In particular, two of the four Grand Challenge missions in the UK Industrial Strategy highlight the positive societal impact underpinned by autonomous systems. The "Artificial Intelligence and data" challenge has as its mission to "Use data, Artificial Intelligence and innovation to transform the prevention, early diagnosis and treatment of chronic diseases by 2030". To this mission, AIMS will contribute the outputs of its research pillar on cancer research. The "Future of mobility" challenge highlights the importance the autonomous vehicles will have in making transport "safer, cleaner and better connected." To this challenge, AIMS offers the world-leading research of its robotic systems research pillar.
AIMS will further promote the positive realisation of autonomous technologies through direct influence on policy. The world-leading academics amongst AIMS's supervisory pool are well-connected to policy formation e.g. Prof Osborne serving as a Commissioner on the Independent Commission on the Future of Work. Further, Dr Dan Mawson, Head of the Economy Unit; Economy and Strategic Analysis Team at BEIS will serve as an advisor to AIMS, ensuring bidirectional influence between policy objectives and AIMS research and training.
Broad understanding of autonomous systems is crucial in making a society robust to the transformations they will engender. AIMS will foster such understanding through its provision of opportunities for AIMS students to directly engage with the public. Given the broad societal importance of getting autonomous systems right, AIMS will deliver core training on the ethical, governance, economic and societal implications of autonomous systems.
People |
ORCID iD |
| Yash Bhalgat (Student) |
Studentship Projects
| Project Reference | Relationship | Related To | Start | End | Student Name |
|---|---|---|---|---|---|
| EP/S024050/1 | 30/09/2019 | 30/03/2028 | |||
| 2577387 | Studentship | EP/S024050/1 | 30/09/2021 | 29/09/2025 | Yash Bhalgat |