Understanding scenes and events through joint parsing, cognitive reasoning and lifelong learning

Lead Research Organisation: University of Birmingham

Department Name: School of Computer Science

Abstract

Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.

Funded Value:

£841,387

Funded Period:

Jan 16 - Feb 22

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/N019415/1

Principal Investigator:

Ales Leonardis

Research Subject:

Info. & commun. Technol. (75%)

Psychology (25%)

Research Topic:

Artificial Intelligence (25%)

Cognitive Psychology (25%)

Image & Vision Computing (25%)

Vision & Senses - ICT appl. (25%)

Organisations

People	ORCID iD
Ales Leonardis (Principal Investigator)

Publications

Author Name

Title Publication Date Published

|< < 1 2 3 4 > >|

10 25 50

Ani M (2022) Conditional Patch-Based Domain Randomization: Improving Texture Domain Randomization Using Natural Image Patches

Ani M (2021) Quantifying the Use of Domain Randomization

Basevi H (2022) Imagining hidden supporting objects using volumetric conditional GANs and differentiable stability scores

Basevi H (2016) Towards Categorization and Pose Estimation of Sets of Occluded Objects in Cluttered Scenes from Depth Data and Generic Object Models Using Joint Parsing

Chen W (2021) FS-Net: Fast Shape-based Network for Category-Level 6D Object Pose Estimation with Decoupled Rotation Mechanism

Chen W (2020) G2L-Net: Global to Local Network for Real-Time 6D Pose Estimation With Embedding Vector Features

Chen W (2021) FS-Net: Fast Shape-based Network for Category-Level 6D Object Pose Estimation with Decoupled Rotation Mechanism

Chen W (2020) PointPoseNet: Point Pose Network for Robust 6D Object Pose Estimation

Du Y (2018) Learning to exploit stability for 3D scene parsing

Horanyi N (2022) Repurposing Existing Deep Networks for Caption and Aesthetic-Guided Image Cropping

Key Findings
Further Funding
Research Databases and Models
Collaboration
Software and Technical Products
Engagement Activities


Description	The University of Birmingham contributed to a better understanding of the role of physics within computer vision. It has identified the advantages and disadvantages of different representations for scenes and scene understanding, in addition to different models for physical phenomena. It has developed methods for leveraging models of physics to enhance other computer vision tasks for which physics is not a primary factor. This research has resulted in interdisciplinary and international collaborations, in particular with MIT. Several strands of research within the project have provided improvements in real-world robotic manipulation. In addition to the focus on physics, the University of Birmingham has developed a number of state-of-the-art techniques and evaluation mechanisms for tasks with relevance to robotics, such as object pose estimation and tracking. These techniques and tools will have future impact as they are adopted by other researchers. Work on tracking evaluation methods has already impacted and continues to influence the research community. The objective of collaboration between research groups and between disciplines was impacted by the onset of the COVID pandemic. Collaborations were continued over online channels, but the inability to visit research labs due to national and international travel restrictions negatively affected the effectiveness and outputs of collaborations.
Exploitation Route	The methods of leveraging physics for other computer vision tasks developed by the University of Birmingham have the potential to be incorporated into a broad range of computer vision tasks. We expect that these will be adopted by the wider research community for individual tasks as appropriate. The researchers from the University of Birmingham intend to build upon our findings on the role of scene representations and physics and their implications for other vision tasks to develop interpretable representations which have general utility across a range of vision tasks.
Sectors	Digital/Communication/Information Technologies (including Software),Other


Description	Amazon Research Awards
Amount	$80,000 (USD)
Organisation	Amazon.com
Sector	Private
Country	United States
Start


Description	Babymind: Computational Models Of Sensorimotor Learning And Control
Amount	£24,930 (GBP)
Organisation	Ministry of Science ICT and Future Planning
Sector	Public
Country	Korea, Republic of
Start	01/2020
End	12/2020


Description	CHIST-ERA (Object recognition and manipulation by robots: Data sharing and experiment reproducibility (ORMR))
Amount	€ 1,180,634 (EUR)
Organisation	Engineering and Physical Sciences Research Council (EPSRC)
Sector	Public
Country	United Kingdom
Start	04/2019
End	03/2022


Description	Paul and Yuanbi Ramsay Research Fund
Amount	£2,719 (GBP)
Organisation	University of Birmingham
Sector	Academic/University
Country	United Kingdom
Start	03/2017
End	03/2017


Description	Tracking membrane receptor interactions via data-driven machine learning
Amount	£119,134 (GBP)
Organisation	Alan Turing Institute
Sector	Academic/University
Country	United Kingdom
Start	04/2020
End	12/2021


Title	Multimodal visual and physical scene understanding data set
Description	The data set consists of simulated scenes of complex collections of objects on tables. Data modalities include RGB images, depth images, 2D semantic maps, volumetric semantic maps, as well as object poses, physical interactions between objects, and object trajectories induced by physical perturbations to the scenes.
Type Of Material	Database/Collection of data
Year Produced	2019
Provided To Others?	No
Impact	Mainly project research, and parts of data set have also been used by students working on other projects.


Description	Collaboration with Josh Tenenbaum's group at MIT
Organisation	Massachusetts Institute of Technology
Country	United States
Sector	Academic/University
PI Contribution	Expertise on stability estimation and robust integration of stability estimation models into larger tasks, via collaboration on paper.
Collaborator Contribution	Expertise on scene parsing and physical inference, via leading collaboration on paper.
Impact	NeurIPS 2018 paper.
Start Year	2018


Description	Collaboration with Mario Fritz's group
Organisation	Helmholtz Association of German Research Centres
Country	Germany
Sector	Academic/University
PI Contribution	Expert input and guidance, data sets, and implementation of algorithm models. The collaboration resulted in multiple papers.
Collaborator Contribution	Expert input and guidance, data sets, and implementation of algorithm models. The collaboration resulted in multiple papers.
Impact	Papers: W Li, A Leonardis, M Fritz, Visual Stability Prediction for Robotic Manipulation, IEEE International Conference on Robotics and Automation M Wagner, H Basevi, R Shetty, W Li, M Malinowski, M Fritz, A Leonardis, Answering Visual What-If Questions: From Actions to Predicted Scene Descriptions, European Conference on Computer Vision, 521-537
Start Year	2016


Description	Collaboration with Mario Fritz's group
Organisation	Max Planck Society
Department	Max Planck Institute for Informatics
Country	Germany
Sector	Charity/Non Profit
PI Contribution	Expert input and guidance, data sets, and implementation of algorithm models. The collaboration resulted in multiple papers.
Collaborator Contribution	Expert input and guidance, data sets, and implementation of algorithm models. The collaboration resulted in multiple papers.
Impact	Papers: W Li, A Leonardis, M Fritz, Visual Stability Prediction for Robotic Manipulation, IEEE International Conference on Robotics and Automation M Wagner, H Basevi, R Shetty, W Li, M Malinowski, M Fritz, A Leonardis, Answering Visual What-If Questions: From Actions to Predicted Scene Descriptions, European Conference on Computer Vision, 521-537
Start Year	2016


Title	Software platform for physically realistic scene data generation
Description	A software tool for generation of physically realistic scenes and associated data. It physically simulates scene creation process using the Bullet physics engine, and applies a bespoke renderer to generate multimodal colour, surface normal, reflectance, and semantic information. The tool also permits more sophisticated physical understanding through generation of physical contact and force information.
Type Of Technology	Software
Year Produced	2018
Impact	The tool has been used within the research group by PhD students within other projects, and is being adapted for use in the ongoing BURG robotic benchmarking project.


Description	Article in Computer VIsion News Magazine
Form Of Engagement Activity	A magazine, newsletter or online publication
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Media (as a channel to the public)
Results and Impact	Article on research in Computer Vision News magazine, via their ECCV special issue. Read by the computer vision community and more widely (e.g. hobbyists etc.).
Year(s) Of Engagement Activity	2018
URL	http://www.rsipvision.com/ECCV2018-Monday/


Description	Talk at Amazon Research Awards Robotics Symposium 2019
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Industry/Business
Results and Impact	Hector Basevi attended the Amazon Research Awards Robotics Symposium 2019 in Boston, USA to present an extension of project research to constrained environments. The audience consisted of employees of Amazon and other invited Research groups, and the talk was later uploaded to Twitch.TV where it is publicly available.
Year(s) Of Engagement Activity	2019
URL	https://www.twitch.tv/videos/511933761

Abstract

Organisations

People

ORCID iD

Publications