Visual Question Answering focused on properties (materials, shapes) and relations of small objects

Lead Research Organisation: Imperial College London
Department Name: Electrical and Electronic Engineering

Abstract

The research is focused on learning various properties of objects (such as: surface type, weight, size, shape, etc.) using different modalities of information provided, i.e. visual and text. The main application of modelling a broad range of properties at different semantic levels is robotic manipulation. Robotic systems require detailed information about object's shape, volume, material, mass, friction, etc. to perform correct perception and grasping. The main objective of this research is closely related to the task of Visual Question Answering, which is to provide the most accurate answer for a natural language question related to a given image. Another re- search objective is to provide a consistent and structured representation of the object properties inferred from different modalities and to predict properties of unobserved instances of objects from the same category as the known ones. The research will also concentrate on providing a new experimental framework for VQA focused on properties and relations, establishing a way to compare different systems used for recognition of physical properties.

The approach explored in this research will be primarily based on deep neural networks, which are commonly used in classification and recognition tasks. For the visual data, image segmentation with neural networks will be explored along with novel methods for inferring object properties from different views. Text data will also be processed with the use of recurrent neural networks being the state of the art in natural language processing. Both methods will extract features of objects (visual and textual) and the approach for fusing the representations into a consistent scene representation will be proposed. In order to extract information from such representation a symbolic program execution idea will be explored. Symbolic programs are small modules allowing to filter the scene representation or compare properties of two objects. Using deep neural networks such a program will be created, and the queried property of the object will be extracted. The research will develop methods to infer relevant properties of objects from existing natural language descriptions accompanied with pictures and illustrations. The developed approach will be provided in consent with current VQA evaluation protocols in order to compare it to the existing methods.

The research is aiming to provide a new, state of the art approach for disentangling visual and textual information obtained by neural networks. In addition, a new task and benchmark will be formulated with explicit focus on analysis the capability of method to infer items' properties. Finally, the approach will be used to increase the autonomic capabilities of robots by providing them with rich and precise information about their surroundings.

EPSRC research areas:
- Artificial intelligence technologies
- Image and vision computing
- Natural language processing

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/R513052/1 01/10/2018 30/09/2023
2127907 Studentship EP/R513052/1 01/10/2018 31/07/2022 Michal Nazarczuk