Computer vision

Lead Research Organisation: Imperial College London
Department Name: Electrical and Electronic Engineering

Abstract

The research are alignments are: Image and vision computing and Artificial intelligence technologies

An image can be accurately described by is by objects it shows and spatial relations between them. For example, an image of a bedroom may be described by the sentence: "there is a bed next to a nightstand table with a lamp on the top of it". The aim of this Ph.D. research is the visual understanding of an image by detecting different objects (i.e. recognize their category and location within the image), followed by the generation of a textual description of the scene.
One of the main challenges in object recognition is the collection of annotated training data. Rather than collecting and annotating images manually, we propose to develop methods for training object models from existing datasets collected in different domains or generated synthetically, which is called domain adaptation. The domain adaptation problem is two-fold. Firstly, visual domain shift needs to be addressed: objects in images from different datasets have different visual appearance e.g. objects in real images vs. drawings. The appearance may differ even between images of real objects of the same category e.g. a shirt in a stock photo differs from its appearance in a low-quality picture, or taken with a mobile phone in different illumination conditions.
Some of the most popular deep learning object detection models include Fast/Faster R-CNN, YOLO or SSD and use annotated training data (i.e. a bounding box around the object and its category label) to learn the appearance of the objects. However, when the models are applied in images where the objects have different appearance their performance degrades. Domain adaptation research aims to reduce this degradation, but it has mainly focused on image classification problems (one category per image), and there has been little work on both recognition and localisation of objects in different visual domains. Developing an approach to object detection with domain adaptation is the first main objective of this thesis. To do so, we aim to adapt the instance features from the two different domains similar to the problem tackled by [1].
The second objective of this work is to propose methods that can describe images with natural text enriched by words that were not present in annotated training data. Current models require pairs of images and text to learn suitable descriptions, thus, their vocabulary is limited to what is present the given training pairs. A rich vocabulary is needed to generate more specific captions. For example in fashion or cloth retail market, one may want to use fashion-related attributes to describe images of a person in the street. The most popular methods are focused on using global features extracted from the image and fed to a recurrent neural network (RNN) that generates a sentence description. Recently, more attention has been directed towards methods which first detect all the objects in the image as well as their spatial relations, and then input this information into RNN. This decoupling of object detection and caption generation allows using different detectors with different results that can be trained with data which require less intensive manual annotation than image-sentence pairs typically used for training caption generations. Caption generation can then rely on the input from object detectors and other text datasets with rich object descriptions which do not include image examples. We propose to use a method similar to [2] to learn pattern-like structures of sentences and use detectors trained in other domain to output more fine-grained results.

References:

[1] Y. Chen, W. Li, C. Sakaridis, D. Dai, and L. Van Gool. Domain adaptive Faster R-CNN for object detection in the wild. In CVPR, 2018.
[2] J. Lu, J. Yang, D. Batra, and D. Parikh. 2018. Neural Baby Talk. In CVPR, 2018."

Publications

10 25 50
publication icon
Lopez Rodriguez Adrian (2019) Domain Adaptation for Object Detection via Style Consistency in arXiv e-prints

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/N509486/1 01/10/2016 31/03/2022
1859718 Studentship EP/N509486/1 01/11/2016 31/10/2020 Adrian Lopez-Rodriguez