Deep Learning-based Computer Vision Methods
Lead Research Organisation:
University of Oxford
Department Name: Engineering Science
Abstract
Visual understanding of 3D from images and video
This project falls within the EPSRC Information and Communication Technologies research area.
The objective of this research is to be able to match, describe and categorise image and video content. The aim here is to achieve human like performance and beyond, for example in recognising configurations of parts and spatial layout, counting and delineating objects, or recognising human actions and inter-actions in videos, significantly superseding the current limitations of computer vision systems, and enabling new and far reaching applications. This research will build on top of the success of discriminatively trained recognition systems to recognise visual content in images. The research will involve developing combined generative and discriminative models that can infer the attributes and shape of novel visual material. The new algorithms will learn automatically, building on recent breakthroughs in deep machine learning. They will be capable of weakly-supervised learning, for example from imagines and videos downloaded from the internet, and require very little human supervision.
There are applications of this work to image categorisation and search in large image and video datasets. Our vision is that anything visual should be searchable for, in the manner of a Google search of the web: by specifying a query, and having results returned immediately, irrespective of the size of the data. Such enabling capabilities will have widespread application both for general image/video search - consider how Google's web search has opened up new areas - and also for designing customised solutions for searching. Another application is the automatic counting of large numbers of objects in imagines. This is used in a variety of domains, from counting cells in medical images to counting cars in traffic or satellite data.
The ubiquity of digital imagine means that every UK citizen may potentially benefit from the Programme research in different ways. One example is an enhanced iplayer that can search for where particular characters appear in a programme, or intelligently fast forward to the next 'hugging' sequence. A second is wider deployment of lower cost imaging solutions in healthcare delivery. A third, also motivated by healthcare, is through the employment of new machine learning methods for validating targets for drug discovery based on microscopy images.
This project falls within the EPSRC Information and Communication Technologies research area.
The objective of this research is to be able to match, describe and categorise image and video content. The aim here is to achieve human like performance and beyond, for example in recognising configurations of parts and spatial layout, counting and delineating objects, or recognising human actions and inter-actions in videos, significantly superseding the current limitations of computer vision systems, and enabling new and far reaching applications. This research will build on top of the success of discriminatively trained recognition systems to recognise visual content in images. The research will involve developing combined generative and discriminative models that can infer the attributes and shape of novel visual material. The new algorithms will learn automatically, building on recent breakthroughs in deep machine learning. They will be capable of weakly-supervised learning, for example from imagines and videos downloaded from the internet, and require very little human supervision.
There are applications of this work to image categorisation and search in large image and video datasets. Our vision is that anything visual should be searchable for, in the manner of a Google search of the web: by specifying a query, and having results returned immediately, irrespective of the size of the data. Such enabling capabilities will have widespread application both for general image/video search - consider how Google's web search has opened up new areas - and also for designing customised solutions for searching. Another application is the automatic counting of large numbers of objects in imagines. This is used in a variety of domains, from counting cells in medical images to counting cars in traffic or satellite data.
The ubiquity of digital imagine means that every UK citizen may potentially benefit from the Programme research in different ways. One example is an enhanced iplayer that can search for where particular characters appear in a programme, or intelligently fast forward to the next 'hugging' sequence. A second is wider deployment of lower cost imaging solutions in healthcare delivery. A third, also motivated by healthcare, is through the employment of new machine learning methods for validating targets for drug discovery based on microscopy images.
Organisations
People |
ORCID iD |
Andrew Zisserman (Primary Supervisor) | |
Erika Lu (Student) |
Publications
Studentship Projects
Project Reference | Relationship | Related To | Start | End | Student Name |
---|---|---|---|---|---|
EP/N509711/1 | 30/09/2016 | 29/09/2021 | |||
1940016 | Studentship | EP/N509711/1 | 30/09/2017 | 30/03/2021 | Erika Lu |