Classifying Images Regardless of Depictive Style

Lead Research Organisation: University of Bath
Department Name: Computer Science


Computer's today can recognise objects in photographs. This ability has formed the basis of many familiar applications such as Facebook tagging, Google Image Search, Google Goggles, and automated passport checking at UK borders.

Yet a significant restriction remains: computers can only recognise objects in photographs. At least, their ability to recognise objects in
drawings and paintings - in artwork of any kind - is strictly limited. If this limitation can be overcome then many more applications will
become possible.

One is a new way to search the web for images in which a drawing (say) is dragged from the desktop into a search bar,
and paintings and photographs are given back to the user (at the moment a user gets the same sort of image back as was dragged into the
search bar).

Another is the automated production of catalogues for taxonomy - which is important to scientists faces with tens of thousands of
microscopic creatures; species catalogues are hand-drawn right now so automation would be a significant advance for them.

The output of the programme would also allow ordinary photographs to be converted to icons. This is not as dry as it sounds, but could help the visually impaired to gain access to photographic content. If photographs and drawings can be linked in the way this project has in mind, then
objects in photographs could be turned into icons rendered by a set of raised pins. So there would be a symbol for car, say, not unlike that which might be drawn by a child - and in fact this is very close to the icons blind artists draw. This would allow the visually impaired to read photographs in newspapers, or in text books, and allow them to share the holiday snaps of family and friends.

This proposal is about building the basic technology that underpins these applications, and quite possibly others too. Key to it is lifting the barrier that computers of today face - allowing them to recognise objects no matter how they are depicted.

Planned Impact

The impact of this proposal will show in the near, medium, and longer terms.
It will also show in the academic sector, the industrial sector, in society, and is media friendly.
We plan to pursue impact in all of these.

The academic beneficiaries are detailed in another section of this form and so will not be detailed here,
expect to say that medium and longer impact requires further input from the academic communities.

Impact in the industrial sector has an obvious route via sophisticated search engines, as built by companies such as Google and Microsoft.
Internet search will benefit because the total of all images includes not just photographs but also artwork of all kinds.
The ability to pose queries as sketches and retrieve photographs is a current research topic; this proposal would not contribute directly
but we do plan to build a prototype as our application (objective 4). We note that Google offer PhD studentships, a route we intend to take advantage of as this proposal progresses, with a view to constructing a more substantive web-search application, and so we regard it as a medium term aim.

In terms of wider society, we find the possibility of allowing the visually impaired access to photographic content particularity appealing. In fact, because our philosophy is to not discriminate between depictive styles, our approach should allow the visually impaired access to a mush wider variety of images than at present. Access is possible if we can summarise visual content so as to remove unwanted clutter; recognising objects and synthesising icons that can be felt allows people to read newspapers, share photos, and generally raises their standard of living.
This would a medium to long-term aim.

The proposal is media friendly: the ability to drag a photograph of the Queen onto a search bar and have all of her portraits returned is something that would be of general interest. It is to the benefit of the research community as a whole and to research councils in particular to receive favourable publicity.
Description All current methods for object recognition, including ours, are able to recognise objects in photographs very well. However, no current method except ours is able to recognise objects in both photographs and art .

Our key findings
+ Released two cross-depiction datasets. Currently no such dataset exists for cross-depiction problem.
+ Conducted extensive experiments to benchmark classification, domain adaptation, detection and deep learning methods on the cross-depiction task. It greatly helps the computer vision community to understand the performance of leading techniques on this new field and gives insights for potential solutions.
+ Developed a multi-labeled graph model with learned discriminative weights which is able to model object classes over a broad range of depictive styles.
+ Adapted DPM with cross-depiction expansion to bridge the gap between photo and art domains, leading to a significant rise in performance.
+ Adapted the state-of-the-art deep learning method -- fast-RCNN (Regional Convolutional Neural Networks) to detect people in all kinds of art images.
+ Designed dual convolutional neural networks to simultaneously minimise the classification error and the domain discrepancy.
Exploitation Route This is a growing area in Computer Vision. Web search engines, commercial companies requiring advanced indexing, even converting photos to icons for (eg) blind people. We plan also the extend the work to 3D objects and to video.

The work is of broad value, with expressions of interest from the RNIB, British Library, car component manufacturers, CAD companies, and the creative sector.

We plan to meet BL and RNIB to discuss joint ways forward - in fact several grants with them are under development.

--- Update ---

We have now submitted an EPSRC proposal in the related area of Style Transfer, with Art Historians who curate and research.
We are developing a large scale proposal in Assistive Computing, with RNIB involved.
We are supervising a CSC student in Style Transfer, with simulataneously advances our agenda for UK/China collaboration.
Sectors Communities and Social Services/Policy,Creative Economy,Education,Healthcare,Culture, Heritage, Museums and Collections,Other

Description Interested new industrial partners include the RNIB, charity Designability, and art historians. The work has supported on-going projects with two companies: Ninja Theory and Disney. It has helped develop relations with Chinese, Canadian, and German academics. -- update --- A major proposal is now under development pulling together two leading UK universities, and a true cross-discipline collaboation of Computer Science, Psychology, Electronic Engineering, and Education as well and national bodies including RNIB. We will partner with UK software houses, and EU hardware manufacturers.
First Year Of Impact 2019
Sector Creative Economy,Education,Culture, Heritage, Museums and Collections,Other
Impact Types Cultural,Societal,Economic

Title People-Art 
Description A collection of 4000+ images, each showing at least one human figure. The images come in a broad range of artistic styles. 
Type Of Material Database/Collection of data 
Year Produced 2015 
Provided To Others? Yes  
Impact The database shows the failure of all contemporary computer vision methods to detect people in artwork. It has motivated our current research direction. 
Title Photo-Art-50 
Description A collection of images; artwork of all kinds. This augments the famous CalTech-256 dataset with artwork it was previously lacking. 
Type Of Material Database/Collection of data 
Year Produced 2015 
Provided To Others? Yes  
Impact The data was used to show a failure mode for all contemporary methods in computer vision for recognition; to explain the failure empirically; and the to address it. A URL will be published shortly. 
Description Art History Through Recognition 
Organisation University of Tuzla
Department Philosophy Faculty
Country Bosnia and Herzegovina 
Sector Academic/University 
PI Contribution Plans to work on recognition in art historical databases. To couthor publications. We provide techncial developments.
Collaborator Contribution Plans to work on recognition in art historical databases. To couthor publications. Provides access to databases, to European groups, and to critical assessement.
Impact Early stage of collaboration
Start Year 2018
Description Assessing Automatci Art by Appreciation 
Organisation Carleton University
Country Canada 
Sector Academic/University 
PI Contribution We provide datasets, experiments, IP etc
Collaborator Contribution They provide datasets, experiments, IP etc
Impact Just started
Start Year 2019
Description Tactile Images for the Visually Imparied 
Organisation Royal National Institute for Blind People
Country United Kingdom 
Sector Charity/Non Profit 
PI Contribution We provide techical development
Collaborator Contribution RNIB provide access to user groups
Impact Just started
Start Year 2019
Description Tactile Images for the Visually Imparied 
Organisation University of Bath
Department Designability (Bath Institute of Medical Engineering)
Country United Kingdom 
Sector Academic/University 
PI Contribution We provide technical developments
Collaborator Contribution Designabiilty will conduct field tests
Impact Just started
Start Year 2019
Description workshop organisation 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact We have organised and helped to organise the "Expressive" triad of workshops, and the VisArt workshop. Our work in the area of cross depiction and NPR is therefore helping bring together computer graphics, computer vision, cultural history, and have impact with the British Librayr, the RNIB, as well ad CAD companies and others.
Year(s) Of Engagement Activity 2014,2016