Visual Image Interpretation in Humans and Machines

Lead Research Organisation: University of Birmingham

Department Name: School of Psychology

Abstract

The sense of vision is so fundamental to humans that it is a largely automated process which appears to us as extremely easy. This would suggest that it should be easy to make a computer see like a human. In fact this is a very difficult task because the biological visual system is very complex; occupying about one quarter of the human brain. Human vision is both highly effective and efficient. For example it is capable of identifying around 10,000 different object categories and can learn new categories from single examples. This in achieved with a system requiring just 20 watts of power and weighing 1.4kg. No computer system can match this performance for recognition ability, learning efficiency and power consumption.

One way to devise new computer vision methods is to understand how biological visual systems work. However, the complexity of vision has made this very difficult and some researchers have concentrated their efforts on understanding biological vision while others have sought independent solutions to specific problems in computer vision. For example, humans can read car number plates but we do so using a general purpose visual system that can also read gothic script and handwriting as well as performing a host of other tasks. Building a number plate recognition system to read letters in the same general way that humans do would be difficult. However, because number plates have a certain fixed format (they are always a certain, bright, colour, and the font is always a certain style and size) building a computer vision system just to read number plates, and nothing else, is a much simpler task.

There are some tasks that have not proved simple for computer vision and where understanding biological vision is likely to be essential to future success. One example is matching the appearance of two surfaces. Suppose you wanted to make artificial stone to look exactly like the real stones in a building. To get the recipe just right you would have to know not just the physical properties of the original stone (which probably cannot be matched exactly) but also how the human vision system is likely to perceive the stone. You can then pick a recipe that may not mimic the stone exactly but which will look just like the real stone to humans. Moreover, if you know how the visual system processes the colours and textures of surfaces you can build a computerised tool that can predict recipes automatically.

Another area of interest is computer graphics. One way to make computer graphics look convincing is to exactly model the physics of the thing you are trying to represent. However, such rendering methods are often very time consuming and computationally expensive. Because the human visual system does not see every detail in an object it is often possible to render graphics much more quickly and effectively using perceptual rendering techniques that exploit knowledge of how the human visual system will process each scene.

Because those researchers working on biological vision tend to be from Biology and Psychology backgrounds and those who research computer vision from Computer Science and Engineering backgrounds, there is often a gap in understanding between the two groups of researchers which makes it hard for them to work together on problems such as those outlined above. The aim of this Network is to bring such researchers closer together, both physically and scientifically, so that they can identify and work together on the challenging problems where success is most likely. We will achieve this by a series of away day style meetings and conferences and by funding junior scientists and PhD students to spend time working in another lab from a different discipline.

Planned Impact

Network members
The impact of the Network will be felt directly in terms of training for the PhD students and early career researchers who will benefit from the exchange visits. There is also the wider learning and development of understanding that participating in the Network will bring, via workshop and conference participation, as well as the direct collaboration within projects.

Research Community
The research community will benefit from having a nexus of researchers spanning a number of EPSRC research areas who will be able to provide a better focus for cross-discipline research in terms of the ICT, and broader EPSRC, remit. Research areas where this benefit will be felt include: Image and Vision Computing, Vision Hearing and Other Senses, Graphics and Visualisation, Human Computer Interaction, Artificial Intelligence Technologies, Robotics, and Displays. The Network will provide a coherent portfolio of multi-disciplinary grants. The research community will also gain a more clearly defined group of researchers able to comment on the multi-disciplinary grants that the above themes tend to generate.

UK Economy, Healthcare Professionals, Education professionals, Schools and Museums, Public
These stakeholders will benefit from having an established and coherent researcher base in the area of biological and computer Vision. The Network will act as a facilitator for activity in this area and as such will not produce direct impact. Individual Network members will be responsible for translating and exploiting their own intellectual property. However, the potential application area for computer vision technology is huge raging from medical diagnostics and treatment to computer games and the film industry. We indicate some example application areas below but this list is by no means exhaustive:

Appearance technology has applications in the textile, paint and automotive industries where there is a desire to match the appearance of a synthetic product to some natural sample. There are applications of this technology too in retail where samples presented online need to match the physical product. Medical applications could include the construction of prosthetics. Designers, artists and education professionals will also benefit from the ability to produce physical replicas of artefacts or surface treatments that match natural samples.

Perceptual rendering (rendering to match the perceptual rather than physical properties of an object) has applications in computer graphics, computer games, and the film industry where more realistic rendering can be achieved more economically.

Object recognition technology might directly contribute to 'home help' style robots capable of finding and retrieving objects from around a standard house or able to perform simple household tasks. Many industrial settings can also benefit from systems capable of recognising objects in a range of poses and lighting conditions. Automated vehicles may benefit from object, place terrain and hazard recognition. Automatic face recognition technology has applications in security settings. Recognition applications for smart phone and wearable computers will help users to identify unusual objects, people and places and may also assist in the rehabilitation of stroke patients who often have specific memory deficits. Object recognition technology can also improve medical diagnosis.

Active vision for robots will enable them to better respond to changing environments to find and manipulate objects. Gesture and emotion recognition will have applications in the design of human computer interfaces and educational software. Recognition of emotions will also have applications in security where it may be possible to identify malicious intent. Emotion recognition applications for wearable computers may also help those with autism.

Funded Value:

£121,299

Funded Period:

Apr 14 - Oct 17

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/L014564/1

Principal Investigator:

Andrew Schofield

Research Subject:

Info. & commun. Technol. (100%)

Research Topic:

Image & Vision Computing (50%)

Vision & Senses - ICT appl. (50%)

Organisations

University of Birmingham (Lead Research Organisation)

People	ORCID iD
Andrew Schofield (Principal Investigator)

Publications

Author Name

Title Publication Date Published

10 25 50

Lock J (2020) Experimental Analysis of a Spatialised Audio Interface for People with Visual Impairments in ACM Transactions on Accessible Computing

Schofield A (2018) Understanding images in biological and computer vision in Interface Focus

Key Findings
Engagement Activities


Description	Orignal Objectgives: The principle aims of the Network will be: 1. To foster communication and joint projects between relevant research groups including those working on biological vision (human and non-human animals) and computer vision. The Network, in particular the Appearance theme, will also be relevant to those working in the area of computer graphics, rendering and special effects. [Objective Met] 2. To establish a series of grand challenges focused around well specified tasks where cross-over studies have a strong potential to provide robust solutions. [Objective Met] 3. To foster joint cross-discipline grant applications. [Objective Met] 4. To explore mechanisms to improve the utility of joint publications for both partners. [Objective Partly Met] 5. To equip individual PhD and post-doctoral scientists to be future leaders of cross-over research projects. [Objective Met] 6. To establish a lasting vehicle for supporting cross-over biological and computer vision projects. [Objective Partly Met] 7. To increase public engagement with the concept of biologically inspired computer vision. [Objective Partly Met] As a Network grant funding was given to establish a research network rather than to carry out original research. As such there are not 'findings' to report however the network has met most of its original objectives to bring together the biological and machine vision communities. We have does this by running 3 large workshops for 80 or more participants. These workshops have focused on network building, defining the key problems in the field ad defining grand challenges. The result has been the publication of a Grand Challenges document which we hope will help to inform future grant writing and grant funding decisions. We have also funded a number of smaller workshops to discuss specific topics and a smaller number of grant writing workshops allowing network members to spend concentrated time writing multi-disciplinary grant applications. We have also funded a small number of exchange visits between labs. We know of several new grant applications that have been submitted as a result of the Network which would not have been written if the ViiHM Network had not existed. Throughout, we have encouraged the participation of more junior researchers activity promoting their attendance at both large and small workshops and encouraging them to run their own workshops funded by ViiHM. We have promoted the principle that interactions between the two communities can be fruitful by sponsoring symposia at larger conferences and via industry focused talks. The Network now has more than 200 members and we have established a route by which the Network's activities can be continued with the support of existing organisations including the Royal Society, British Machine Vision Association, and Applied Vision Association.
Exploitation Route	The Grand Challenges document possess three challenges: a theoretical challenge and technical challenge and an application challenge. These are centred around the idea of an intelligent, cognitive assistant which might be deployed as a robot or in wearable technology. It should have low space and power requirements. This document outlines a roadmap by which human and machine vision combined my address this challenge and highlight spin-offs to other areas such as personal assistance robots, drive less cars, drones and similar technologies. We hope that this will be a resource for grant and product ideas into the future.
Sectors	Digital/Communication/Information Technologies (including Software),Electronics,Healthcare,Transport
URL	http://www.viihm.org.uk


Description	Opinion article for Trade Magazine
Form Of Engagement Activity	A magazine, newsletter or online publication
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Industry/Business
Results and Impact	Written online article outlining the ways in which biological vision might inform the construction of machine vision systems.
Year(s) Of Engagement Activity	2017
URL	https://www.imveurope.com/news/analysis-opinion/what-can-drones-learn-bees

Abstract

Planned Impact

Organisations

People

ORCID iD

Publications