Structural analysis and interactive composition of visual media

Lead Research Organisation: CARDIFF UNIVERSITY

Department Name: Computer Science

Abstract

This project represents joint work between 12 leading Chinese Universities, and several other invited key partners in the UK and US.

The Internet, and other large-scale databases, form a significant resource of what may be termed "visual media": images, videos, 3D shape models, and so on. Internet text searches usually produce useful results. However, it can be much more difficult to find visual media, e.g. videos with specific content, or images similar to a picture in one's mind's eye. This is partly due to the fact that most image search is based on text inputs, and partly due to the difficulty of classifying pictures. It is easy for humans to "know" what an image contains, but image understanding by computer requires many tricky tasks - splitting an image into separate objects, and analysing their colour, their shape, and many other attributes. Better solutions to search of visual media would enable many applications in addition to search itself, and we will also look at one of them - the re-use of existing visual media when creating new visual media.

This project has four main goals.

The first is to investigate new approaches to structural analysis of visual media. This will include devising methods to find salient information (for example, what is the main object? what is irrelevant background? how is this object composed of parts?), and methods which process the information on different scales (small details may be just as important as overall shape, for example). The aim is to come up with hierarchical descriptions of the important information in visual media.

The second is to find efficient new approaches to comparing, classifying and searching visual media, based on the above hierarchical descriptions. We will also look at how sketches can be used as a much more powerful means than text of allowing users to describe what they want to find when searching.

The third area to be considered is editing and resynthesis of visual media. Structural analysis will provide more meaningful ways to select parts of an image than just, for example, all parts of the scene with a certain colour. In turn, this will simplify the process of editing visual media. Users will be able to apply consistent editing to scene elements with similar meaning (e.g. the user controls bending of one finger, and the computer applies a similar bend to the rest of the fingers of a hand, despite minor shape differences). More powerful search will also allow elements to be rapidly retrieved from visual media databases or the Internet to be combined into new scenes, or to be included within existing images, with suitable adjustment for different lighting, etc. When video is processed, further considerations will be needed to ensure results are consistent over time, and smoothly vary as time progresses; the vast amounts of data involved in video processing make this a challenging problem.

The final area of work concerns the use of machine learning techniques to assist with all of the previous goals. The aim here is to automatically learn to recognize complex patterns, permitting software to make intelligent decisions based on visual data. Ultimately, a careful balance must be struck in which the user is firmly in control of the creative process, but the computer makes it easy for the user to produce the desired results.

Planned Impact

The ultimate beneficiaries of this research will be multiple, and wide ranging.
Improved search of visual media, e.g. for images or video on the Internet, could clearly benefit the public at large. Improved content analysis has potential applications in many areas such as security (where have I seen this person before? what are they doing?), summarisation (e.g. producing edited highlights of a sports match or movie), digital rights protection (finding unauthorised copies of digital media), and so on.
Interactive composition of digital media has many potential applications in computer aided design, digital entertainment, broadcasting, games, advertising, and other areas.
For example, incorporation of existing geometric mesh data has the potential to allow rapid design and re-use of complex shapes, helping CAD to go beyond its traditional remit of mechanical design to applications in artistic design.
The ability to synthesise video content from a mix of geometric models and existing video, and to edit video content in novel ways, promises large savings in the cost of movie production.

Apart from these established application areas, other areas are rapidly developing - current mobile phones already include graphics processors, and some are starting to employ 3D cameras and displays. The techniques devised during this project will help to provide an underpinning for a new generation of mobile applications.

Historically, the UK has been at the forefront of CAD software and digital entertainment software, and participation in internationally leading collaborations is essential to retain that position.

Funded Value:

£95,804

Funded Period:

Jan 12 - Apr 17

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/J009830/1

Principal Investigator:

Ralph Martin

Research Subject:

Info. & commun. Technol. (100%)

Research Topic:

Computer Graphics & Visual. (50%)

Image & Vision Computing (50%)

Organisations

People	ORCID iD
Ralph Martin (Principal Investigator)

Publications

Author Name

Title Publication Date Published

|< < 1 2 > >|

10 25 50

Chen K (2014) Automatic semantic modeling of indoor scenes from low-quality RGB-D data using contextual information in ACM Transactions on Graphics

Du SP (2013) Semiregular solid texturing from 2D image exemplars. in IEEE transactions on visualization and computer graphics

Hu S (2013) Internet visual media processing: a survey with graphics and vision applications in The Visual Computer

Hu S (2013) PatchNet a patch-based image representation for interactive library-driven image editing in ACM Transactions on Graphics

Huang H (2014) Learning Natural Colors for Image Recoloring in Computer Graphics Forum

Huang HZ (2016) Efficient, Edge-Aware, Combined Color Quantization and Dithering. in IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

Li XY (2013) Mixed-domain edge-aware image manipulation. in IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

Liu B (2014) Structure Aware Visual Cryptography in Computer Graphics Forum

Lu SP (2013) Timeline editing of objects in video. in IEEE transactions on visualization and computer graphics

Mu T (2014) A response time model for abrupt changes in binocular disparity in The Visual Computer

Key Findings
Impact Summary


Description	Mnay algorithms for processing visual media: images and video. For example, extrapolating an image to make a larger one using data from online images, and changing the times at which things happen in a video.
Exploitation Route	Implementation in commercial software, Further research.
Sectors	Creative Economy Digital/Communication/Information Technologies (including Software) Education Leisure Activities including Sports Recreation and Tourism Culture Heritage Museums and Collections Other
URL	http://ralph.cs.cf.ac.uk/publications.html


Description	Various algorithms are in use by Chinese companies such as Tencent. A new journal, Computational Visual Media, has been set up by the UK and Chinese partners in on this grant, as a new outlet for this expanding area.
First Year Of Impact	2012
Sector	Digital/Communication/Information Technologies (including Software)
Impact Types	Societal Economic

Abstract

Planned Impact

Organisations

People

ORCID iD

Publications