Scene Understanding using New Global Energy Models

Lead Research Organisation: Oxford Brookes University
Department Name: Faculty of Tech, Design and Environment


This proposal concerns scene understanding from video. Computer vision algorithms for individual tasks such as objectrecognition, detection and segmentation has now reached some level of maturity. The next challenge is to integrate all thesealgorithms and address the problem of scene understanding. The problem of scene understanding involves explaining thewhole image by recognizing all the objects of interest within an image and their spatial extent or shape in 3D.The application to drive the research will be the problem of automated understanding of cities from video usingcomputer vision, inspired by the availability of massive new data sets such as that of Google's Street View, Yotta (who have agreed to supply OxfordBrookes with data) and Microsoft's Photosynth The scenario is as follows: a van drivesaround the roads of the UK, in the van are GPS equipment and multiple calibrated cameras, synchronized to capture andstore an image every two metres; giving a massive data set. The task is to recognize objects of interest in the video, fromroad signs and other street furniture, to particular buildings, to allow them to be located exactly on maps of the environment.A second scenario would be to perform scene understanding for indoor scenes such as home or office, with video taken froma normal camera and Z-cam.

Planned Impact

Impact Plan The aim of this project is twofold, first to engage in basic science, second to produce a commercially useful set of outcomes that will improve UK competitiveness. The former will be a set of papers on object recognition combined with structure, models thereof and combinatorial algorithms to bring the work to fruition. The latter will be greatly helped by interaction with the Oxford Metrics Group, both 2d3 and Yotta who have undertaken to meet with the project members regularly to ensure the outcomes are commercially useful. Professor Torr's contacts with Sony and Microsoft will enable him to steer the project along lines that should provide maximum benefit to the UK economy. This project clearly relates to the Digital Economy, which is highlighted as a key area in the EPSRC Delivery plan 2008-11, and in particular to Transport and the Creative industries, which are highlighted as an area of particular importance within the Digital Economy-thus this proposal lies exactly in accord with fundamental directions outlined in the EPSRC's Delivery Plan. Transport Industry: Our starting target industry is the highways industry who are interested in their 'asset inventories', e.g. location of street furniture, heights of bridges. The UK Government has now adopted a policy to implement Resource Accounting and Budgeting and Whole Government Accounting (WGA). The use of Asset Management Plans is essential to underpinning this policy. This is currently not obligatory, but is likely to be legislated within the next couple of years (i.e. a requirement in order to apply for road maintenance funding). At that point, we would expect that every local authority in the UK will be required to provide an inventory for their entire road network, totaling about 400,000km. The inventory would need to be updated annually. A typical rate for asset inventory is about 30/km. We expect that the USA will follow within the next 10 years; the Europe market similarly. Creative Industries: The second application that would be considered would be the identification of objects in indoor scenes. The scenes might be of rooms in the home, public building or workplace. Professor Torr works closely with Sony on the EyeToy (, the EyeToy is typically placed in the living room and being able to recognize objects within the living room would significantly help with the design of games. It is anticipated the the release of the time of flight camera as a peripheral for Microsoft's project Natal would revolutionize not only the gaming industry but research in computer vision as well. With the deep penetration of the Xbox it would be expected that over five million units would be sold. That means five million Z-cams in people's living rooms. Currently research on time of flight cameras is not the main stream, but Natal will change this and the commercial desire for such things as object recognition will be immediate in games, HCI, advertising using Z-cam data. Exploitation: Intellectual Property Rights management and exploitation will be managed by the Research and Business Development Office (RBDO) at Oxford Brookes University, which has access to financial and other resources to enable Intellectual Property and its commercial exploitation to be effectively managed, whilst maximizing the widespread dissemination of the research results. This includes, as appropriate, finance for patenting and proof of concept funding; IP, technology and market assessment; resources for defining and implementing a commercialization strategy though licensing, start-up company or other routes. Oxford Brookes Computer Vision group already has an established record for exploiting IP, and interactions with several companies. Agreements are in place with Sony and OMG.


10 25 50
Description We are integrating our object recognition into glasses for the partially sighted. We are working with a chip company Apical to put it on mobile phones. We are working with Microsoft to put it in games We are working with Technicolor to put the segmentation into films see EP/I001107/2
Sector Creative Economy,Electronics,Healthcare
Impact Types Societal,Economic

Description Towards Total Scene Understanding using Structured Models
Amount £2,200,000 (GBP)
Funding ID 321162 
Organisation European Commission 
Sector Public
Country European Union (EU)
Start 12/2013 
End 02/2018
Description Towards Total Scene Understanding using Structured Models
Amount £2,200,000 (GBP)
Funding ID 321162 
Organisation European Commission 
Sector Public
Country European Union (EU)
Start 01/2014 
End 01/2019
Description Real time processing on Apical 
Organisation Apical
Country United Kingdom 
Sector Private 
PI Contribution tech
Collaborator Contribution 160K fund student
Impact just started
Start Year 2014
Description segmentation with technicolor 
Organisation Technicolor
Country United Kingdom 
Sector Private 
PI Contribution segmentation tech
Collaborator Contribution tech
Impact tech
Start Year 2013
Company Name oxsight 
Description OxSight is a University of Oxford venture that uses the latest smart glasses to improve sight for blind and partially sighted people. OxSight's aim is to develop sight enhancing technologies to improve the quality of life for blind and partially sighted people around the world. Our current commercial products can enhance vision for people affected by conditions like glaucoma, diabetes and retinitis pigmentosa as well as some other degenerative eye diseases. 
Year Established 2016 
Impact see
Description Many mentions in News Media TV 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Media (as a channel to the public)
Results and Impact many BBC, news papers etc etc

some of it listed here

google semantic paint

see also here
Year(s) Of Engagement Activity 2015