Scene Understanding using New Global Energy Models

Lead Research Organisation: University of Oxford
Department Name: Engineering Science


This proposal concerns scene understanding from video. Computer vision algorithms for individual tasks such as objectrecognition, detection and segmentation has now reached some level of maturity. The next challenge is to integrate all thesealgorithms and address the problem of scene understanding. The problem of scene understanding involves explaining thewhole image by recognizing all the objects of interest within an image and their spatial extent or shape in 3D.The application to drive the research will be the problem of automated understanding of cities from video usingcomputer vision, inspired by the availability of massive new data sets such as that of Google's Street View, Yotta (who have agreed to supply OxfordBrookes with data) and Microsoft's Photosynth The scenario is as follows: a van drivesaround the roads of the UK, in the van are GPS equipment and multiple calibrated cameras, synchronized to capture andstore an image every two metres; giving a massive data set. The task is to recognize objects of interest in the video, fromroad signs and other street furniture, to particular buildings, to allow them to be located exactly on maps of the environment.A second scenario would be to perform scene understanding for indoor scenes such as home or office, with video taken froma normal camera and Z-cam.

Planned Impact

Impact Plan The aim of this project is twofold, first to engage in basic science, second to produce a commercially useful set of outcomes that will improve UK competitiveness. The former will be a set of papers on object recognition combined with structure, models thereof and combinatorial algorithms to bring the work to fruition. The latter will be greatly helped by interaction with the Oxford Metrics Group, both 2d3 and Yotta who have undertaken to meet with the project members regularly to ensure the outcomes are commercially useful. Professor Torr's contacts with Sony and Microsoft will enable him to steer the project along lines that should provide maximum benefit to the UK economy. This project clearly relates to the Digital Economy, which is highlighted as a key area in the EPSRC Delivery plan 2008-11, and in particular to Transport and the Creative industries, which are highlighted as an area of particular importance within the Digital Economy-thus this proposal lies exactly in accord with fundamental directions outlined in the EPSRC's Delivery Plan. Transport Industry: Our starting target industry is the highways industry who are interested in their 'asset inventories', e.g. location of street furniture, heights of bridges. The UK Government has now adopted a policy to implement Resource Accounting and Budgeting and Whole Government Accounting (WGA). The use of Asset Management Plans is essential to underpinning this policy. This is currently not obligatory, but is likely to be legislated within the next couple of years (i.e. a requirement in order to apply for road maintenance funding). At that point, we would expect that every local authority in the UK will be required to provide an inventory for their entire road network, totaling about 400,000km. The inventory would need to be updated annually. A typical rate for asset inventory is about 30/km. We expect that the USA will follow within the next 10 years; the Europe market similarly. Creative Industries: The second application that would be considered would be the identification of objects in indoor scenes. The scenes might be of rooms in the home, public building or workplace. Professor Torr works closely with Sony on the EyeToy (, the EyeToy is typically placed in the living room and being able to recognize objects within the living room would significantly help with the design of games. It is anticipated the the release of the time of flight camera as a peripheral for Microsoft's project Natal would revolutionize not only the gaming industry but research in computer vision as well. With the deep penetration of the Xbox it would be expected that over five million units would be sold. That means five million Z-cams in people's living rooms. Currently research on time of flight cameras is not the main stream, but Natal will change this and the commercial desire for such things as object recognition will be immediate in games, HCI, advertising using Z-cam data. Exploitation: Intellectual Property Rights management and exploitation will be managed by the Research and Business Development Office (RBDO) at Oxford Brookes University, which has access to financial and other resources to enable Intellectual Property and its commercial exploitation to be effectively managed, whilst maximizing the widespread dissemination of the research results. This includes, as appropriate, finance for patenting and proof of concept funding; IP, technology and market assessment; resources for defining and implementing a commercialization strategy though licensing, start-up company or other routes. Oxford Brookes Computer Vision group already has an established record for exploiting IP, and interactions with several companies. Agreements are in place with Sony and OMG.
Description we have worked on recognition and segmentation of objects in images

large scale coverage in media
e.g. google semantic Paint
in main stream news e.g. tabloids, and on BBC

also large scale coverage on media for our other work

this is best in world, beating top tech companues
Exploitation Route examples include

we are building augmented reality glasses to help partially sighted have enhanced vision.

we are working with Creative industries for film and gaming (segmentation of images and recognition).

We are collaborating with 3D cinema company RealD on building the software of the future generation glasses for viewing 3D films.

We are also working with search engine company Baidu on building software for wearable devices in retail scenario and automatic-cars.

spinning out a company to help partially sighted this year

Sectors Creative Economy,Education,Healthcare,Retail,Transport

Description I co-founded the new Oxford spin-out OxSight in 2016 with the aim of producing augmented reality glasses to help the partially sighted. The World Health Organization (WHO) estimates that 253 million people live with vision impairment: 36 mil lion are blind and 217 million have moderate to severe vision impairment. The World Health Organ ization (WHO) defines moderate to severe vision impairment as a person being unable to clearly see how many fingers are being held up at a distance of 6m (19 feet) or less, even when they are wearing glasses or contact lenses. These figures are set to rise dramatically in the near future, more than doub ling by 20508 as the population ages. There is a huge cost to the economy and society in care costs, loss of employment, and chronic depression that can be caused by the loss of independence that visu al impairment often entails. However, even among people registered as legally blind, a majority (85%) have retained some light perception: the issue is that their disability renders them unable to make sense of the signal, which just becomes to them confusing light noise. OxSight have created an augmented reality display system that allows people to regain a sense of independence by making better use of the light they can perceive, allowing them to see again to some extent. The spectacles al low the wearer to make sense of their surroundings by simplifying the ambient light, and translating it into shapes and shades that allow them to discern physical objects and perceive depth within their physical environment Read the news article and watch the TED talk:
First Year Of Impact 2016
Sector Creative Economy,Digital/Communication/Information Technologies (including Software),Healthcare
Impact Types Societal,Economic

Description Towards Total Scene Understanding using Structured Models
Amount £2,200,000 (GBP)
Funding ID 321162 
Organisation European Commission 
Sector Public
Country European Union (EU)
Start 01/2014 
End 01/2019
Description Real time processing on Apical 
Organisation Apical
Country United Kingdom 
Sector Private 
PI Contribution tech
Collaborator Contribution 160K fund student
Impact just started
Start Year 2014
Description segmentation with technicolor 
Organisation Technicolor
Country United Kingdom 
Sector Private 
PI Contribution segmentation tech
Collaborator Contribution tech
Impact tech
Start Year 2013
Title face tracking 
Description face tracker 
IP Reference  
Protection Copyrighted (e.g. software)
Year Protection Granted 2012
Licensed Yes
Impact licensed to Real D
Title some of the segmentation s/w was used in the Huawei phone 
Description Recent progress on salient object detection is substantial, benefiting mostly from the explosive development of Convolutional Neural Networks (CNNs). Semantic segmentation and salient object detection algorithms developed lately have been mostly based on Fully Convolutional Neural Networks (FCNs). There is still a large room for improvement over the generic FCN models that do not explicitly deal with the scale-space problem. Holistically-Nested Edge Detector (HED) provides a skip-layer structure with deep supervision for edge and boundary detection, but the performance gain of HED on saliency detection is not obvious. In this paper, we propose a new salient object detection method by introducing short connections to the skip-layer structures within the HED architecture. Our framework takes full advantage of multi-level and multi-scale features extracted from FCNs, providing more advanced representations at each layer, a property that is critically needed to perform segment detection. Our method produces state-of-the-art results on 5 widely tested salient object detection benchmarks, with advantages in terms of efficiency (0.08 seconds per image), effectiveness, and simplicity over the existing algorithms. Beyond that, we conduct an exhaustive analysis on the role of training data on performance. Our experimental results provide a more reasonable and powerful training set for future research and fair comparisons. 
Type Of Technology Software 
Year Produced 2018 
Impact This algorithm is used in flagship products such as Huawei Mate 10, Huawei Honour V10 etc, to create "AI Selfie: Brilliant Bokeh, perfect portraits" effects as demonstrated in Mate 10 launch show, in Munich, Germany. 
Company Name oxsight 
Description OxSight is a University of Oxford venture that uses the latest smart glasses to improve sight for blind and partially sighted people. OxSight's aim is to develop sight enhancing technologies to improve the quality of life for blind and partially sighted people around the world. Our current commercial products can enhance vision for people affected by conditions like glaucoma, diabetes and retinitis pigmentosa as well as some other degenerative eye diseases. 
Year Established 2016 
Impact see
Description Many mentions in News Media TV 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Media (as a channel to the public)
Results and Impact many BBC, news papers etc etc

some of it listed here

google semantic paint

see also here
Year(s) Of Engagement Activity 2015