Scene Understanding using New Global Energy Models

Lead Research Organisation: University of Oxford

Department Name: Engineering Science

Abstract

This proposal concerns scene understanding from video. Computer vision algorithms for individual tasks such as objectrecognition, detection and segmentation has now reached some level of maturity. The next challenge is to integrate all thesealgorithms and address the problem of scene understanding. The problem of scene understanding involves explaining thewhole image by recognizing all the objects of interest within an image and their spatial extent or shape in 3D.The application to drive the research will be the problem of automated understanding of cities from video usingcomputer vision, inspired by the availability of massive new data sets such as that of Google's Street Viewhttp://maps.google.com/help/maps/streetview/, Yotta http://www.yotta.tv/index.php (who have agreed to supply OxfordBrookes with data) and Microsoft's Photosynth http://labs.live.com/photosynth/. The scenario is as follows: a van drivesaround the roads of the UK, in the van are GPS equipment and multiple calibrated cameras, synchronized to capture andstore an image every two metres; giving a massive data set. The task is to recognize objects of interest in the video, fromroad signs and other street furniture, to particular buildings, to allow them to be located exactly on maps of the environment.A second scenario would be to perform scene understanding for indoor scenes such as home or office, with video taken froma normal camera and Z-cam.

Planned Impact

Impact Plan The aim of this project is twofold, first to engage in basic science, second to produce a commercially useful set of outcomes that will improve UK competitiveness. The former will be a set of papers on object recognition combined with structure, models thereof and combinatorial algorithms to bring the work to fruition. The latter will be greatly helped by interaction with the Oxford Metrics Group, both 2d3 and Yotta who have undertaken to meet with the project members regularly to ensure the outcomes are commercially useful. Professor Torr's contacts with Sony and Microsoft will enable him to steer the project along lines that should provide maximum benefit to the UK economy. This project clearly relates to the Digital Economy, which is highlighted as a key area in the EPSRC Delivery plan 2008-11, and in particular to Transport and the Creative industries, which are highlighted as an area of particular importance within the Digital Economy-thus this proposal lies exactly in accord with fundamental directions outlined in the EPSRC's Delivery Plan. Transport Industry: Our starting target industry is the highways industry who are interested in their 'asset inventories', e.g. location of street furniture, heights of bridges. The UK Government has now adopted a policy to implement Resource Accounting and Budgeting and Whole Government Accounting (WGA). The use of Asset Management Plans is essential to underpinning this policy. This is currently not obligatory, but is likely to be legislated within the next couple of years (i.e. a requirement in order to apply for road maintenance funding). At that point, we would expect that every local authority in the UK will be required to provide an inventory for their entire road network, totaling about 400,000km. The inventory would need to be updated annually. A typical rate for asset inventory is about 30/km. We expect that the USA will follow within the next 10 years; the Europe market similarly. Creative Industries: The second application that would be considered would be the identification of objects in indoor scenes. The scenes might be of rooms in the home, public building or workplace. Professor Torr works closely with Sony on the EyeToy (http://en.wikipedia.org/wiki/EyeToy), the EyeToy is typically placed in the living room and being able to recognize objects within the living room would significantly help with the design of games. It is anticipated the the release of the time of flight camera as a peripheral for Microsoft's project Natal would revolutionize not only the gaming industry but research in computer vision as well. With the deep penetration of the Xbox it would be expected that over five million units would be sold. That means five million Z-cams in people's living rooms. Currently research on time of flight cameras is not the main stream, but Natal will change this and the commercial desire for such things as object recognition will be immediate in games, HCI, advertising using Z-cam data. Exploitation: Intellectual Property Rights management and exploitation will be managed by the Research and Business Development Office (RBDO) at Oxford Brookes University, which has access to financial and other resources to enable Intellectual Property and its commercial exploitation to be effectively managed, whilst maximizing the widespread dissemination of the research results. This includes, as appropriate, finance for patenting and proof of concept funding; IP, technology and market assessment; resources for defining and implementing a commercialization strategy though licensing, start-up company or other routes. Oxford Brookes Computer Vision group already has an established record for exploiting IP, and interactions with several companies. Agreements are in place with Sony and OMG.

Funded Value:

£327,698

Funded Period:

Sep 13 - Feb 16

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/I001107/2

Principal Investigator:

Philip Torr

Research Subject:

Info. & commun. Technol. (100%)

Research Topic:

Image & Vision Computing (100%)

Organisations

People	ORCID iD
Philip Torr (Principal Investigator)

Publications

Author Name

Title Publication Date Published

|< < 1 2 3 4 5 6 7 > >|

10 25 50

Cheng M (2019) BING: Binarized normed gradients for objectness estimation at 300fps in Computational Visual Media

Cheng M (2013) SalientShape: group saliency in image collections in The Visual Computer

Cheng M (2014) ImageSpirit Verbal Guided Image Parsing in ACM Transactions on Graphics

Cheng M (2015) DenseCut: Densely Connected CRFs for Realtime GrabCut in Computer Graphics Forum

Cheng M (2016) Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part III

Dehghan A (2015) Target Identity-aware Network Flow for online multiple target tracking

Desmaison A (2016) Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II

Dokania P (2018) FLIPDIAL: A Generative Model for Two-Way Visual Dialogue

Ghosh A (2018) Multi-agent Diverse Generative Adversarial Networks

Golodetz S (2018) Collaborative Large-Scale Dense 3D Reconstruction with Online Inter-Agent Pose Optimisation. in IEEE transactions on visualization and computer graphics

Key Findings
Impact Summary
Further Funding
Collaboration
Intellectual Property
Software and Technical Products
Spin Outs
Engagement Activities


Description	we have worked on recognition and segmentation of objects in images large scale coverage in media e.g. google semantic Paint in main stream news e.g. tabloids, and on BBC also large scale coverage on media for our other work see http://www.robots.ox.ac.uk/~szheng/CRFasRNN.html this is best in world, beating top tech companues
Exploitation Route	examples include we are building augmented reality glasses to help partially sighted have enhanced vision. we are working with Creative industries for film and gaming (segmentation of images and recognition). We are collaborating with 3D cinema company RealD on building the software of the future generation glasses for viewing 3D films. We are also working with search engine company Baidu on building software for wearable devices in retail scenario and automatic-cars. spinning out a company to help partially sighted this year see http://www.va-st.com/smart-specs/
Sectors	Creative Economy Education Healthcare Retail Transport
URL	http://www.robots.ox.ac.uk/~tvg/


Description	I co-founded the new Oxford spin-out OxSight in 2016 with the aim of producing augmented reality glasses to help the partially sighted. The World Health Organization (WHO) estimates that 253 million people live with vision impairment: 36 mil lion are blind and 217 million have moderate to severe vision impairment. The World Health Organ ization (WHO) defines moderate to severe vision impairment as a person being unable to clearly see how many fingers are being held up at a distance of 6m (19 feet) or less, even when they are wearing glasses or contact lenses. These figures are set to rise dramatically in the near future, more than doub ling by 20508 as the population ages. There is a huge cost to the economy and society in care costs, loss of employment, and chronic depression that can be caused by the loss of independence that visu al impairment often entails. However, even among people registered as legally blind, a majority (85%) have retained some light perception: the issue is that their disability renders them unable to make sense of the signal, which just becomes to them confusing light noise. OxSight have created an augmented reality display system that allows people to regain a sense of independence by making better use of the light they can perceive, allowing them to see again to some extent. The spectacles al low the wearer to make sense of their surroundings by simplifying the ambient light, and translating it into shapes and shades that allow them to discern physical objects and perceive depth within their physical environment Read the news article and watch the TED talk: http://innovation.ox.ac.uk/news/oxford-spinout-develops-smart-glasses-giving-legally-blind-ability-rea https://www.youtube.com/watch?v=goYIHXKvwLk
First Year Of Impact	2016
Sector	Creative Economy,Digital/Communication/Information Technologies (including Software),Healthcare
Impact Types	Societal Economic


Description	Towards Total Scene Understanding using Structured Models
Amount	£2,200,000 (GBP)
Funding ID	321162
Organisation	European Commission
Sector	Public
Country	European Union (EU)
Start	01/2014
End	01/2019


Description	Real time processing on Apical
Organisation	Apical
Country	United Kingdom
Sector	Private
PI Contribution	tech
Collaborator Contribution	160K fund student
Impact	just started
Start Year	2014


Description	segmentation with technicolor
Organisation	Technicolor
Country	United Kingdom
Sector	Private
PI Contribution	segmentation tech
Collaborator Contribution	tech
Impact	tech
Start Year	2013


Title	face tracking
Description	face tracker
IP Reference
Protection	Copyrighted (e.g. software)
Year Protection Granted	2012
Licensed	Yes
Impact	licensed to Real D


Title	some of the segmentation s/w was used in the Huawei phone
Description	Recent progress on salient object detection is substantial, benefiting mostly from the explosive development of Convolutional Neural Networks (CNNs). Semantic segmentation and salient object detection algorithms developed lately have been mostly based on Fully Convolutional Neural Networks (FCNs). There is still a large room for improvement over the generic FCN models that do not explicitly deal with the scale-space problem. Holistically-Nested Edge Detector (HED) provides a skip-layer structure with deep supervision for edge and boundary detection, but the performance gain of HED on saliency detection is not obvious. In this paper, we propose a new salient object detection method by introducing short connections to the skip-layer structures within the HED architecture. Our framework takes full advantage of multi-level and multi-scale features extracted from FCNs, providing more advanced representations at each layer, a property that is critically needed to perform segment detection. Our method produces state-of-the-art results on 5 widely tested salient object detection benchmarks, with advantages in terms of efficiency (0.08 seconds per image), effectiveness, and simplicity over the existing algorithms. Beyond that, we conduct an exhaustive analysis on the role of training data on performance. Our experimental results provide a more reasonable and powerful training set for future research and fair comparisons. http://www.robots.ox.ac.uk/~tvg/publications/2017/DeepSal.pdf
Type Of Technology	Software
Year Produced	2018
Impact	This algorithm is used in flagship products such as Huawei Mate 10, Huawei Honour V10 etc, to create "AI Selfie: Brilliant Bokeh, perfect portraits" effects as demonstrated in Mate 10 launch show, in Munich, Germany.
URL	http://mmcheng.net/dss/


Company Name	OxSight
Description	OxSight has developed SmartSpecs, a system of devices to help people with severe visual impairment navigate independently. The system uses cameras and computer vision algorithms to detect and highlight objects in real-time, creating an interactive overlay over the wearer's normal vision.
Year Established	2016
Impact	see http://smartspecs.co/
Website	http://www.oxsight.co.uk


Description	Many mentions in News Media TV
Form Of Engagement Activity	A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Media (as a channel to the public)
Results and Impact	many BBC, news papers etc etc some of it listed here http://www.robots.ox.ac.uk/~tvg/projects/SemanticPaint/index.php google semantic paint see also here http://www.va-st.com/smart-specs/
Year(s) Of Engagement Activity	2015
URL	http://www.va-st.com/smart-specs/

Abstract

Planned Impact

Organisations

People

ORCID iD

Publications