Scene Understanding using New Global Energy Models
Lead Research Organisation:
University of Oxford
Department Name: Engineering Science
Abstract
This proposal concerns scene understanding from video. Computer vision algorithms for individual tasks such as objectrecognition, detection and segmentation has now reached some level of maturity. The next challenge is to integrate all thesealgorithms and address the problem of scene understanding. The problem of scene understanding involves explaining thewhole image by recognizing all the objects of interest within an image and their spatial extent or shape in 3D.The application to drive the research will be the problem of automated understanding of cities from video usingcomputer vision, inspired by the availability of massive new data sets such as that of Google's Street Viewhttp://maps.google.com/help/maps/streetview/, Yotta http://www.yotta.tv/index.php (who have agreed to supply OxfordBrookes with data) and Microsoft's Photosynth http://labs.live.com/photosynth/. The scenario is as follows: a van drivesaround the roads of the UK, in the van are GPS equipment and multiple calibrated cameras, synchronized to capture andstore an image every two metres; giving a massive data set. The task is to recognize objects of interest in the video, fromroad signs and other street furniture, to particular buildings, to allow them to be located exactly on maps of the environment.A second scenario would be to perform scene understanding for indoor scenes such as home or office, with video taken froma normal camera and Z-cam.
Planned Impact
Impact Plan The aim of this project is twofold, first to engage in basic science, second to produce a commercially useful set of outcomes that will improve UK competitiveness. The former will be a set of papers on object recognition combined with structure, models thereof and combinatorial algorithms to bring the work to fruition. The latter will be greatly helped by interaction with the Oxford Metrics Group, both 2d3 and Yotta who have undertaken to meet with the project members regularly to ensure the outcomes are commercially useful. Professor Torr's contacts with Sony and Microsoft will enable him to steer the project along lines that should provide maximum benefit to the UK economy. This project clearly relates to the Digital Economy, which is highlighted as a key area in the EPSRC Delivery plan 2008-11, and in particular to Transport and the Creative industries, which are highlighted as an area of particular importance within the Digital Economy-thus this proposal lies exactly in accord with fundamental directions outlined in the EPSRC's Delivery Plan. Transport Industry: Our starting target industry is the highways industry who are interested in their 'asset inventories', e.g. location of street furniture, heights of bridges. The UK Government has now adopted a policy to implement Resource Accounting and Budgeting and Whole Government Accounting (WGA). The use of Asset Management Plans is essential to underpinning this policy. This is currently not obligatory, but is likely to be legislated within the next couple of years (i.e. a requirement in order to apply for road maintenance funding). At that point, we would expect that every local authority in the UK will be required to provide an inventory for their entire road network, totaling about 400,000km. The inventory would need to be updated annually. A typical rate for asset inventory is about 30/km. We expect that the USA will follow within the next 10 years; the Europe market similarly. Creative Industries: The second application that would be considered would be the identification of objects in indoor scenes. The scenes might be of rooms in the home, public building or workplace. Professor Torr works closely with Sony on the EyeToy (http://en.wikipedia.org/wiki/EyeToy), the EyeToy is typically placed in the living room and being able to recognize objects within the living room would significantly help with the design of games. It is anticipated the the release of the time of flight camera as a peripheral for Microsoft's project Natal would revolutionize not only the gaming industry but research in computer vision as well. With the deep penetration of the Xbox it would be expected that over five million units would be sold. That means five million Z-cams in people's living rooms. Currently research on time of flight cameras is not the main stream, but Natal will change this and the commercial desire for such things as object recognition will be immediate in games, HCI, advertising using Z-cam data. Exploitation: Intellectual Property Rights management and exploitation will be managed by the Research and Business Development Office (RBDO) at Oxford Brookes University, which has access to financial and other resources to enable Intellectual Property and its commercial exploitation to be effectively managed, whilst maximizing the widespread dissemination of the research results. This includes, as appropriate, finance for patenting and proof of concept funding; IP, technology and market assessment; resources for defining and implementing a commercialization strategy though licensing, start-up company or other routes. Oxford Brookes Computer Vision group already has an established record for exploiting IP, and interactions with several companies. Agreements are in place with Sony and OMG.
People |
ORCID iD |
Philip Torr (Principal Investigator) |
Publications
Cheng M
(2019)
BING: Binarized normed gradients for objectness estimation at 300fps
in Computational Visual Media
Cheng M
(2013)
SalientShape: group saliency in image collections
in The Visual Computer
Cheng M
(2014)
ImageSpirit Verbal Guided Image Parsing
in ACM Transactions on Graphics
Cheng M
(2015)
DenseCut: Densely Connected CRFs for Realtime GrabCut
in Computer Graphics Forum
Dokania P
(2018)
FLIPDIAL: A Generative Model for Two-Way Visual Dialogue
Ghosh A
(2018)
Multi-agent Diverse Generative Adversarial Networks
Golodetz S
(2018)
Collaborative Large-Scale Dense 3D Reconstruction with Online Inter-Agent Pose Optimisation.
in IEEE transactions on visualization and computer graphics
Description | we have worked on recognition and segmentation of objects in images large scale coverage in media e.g. google semantic Paint in main stream news e.g. tabloids, and on BBC also large scale coverage on media for our other work see http://www.robots.ox.ac.uk/~szheng/CRFasRNN.html this is best in world, beating top tech companues |
Exploitation Route | examples include we are building augmented reality glasses to help partially sighted have enhanced vision. we are working with Creative industries for film and gaming (segmentation of images and recognition). We are collaborating with 3D cinema company RealD on building the software of the future generation glasses for viewing 3D films. We are also working with search engine company Baidu on building software for wearable devices in retail scenario and automatic-cars. spinning out a company to help partially sighted this year see http://www.va-st.com/smart-specs/ |
Sectors | Creative Economy Education Healthcare Retail Transport |
URL | http://www.robots.ox.ac.uk/~tvg/ |
Description | I co-founded the new Oxford spin-out OxSight in 2016 with the aim of producing augmented reality glasses to help the partially sighted. The World Health Organization (WHO) estimates that 253 million people live with vision impairment: 36 mil lion are blind and 217 million have moderate to severe vision impairment. The World Health Organ ization (WHO) defines moderate to severe vision impairment as a person being unable to clearly see how many fingers are being held up at a distance of 6m (19 feet) or less, even when they are wearing glasses or contact lenses. These figures are set to rise dramatically in the near future, more than doub ling by 20508 as the population ages. There is a huge cost to the economy and society in care costs, loss of employment, and chronic depression that can be caused by the loss of independence that visu al impairment often entails. However, even among people registered as legally blind, a majority (85%) have retained some light perception: the issue is that their disability renders them unable to make sense of the signal, which just becomes to them confusing light noise. OxSight have created an augmented reality display system that allows people to regain a sense of independence by making better use of the light they can perceive, allowing them to see again to some extent. The spectacles al low the wearer to make sense of their surroundings by simplifying the ambient light, and translating it into shapes and shades that allow them to discern physical objects and perceive depth within their physical environment Read the news article and watch the TED talk: http://innovation.ox.ac.uk/news/oxford-spinout-develops-smart-glasses-giving-legally-blind-ability-rea https://www.youtube.com/watch?v=goYIHXKvwLk |
First Year Of Impact | 2016 |
Sector | Creative Economy,Digital/Communication/Information Technologies (including Software),Healthcare |
Impact Types | Societal Economic |
Description | Towards Total Scene Understanding using Structured Models |
Amount | £2,200,000 (GBP) |
Funding ID | 321162 |
Organisation | European Commission |
Sector | Public |
Country | European Union (EU) |
Start | 01/2014 |
End | 01/2019 |
Description | Real time processing on Apical |
Organisation | Apical |
Country | United Kingdom |
Sector | Private |
PI Contribution | tech |
Collaborator Contribution | 160K fund student |
Impact | just started |
Start Year | 2014 |
Description | segmentation with technicolor |
Organisation | Technicolor |
Country | United Kingdom |
Sector | Private |
PI Contribution | segmentation tech |
Collaborator Contribution | tech |
Impact | tech |
Start Year | 2013 |
Title | face tracking |
Description | face tracker |
IP Reference | |
Protection | Copyrighted (e.g. software) |
Year Protection Granted | 2012 |
Licensed | Yes |
Impact | licensed to Real D |
Title | some of the segmentation s/w was used in the Huawei phone |
Description | Recent progress on salient object detection is substantial, benefiting mostly from the explosive development of Convolutional Neural Networks (CNNs). Semantic segmentation and salient object detection algorithms developed lately have been mostly based on Fully Convolutional Neural Networks (FCNs). There is still a large room for improvement over the generic FCN models that do not explicitly deal with the scale-space problem. Holistically-Nested Edge Detector (HED) provides a skip-layer structure with deep supervision for edge and boundary detection, but the performance gain of HED on saliency detection is not obvious. In this paper, we propose a new salient object detection method by introducing short connections to the skip-layer structures within the HED architecture. Our framework takes full advantage of multi-level and multi-scale features extracted from FCNs, providing more advanced representations at each layer, a property that is critically needed to perform segment detection. Our method produces state-of-the-art results on 5 widely tested salient object detection benchmarks, with advantages in terms of efficiency (0.08 seconds per image), effectiveness, and simplicity over the existing algorithms. Beyond that, we conduct an exhaustive analysis on the role of training data on performance. Our experimental results provide a more reasonable and powerful training set for future research and fair comparisons. http://www.robots.ox.ac.uk/~tvg/publications/2017/DeepSal.pdf |
Type Of Technology | Software |
Year Produced | 2018 |
Impact | This algorithm is used in flagship products such as Huawei Mate 10, Huawei Honour V10 etc, to create "AI Selfie: Brilliant Bokeh, perfect portraits" effects as demonstrated in Mate 10 launch show, in Munich, Germany. |
URL | http://mmcheng.net/dss/ |
Company Name | OxSight |
Description | OxSight has developed SmartSpecs, a system of devices to help people with severe visual impairment navigate independently. The system uses cameras and computer vision algorithms to detect and highlight objects in real-time, creating an interactive overlay over the wearer's normal vision. |
Year Established | 2016 |
Impact | see http://smartspecs.co/ |
Website | http://www.oxsight.co.uk |
Description | Many mentions in News Media TV |
Form Of Engagement Activity | A press release, press conference or response to a media enquiry/interview |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Media (as a channel to the public) |
Results and Impact | many BBC, news papers etc etc some of it listed here http://www.robots.ox.ac.uk/~tvg/projects/SemanticPaint/index.php google semantic paint see also here http://www.va-st.com/smart-specs/ |
Year(s) Of Engagement Activity | 2015 |
URL | http://www.va-st.com/smart-specs/ |