Human Vision: Relationship to Three-Dimensional Surface Statistics of Natural Scenes

Lead Research Organisation: University of Southampton

Department Name: Sch of Psychology

Abstract

The human visual system has been fine-tuned over generations of evolution to operate effectively in our particular environment, allowing us to form rich 3D representations of the objects around us. The scenes that we encounter on a daily basis produce 2D retinal images that are complex and ambiguous. From this input, how does the visual system achieve the immensely difficult goal of recovering our surroundings, in such an impressively fast and robust way?
To achieve this feat, humans must use two types of information about their environment. First, we must learn the probabilistic relationships between 3D natural scene properties and the 2D image cues these produce. Second, we must learn which scene structures (shapes, distances, orientations) are most common, or probable in our 3D environment. This statistical knowledge about natural 3D scenes and their projected images allows us to maximize our perceptual performance. To better understand 3D perception, therefore, we must study the environment that we have evolved to process. A key goal of our research is to catalogue and evaluate the statistical structure of the environment that guides human depth perception. We will sample the range of scenes that humans frequently encounter (indoor and outdoor environments over different seasons and lighting conditions). For each scene, state-of-the-art ground based Light Detection and Ranging (LiDAR) technology will be used to measure the physical distance to all objects (trees, ground, etc.) from a single location - a 3D map of the scene. We will also take High Dynamic Range (HDR) photographs of the same scene, from the same vantage point. By collating this paired 3D and 2D data across numerous scenes we will create a comprehensive database of our environment, and the 2D images that it produces. By making the database publicly available it will facilitate not just our own work, but research by human and computer vision scientists around the world who are interested in a range of pure and applied visual processes.
There is great potential for computer vision to learn from the expert processor that is the human visual system: computer vision algorithms are easily out-performed by humans for a range of tasks, particularly when images correspond to more complex, realistic scenes. We are still far from understanding how the human visual system handles the kind of complex natural imagery that defeats computer vision algorithms. However, the robustness of the human visual system appears to hinge on: 1) exploiting the full range of available depth cues and 2) incorporating statistical 'priors': information about typical scene configurations. We will employ psychophysical experiments, guided by our analyses of natural scenes and their images, to develop valid and comprehensive computational models of human depth perception. We will concentrate our analysis and experimentation on key tasks in the process of recovering scene structure - estimating the location, orientation and curvature of surface segments across the environment. Our project addresses the need for more complex and ecologically valid models of human perception by studying how the brain implicitly encodes and interprets depth information to guide 3D perception.
Virtual 3D environments are now used in a range of settings, such as flight simulation and training systems, rehabilitation technologies, gaming, 3D movies and special effects. Perceptual biases are particularly influential when visual input is degraded, as they are in some of these simulated environments. To evaluate and improve these technologies we require a better understanding of 3D perception. In addition, the statistical models and inferential algorithms developed in the project will facilitate the development of computer vision algorithms for automatic estimation of depth structure in natural scenes. These algorithms have many applications, such as 2D to 3D film conversion, visual surveillance and biometrics.

Planned Impact

Our proposed work sits at the interface of human and computer vision. In essence, it asks how humans and computers infer 3D structure from 2D images in realistic, complex environments. Beyond these academic arenas, our work has clear implications for those working in applied computer vision and in the visual media industry. The latter two groups will exploit our work in technologies such as 2D to 3D conversion, special effects generation and creating virtual reality environments for applications such as gaming and training. The recent NextGen Review (2011) of the skill requirements (and current shortfall) for the UK's video games and visual effects industries highlighted the importance of those industries to the UK economy. In 2008, the global sales of video games created by UK companies reached £2 billion, contributing £1 billion in GDP, making the UK the third largest games developer in the world. The UK visual effects industry is also on the rise, contributing to blockbuster movies like Harry Potter, Inception and Batman. This sector grew by 17% between 2006 and 2008, with four of the worlds largest visual effects companies based in London.
Our work has a clear role in maintaining the lead role that the UK currently holds in these growing industries. These world-leading industries could benefit substantially from the input of experts in vision science. For example, many of the industry applications described above involve the inference of 3D scene structure from 2D images, but often have to rely on human hand segmentation and depth labeling of images to complement current computational algorithms. In contrast, humans are remarkably adept and robust in reconstructing their 3D world. Our work will expand current understanding of the structure of natural scenes, and how this statistical structure is exploited by the human visual system to efficiently recover depth. This ecological, natural scenes approach is critical to bridging the gap between human performance and current efforts to replicate it in computer vision applications. To ensure that this impact is realized, vision scientists must engage with those involved in gaming and visual effects. Currently, there is a lack of communication between vision researchers, and these industrial groups. Dr. Adams' discussions with attendees at the recent Conference on Visual Media Production (CVMP) in London made clear the potential for human vision to inform algorithms for visual media production that are efficient, and produce content that is realistic and enjoyable for the end user. For example, certain well-known strategies within human vision for recovering shape from shading are not exploited within technologies that capture and create 3D content. Discussions with our project partners (Hilton: applied computer vision and Grau: 3D media production, BBC) have identified particular areas where our work will inform current problems within applied computer vision and visual media production; the potential impact of our work is reflected in their Letters of Support.
The NextGen report identified an immediate need to change current practice in ICT training in schools and Universities to better reflect the skill needs of the gaming and visual media industries. This move is critical to ensure that these industries continue to lead the world market. Vision science relies heavily on the key skills highlighted in the report, including mathematics, physics, computer programming and design. By using visual illusions to explain key concepts in human vision, and demonstrating the mathematical and computational challenges in developing visual stimuli, we will design activities (for our website and for the science roadshow) that will engage young people and foster an interest in mathematics, perceptual psychology and its applications. By making use of the engaging aspects of visual perception we hope to inspire future generations of scientists and industry professionals.

Funded Value:

£505,830

Funded Period:

Jun 13 - Dec 16

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/K005952/1

Principal Investigator:

Wendy Adams

Research Subject:

Info. & commun. Technol. (100%)

Research Topic:

Image & Vision Computing (50%)

Vision & Senses - ICT appl. (50%)

Organisations

People	ORCID iD
Wendy Adams (Principal Investigator)
Erich Graf (Co-Investigator)
Julian Leyland (Co-Investigator)

Publications

Author Name

Title Publication Date Published

|< < 1 2 3 > >|

10 25 50

Adams W (2015) Perception of 3D structure and natural scene statistics: The Southampton-York Natural Scenes (SYNS) dataset. in Journal of Vision

Adams W (2017) Estimation of gloss and shape from vision and touch. in Journal of Vision

Adams W (2015) The Southampton York natural scenes (SYNS) dataset

Adams W (2016) Natural scene statistics and estimation of shape and reflectance. in Journal of Vision

Adams Wendy J. (2016) Interactions between illumination, shape and reflectance in PERCEPTION

Adams WJ (2016) The Southampton-York Natural Scenes (SYNS) dataset: Statistics of surface attitude. in Scientific reports

Adams WJ (2016) Touch influences perceived gloss. in Scientific reports

Adams WJ (2018) Naturally glossy: Gloss perception, illumination statistics, and tone mapping. in Journal of vision

Adams WJ (2014) Effects of specular highlights on perceived surface convexity. in PLoS computational biology

Adams WJ (2019) Disruptive coloration and binocular disparity: breaking camouflage. in Proceedings. Biological sciences

Key Findings
Impact Summary
Further Funding
Research Databases and Models
Collaboration
Engagement Activities


Description	We are interested in the structure of our natural environment, how this shapes human perception, and how the information can be exploited in computer vision (for example, estimating 3D structure from a 2D image. One substantial key finding is our set of measurements of scenes, sampled from the environment within Hampshire, UK. This Southampton-York Natural Scenes Dataset (SYNS) is now a public dataset that researchers and industrial users (working in virtual reality, computer vision) are downloading and using for their research, or for product development. The measurements taken at each scene include measurements of the 3D structure of the scene, a high dynamic range spherical image of the scene, and a panorama of stereo image pairs. In addition, we have analysed the 3D structure of the scenes, to show how surface attitude (slant and tilt) varies across different types of scenes and elevations. We have also used the dataset to investigate a number of different ways in which human vision is tuned to the statistics of the natural environment. For example, we have shown how this knowledge effects the perception of gloss. We have also shown how natural scene statistics bias our judgements of slant and tilt. We are also using the dataset to show how edges in an image can be categorised, for example, to segment objects from their background.
Exploitation Route	Our work can be taken forward in two key ways: 1) Other groups can use the public dataset as a critical tool in testing computer vision algorithms, or understanding human perception. As noted elsewhere, our dataset already has around 450 users, and we expect many more. 2) Other groups can build on our research findings - how natural statistics shape perception - to further understanding of human sensory perception.
Sectors	Digital/Communication/Information Technologies (including Software),Education,Security and Diplomacy
URL	https://syns.soton.ac.uk


Description	As detailed in the 'engagement activities', we have used our work in a number of outreach and education activities. We have taken details of our research project, alongside more accessible information about human vision processing, to a number of schools, festivals and outreach events. We have engaged with tens of thousands of members of the public and school children. We have worked with the Winchester Science museum to create a suite of exhibits on the topic of human vision processing, with associated education materials for use in schools. These exhibitions have been hugely popular with visitors In addition, our natural scenes dataset (SYNS) is now public, and has around 450 active users, who have made more than 15000 downloads from the dataset. These include academics from 36 unique countries across the world, but also users from industry, and public health.
First Year Of Impact	2017
Sector	Digital/Communication/Information Technologies (including Software),Education,Healthcare,Security and Diplomacy
Impact Types	Cultural,Economic


Description	Horizons 2020 Marie Sklodowska-Curie Actions Innovative Training Networks
Amount	€ 2,830,000 (EUR)
Organisation	European Commission
Department	Horizon 2020
Sector	Public
Country	European Union (EU)
Start	10/2017
End	10/2021


Description	ROSSINI: Reconstructing 3D structure from single images: a perceptual reconstruction approach
Amount	£349,735 (GBP)
Funding ID	EP/S016368/1
Organisation	Engineering and Physical Sciences Research Council (EPSRC)
Sector	Public
Country	United Kingdom
Start	01/2019
End	12/2021


Description	ViiHM collaboration grant
Amount	£500 (GBP)
Organisation	Visual image interpretation in humans and machines (ViiHM)
Sector	Academic/University
Country	United Kingdom
Start	09/2015
End	09/2015


Title	SYNS
Description	Creating a database of natural scenes is a significant milestone for this EPSRC project. For each natural scene, we provide three types of data (i) 3D point cloud data from LiDAR, (ii) high dynamic range spherical images and (iii) stereoscopic, high resolution image pairs.
Type Of Material	Database/Collection of data
Provided To Others?	No
Impact	The data will be publicly available to all research groups within the next few months. I gave an invited talk at a recent conference (ViiHM), where I presented our work on this database, and some analyses of the point cloud data. There was a great deal of interest from other research groups (both human and computer vision scientists). We predict that the database will be widely used by other researchers who wish to understand human vision, or develop computer vision algorithms, for various problems such as image segmentation and depth estimation. Please note that the website at the URL provided is not yet fully functional.
URL	http://synsdata.soton.ac.uk


Description	Depth and scene gist
Organisation	York University Toronto
Country	Canada
Sector	Academic/University
PI Contribution	A collaborative research project, I am conducting the research using the SYNS dataset that was created as a key outcome of the EPSRC grant
Collaborator Contribution	Addition of expertise in stereo depth processing from Professor Laurie Wilcox
Impact	None yet
Start Year	2016


Description	Light fields and perceived gloss
Organisation	New York University
Country	United States
Sector	Academic/University
PI Contribution	I am working in collaboration with Professor Mike Landy and his graduate student, Gizem Kucukoglu, on two research projects.
Collaborator Contribution	The graduate student is conducting some of the research under my supervision
Impact	Two conference presentations (1 poster, 1 talk, both with published conference abstracts).
Start Year	2014


Description	BMVA workshop in London
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Industry/Business
Results and Impact	Around 60 people attended the workshop, whose theme was 3D reconstruction in both humans and machines. We had 2 international speakers.
Year(s) Of Engagement Activity	2020
URL	https://britishmachinevisionassociation.github.io/meetings/20-01-29-3D%20worlds%20from%202D%20images...


Description	Bestival
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Public/other audiences
Results and Impact	We had a interactive stand on the topic of human visual processing. 5600 people engaged with the activities from our research group. There were many questions and discussions.
Year(s) Of Engagement Activity	2015


Description	Cheltenham Science Festival
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Public/other audiences
Results and Impact	Our research group had a stand at the festival. Of approximately 45000 attendees, we engaged directly with 7500 people. Visitors engaged in visual illusion activities, talked to researchers, and took away handout activities.
Year(s) Of Engagement Activity	2015


Description	Glastonbury Festival
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Public/other audiences
Results and Impact	We had a display stand about human visual perception, which included activities, information, and take away activities. 5600 people engaged with the science display from our research group. There were many questions and discussions.
Year(s) Of Engagement Activity	2015


Description	Science And Engineering Day
Form Of Engagement Activity	Participation in an open day or visit at my research institution
Part Of Official Scheme?	No
Geographic Reach	Regional
Primary Audience	Public/other audiences
Results and Impact	4000 people attended the University of Southampton Science and Engineering day. Many questions and discussions about visual processing.
Year(s) Of Engagement Activity	2015,2016
URL	http://www.southampton.ac.uk/per/university/festival/science-and-engineering-day.page


Description	Thomas Hardye School visit, Dorchester
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Regional
Primary Audience	Schools
Results and Impact	Visit to the school to speak to and do activities with GCSE and AS level students about visual processing. 500 students engaged with our display, participating in activities, taking away illusion-based handouts.
Year(s) Of Engagement Activity	2015

Abstract

Planned Impact

Organisations

People

ORCID iD

Publications