Human Vision: Relationship to Three-Dimensional Surface Statistics of Natural Scenes

Lead Research Organisation: University of Southampton
Department Name: School of Psychology

Abstract

The human visual system has been fine-tuned over generations of evolution to operate effectively in our particular environment, allowing us to form rich 3D representations of the objects around us. The scenes that we encounter on a daily basis produce 2D retinal images that are complex and ambiguous. From this input, how does the visual system achieve the immensely difficult goal of recovering our surroundings, in such an impressively fast and robust way?
To achieve this feat, humans must use two types of information about their environment. First, we must learn the probabilistic relationships between 3D natural scene properties and the 2D image cues these produce. Second, we must learn which scene structures (shapes, distances, orientations) are most common, or probable in our 3D environment. This statistical knowledge about natural 3D scenes and their projected images allows us to maximize our perceptual performance. To better understand 3D perception, therefore, we must study the environment that we have evolved to process. A key goal of our research is to catalogue and evaluate the statistical structure of the environment that guides human depth perception. We will sample the range of scenes that humans frequently encounter (indoor and outdoor environments over different seasons and lighting conditions). For each scene, state-of-the-art ground based Light Detection and Ranging (LiDAR) technology will be used to measure the physical distance to all objects (trees, ground, etc.) from a single location - a 3D map of the scene. We will also take High Dynamic Range (HDR) photographs of the same scene, from the same vantage point. By collating this paired 3D and 2D data across numerous scenes we will create a comprehensive database of our environment, and the 2D images that it produces. By making the database publicly available it will facilitate not just our own work, but research by human and computer vision scientists around the world who are interested in a range of pure and applied visual processes.
There is great potential for computer vision to learn from the expert processor that is the human visual system: computer vision algorithms are easily out-performed by humans for a range of tasks, particularly when images correspond to more complex, realistic scenes. We are still far from understanding how the human visual system handles the kind of complex natural imagery that defeats computer vision algorithms. However, the robustness of the human visual system appears to hinge on: 1) exploiting the full range of available depth cues and 2) incorporating statistical 'priors': information about typical scene configurations. We will employ psychophysical experiments, guided by our analyses of natural scenes and their images, to develop valid and comprehensive computational models of human depth perception. We will concentrate our analysis and experimentation on key tasks in the process of recovering scene structure - estimating the location, orientation and curvature of surface segments across the environment. Our project addresses the need for more complex and ecologically valid models of human perception by studying how the brain implicitly encodes and interprets depth information to guide 3D perception.
Virtual 3D environments are now used in a range of settings, such as flight simulation and training systems, rehabilitation technologies, gaming, 3D movies and special effects. Perceptual biases are particularly influential when visual input is degraded, as they are in some of these simulated environments. To evaluate and improve these technologies we require a better understanding of 3D perception. In addition, the statistical models and inferential algorithms developed in the project will facilitate the development of computer vision algorithms for automatic estimation of depth structure in natural scenes. These algorithms have many applications, such as 2D to 3D film conversion, visual surveillance and biometrics.

Planned Impact

Our proposed work sits at the interface of human and computer vision. In essence, it asks how humans and computers infer 3D structure from 2D images in realistic, complex environments. Beyond these academic arenas, our work has clear implications for those working in applied computer vision and in the visual media industry. The latter two groups will exploit our work in technologies such as 2D to 3D conversion, special effects generation and creating virtual reality environments for applications such as gaming and training. The recent NextGen Review (2011) of the skill requirements (and current shortfall) for the UK's video games and visual effects industries highlighted the importance of those industries to the UK economy. In 2008, the global sales of video games created by UK companies reached £2 billion, contributing £1 billion in GDP, making the UK the third largest games developer in the world. The UK visual effects industry is also on the rise, contributing to blockbuster movies like Harry Potter, Inception and Batman. This sector grew by 17% between 2006 and 2008, with four of the worlds largest visual effects companies based in London.
Our work has a clear role in maintaining the lead role that the UK currently holds in these growing industries. These world-leading industries could benefit substantially from the input of experts in vision science. For example, many of the industry applications described above involve the inference of 3D scene structure from 2D images, but often have to rely on human hand segmentation and depth labeling of images to complement current computational algorithms. In contrast, humans are remarkably adept and robust in reconstructing their 3D world. Our work will expand current understanding of the structure of natural scenes, and how this statistical structure is exploited by the human visual system to efficiently recover depth. This ecological, natural scenes approach is critical to bridging the gap between human performance and current efforts to replicate it in computer vision applications. To ensure that this impact is realized, vision scientists must engage with those involved in gaming and visual effects. Currently, there is a lack of communication between vision researchers, and these industrial groups. Dr. Adams' discussions with attendees at the recent Conference on Visual Media Production (CVMP) in London made clear the potential for human vision to inform algorithms for visual media production that are efficient, and produce content that is realistic and enjoyable for the end user. For example, certain well-known strategies within human vision for recovering shape from shading are not exploited within technologies that capture and create 3D content. Discussions with our project partners (Hilton: applied computer vision and Grau: 3D media production, BBC) have identified particular areas where our work will inform current problems within applied computer vision and visual media production; the potential impact of our work is reflected in their Letters of Support.
The NextGen report identified an immediate need to change current practice in ICT training in schools and Universities to better reflect the skill needs of the gaming and visual media industries. This move is critical to ensure that these industries continue to lead the world market. Vision science relies heavily on the key skills highlighted in the report, including mathematics, physics, computer programming and design. By using visual illusions to explain key concepts in human vision, and demonstrating the mathematical and computational challenges in developing visual stimuli, we will design activities (for our website and for the science roadshow) that will engage young people and foster an interest in mathematics, perceptual psychology and its applications. By making use of the engaging aspects of visual perception we hope to inspire future generations of scientists and industry professionals.

Publications

10 25 50
 
Description We are interested in the structure of our natural environment, how this shapes human perception, and how the information can be exploited in computer vision (for example, estimating 3D structure from a 2D image.
One substantial key finding is our set of measurements of scenes, sampled from the environment within Hampshire, UK. This Southampton-York Natural Scenes Dataset (SYNS) is now a public dataset that researchers and industrial users (working in virtual reality, computer vision) are downloading and using for their research, or for product development.

The measurements taken at each scene include measurements of the 3D structure of the scene, a high dynamic range spherical image of the scene, and a panorama of stereo image pairs.

In addition, we have analysed the 3D structure of the scenes, to show how surface attitude (slant and tilt) varies across different types of scenes and elevations.

We have also used the dataset to investigate a number of different ways in which human vision is tuned to the statistics of the natural environment. For example, we have shown how this knowledge effects the perception of gloss. We have also shown how natural scene statistics bias our judgements of slant and tilt.

We are also using the dataset to show how edges in an image can be categorised, for example, to segment objects from their background.
Exploitation Route Our work can be taken forward in two key ways:
1) Other groups can use the public dataset as a critical tool in testing computer vision algorithms, or understanding human perception. As noted elsewhere, our dataset already has around 150 users, and we expect many more.
2) Other groups can build on our research findings - how natural statistics shape perception - to further understanding of human sensory perception.
Sectors Digital/Communication/Information Technologies (including Software),Education,Security and Diplomacy

URL https://syns.soton.ac.uk
 
Description As detailed in the 'engagement activities', we have used our work in a number of outreach and education activities. We have taken details of our research project, alongside more accessible information about human vision processing, to a number of schools, festivals and outreach events. We have engaged with tens of thousands of members of the public and school children. We have worked with the Winchester Science museum to create a suite of exhibits on the topic of human vision processing, with associated education materials for use in schools. These exhibitions have been hugely popular with visitors In addition, our natural scenes dataset (SYNS) is now public, and has around 300 active users, who have made more than 3000 downloads from the dataset. These include academics from all over the world, but also users from industry, and public health.
First Year Of Impact 2016
Sector Digital/Communication/Information Technologies (including Software),Education,Healthcare,Security and Diplomacy
Impact Types Cultural,Economic

 
Description Horizons 2020 Marie Sklodowska-Curie Actions Innovative Training Networks
Amount € 2,830,000 (EUR)
Organisation European Commission 
Department Horizon 2020
Sector Public
Country European Union (EU)
Start 10/2017 
End 10/2021
 
Description ROSSINI: Reconstructing 3D structure from single images: a perceptual reconstruction approach
Amount £349,735 (GBP)
Funding ID EP/S016368/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 01/2019 
End 12/2021
 
Description ViiHM collaboration grant
Amount £500 (GBP)
Organisation Visual image interpretation in humans and machines (ViiHM) 
Sector Academic/University
Country United Kingdom
Start 09/2015 
End 09/2015
 
Title SYNS 
Description Creating a database of natural scenes is a significant milestone for this EPSRC project. For each natural scene, we provide three types of data (i) 3D point cloud data from LiDAR, (ii) high dynamic range spherical images and (iii) stereoscopic, high resolution image pairs. 
Type Of Material Database/Collection of data 
Provided To Others? No  
Impact The data will be publicly available to all research groups within the next few months. I gave an invited talk at a recent conference (ViiHM), where I presented our work on this database, and some analyses of the point cloud data. There was a great deal of interest from other research groups (both human and computer vision scientists). We predict that the database will be widely used by other researchers who wish to understand human vision, or develop computer vision algorithms, for various problems such as image segmentation and depth estimation. Please note that the website at the URL provided is not yet fully functional. 
URL http://synsdata.soton.ac.uk
 
Description Depth and scene gist 
Organisation York University Toronto
Country Canada 
Sector Academic/University 
PI Contribution A collaborative research project, I am conducting the research using the SYNS dataset that was created as a key outcome of the EPSRC grant
Collaborator Contribution Addition of expertise in stereo depth processing from Professor Laurie Wilcox
Impact None yet
Start Year 2016
 
Description Light fields and perceived gloss 
Organisation New York University
Country United States 
Sector Academic/University 
PI Contribution I am working in collaboration with Professor Mike Landy and his graduate student, Gizem Kucukoglu, on two research projects.
Collaborator Contribution The graduate student is conducting some of the research under my supervision
Impact Two conference presentations (1 poster, 1 talk, both with published conference abstracts).
Start Year 2014
 
Description BMVA workshop in London 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact Around 60 people attended the workshop, whose theme was 3D reconstruction in both humans and machines. We had 2 international speakers.
Year(s) Of Engagement Activity 2020
URL https://britishmachinevisionassociation.github.io/meetings/20-01-29-3D%20worlds%20from%202D%20images...
 
Description Bestival 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Public/other audiences
Results and Impact We had a interactive stand on the topic of human visual processing.
5600 people engaged with the activities from our research group.
There were many questions and discussions.
Year(s) Of Engagement Activity 2015
 
Description Cheltenham Science Festival 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Public/other audiences
Results and Impact Our research group had a stand at the festival. Of approximately 45000 attendees, we engaged directly with 7500 people.
Visitors engaged in visual illusion activities, talked to researchers, and took away handout activities.
Year(s) Of Engagement Activity 2015
 
Description Glastonbury Festival 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact We had a display stand about human visual perception, which included activities, information, and take away activities.
5600 people engaged with the science display from our research group.
There were many questions and discussions.
Year(s) Of Engagement Activity 2015
 
Description Science And Engineering Day 
Form Of Engagement Activity Participation in an open day or visit at my research institution
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact 4000 people attended the University of Southampton Science and Engineering day.
Many questions and discussions about visual processing.
Year(s) Of Engagement Activity 2015,2016
URL http://www.southampton.ac.uk/per/university/festival/science-and-engineering-day.page
 
Description Thomas Hardye School visit, Dorchester 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Schools
Results and Impact Visit to the school to speak to and do activities with GCSE and AS level students about visual processing.
500 students engaged with our display, participating in activities, taking away illusion-based handouts.
Year(s) Of Engagement Activity 2015