Seeing the future

Lead Research Organisation: CARDIFF UNIVERSITY
Department Name: Computer Science

Abstract

Newspapers do not report stories about the sun rising or setting. This because what is expected or unchanging is of little interest, what is newsworthy is the unexpected or changing. In 1961 Horace Barlow, a British vision scientist, hypothesised that the sensory systems of the brain work on the same principle, it devotes little investigation to the expected and reports the unusual. It does this by trying to predict what is going to happen and when it fails it funnels resources at the "prediction errors". This makes intuitive sense, the things in a scene that merit attention are typically those that unpredictable or changing. This idea, "predictive coding" has proven very fruitful and today it underpins much research on how the brain works.

The aim of this project is to translate the idea of predictive coding from the fields of psychology and biology to the design of a computer vision system. The system will try to predict the upcoming camera image. Prediction will be based on the previous images seen (over immediate, intermediate and longer time scales), information about movement of the system through space (from the inertial sensor), and knowledge of how objects and people move. Prediction errors will be flagged.

This is a feasibility study, to determine and demonstrate the benefits of such a system and form a basis for future work.

Our specific objectives are to:

i) Evaluate the technical feasibility of building a predictive vision system.
ii) Assess the projected efficiency and accuracy of a predictive vision system.
iii) Evaluate the effectiveness of a predictive vision system for detecting changes or incongruencies in the scene.
iv) Assess the computational completeness and sufficiency of the predictive coding approach (this result will be fed back to the biology/psychology community).
v) Identify potential partners and explore potential industrial, security or healthcare applications of the technology.
vi) Take appropriate steps to protect intellectual property and then release the data and code to the broad research community.
vii) On successful completion seek follow-on funding (research council and/or industrial) for further work.

The project brings together expertise in computer vision, human vision and computational biology across two partner (GW4; Cardiff and Bristol) universities to work on this innovative and cutting edge work. The system is to be built from a stereo camera, an inertial sensor, and a CPU/GPU. This technology is available in some smartphones today. If the efficiency and accuracy benefits are realized then the technology could be based on a smartphone which opens up a broad range of applications, including in healthcare, security and ubiquitous computing.

Planned Impact

This 18 month feasibility study will build a demonstrator predictive vision system, evaluate its strengths and weaknesses, and explore potential application areas and broader use. During planned activities (e.g. workshop with industrial, security and healthcare delegates) we will explore the future applications of a predictive vision system. Below we outline potential beneficiaries of the work.

Economic Impact: Building Industrial Niches in the UK
The worldwide machine vision industry is expected to grow from 8 billion USD to 12 billion USD by 2020. Within this industrial niche, there are a large number of potential applications of our new technology. If deployed on a smart phone (see below) there are an abundance of applications in reach at very little cost. This opens up major industrial/business opportunities many of which can be envisioned (some examples below) but many unexpected applications will be envisioned by innovative smart phone developers.

Video Surveillance is a commonplace, wide ranging and massively growing application of video camera, entire "Smart Cities" are under increasing surveillance and many public and work places are observed for security and safety monitoring. Police forces worldwide are currently being deployed with body worn camera, as are military personnel. Many private vehicles and Police/Security forces have in car cameras (e.g. GoPro). An obvious application of our 'smart' video capture system would be to record and feed back anomalies to security agents, without the need for massive data storage and operator screening requirements.

Next-Generation Video Compression, Change Detection and Saliency mapping: A successful demonstrator system could provide a foundation for new approaches to video compression, event detection and salience mapping in a variety of video analysis scenarios (in particular monitoring and surveillance).

Greener smart sensor technologies: The efficiency advantages of a predictive system should allow the development of lower-energy smart sensors.

Ubiquitous computing, smartphone apps: A key feature of the predictive system is that it detects "interesting" events (changes or incongruities) in the scene in real time. All components of our system already exist on a smartphone. In a few years time, smartphone sensors will no doubt improve in accuracy and smartphone processing power will increase to allow our methods to be effectively deployed on such devices, opening potential for many far reaching applications and future technologies.

Societal Impact: Building a healthy, competitive and resilient society

Predictive sensing in care of elderly or infirm: Medicine has many ready-made challenges that our technology could assist. For example, a device that could be placed in a person's home, a care home or hospital to monitor for abnormal movement patterns, at one extreme that could be falls, at the other it could be early signs of Parkinson's gait or lameness etc., both would be flagged as prediction errors, deviations from the priors.

Predictive sensing as an aid for those with attentional or perceptual deficits: Visually impaired people, due low vision, or people with attentional problems, following brain injury, stroke, or just general ageing, could be equipped with an assistive sensor (incorporated in a handheld device, or a device worn on the body, on the chest or on the head in a google Glass type arrangement), to scan the environment and alert them to important events in the scene.

Developing World Leading Scientists: A key role will be played by this project's PDRA who work close collaboration with the investigator team. They will present at international conferences (see resources) where they can build their own reputation and establish, moving forward, their own scientific career (with multidisciplinary collaborators).
 
Description **This award has not quite completed - having received two extensions due to COVID disruption the closing date is set at 31st March 2021.
** As such final papers and indeed final results are still pending - we will continue this research after the closing deadline to finish off papers and tie up other loose ends.

The estimated rate that information enters the human eye is a staggering 72GB/s. Using very sluggish neurons and a power budget of a few watts the human brain processes this information, creates a percept of a stable world and detects important events occurring within the world, significantly outperforming conventional computer vision methods. How does the brain achieve such efficient and fast processing? Predictive coding (see [1,2]) is a term used by brain scientists to describe an idea that has its roots in the earliest days of psychology, and had a later echo in cybernetics. The theory proposes that in order to drastically reduce the inherent data processing requirements, and thus achieve efficiency, the brain tries to predict incoming sensory information. It does this for two reasons. First, the processing demands for testing a prediction of what something is are considerably lower than for deducing what something is. Second, prediction failures indicate sensory input that merits extra processing resources, where something is changing or incongruent.

We have built a prototype which demonstrates is the first hierarchical artificial predictive vision system.

The key finding is that:

Human inspired models of predictive coding can be utilised as predictive models for predictive computer vision tasks - specifically we can use such predictive models to predict the next image in a video. The models developed show improved performance in comparison to state of the deep learning models such a Prednet (https://github.com/coxlab/prednet)

Other important findings made as the research progressed include:

* Predicting Time to Contact (TTC) across the visual image can be estimated via changes in distance,changes in optical size, and changes in binocular disparity. When stimation of TTC across the whole visual field is more complicated and objects or features move within the visual field and need to be tracked to estimate TTC.

* Predicting out head movements: When an observer walks the head bounces (moves vertically), sways (moves laterally), bobs (changes speed in the forward/backward axis) and rolls (rotates about the forward/backward axis). These movements are typically ignored in computational and theoretical models of perception (and action). The result is that existing models are either incomplete or invalid. Head movements complicate the estimation of time-to-contact, etc. How might the brain deal with bounce, sway, bob and roll? We collected head movement data from natural human walking and developed an LSTM-based time series prediction network to learn how the head moves.

* Perceptual stability through prediction: How does the brain know that changes in the retinal image resulted from translation of the eye, rather than changes within the world? And how during translations of the eye, does the brain maintain a stable and persisting representation of the world on which attentional and other processes can operate? If the brain could predict how self-movement will change the retinal image, this would provide a solution to both problems. Via computer simulation, we developed an image prediction mechanism that could be implemented in the human brain.

* A probabilistic generative model of perception and action under the Free Energy Principle (Predictive coding) has been developed. Specifically, a mixed discrete-continuous framework was utilised to enact simulations of visual search and belief updating in a perceptual categorization task. We established the construct validity of the model by comparison with behavioural data, collected from participants performing this same task. We showed that partial data sampling is an efficient machine vision approach that may better approximate human behaviour.

* We have shown experimentally that visual recognition in human brains is accompanied by sparse selective feature selection (eye movements). These can be recapitulated in in silico systems.
Exploitation Route We plan to complete some papers outstanding from the end of the project.
We plan to run an industry focussed workshop, this is one objective not yet met. The workshop has been delayed due to COVID uncertainty and will no be run online sometime in April/May 2021. Following the workshop further funding opportunities will be sought.

The Variense VMU931 Python Toolkit is enjoyed by other VMU931 users.

We have brought predictive coding to the wider attention of human and vision communities, hopefully this will inspire other researchers.
Sectors Digital/Communication/Information Technologies (including Software),Healthcare,Security and Diplomacy

URL https://git.cardiff.ac.uk/seeing-the-future
 
Title Seeing the Future Data Set Acquisition 
Description Some initial data has been collected as planned in the first phase of this project. A data capture protocol using inertial sensors, and RGBD camera and a portable computer has been developed - We are able to take a bike helmet, strap on a portable video camera that records stereo depth, plus an inertial sensor that measures movement and walk around Cardiff to generate a rich dataset of city scenes. Sensors have been calibrated. 
Type Of Material Database/Collection of data 
Year Produced 2018 
Provided To Others? No  
Impact Data set is planned to be eventually released publicly. Data currently being processed, benchmarked and used to develop our prediction algorithms. 
 
Title Predictive Video Coding 
Description We are in the process of developing Predictive Video Coding algorithms. Progress is being made on Phase 1 Optical Flow based Prediction: At the lowest level, the system will take the current camera image including pixel-level depth information, an estimate of camera movement from an inertial sensor attached to the camera, and predict the next camera image. Progress is also being made on developing the predictive coding engine. 
Type Of Technology Software 
Year Produced 2018 
Impact Still in progress 
 
Title RosalynMoran/Covid-19: Covid-19 
Description Initial code release for 'Estimating required 'lockdown' cycles before immunity to SARS-CoV-2: Model-based analyses of susceptible population sizes, 'S0', in seven European countries including the UK and Ireland'. 
Type Of Technology Software 
Year Produced 2020 
Open Source License? Yes  
URL https://zenodo.org/record/3764670
 
Title RosalynMoran/Covid-19: Covid-19 
Description Initial code release for 'Estimating required 'lockdown' cycles before immunity to SARS-CoV-2: Model-based analyses of susceptible population sizes, 'S0', in seven European countries including the UK and Ireland'. 
Type Of Technology Software 
Year Produced 2020 
Open Source License? Yes  
URL https://zenodo.org/record/3766243
 
Title RosalynMoran/Covid-19: Covid-19 
Description Initial code release for 'Estimating required 'lockdown' cycles before immunity to SARS-CoV-2: Model-based analyses of susceptible population sizes, 'S0', in seven European countries including the UK and Ireland'. 
Type Of Technology Software 
Year Produced 2020 
Open Source License? Yes  
URL https://zenodo.org/record/3764669
 
Description Industry Facing Workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Industry/Business
Results and Impact Workshop was planned to take place for March 30th, 2020.
This was delayed due to COVID.

We now plan to run and online version sometime April 2021.
Year(s) Of Engagement Activity 2020
 
Description Presentation at an Industry Facing AI Festival (Feb 24, 2021) "Seminar: Cardiff University - Visual computing & machine learning " 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact Invited talk, overview of my research including the "Seeing the Future" project and the iCase PhD.
Year(s) Of Engagement Activity 2021
URL https://aiglobalfestival.com