Learning the Memorable Information from Images
Lead Research Organisation:
University of York
Department Name: Computer Science
Abstract
Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.
Organisations
People |
ORCID iD |
Adrian Bors (Primary Supervisor) | |
Cameron Kyle-Davidson (Student) |
Publications

Cameron Kyle-Davidson
(2019)
Predicting Visual Memory Schemas withVariational Autoencoders
Studentship Projects
Project Reference | Relationship | Related To | Start | End | Student Name |
---|---|---|---|---|---|
EP/R513386/1 | 30/09/2018 | 31/12/2023 | |||
2109163 | Studentship | EP/R513386/1 | 30/09/2018 | 31/12/2021 | Cameron Kyle-Davidson |
NE/W503071/1 | 31/03/2021 | 30/03/2022 | |||
2109163 | Studentship | NE/W503071/1 | 30/09/2018 | 31/12/2021 | Cameron Kyle-Davidson |
Description | Image memorability prediction aims to develop computational algorithms capable of determining exactly how memorable any given image might be to the average person. Understanding memorability has far-reaching applications, among both commercial, educational and medical interests. However, most research focusing on image memorability treats this property as a single score assigned to a given image. Recently datasets have arisen which are human data driven, capturing memorability as a two-dimensional property across the image, rather than as a single "score". In our work, we seek to understand two-dimensional image memorability via neural models, and to develop techniques to generate images whose memorability we can define and manipulate. These two-dimensional image memorability maps are known as "Visual Memory Schema" (VMS) maps. We: 1.) Significantly enhanced the availability of VMS-based datasets. Initially, only an 800 image/VMS map dataset existed, which makes it difficult to determine whether any algorithms based on this dataset are constrained by the limits of the algorithm; or the limits of the dataset. We enhance this existing dataset by a further 800 images and VMS maps by following the original VMS experiment paradigm, then increase the available dataset yet further via a crowdsourced repeat-recognition experiment, to a total of 4,261 images/VMS map pairs. 2.) We employ these datasets to make advances in the automatic prediction of VMS maps for a wide array of scene images. These advances have been made possible via the application and development of several different classes of deep neural network. We initially focused on retasking Variational Autoencoders for the purposes of VMS map reconstruction, surpassing prior work in this field. Extending this, we develop and analyse a wide array of neural techniques and their application to VMS map prediction, with results superior to our own prior work. Our two-dimensional memorability models employ multi-scale information, depth information, and self-attention, and we find that the non-local detection of features granted by self-attention modules improves memorability map prediction. Finally, we investigate a novel method for combining single-score datasets with two-dimensional datasets to reach even higher levels of VMS map prediction accuracy. 3.) We combine VMS maps with certain generative models, and attempt to synthesise brand new scene images whose memorability we can control through modulation of an input "target" VMS map. By utilising our previous predictor models combined with state-of-the-art generative adversarial networks, we attempt to evaluate the memorability of generated images, and force the network to generate scene images that employ specific visual memory schemas. We evaluate our generated images on human observers via psychological experiment, and find that images generated to be more memorable appear to be rated as such by participants. We also note that a certain image quality for all generated images is necessary; poor quality memorability-modulated generated images do not cause the same human-perceived difference in memorability, due to the lack of clear semantic features in the images. |
Exploitation Route | The most straightforward method in which others could benefit from this outcome is by performing further analysis on the dataset that we have developed here. We structure our results in such a fashion to invite easy comparison with other methods, and provide a wide array of metrics should others wish to build on our work with memorability predictive models. Future work may also focus on the generative models; we are limited by available computational power and time, and hence have a hard limit at which resolution we can generate our memorable scene images. Academically, others may be interested in enhancing the output resolution of our models, the categories of images that the model is capable of generating, and investigate different methods for applying a VMS-based constraint upon a neural network. There are commercial applications of a refined variant of this technique; some may be interested in generating memorable backdrops for advertisements, or other marketing-related applications. Medical researchers may be interested in clinical applications of VMS-based memorability; questions remain on how neurodegenerative issues affect two-dimensional memorability of scene images, and could lead to methods to track disease/age-related cognitive decline. |
Sectors | Digital/Communication/Information Technologies (including Software) Healthcare Other |