Self-supervised autoencoder framework for salient sensory feature extraction
Lead Research Organisation:
Imperial College London
Department Name: Electrical and Electronic Engineering
Abstract
The natural world is full of noise, but the brain's capacity for information transmission is limited. Therefore, discarding irrelevant information from sensory inputs is key. Studies suggest that this could be partly achieved by the brain implementing information bottlenecks. To implement them, we need to measure information, which is often achieved with mutual information. However, this metric and many existing approaches for its estimation suffer from the curse of dimensionality. Recently, this challenge has been approached by framing mutual information estimation as a minmax optimisation problem in an adversarial setting, which has been found to be scalable in dimension and sample-size.
Building on this, we propose an adversarial-inspired autoencoder framework for salient sensory feature extraction consisting of three neural networks: encoder, decoder, and classifier. The objective is for the encoder to learn salient features necessary for classification, but not reconstruction. The auxiliary classification task helps condition the latent space of the encoder to capture salient features. Preliminary results on MNIST and CIFAR10 show that it discards irrelevant information from image data. Furthermore, it appears to perform figure-ground separation.
In this project, we aim to confirm these findings by training the framework on more complex datasets, investigate whether it can reproduce other feature selective mechanisms, and extend its application to speech processing. For the latter, we will begin by training the network on a simple task, such as vowel classification. As our approach relies on representations obtained directly from input data, the framework may provide a more precise explanation of neural responses to stimuli. This may lead to the generation of new, testable hypotheses about the features underlying noise-robust speech processing, as well as contribute to speech recognition system improvement.
Building on this, we propose an adversarial-inspired autoencoder framework for salient sensory feature extraction consisting of three neural networks: encoder, decoder, and classifier. The objective is for the encoder to learn salient features necessary for classification, but not reconstruction. The auxiliary classification task helps condition the latent space of the encoder to capture salient features. Preliminary results on MNIST and CIFAR10 show that it discards irrelevant information from image data. Furthermore, it appears to perform figure-ground separation.
In this project, we aim to confirm these findings by training the framework on more complex datasets, investigate whether it can reproduce other feature selective mechanisms, and extend its application to speech processing. For the latter, we will begin by training the network on a simple task, such as vowel classification. As our approach relies on representations obtained directly from input data, the framework may provide a more precise explanation of neural responses to stimuli. This may lead to the generation of new, testable hypotheses about the features underlying noise-robust speech processing, as well as contribute to speech recognition system improvement.
Organisations
People |
ORCID iD |
Studentship Projects
| Project Reference | Relationship | Related To | Start | End | Student Name |
|---|---|---|---|---|---|
| EP/W524323/1 | 30/09/2022 | 29/09/2028 | |||
| 2894189 | Studentship | EP/W524323/1 | 29/09/2023 | 30/03/2027 |