Metric-based mutual information estimators and their application to analysing neural recordings Abstract

Lead Research Organisation: University of Bristol
Department Name: Computer Science

Abstract

Since Shannon's introduction of Information Theory, it has been applied widely and has found great success in a wide range of fields. These fields range from data compression through linguistics, to thermodynamics. It allows the quantification of the information contained in distributions of random variables, and how these distributions relate to one another. These are fundamental, widely important ideas - explaining its use and popularity.

One key measure in Information Theory is mutual information. This captures the amount of information learning one variable gives you about another. This measure has become popular in areas such as machine learning, and is growing in popularity. However, reliably, quickly calculating or estimating this quantity can prove challenging. In practice, this often involves calculating the whole distribution, followed by logarithmic and other non linear quantities, which can be hard to estimate, biased and slow to converge. In (Houghton, 2015) (Houghton, 2019), a new approach to estimating mutual information was introduced. This approximator relies on embedding data in a metric space, and exploiting the distance measure in that space. It is a Kozachenko-Leonenko estimator, estimating the mutual information from the relationship between data points, rather than from the points themselves. This gives the effect that it works well on high dimensional data, as it is this distance used in the estimator, not the data points. Practically, making use of the distance metric means that this estimator can work with a much smaller data set than traditional or naive methods. The effects and uses of this method will be explored, focussing on analysing neural recordings, and in machine learning.

Mutual information, or information gain, is a common quantity in machine learning and AI, either implicitly or explicitly. Explicit examples include splitting data sets to construct decision trees, and for measuring information flow in neural networks. Implicitly, as mutual information captures the amount of shared information in two random variables, machine learning and inference problems can often be framed using Information Theory and mutual information.

One of the areas Information Theory has been very useful in is neuroscience. One of the primary functions of the brain is to process and store information, so applying Information Theoretic approaches is an obvious fit. Experiments in neuroscience are, as data acquisition and experimental techniques improve, giving a lot of temporal high dimensional data, and often data of different types. The model independent nature of Information Theory means it is able to capture a very wide range of phenomena and interactions, as they are not limited by assuming a model. An integral part of this estimator is having a 'sensible' distance measure on the space of spike train recordings, a problem for which multiple measures have been proposed and will be investigated for use in estimating mutual information. This gives an area that this kind of estimator is suited to; one where large datasets can be costly or unfeasible. We aim to apply this mutual information estimator to biological data, recorded in Matt Jones' lab. The aim here would be, through the estimator, to work out the relationships between recorded neurons, and infer something of the network. Similar approaches have been used before, though in a more restricted setting, to good results, and we explore the effect of this new approach.
Overall, this project will study and apply this estimator, in the context of neuroscience on recorded data, as well as in the context of machine learning and for artificial neural networks - and to develop the theory of mutual information estimators. This research falls within the EPSRC Artificial Intelligence Technologies research area.

References Houghton, C. (2015). Calculating mutual information for spike trains and other data with distances but no coordinates.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/T517872/1 01/10/2020 30/09/2025
2445980 Studentship EP/T517872/1 01/10/2020 31/03/2024 Jake Witter