Online detection of changes in the latent structure of network models

Lead Research Organisation: Imperial College London
Department Name: Mathematics

Abstract

Network-valued data, which is to say data that can be represented by a collection of connected
nodes, are encountered in many domains, such as in the modeling of social networks, messaging
services, or computer networks. One can model the use of rentable bikes in urban areas using such a
framework. By a node, we simply refer to an object of interest. In the context of social networks, for
example, a node may represent a user's profile page, or in the rentable bikes setup, a node would be
a drop-off/collection point.
Often, there exists an underlying, unobserved, and dynamic structure within these networks. For
example, consider the task of detecting hacking using a network model of a company's network
traffic, where nodes represent physical computers, and edges represent connections between those
computers. An underlying, but crucially unobserved, group structure will exist between those nodes,
dictating how they interact. As this structure is unobserved, we cannot obtain direct data describing
it but must instead infer it from the interactions. As the network traffic evolves over time, we want
to be able to dynamically observe changes in the inferred latent group structure of the nodes, as
changes will reflect behavioral changes of a node or nodes, which could be used as a warning sign
of malicious behavior. We refer to such evolution in the group structure as a change point.
Methodology exists for modeling network-value data, and furthermore, there is work on finding
change points within these networks. However, there is no existing literature on detecting change
points in network data online, that is to say, in real-time. In this project, we propose a novel
methodology for detecting these unobserved changes in network models on the fly in a statistically
robust and computationally efficient manner. We aim to understand the mathematical properties of
changes to the structure of network-valued data, and to adjust our algorithm inline with our real time
confidence in such a detected change.

This project falls within the EPSRC Mathematical Sciences research area.

Planned Impact

The primary CDT impact will be training 75 PhD graduates as the next generation of leaders in statistics and statistical machine learning. These graduates will lead in industry, government, health care, and academic research. They will bridge the gap between academia and industry, resulting in significant knowledge transfer to both established and start-up companies. Because this cohort will also learn to mentor other researchers, the CDT will ultimately address a UK-wide skills gap. The students will also be crucial in keeping the UK at the forefront of methodological research in statistics and machine learning.
After graduating, students will act as multipliers, educating others in advanced methodology throughout their career. There are a range of further impacts:
- The CDT has a large number of high calibre external partners in government, health care, industry and science. These partnerships will catalyse immediate knowledge transfer, bringing cutting edge methodology to a large number of areas. Knowledge transfer will also be achieved through internships/placements of our students with users of statistics and machine learning.
- Our Women in Mathematics and Statistics summer programme is aimed at students who could go on to apply for a PhD. This programme will inspire the next generation of statisticians and also provide excellent leadership training for the CDT students.
- The students will develop new methodology and theory in the domains of statistics and statistical machine learning. It will be relevant research, addressing the key questions behind real world problems. The research will be published in the best possible statistics journals and machine learning conferences and will be made available online. To maximize reproducibility and replicability, source code and replication files will be made available as open source software or, when relevant to an industrial collaboration, held as a patent or software copyright.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/S023151/1 01/04/2019 30/09/2027
2748724 Studentship EP/S023151/1 03/10/2022 30/09/2026 Joshua Corneck-Willcox