Change and Anomaly Detection in Data Streams.

Lead Research Organisation: Lancaster University
Department Name: Mathematics and Statistics

Abstract

It's a changing world.

Pick a system. Any will do. You could choose something simple, like the population dynamics in Lancaster's duck pond, or something fantastically complex, such as the movements of the stock markets around the world or the flow of information between every human being on the planet. Ultimately, every action in every system is governed by change and reaction to it. Every problem is a changepoint problem.

And yet, despite the increased interest in studying the detection of change in recent years, understanding in many aspects of the problem still lag behind where we would like them to be. The first major issue arises from the necessary consideration of multiple variables simultaneously. In most real-life situations, it will be necessary to examine the evolution of more than one quantity (in our duck pond example, the population of ducks and of the students who feed them would both be pertinent considerations). Yet, unpacking what a change means in this context, and how to detect it, is still very much in its infancy.

The second important strand of the project will involve speeding up existing changepoint detection methods. In order for such a detection method to be of any real world use, changes need to be detected as soon as possible, especially in situations where the nature of the change is subtle. Failure to do so can lead to the retention of policies which can be actively detrimental to the system in the medium to long term.

At the same time, however, distinguishing between true changes and mere anomalies in the data is important. Correctly identifying an anomalous occasion, and setting it apart from a true, persistent change, is vital for any decision maker. Doing this effectively and quickly, in the context with multiple data sets (known as a data stream) being observed simultaneously is the central goal of this research.

In partnership with BT.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/P510543/1 01/10/2016 30/09/2021
1817269 Studentship EP/P510543/1 01/10/2015 30/09/2019 Samuel Tickle
 
Description A paper detailing a fast new method for finding changepoints (points in a time series where something which is generating the data changes in some way), was accepted for publication (by the Journal of Computational and Graphical Statistics). Additionally, another paper looking at the changepoint problem for the case in which you may have multiple data sequences to process at once (which we can refer to as a multivariate dataset), was recently submitted. This paper focused on the changepoint problem for worldwide terrorism incidence since the beginning of the 1970s. These two papers form the bulk of my thesis, which was accepted for the award of doctor of philosophy subject to minor corrections in November 2019. In January 2020, these corrections were completed and the final version of the thesis approved.
Exploitation Route Code is available with the paper accepted by the Journal of Computational and Graphical Statistics. This paper was published in said journal in 2020. Additionally, the paper discussing the changepoint problem for the terrorism incidence setting has now been released as a preprint following the first round of reviews from the Journal of the Royal Statistical Society, Series A. We sent in the corrected manuscript in November 2020 and are waiting for the second round of reviews; code was also submitted with this paper and will be made open source once/if the paper is accepted. The latter will become part of a wider R package containing a multitude of multivariate changepoint detection methods pioneered at Lancaster University. I am currently working on developing the final chapter of my thesis as part of my new role as a research associate. An early version of this method has been used to analyse some complex data examples provided by British Telecommunications Ltd (BT). In addition, I have been able to use some of the methods developed during the PhD to help with, for example, government modelling of the response to the COVID-19 pandemic. This particular project involved a number of collaborators in my new post, and changepoint detection was an extremely helpful tool in assessing the effectiveness of various lockdown exit strategies during the summer of 2020.
Sectors Digital/Communication/Information Technologies (including Software),Government, Democracy and Justice

URL http://www.research.lancs.ac.uk/portal/en/people/sam-tickle(d3afff29-7b7a-4e7e-9076-5c628971de7c).html
 
Description The new method SUBSET, developed as part of this award, was used extensively during a project with the Cabinet Office in the summer months of 2020. This project concerned the relaxation of NPIs (Non-Pharmaceutical Interventions) and the effect this would have on the wider economy as well as the number of cases of COVID-19. A significant part of the problem here involved "proxying" the economy using high-frequency open source datasets. One example of such a dataset was the ENTSO-E energy platform, detailing energy usage per hour for a number of European countries. After de-seasonalising, we were able to use SUBSET to detect changes in the energy output of different European countries associated with the strengthening of NPIs at the beginning of the pandemic in February/March 2020. We were able to use this to, for example, identify those countries which the UK most closely "mirrored" in terms of policy and economic impact, and then forecast the impact of different exit strategies.
First Year Of Impact 2020
Sector Energy,Government, Democracy and Justice
Impact Types Policy & public services

 
Description Co-Sponsor and Industrial Partner of the PhD Project 
Organisation BT Group
Country United Kingdom 
Sector Private 
PI Contribution Inference with regards to provided datasets - both directly (via visits to Adastral Park) and indirectly (discussing findings via supervision insofar as the use of data assisted with the academic output of the project thus far).
Collaborator Contribution Provision of data for testing methods/techniques created during the PhD; regular meetings/supervisions with members of the BT Research Team to discuss progress and real life motivation for theory and methods developed during the project; proof-reading of academic output.
Impact Please see relevant section on visits to BT.
Start Year 2016
 
Description Changepoint Event 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Industry/Business
Results and Impact A number of industrial representatives met with the Lancaster changepoint group for a two-day event in Manchester in which our respective research problems and achievements were discussed and issues going forward were identified.
Year(s) Of Engagement Activity 2017
 
Description Presentation at the Royal Statistical Society Conference 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Spoke to attendees of the Royal Statistical Society on my parallelisation work.
Year(s) Of Engagement Activity 2018
 
Description Presentation to the 2018 Research Students' Conference 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Presented the results of the first section of our PhD on our novel methods and associated theory for parallelising a common changepoint detection technique. As a result of the presentation, I was delighted to win an Outstanding Speaker Award, and subsequently a place at the Royal Statistical Society conference a few weeks later.
Year(s) Of Engagement Activity 2018
 
Description School Visits - Series of ongoing interactive outreach sessions 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Schools
Results and Impact 4 sessions (thus far) beginning in the autumn of 2016, with the aim of discussing my current research, as well as conducting research more generally. Surveys conducted following the sessions showed that the sessions were well-received and were a useful exposition into research practices.
Year(s) Of Engagement Activity 2016,2017,2018,2019
 
Description Tommy Flowers Institute 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact Networking group set up by BT in 2016 which holds meetings around 3 times per year, with the aim of promoting cooperation, dialogue and collaboration between early career researchers, leading industrial experts and university institutions. The format of these meetings are fairly variable - my own personal involvement has included a short talk at the inaugural event, a contribution to a panel on future organisation (particularly with respect to integrating research) and helping to set up the institute's social media infrastructure. The institute has been extremely useful in exposing a wider industrial audience to the questions surrounding my work; in particular, the events have been helpful in providing a platform for discussing data-driven solutions to a wider audience within BT.
Year(s) Of Engagement Activity 2016,2017,2018