Change and Anomaly Detection in Data Streams.
Lead Research Organisation:
Lancaster University
Department Name: Mathematics and Statistics
Abstract
It's a changing world.
Pick a system. Any will do. You could choose something simple, like the population dynamics in Lancaster's duck pond, or something fantastically complex, such as the movements of the stock markets around the world or the flow of information between every human being on the planet. Ultimately, every action in every system is governed by change and reaction to it. Every problem is a changepoint problem.
And yet, despite the increased interest in studying the detection of change in recent years, understanding in many aspects of the problem still lag behind where we would like them to be. The first major issue arises from the necessary consideration of multiple variables simultaneously. In most real-life situations, it will be necessary to examine the evolution of more than one quantity (in our duck pond example, the population of ducks and of the students who feed them would both be pertinent considerations). Yet, unpacking what a change means in this context, and how to detect it, is still very much in its infancy.
The second important strand of the project will involve speeding up existing changepoint detection methods. In order for such a detection method to be of any real world use, changes need to be detected as soon as possible, especially in situations where the nature of the change is subtle. Failure to do so can lead to the retention of policies which can be actively detrimental to the system in the medium to long term.
At the same time, however, distinguishing between true changes and mere anomalies in the data is important. Correctly identifying an anomalous occasion, and setting it apart from a true, persistent change, is vital for any decision maker. Doing this effectively and quickly, in the context with multiple data sets (known as a data stream) being observed simultaneously is the central goal of this research.
In partnership with BT.
Pick a system. Any will do. You could choose something simple, like the population dynamics in Lancaster's duck pond, or something fantastically complex, such as the movements of the stock markets around the world or the flow of information between every human being on the planet. Ultimately, every action in every system is governed by change and reaction to it. Every problem is a changepoint problem.
And yet, despite the increased interest in studying the detection of change in recent years, understanding in many aspects of the problem still lag behind where we would like them to be. The first major issue arises from the necessary consideration of multiple variables simultaneously. In most real-life situations, it will be necessary to examine the evolution of more than one quantity (in our duck pond example, the population of ducks and of the students who feed them would both be pertinent considerations). Yet, unpacking what a change means in this context, and how to detect it, is still very much in its infancy.
The second important strand of the project will involve speeding up existing changepoint detection methods. In order for such a detection method to be of any real world use, changes need to be detected as soon as possible, especially in situations where the nature of the change is subtle. Failure to do so can lead to the retention of policies which can be actively detrimental to the system in the medium to long term.
At the same time, however, distinguishing between true changes and mere anomalies in the data is important. Correctly identifying an anomalous occasion, and setting it apart from a true, persistent change, is vital for any decision maker. Doing this effectively and quickly, in the context with multiple data sets (known as a data stream) being observed simultaneously is the central goal of this research.
In partnership with BT.
Studentship Projects
Project Reference | Relationship | Related To | Start | End | Student Name |
---|---|---|---|---|---|
EP/P510543/1 | 01/10/2016 | 30/09/2021 | |||
1817269 | Studentship | EP/P510543/1 | 01/10/2015 | 30/09/2019 | Samuel Tickle |
Description | A paper detailing a fast new method for finding changepoints (points in a time series where something which is generating the data changes in some way), was accepted for publication (by the Journal of Computational and Graphical Statistics). Additionally, another paper looking at the changepoint problem for the case in which you may have multiple data sequences to process at once (which we can refer to as a multivariate dataset), was recently submitted. This paper focused on the changepoint problem for worldwide terrorism incidence since the beginning of the 1970s. These two papers form the bulk of my thesis, which was accepted for the award of doctor of philosophy subject to minor corrections in November 2019. In January 2020, these corrections were completed and the final version of the thesis approved. |
Exploitation Route | Code is available with the paper accepted by the Journal of Computational and Graphical Statistics. This paper was published in said journal in 2020. Additionally, the paper discussing the changepoint problem for the terrorism incidence setting has now been released as a preprint following the first round of reviews from the Journal of the Royal Statistical Society, Series A. We sent in the corrected manuscript in November 2020 and are waiting for the second round of reviews; code was also submitted with this paper and will be made open source once/if the paper is accepted. The latter will become part of a wider R package containing a multitude of multivariate changepoint detection methods pioneered at Lancaster University. I am currently working on developing the final chapter of my thesis as part of my new role as a research associate. An early version of this method has been used to analyse some complex data examples provided by British Telecommunications Ltd (BT). In addition, I have been able to use some of the methods developed during the PhD to help with, for example, government modelling of the response to the COVID-19 pandemic. This particular project involved a number of collaborators in my new post, and changepoint detection was an extremely helpful tool in assessing the effectiveness of various lockdown exit strategies during the summer of 2020. |
Sectors | Digital/Communication/Information Technologies (including Software),Government, Democracy and Justice |
URL | http://www.research.lancs.ac.uk/portal/en/people/sam-tickle(d3afff29-7b7a-4e7e-9076-5c628971de7c).html |
Description | The new method SUBSET, developed as part of this award, was used extensively during a project with the Cabinet Office in the summer months of 2020. This project concerned the relaxation of NPIs (Non-Pharmaceutical Interventions) and the effect this would have on the wider economy as well as the number of cases of COVID-19. A significant part of the problem here involved "proxying" the economy using high-frequency open source datasets. One example of such a dataset was the ENTSO-E energy platform, detailing energy usage per hour for a number of European countries. After de-seasonalising, we were able to use SUBSET to detect changes in the energy output of different European countries associated with the strengthening of NPIs at the beginning of the pandemic in February/March 2020. We were able to use this to, for example, identify those countries which the UK most closely "mirrored" in terms of policy and economic impact, and then forecast the impact of different exit strategies. |
First Year Of Impact | 2020 |
Sector | Energy,Government, Democracy and Justice |
Impact Types | Policy & public services |
Description | Co-Sponsor and Industrial Partner of the PhD Project |
Organisation | BT Group |
Country | United Kingdom |
Sector | Private |
PI Contribution | Inference with regards to provided datasets - both directly (via visits to Adastral Park) and indirectly (discussing findings via supervision insofar as the use of data assisted with the academic output of the project thus far). |
Collaborator Contribution | Provision of data for testing methods/techniques created during the PhD; regular meetings/supervisions with members of the BT Research Team to discuss progress and real life motivation for theory and methods developed during the project; proof-reading of academic output. |
Impact | Please see relevant section on visits to BT. |
Start Year | 2016 |
Description | Changepoint Event |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Industry/Business |
Results and Impact | A number of industrial representatives met with the Lancaster changepoint group for a two-day event in Manchester in which our respective research problems and achievements were discussed and issues going forward were identified. |
Year(s) Of Engagement Activity | 2017 |
Description | Presentation at the Royal Statistical Society Conference |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Postgraduate students |
Results and Impact | Spoke to attendees of the Royal Statistical Society on my parallelisation work. |
Year(s) Of Engagement Activity | 2018 |
Description | Presentation to the 2018 Research Students' Conference |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Postgraduate students |
Results and Impact | Presented the results of the first section of our PhD on our novel methods and associated theory for parallelising a common changepoint detection technique. As a result of the presentation, I was delighted to win an Outstanding Speaker Award, and subsequently a place at the Royal Statistical Society conference a few weeks later. |
Year(s) Of Engagement Activity | 2018 |
Description | School Visits - Series of ongoing interactive outreach sessions |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | Regional |
Primary Audience | Schools |
Results and Impact | 4 sessions (thus far) beginning in the autumn of 2016, with the aim of discussing my current research, as well as conducting research more generally. Surveys conducted following the sessions showed that the sessions were well-received and were a useful exposition into research practices. |
Year(s) Of Engagement Activity | 2016,2017,2018,2019 |
Description | Tommy Flowers Institute |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Industry/Business |
Results and Impact | Networking group set up by BT in 2016 which holds meetings around 3 times per year, with the aim of promoting cooperation, dialogue and collaboration between early career researchers, leading industrial experts and university institutions. The format of these meetings are fairly variable - my own personal involvement has included a short talk at the inaugural event, a contribution to a panel on future organisation (particularly with respect to integrating research) and helping to set up the institute's social media infrastructure. The institute has been extremely useful in exposing a wider industrial audience to the questions surrounding my work; in particular, the events have been helpful in providing a platform for discussing data-driven solutions to a wider audience within BT. |
Year(s) Of Engagement Activity | 2016,2017,2018 |