📣 Help Shape the Future of UKRI's Gateway to Research (GtR)

We're improving UKRI's Gateway to Research and are seeking your input! If you would be interested in being interviewed about the improvements we're making and to have your say about how we can make GtR more user-friendly, impactful, and effective for the Research and Innovation community, please email gateway@ukri.org.

aeon: a toolkit for machine learning with time series

Lead Research Organisation: University of East Anglia
Department Name: Computing Sciences

Abstract

In recent years, machine learning frameworks such as scikit-learn have become essential infrastructure of modern data science. They have become the principal tool for practitioners and central components in scientific, commercial and industrial applications. But despite the ubiquity of time series data, until recently, no such framework exists for machine learning with time series. In 2019, sktime was conceived to fill this gap and it has become an established toolkit and software component for time series analysis used world-wide by academics and industry alike.
It is an easy-to-use, flexible and modular framework for a wide range of time series machine learning tasks. Techniques for learning from time series have been developed in a range of disciplines, including: statistics; machine learning; signal processing; econometrics; and finance. sktime aims to link these communities by providing a unified interface for related time series tasks such as forecasting, classification, clustering, regression, annotation, anomaly detection and segmentation. It provides scikit-learn compatible algorithms and gives easy access to implementations of state of the art algorithms not accessible in other packages. This project will allow sktime to continue to sustain and grow its operations by providing dedicated maintenance resource, enhancing the functionality and increasing engagement with scientific and industrial stakeholders. We wish to broaden the functionality of sktime to include new areas of active machine learning research and deepen our user base to reach new communities of researchers. Our aim is to link theory and practice by making it easier and faster for state of the art time series algorithms to be applied to real world problems of genuine scientific interest. To demonstrate this potential we will collaborate with domain experts on two applications. The first relates to predicting the early onset of dementia using electroencephalography (EEG). EEG are time series that record electrical activity in the brain using a series electrodes placed on the scalp. The equipment is relatively cheap and portable. If we could use it to screen for early onset dementia it could make a huge difference to the outcomes for many patients. However, the accuracy needed for clinical use is very hard to achieve. We will collaborate with experts in Cambridge who have clinical data and see if the state of the art predictive models can outperform traditional approaches. The second application involves analysing data generated from intensive care monitoring of children in Great Ormond Street Hospital (GOSH). Intensive care patients are continually monitored for vital body functions (heart rate, blood pressure, breathing rate, etc). Increasingly, this time series data is captured and can be mined to improve clinical practice. We will collaborate with a research team already working with GOSH to explore whether sktime can be used to decrease the time it takes to analyse this data.
This research may lead to insights that improve clinical practice by answering questions such as "when is the best time to remove the tube that is helping a patient breathe?". It will also help us reach our broader goal to speed up the discovery and dissemination of best practice. Data sharing between hospitals is, quite sensibly, difficult and time consuming. We wish to develop a new user base of hospital data scientists willing to share their research findings and code rather than their data. So, for example, if we discover something interesting in the GOSH data, we would like to rapidly share this finding and the code that verifies it in our data. This code sharing via sktime will dramatically reduce the time taken to test hypotheses on different observational data sets and give greater confidence in finding verified on independent groups of patients conducted transparently by different researchers.

Related Projects

Project Reference Relationship Related To Start End Award Value
EP/W030756/1 30/09/2022 30/07/2023 £534,661
EP/W030756/2 Transfer EP/W030756/1 31/07/2023 30/05/2026 £403,617
 
Description The project involves developing the aeon-toolkit to help improve scientific workflows, improve reproducibility in research and to widen the user-base. To these ends we have:
1) extended the functionality of aeon and through international collaboration we have conducted several comparative studies of novel algorithmic contributions. We have identified the state of the art approaches to time series classification, clustering and regression and released data and code to reproduce these experiments.
2) joined the organisation numFOCUS, a not for profit organisation aiming to promote open source toolkits and are running internship projects through Google Summer of Code. We have increased the number of developers for aeon and are actively promoting it to practitioners and researchers. We are also developing teaching material based around aeon.
3) formed a team to work on EEG data and made initial progress at making useful contributions to this research field.
Exploitation Route aeon is a general purpose toolkit for time series machine learning. Our goal is to form a development team and user base for aeon that will last far beyond the grant. aeon could impact a large number of areas of science, and we are actively seeking experts in a range of domains to collaborate with,
Sectors Agriculture

Food and Drink

Education

Environment

Healthcare

Retail

URL http://aeon-toolkit.org
 
Description aeon is an open source tool that has been used in a wide range on non academic settings. For example, two core developers work for Haleon, GCHQ have expressed interest and we have been in contact with several companies who are using aeon.
First Year Of Impact 2022
Sector Aerospace, Defence and Marine,Chemicals,Education,Energy,Environment,Financial Services, and Management Consultancy,Healthcare,Retail
Impact Types Societal

Economic

 
Title The revised Multivariate Time Series Classification Archive 
Description In 2022 we worked with researchers at the University of California, Riverside, to revise the archive of multivariate problems to assess multivariate time series classification algorithms 
Type Of Material Database/Collection of data 
Year Produced 2023 
Provided To Others? Yes  
Impact The database is currently being used by researchers, and is beginning to be referenced. 
 
Title Unsupervised Feature Based Algorithms for Time Series Extrinsic Regression 
Description Time Series Extrinsic Regression (TSER) involves using a set of training time series to form a predictive model of a continuous response variable that is not directly related to the regressor series. The TSER archive for comparing algorithms was released in 2022 with 19 problems. We increase the size of this archive to 63 problems and reproduce the previous comparison of baseline algorithms. We then extend the comparison to include a wider range of standard regressors and the latest versions of TSER models used in the previous study. We show that none of the previously evaluated regressors can outperform a regression adaptation of a standard classifier, rotation forest. We introduce two new TSER algorithms developed from related work in time series classification. FreshPRINCE is a pipeline estimator consisting of a transform into a wide range of summary features followed by a rotation forest regressor. DrCIF is a tree ensemble that creates features from summary statistics over random intervals. Our study demonstrates that both algorithms, along with InceptionTime, exhibit significantly better performance compared to the other 18 regressors tested. More importantly, these two proposals (DrCIF and FreshPRINCE) models are the only ones that significantly outperform the standard rotation forest regressor. 
Type Of Material Database/Collection of data 
Year Produced 2023 
Provided To Others? Yes  
Impact We have extended the TSER repo to 63 datasets and conducted extensive comparative experiments 
URL https://arxiv.org/abs/2305.01429
 
Title Updated Time Series Classification Repository 
Description We have added to the tsc.com repository, adding new data and new formats. 
Type Of Material Database/Collection of data 
Year Produced 2023 
Provided To Others? Yes  
Impact Over the years, the database has been downloaded thousands of times and been used in hundreds of papers. 
URL https://arxiv.org/abs/2304.13029
 
Description Invited talk at Huawei, PAris 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Presented aeon and time series machine learning at a workshop run by Huawei in Paris.
Year(s) Of Engagement Activity 2023
 
Description Invited talk at the University of Cordoba 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Presented an invited talk on the aeon toolkit and time series machine learning
Year(s) Of Engagement Activity 2024
 
Description Invited talk for GCHQ 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact I gave an invited online talk about aeon and machine learning to GCHQ
Year(s) Of Engagement Activity 2024
 
Description Invited talk to Haleon 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Industry/Business
Results and Impact Gave a talk to researchers at Haleon
Year(s) Of Engagement Activity 2024
 
Description Keynote talk at the KDIR conference in Turin 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Invited keynote talk at the KDIR conference describing algorithms and code developed through the grant
Year(s) Of Engagement Activity 2023
URL https://ic3k.scitevents.org/KeynoteSpeakers.aspx?y=2023
 
Description Organised the 2023 AALTD workshop at ECML/PKDD, Turin 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact I helped organise this time series workshop which featured several contributions to the aeon toolkit.
Year(s) Of Engagement Activity 2023
URL https://ecml-aaltd.github.io/aaltd2023/
 
Description Presentation at PyData Amsterdam 2023 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Presented clustering with the aeon toolkit at this PyData event, attended by a wide range of practitioners
Year(s) Of Engagement Activity 2023
URL https://amsterdam2023.pydata.org/cfp/talk/ENQV3F/