aeon: a toolkit for machine learning with time series
Lead Research Organisation:
University of East Anglia
Department Name: Computing Sciences
Abstract
In recent years, machine learning frameworks such as scikit-learn have become essential infrastructure of modern data science. They have become the principal tool for practitioners and central components in scientific, commercial and industrial applications. But despite the ubiquity of time series data, until recently, no such framework exists for machine learning with time series. In 2019, sktime was conceived to fill this gap and it has become an established toolkit and software component for time series analysis used world-wide by academics and industry alike.
It is an easy-to-use, flexible and modular framework for a wide range of time series machine learning tasks. Techniques for learning from time series have been developed in a range of disciplines, including: statistics; machine learning; signal processing; econometrics; and finance. sktime aims to link these communities by providing a unified interface for related time series tasks such as forecasting, classification, clustering, regression, annotation, anomaly detection and segmentation. It provides scikit-learn compatible algorithms and gives easy access to implementations of state of the art algorithms not accessible in other packages. This project will allow sktime to continue to sustain and grow its operations by providing dedicated maintenance resource, enhancing the functionality and increasing engagement with scientific and industrial stakeholders. We wish to broaden the functionality of sktime to include new areas of active machine learning research and deepen our user base to reach new communities of researchers. Our aim is to link theory and practice by making it easier and faster for state of the art time series algorithms to be applied to real world problems of genuine scientific interest. To demonstrate this potential we will collaborate with domain experts on two applications. The first relates to predicting the early onset of dementia using electroencephalography (EEG). EEG are time series that record electrical activity in the brain using a series electrodes placed on the scalp. The equipment is relatively cheap and portable. If we could use it to screen for early onset dementia it could make a huge difference to the outcomes for many patients. However, the accuracy needed for clinical use is very hard to achieve. We will collaborate with experts in Cambridge who have clinical data and see if the state of the art predictive models can outperform traditional approaches. The second application involves analysing data generated from intensive care monitoring of children in Great Ormond Street Hospital (GOSH). Intensive care patients are continually monitored for vital body functions (heart rate, blood pressure, breathing rate, etc). Increasingly, this time series data is captured and can be mined to improve clinical practice. We will collaborate with a research team already working with GOSH to explore whether sktime can be used to decrease the time it takes to analyse this data.
This research may lead to insights that improve clinical practice by answering questions such as "when is the best time to remove the tube that is helping a patient breathe?". It will also help us reach our broader goal to speed up the discovery and dissemination of best practice. Data sharing between hospitals is, quite sensibly, difficult and time consuming. We wish to develop a new user base of hospital data scientists willing to share their research findings and code rather than their data. So, for example, if we discover something interesting in the GOSH data, we would like to rapidly share this finding and the code that verifies it in our data. This code sharing via sktime will dramatically reduce the time taken to test hypotheses on different observational data sets and give greater confidence in finding verified on independent groups of patients conducted transparently by different researchers.
It is an easy-to-use, flexible and modular framework for a wide range of time series machine learning tasks. Techniques for learning from time series have been developed in a range of disciplines, including: statistics; machine learning; signal processing; econometrics; and finance. sktime aims to link these communities by providing a unified interface for related time series tasks such as forecasting, classification, clustering, regression, annotation, anomaly detection and segmentation. It provides scikit-learn compatible algorithms and gives easy access to implementations of state of the art algorithms not accessible in other packages. This project will allow sktime to continue to sustain and grow its operations by providing dedicated maintenance resource, enhancing the functionality and increasing engagement with scientific and industrial stakeholders. We wish to broaden the functionality of sktime to include new areas of active machine learning research and deepen our user base to reach new communities of researchers. Our aim is to link theory and practice by making it easier and faster for state of the art time series algorithms to be applied to real world problems of genuine scientific interest. To demonstrate this potential we will collaborate with domain experts on two applications. The first relates to predicting the early onset of dementia using electroencephalography (EEG). EEG are time series that record electrical activity in the brain using a series electrodes placed on the scalp. The equipment is relatively cheap and portable. If we could use it to screen for early onset dementia it could make a huge difference to the outcomes for many patients. However, the accuracy needed for clinical use is very hard to achieve. We will collaborate with experts in Cambridge who have clinical data and see if the state of the art predictive models can outperform traditional approaches. The second application involves analysing data generated from intensive care monitoring of children in Great Ormond Street Hospital (GOSH). Intensive care patients are continually monitored for vital body functions (heart rate, blood pressure, breathing rate, etc). Increasingly, this time series data is captured and can be mined to improve clinical practice. We will collaborate with a research team already working with GOSH to explore whether sktime can be used to decrease the time it takes to analyse this data.
This research may lead to insights that improve clinical practice by answering questions such as "when is the best time to remove the tube that is helping a patient breathe?". It will also help us reach our broader goal to speed up the discovery and dissemination of best practice. Data sharing between hospitals is, quite sensibly, difficult and time consuming. We wish to develop a new user base of hospital data scientists willing to share their research findings and code rather than their data. So, for example, if we discover something interesting in the GOSH data, we would like to rapidly share this finding and the code that verifies it in our data. This code sharing via sktime will dramatically reduce the time taken to test hypotheses on different observational data sets and give greater confidence in finding verified on independent groups of patients conducted transparently by different researchers.
Organisations
- University of East Anglia (Lead Research Organisation)
- University of Cambridge (Project Partner)
- University of California Riverside (Project Partner)
- The Alan Turing Institute (Project Partner)
- GSK (Project Partner)
- Mercedes-Benz AG (Project Partner)
- UNIVERSITY COLLEGE LONDON (Project Partner)
- Monash University (Project Partner)
Publications
Ayllón-Gavilán R
(2025)
Convolutional- and Deep Learning-Based Techniques for Time Series Ordinal Classification
in IEEE Transactions on Cybernetics
Ayllón-Gavilán R
(2023)
Convolutional and Deep Learning based techniques for Time Series Ordinal Classification
David Guijo-Rubio
(2024)
Unsupervised feature based algorithms for time series extrinsic regression
David Guijo-Rubio
(2024)
Unsupervised feature based algorithms for time series extrinsic regression
Guijo-Rubio D
(2024)
Unsupervised feature based algorithms for time series extrinsic regression
in Data Mining and Knowledge Discovery
Guijo-Rubio D
(2023)
Unsupervised Feature Based Algorithms for Time Series Extrinsic Regression
Holder C
(2023)
A review and evaluation of elastic distance functions for time series clustering
in Knowledge and Information Systems
Related Projects
| Project Reference | Relationship | Related To | Start | End | Award Value |
|---|---|---|---|---|---|
| EP/W030756/1 | 30/09/2022 | 30/07/2023 | £534,661 | ||
| EP/W030756/2 | Transfer | EP/W030756/1 | 31/07/2023 | 30/05/2026 | £403,617 |
| Description | The project involves developing the aeon-toolkit to help improve scientific workflows, improve reproducibility in research and to widen the user-base. To these ends we have: 1) extended the functionality of aeon and through international collaboration we have conducted several comparative studies of novel algorithmic contributions. We have identified the state of the art approaches to time series classification, clustering and regression and released data and code to reproduce these experiments. 2) joined the organisation numFOCUS, a not for profit organisation aiming to promote open source toolkits and are running internship projects through Google Summer of Code. We have increased the number of developers for aeon and are actively promoting it to practitioners and researchers. We are also developing teaching material based around aeon. 3) formed a team to work on EEG data and made initial progress at making useful contributions to this research field. |
| Exploitation Route | aeon is a general purpose toolkit for time series machine learning. Our goal is to form a development team and user base for aeon that will last far beyond the grant. aeon could impact a large number of areas of science, and we are actively seeking experts in a range of domains to collaborate with, |
| Sectors | Agriculture Food and Drink Education Environment Healthcare Retail |
| URL | http://aeon-toolkit.org |
| Description | aeon is an open source tool that has been used in a wide range on non academic settings. For example, two core developers work for Haleon, GCHQ have expressed interest and we have been in contact with several companies who are using aeon. |
| First Year Of Impact | 2022 |
| Sector | Aerospace, Defence and Marine,Chemicals,Education,Energy,Environment,Financial Services, and Management Consultancy,Healthcare,Retail |
| Impact Types | Societal Economic |
| Title | The revised Multivariate Time Series Classification Archive |
| Description | In 2022 we worked with researchers at the University of California, Riverside, to revise the archive of multivariate problems to assess multivariate time series classification algorithms |
| Type Of Material | Database/Collection of data |
| Year Produced | 2023 |
| Provided To Others? | Yes |
| Impact | The database is currently being used by researchers, and is beginning to be referenced. |
| Title | Unsupervised Feature Based Algorithms for Time Series Extrinsic Regression |
| Description | Time Series Extrinsic Regression (TSER) involves using a set of training time series to form a predictive model of a continuous response variable that is not directly related to the regressor series. The TSER archive for comparing algorithms was released in 2022 with 19 problems. We increase the size of this archive to 63 problems and reproduce the previous comparison of baseline algorithms. We then extend the comparison to include a wider range of standard regressors and the latest versions of TSER models used in the previous study. We show that none of the previously evaluated regressors can outperform a regression adaptation of a standard classifier, rotation forest. We introduce two new TSER algorithms developed from related work in time series classification. FreshPRINCE is a pipeline estimator consisting of a transform into a wide range of summary features followed by a rotation forest regressor. DrCIF is a tree ensemble that creates features from summary statistics over random intervals. Our study demonstrates that both algorithms, along with InceptionTime, exhibit significantly better performance compared to the other 18 regressors tested. More importantly, these two proposals (DrCIF and FreshPRINCE) models are the only ones that significantly outperform the standard rotation forest regressor. |
| Type Of Material | Database/Collection of data |
| Year Produced | 2023 |
| Provided To Others? | Yes |
| Impact | We have extended the TSER repo to 63 datasets and conducted extensive comparative experiments |
| URL | https://arxiv.org/abs/2305.01429 |
| Title | Updated Time Series Classification Repository |
| Description | We have added to the tsc.com repository, adding new data and new formats. |
| Type Of Material | Database/Collection of data |
| Year Produced | 2023 |
| Provided To Others? | Yes |
| Impact | Over the years, the database has been downloaded thousands of times and been used in hundreds of papers. |
| URL | https://arxiv.org/abs/2304.13029 |
| Description | Invited talk at Huawei, PAris |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Professional Practitioners |
| Results and Impact | Presented aeon and time series machine learning at a workshop run by Huawei in Paris. |
| Year(s) Of Engagement Activity | 2023 |
| Description | Invited talk at the University of Cordoba |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Postgraduate students |
| Results and Impact | Presented an invited talk on the aeon toolkit and time series machine learning |
| Year(s) Of Engagement Activity | 2024 |
| Description | Invited talk for GCHQ |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | National |
| Primary Audience | Professional Practitioners |
| Results and Impact | I gave an invited online talk about aeon and machine learning to GCHQ |
| Year(s) Of Engagement Activity | 2024 |
| Description | Invited talk to Haleon |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | National |
| Primary Audience | Industry/Business |
| Results and Impact | Gave a talk to researchers at Haleon |
| Year(s) Of Engagement Activity | 2024 |
| Description | Keynote talk at the KDIR conference in Turin |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Professional Practitioners |
| Results and Impact | Invited keynote talk at the KDIR conference describing algorithms and code developed through the grant |
| Year(s) Of Engagement Activity | 2023 |
| URL | https://ic3k.scitevents.org/KeynoteSpeakers.aspx?y=2023 |
| Description | Organised the 2023 AALTD workshop at ECML/PKDD, Turin |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Postgraduate students |
| Results and Impact | I helped organise this time series workshop which featured several contributions to the aeon toolkit. |
| Year(s) Of Engagement Activity | 2023 |
| URL | https://ecml-aaltd.github.io/aaltd2023/ |
| Description | Presentation at PyData Amsterdam 2023 |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Postgraduate students |
| Results and Impact | Presented clustering with the aeon toolkit at this PyData event, attended by a wide range of practitioners |
| Year(s) Of Engagement Activity | 2023 |
| URL | https://amsterdam2023.pydata.org/cfp/talk/ENQV3F/ |
