The Collective of Transform Ensembles (COTE) for Time Series Classification
Lead Research Organisation:
University of East Anglia
Department Name: Computing Sciences
Abstract
Time series classification is the problem of trying to predict an outcome based on a series of ordered data. So, for example, if we take a series of electronic readings from a sample of meat, the classification problem could be to determine whether that sample is pure beef or whether it has been adulterated with some other meat. Alternatively, if we have a series of electricity usage, the classification problem could be to determine which type of device generated those readings. Time series classification problems arise in all areas of science, and we have worked on problems involving ECG and EEG data, chemical concentration readings, astronomical measurements, otolith outlines, electricity usage, food spectrographs, hand and bone radiograph data and mutant worm motion. The algorithm we have developed to do this, The Collective of Transform Ensembles (COTE), is significantly better than any other technique proposed in the literature (when assessed on 80 data sets used in the literature). This project looks to improve COTE further and to apply it to three problem domains of genuine importance to society. In collaboration with Imperial, we will look at classifying Caenorhabditis elegans via motion traces. C. elegans is a nematode worm commonly used as a model organism in the study of genetics. We will help develop an automated classifier for C. elegans mutant types based on their motion, with the objective of identifying genes that regulate appetite. This classifier will automate a task previously done manually at great cost and will uncover conserved regulators of appetite in a model organism in which functional dissection is possible at the level of behaviour, neural circuitry, and fat storage. In the long term, this may give insights into the genetic component of human obesity.
Working closely with the Institute of Food Research (IFR), we will attempt to solve two problems involving classifying food types by their molecular spectra (infrared, IR, and nuclear magnetic resonance, NMR). The first problem involves classifying meat type. The horse meat scandal of 2012/3 has shown that there is an urgent need to increase current authenticity testing regimes for meat. IFR have been working closely with a company called Oxford Instruments to develop a new low-cost, bench-top spectrometer called the Pulsar for rapid screening of meat. We will collaborate with IFR to find the best algorithms for performing this classification. The second problem aims to find non-destructive ways for testing whether the content of intact spirits bottles is genuine or fake. Forged alcohol is commonplace, and in recent years there has been an increasing number of serious injuries and even deaths from the consumption of illegally produced spirits. The development of sensor technology to detect this type of fraud would thus have great societal value, and the collaboration with Oxford Instruments offers the potential for the development of portable scanners for product verification.
Our third case study involves classifying electric devices from smart meter data. Currently 25% of the United Kingdom's greenhouse gasses are accounted for by domestic energy consumption, such as heating, lighting and appliance use. The government has committed to an 80% reduction of CO2 emissions by 2050, and to meet this is requiring the installation of smart energy meters in every household to promote energy saving. The primary output of this investment of billions of pounds in technology will be enormous quantities of data relating to electricity usage. Understanding and intelligently using this data will be crucial if we are to meet the emissions target. We will focus on one part of the analysis, which is the problem of determining whether we can automatically classify the nature of the device(s) currently consuming electricity at any point in time. This is a necessary first step in better understanding household practices, which is essential for reducing usage.
Working closely with the Institute of Food Research (IFR), we will attempt to solve two problems involving classifying food types by their molecular spectra (infrared, IR, and nuclear magnetic resonance, NMR). The first problem involves classifying meat type. The horse meat scandal of 2012/3 has shown that there is an urgent need to increase current authenticity testing regimes for meat. IFR have been working closely with a company called Oxford Instruments to develop a new low-cost, bench-top spectrometer called the Pulsar for rapid screening of meat. We will collaborate with IFR to find the best algorithms for performing this classification. The second problem aims to find non-destructive ways for testing whether the content of intact spirits bottles is genuine or fake. Forged alcohol is commonplace, and in recent years there has been an increasing number of serious injuries and even deaths from the consumption of illegally produced spirits. The development of sensor technology to detect this type of fraud would thus have great societal value, and the collaboration with Oxford Instruments offers the potential for the development of portable scanners for product verification.
Our third case study involves classifying electric devices from smart meter data. Currently 25% of the United Kingdom's greenhouse gasses are accounted for by domestic energy consumption, such as heating, lighting and appliance use. The government has committed to an 80% reduction of CO2 emissions by 2050, and to meet this is requiring the installation of smart energy meters in every household to promote energy saving. The primary output of this investment of billions of pounds in technology will be enormous quantities of data relating to electricity usage. Understanding and intelligently using this data will be crucial if we are to meet the emissions target. We will focus on one part of the analysis, which is the problem of determining whether we can automatically classify the nature of the device(s) currently consuming electricity at any point in time. This is a necessary first step in better understanding household practices, which is essential for reducing usage.
Planned Impact
We have chosen our case studies to demonstrate the breadth of domains in which time series classification arises and we hope these will act as a catalyst for other biological, food and climate scientists to work with us and/or our code. The investigators on this project have a strong track record of working with industry, and we aim to exploit our research to have a direct impact.
The work with Institute of Food research has perhaps the greatest potential for immediate impact on society and the economy. The horsemeat scandal shook the public confidence in the sector and the complexity in the international market for meat make it hard to guard against further occurrences. Devices like O.I.s Pulsar offer a cost effective mechanism for screening against contamination. If we can find a better algorithm for classification there is a simple and direct path to implementation within Pulsar. Forged alcohol is commonplace, and cases vary from simple economic crimes through to fraud with serious health implications. In recent years there has been an increasing number of serious injuries and even deaths from the consumption of poor-quality, illegally produced spirits. The development of sensor technology to detect this type of fraud would thus have great societal value, and the collaboration with Oxford Instruments offers the potential for the development of commercial hardware to facilitate the usage of the algorithms our research produces. Improving Pulsar and developing a new product will both have a positive economic and societal impact. Devices like Pulsar help with the public engagement with science, as demonstrated by its appearance on the BBC1 program Ripoff Britain http://youtu.be/t8zWLat8NQ0.
The collaborative research with Imperial is part of the important drive to understand the genetic components of obesity. Model species are useful in this respect as it is possible to directly connect behaviour to genetics in a reproducible way. Hence, if we can automatically detect worms that are exhibiting aberrant behaviour, we can then determine what mutations caused it. Conversely, we can cause mutations in the worm then observe behaviour. Both of these tasks require a laborious, manual identification of mutants. This project will not be involved with performing the experiments. We will instead help look at the best ways of automating this time consuming task.
Smart meters will soon be in all of our homes collecting detailed data on our electricity usage. This massive investment in technology must yield a significant reduction in our carbon footprint to justify the cost. The key to altering patterns of consumer behaviour is providing useful and relevant information. This in turn requires the ability to extract knowledge from the raw data. We will concentrate on the problem of identifying the nature of devices being used in a household. This offers the potential for constructing more complex models of behaviour based on combined device usage which in turn may lead to more informative advice on how to modify behaviour.
The work with Institute of Food research has perhaps the greatest potential for immediate impact on society and the economy. The horsemeat scandal shook the public confidence in the sector and the complexity in the international market for meat make it hard to guard against further occurrences. Devices like O.I.s Pulsar offer a cost effective mechanism for screening against contamination. If we can find a better algorithm for classification there is a simple and direct path to implementation within Pulsar. Forged alcohol is commonplace, and cases vary from simple economic crimes through to fraud with serious health implications. In recent years there has been an increasing number of serious injuries and even deaths from the consumption of poor-quality, illegally produced spirits. The development of sensor technology to detect this type of fraud would thus have great societal value, and the collaboration with Oxford Instruments offers the potential for the development of commercial hardware to facilitate the usage of the algorithms our research produces. Improving Pulsar and developing a new product will both have a positive economic and societal impact. Devices like Pulsar help with the public engagement with science, as demonstrated by its appearance on the BBC1 program Ripoff Britain http://youtu.be/t8zWLat8NQ0.
The collaborative research with Imperial is part of the important drive to understand the genetic components of obesity. Model species are useful in this respect as it is possible to directly connect behaviour to genetics in a reproducible way. Hence, if we can automatically detect worms that are exhibiting aberrant behaviour, we can then determine what mutations caused it. Conversely, we can cause mutations in the worm then observe behaviour. Both of these tasks require a laborious, manual identification of mutants. This project will not be involved with performing the experiments. We will instead help look at the best ways of automating this time consuming task.
Smart meters will soon be in all of our homes collecting detailed data on our electricity usage. This massive investment in technology must yield a significant reduction in our carbon footprint to justify the cost. The key to altering patterns of consumer behaviour is providing useful and relevant information. This in turn requires the ability to extract knowledge from the raw data. We will concentrate on the problem of identifying the nature of devices being used in a household. This offers the potential for constructing more complex models of behaviour based on combined device usage which in turn may lead to more informative advice on how to modify behaviour.
Organisations
- University of East Anglia (Lead Research Organisation)
- University of California, Riverside (Collaboration)
- University of Rennes 1 (Collaboration)
- Alan Turing Institute (Collaboration)
- Scotch Whisky Research Institute (Collaboration)
- Medical Research Council (MRC) (Collaboration)
- Medical Research Council (Project Partner)
- University of California, Riverside (Project Partner)
- Vermont Energy Investment Corporation (Project Partner)
- University of Bath (Project Partner)
- Loughborough University (Project Partner)
- Oxford Instrumental IAG (Project Partner)
- The Whisky Tasting Club (Project Partner)
Publications
Bagnall A
(2015)
Time-Series Classification with COTE: The Collective of Transformation-Based Ensembles
in IEEE Transactions on Knowledge and Data Engineering
Bagnall A
(2020)
Detecting Electric Devices in 3D Images of Bags
Bagnall A
(2017)
The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances.
in Data mining and knowledge discovery
Bostrom A
(2015)
Binary Shapelet Transform for Multiclass Time Series Classification
in Proceedings of the 17th International Conference on Big Data Analytics and Knowledge Discovery (DaWaK)
Dau H
(2018)
Optimizing dynamic time warping's window width for time series data mining applications
in Data Mining and Knowledge Discovery
Description | Time series classification problems are numerous and varied. For example, we may want to classify a heart beat as normal or abnormal or the detrermine the type of electrical device from its power usage. This grant focused on developing algorithms for time series classification problems that work work well across widely different problem domains without expert knowledge. Our approach is based on combining classifiers on different representations of the data to form a collective. Each member of the collective specialises in different types of discriminatory features. For example, dictionary based methods classify based on the number of repeating patterns, whereas shapelet based methods classify on whether a particular shape is in the series or not. Our latest algorithm, the hierarchical vote collective of transformation-based ensembles, is significantly more accurate than all other published approaches, including deep learning based convolutional neural networks, when compared on a diverse set of test data sets that are commonly used for evaluation. Our work builds on others existing research (it includes two algorithms developed by other research groups) and allows for a better understanding exploratory analysis of a particular problem through a deconstruction of the components relative performance. However, this is a fast moving field, and researchers around the world are working on developing better algorithms for this task. Our work funded by this grant has provided a framework for rigorously testing new ideas in a reproducible way: we have experimentally verified the results of numerous researchers through collaborative work using a common code base, and detected flaws in the methodology of a few projects which lead to biased results. The work we were able to complete on this grant has lead to new collaborations with researchers on four continents and will be continued and developed for the foreseeable future. Currently we are using the outputs of the grant to develop techniques to detect forged spirits non-intrusively, to classify insects based on the sounds they make and to identify electric devices in luggage based on a 3D x-ray. Our goal now is to continue improving the algorithms whilst finding more applications areas where we can have a genuine impact. |
Exploitation Route | Our findings have been widely disseminated. The academic paper describing our comparison of algorithms (the "bakeoff") has been downloaded 20,000 times in two years. Our code has been downloaded hundreds of times and been used in research and teaching (both undergrad and postgrad) across the world. Several research groups are working on algorithms with the goal of improving on our results. We are currently looking at revisions of the algorithms, but our priorities are looking for applications where our approach may enhance standard methods and for widening the scope for the type of problems we can address. We are working with researchers across the world on a range of problems beyond the scope of this grant. For example, one group are looking to use HIVE-COTE to automatically classify marine mammals based on the audio signal. This is essential for conservation purposes. We are also helping a group trying to develop techniques for predicting crop yield based on environmental data, and a group using the algorithms for problems related to forest preservation and air quality monitoring. |
Sectors | Aerospace Defence and Marine Agriculture Food and Drink Energy Healthcare Security and Diplomacy Other |
URL | https://github.com/alan-turing-institute/sktime |
Description | The output of the HIVE-COTE grant has triggered a huge growth in research into time series classification algorithms. It has helped us forge new collaborations in US, Australia, Brazil, Ireland, France, Spain and Germany. The HIVE-COTE algorithm has continued to develop and is still significantly more accurate than all other approaches on a range of standard test problems for both multivariate and univariate time series classification. The sktime/aeon toolkit implements the algorithms developed in this grant. It has proved popular in both industry and academia and is helping disseminate our ideas. The toolkit has recently been further funded by EPSRC to support its use in scientific research. The code base and the HIVE-COTE algorithms have been used by practitioners form many fields, including food science, medical signal processing, manufacturing, chemometrics and econometrics. |
First Year Of Impact | 2017 |
Sector | Aerospace, Defence and Marine,Agriculture, Food and Drink,Chemicals,Digital/Communication/Information Technologies (including Software),Education,Energy,Financial Services, and Management Consultancy |
Impact Types | Economic |
Description | BBSRC iCASE Studentship |
Amount | £90,000 (GBP) |
Organisation | Scotch Whisky Research Institute |
Sector | Private |
Country | United Kingdom |
Start | 09/2016 |
End | 09/2021 |
Description | Norwich Research Park Bioscience Doctoral Training Partnership |
Amount | £70,000 (GBP) |
Organisation | Biotechnology and Biological Sciences Research Council (BBSRC) |
Sector | Public |
Country | United Kingdom |
Start | 09/2017 |
End | 09/2021 |
Description | Shapeseg: segmentation and classification of x-ray imagery |
Amount | £70,000 (GBP) |
Funding ID | ACC106973 |
Organisation | Defence Science & Technology Laboratory (DSTL) |
Sector | Public |
Country | United Kingdom |
Start | 04/2018 |
End | 12/2018 |
Description | aeon: a toolkit for machine learning with time series |
Amount | £534,660 (GBP) |
Funding ID | EP/W030756/1 |
Organisation | Engineering and Physical Sciences Research Council (EPSRC) |
Sector | Public |
Country | United Kingdom |
Start | 09/2022 |
End | 09/2023 |
Title | The revised Multivariate Time Series Classification Archive |
Description | In 2022 we worked with researchers at the University of California, Riverside, to revise the archive of multivariate problems to assess multivariate time series classification algorithms |
Type Of Material | Database/Collection of data |
Year Produced | 2023 |
Provided To Others? | Yes |
Impact | The database is currently being used by researchers, and is beginning to be referenced. |
Title | Time Series Classification Repository |
Description | This is an extension of the UCR Time Series Classification and Clustering data repository that will be jointly maintained by UEA and UCR at the new website www.timeseriesclassification.com, a work in progress. |
Type Of Material | Database/Collection of data |
Year Produced | 2015 |
Provided To Others? | Yes |
Impact | work in progress |
URL | http://www.timeseriesclassification.com |
Title | Updated Time Series Classification Repository |
Description | We have added to the tsc.com repository, adding new data and new formats. |
Type Of Material | Database/Collection of data |
Year Produced | 2023 |
Provided To Others? | Yes |
Impact | Over the years, the database has been downloaded thousands of times and been used in hundreds of papers. |
URL | https://arxiv.org/abs/2304.13029 |
Description | Classifying early onset Alzheimers |
Organisation | Medical Research Council (MRC) |
Department | MRC Cognition and Brain Sciences Unit |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | We have begun to look at data provided by Richard Henson with the goal of seeing if we can contribute to the task of classifying patients with early onset Alzheimers |
Collaborator Contribution | They have provided us with data from 50 patients |
Impact | no outputs yet, it has just begun |
Start Year | 2017 |
Description | Classifying insects |
Organisation | University of California, Riverside |
Country | United States |
Sector | Academic/University |
PI Contribution | We have been working with Eamonn Keogh of UCR to apply our algorithms to the problem of insect classification from sound snippets |
Collaborator Contribution | UCR have provided us with over 100,000 sound recordings of insects |
Impact | This collaboration lead directly to a successful bid for a DTP studentship. |
Start Year | 2016 |
Description | Dictionary based classifiers for time series classification |
Organisation | University of Rennes 1 |
Country | France |
Sector | Academic/University |
PI Contribution | Simon Malinowski and Romain Tavenard from Rennes invited me to talk at a workshop in Italy. Further discussion lead to a collaboration on developing a specific type of algorithm. This lead to a paper which is currently under review |
Collaborator Contribution | They bought expertise and code in a specific form of algorithm used in Computer Vision which we adapted for Time Series Classification |
Impact | Paper accepted for the Journal Intelligent Data Analysis, in print |
Start Year | 2016 |
Description | Scotch Whisky Research Institute |
Organisation | Scotch Whisky Research Institute |
Country | United Kingdom |
Sector | Private |
PI Contribution | SWRI have supported an extension of research related to one of the work packages in the form of financial support of £25,000 for a BBSRC iCASE studentship to start in September 2016. |
Collaborator Contribution | SWRI will provide advice and guidance for our attempt to develop a mechanism for non-intrusively detecting forged spirits. |
Impact | This collaboration has just begun, so there are as yet no outcomes |
Start Year | 2015 |
Description | sktime: a python tool kit for time series analysis |
Organisation | Alan Turing Institute |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | This project began as a result of an output of the grant: the bakeoff paper. The Alan Turing Institute contacted us wishing to develop a toolkit containing the algorithms we designed on the COTE project. It began in 2018 and is ongoing in March 2019, with grant members Tony Bagnall and Jason Lines seconded to the Turing for development |
Collaborator Contribution | Jason Lines is 100% seconded to the Turing and Tony Bagnall is 10% seconded for 7 months for development of the code base. Currently we have reproducing the published results using the Java code base and will then extend functionality to consider a wider set of use cases |
Impact | This is a new collaboration that has yet to have any public outputs |
Start Year | 2018 |
Title | The UEA Time Series Classification Codebase (2018) |
Description | The code contains implementations of the latest developments in time series classification, from UEA and other researchers worldwide. We moved the codebase from bitbucket to github and it has undergone an extensive redesign to improve usability and remove redundant features. It is the basis for all experiments we have published and facilitates complete reproducability. |
Type Of Technology | Software |
Year Produced | 2018 |
Open Source License? | Yes |
Impact | Over the years, our code has been downloaded hundreds of times and has been used in many publications. It contains the code used in our bakeoff paper, that has been downloaded 20,000 times in two years. |
URL | http://www.timeseriesclassification.com |
Title | Time Series Classification WEKA Code Base |
Description | This freely available code base contains implementations of over 20 recently proposed time series classification algorithms and methods to recreate the experiments reported in our papers |
Type Of Technology | Software |
Year Produced | 2016 |
Open Source License? | Yes |
Impact | The code has been downloaded by over 200 researchers worldwide. |
URL | https://bitbucket.org/TonyBagnall/time-series-classification |
Description | 13th International Conference on Hybrid Artificial Intelligent Systems |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Postgraduate students |
Results and Impact | I gave an keynote talk at the HAIS conference in Oviedo in 2018 describing the work we did on this grant. The talk was very well received, particularly from participants from industry, and it has led to two new potential collaborations |
Year(s) Of Engagement Activity | 2018 |
URL | https://hais2018.uniovi.es/ |
Description | 2nd ECML/PKDD Workshop on Advanced analytics and Learning on Temporal Data |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | I gave an invited talk at this workshop in 2016 to describe the early work on this project. It lead to several new contacts and collaborations |
Year(s) Of Engagement Activity | 2016 |
URL | https://aaltd16.irisa.fr/invited-speakers/ |
Description | 3nd ECML/PKDD Workshop on Advanced Analytics and Learning on Temporal Data |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Postgraduate students |
Results and Impact | We helped organise a workshop on advanced analytics and learning on temporal data, where we demoed our code and launched the new archive datasets. Over 100 people attended the full day workshop, where 20 papers were presented as posters or presentation. We will repeat the exercise for 2019. |
Year(s) Of Engagement Activity | 2018 |
URL | https://project.inria.fr/aaldt18/ |
Description | BiDAS 3 - Third Bilbao Data Science Workshop |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | I gave an invited talk at this workshop in Bilbao, disseminating the work we did on the grant and making new contacts. The two day workshop involved invited international speakers and postgraduate posters and presentations. It lead to two potential new collaborations and helped disseminate our work to a wider audience from the statistics community |
Year(s) Of Engagement Activity | 2018 |
URL | https://wp.bcamath.org/bidas3/ |
Description | CNRS/Mastadons Interdisciplinary Workshop on Time Series Analysis |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | This workshop in Paris consisted of invited talks from a range of disciplines, and was intended to stimulate new collaborations. |
Year(s) Of Engagement Activity | 2016 |
URL | https://indico.in2p3.fr/event/13186/ |
Description | Invited talk at the International Federation of Classification Societies, Porto, 2022 |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Postgraduate students |
Results and Impact | I was invited to give a talk at the IFCS conference, 2022, which included algorithms that were directly the result of this grant. There were 50-100 researchers in the audience from around the world. |
Year(s) Of Engagement Activity | 2022 |
URL | https://ifcs2022.fep.up.pt/invited-sessions/ |