The Collective of Transform Ensembles (COTE) for Time Series Classification

Lead Research Organisation: University of East Anglia
Department Name: Computing Sciences

Abstract

Time series classification is the problem of trying to predict an outcome based on a series of ordered data. So, for example, if we take a series of electronic readings from a sample of meat, the classification problem could be to determine whether that sample is pure beef or whether it has been adulterated with some other meat. Alternatively, if we have a series of electricity usage, the classification problem could be to determine which type of device generated those readings. Time series classification problems arise in all areas of science, and we have worked on problems involving ECG and EEG data, chemical concentration readings, astronomical measurements, otolith outlines, electricity usage, food spectrographs, hand and bone radiograph data and mutant worm motion. The algorithm we have developed to do this, The Collective of Transform Ensembles (COTE), is significantly better than any other technique proposed in the literature (when assessed on 80 data sets used in the literature). This project looks to improve COTE further and to apply it to three problem domains of genuine importance to society. In collaboration with Imperial, we will look at classifying Caenorhabditis elegans via motion traces. C. elegans is a nematode worm commonly used as a model organism in the study of genetics. We will help develop an automated classifier for C. elegans mutant types based on their motion, with the objective of identifying genes that regulate appetite. This classifier will automate a task previously done manually at great cost and will uncover conserved regulators of appetite in a model organism in which functional dissection is possible at the level of behaviour, neural circuitry, and fat storage. In the long term, this may give insights into the genetic component of human obesity.
Working closely with the Institute of Food Research (IFR), we will attempt to solve two problems involving classifying food types by their molecular spectra (infrared, IR, and nuclear magnetic resonance, NMR). The first problem involves classifying meat type. The horse meat scandal of 2012/3 has shown that there is an urgent need to increase current authenticity testing regimes for meat. IFR have been working closely with a company called Oxford Instruments to develop a new low-cost, bench-top spectrometer called the Pulsar for rapid screening of meat. We will collaborate with IFR to find the best algorithms for performing this classification. The second problem aims to find non-destructive ways for testing whether the content of intact spirits bottles is genuine or fake. Forged alcohol is commonplace, and in recent years there has been an increasing number of serious injuries and even deaths from the consumption of illegally produced spirits. The development of sensor technology to detect this type of fraud would thus have great societal value, and the collaboration with Oxford Instruments offers the potential for the development of portable scanners for product verification.
Our third case study involves classifying electric devices from smart meter data. Currently 25% of the United Kingdom's greenhouse gasses are accounted for by domestic energy consumption, such as heating, lighting and appliance use. The government has committed to an 80% reduction of CO2 emissions by 2050, and to meet this is requiring the installation of smart energy meters in every household to promote energy saving. The primary output of this investment of billions of pounds in technology will be enormous quantities of data relating to electricity usage. Understanding and intelligently using this data will be crucial if we are to meet the emissions target. We will focus on one part of the analysis, which is the problem of determining whether we can automatically classify the nature of the device(s) currently consuming electricity at any point in time. This is a necessary first step in better understanding household practices, which is essential for reducing usage.

Planned Impact

We have chosen our case studies to demonstrate the breadth of domains in which time series classification arises and we hope these will act as a catalyst for other biological, food and climate scientists to work with us and/or our code. The investigators on this project have a strong track record of working with industry, and we aim to exploit our research to have a direct impact.

The work with Institute of Food research has perhaps the greatest potential for immediate impact on society and the economy. The horsemeat scandal shook the public confidence in the sector and the complexity in the international market for meat make it hard to guard against further occurrences. Devices like O.I.s Pulsar offer a cost effective mechanism for screening against contamination. If we can find a better algorithm for classification there is a simple and direct path to implementation within Pulsar. Forged alcohol is commonplace, and cases vary from simple economic crimes through to fraud with serious health implications. In recent years there has been an increasing number of serious injuries and even deaths from the consumption of poor-quality, illegally produced spirits. The development of sensor technology to detect this type of fraud would thus have great societal value, and the collaboration with Oxford Instruments offers the potential for the development of commercial hardware to facilitate the usage of the algorithms our research produces. Improving Pulsar and developing a new product will both have a positive economic and societal impact. Devices like Pulsar help with the public engagement with science, as demonstrated by its appearance on the BBC1 program Ripoff Britain http://youtu.be/t8zWLat8NQ0.

The collaborative research with Imperial is part of the important drive to understand the genetic components of obesity. Model species are useful in this respect as it is possible to directly connect behaviour to genetics in a reproducible way. Hence, if we can automatically detect worms that are exhibiting aberrant behaviour, we can then determine what mutations caused it. Conversely, we can cause mutations in the worm then observe behaviour. Both of these tasks require a laborious, manual identification of mutants. This project will not be involved with performing the experiments. We will instead help look at the best ways of automating this time consuming task.

Smart meters will soon be in all of our homes collecting detailed data on our electricity usage. This massive investment in technology must yield a significant reduction in our carbon footprint to justify the cost. The key to altering patterns of consumer behaviour is providing useful and relevant information. This in turn requires the ability to extract knowledge from the raw data. We will concentrate on the problem of identifying the nature of devices being used in a household. This offers the potential for constructing more complex models of behaviour based on combined device usage which in turn may lead to more informative advice on how to modify behaviour.

Publications

10 25 50
 
Description Time series classification problems are numerous and varied. For example, we may want to classify a heart beat as normal or abnormal or the detrermine the type of electrical device from its power usage. This grant focused on developing algorithms for time series classification problems that work work well across widely different problem domains without expert knowledge. Our approach is based on combining classifiers on different representations of the data to form a collective. Each member of the collective specialises in different types of discriminatory features. For example, dictionary based methods classify based on the number of repeating patterns, whereas shapelet based methods classify on whether a particular shape is in the series or not. Our latest algorithm, the hierarchical vote collective of transformation-based ensembles, is significantly more accurate than all other published approaches, including deep learning based convolutional neural networks, when compared on a diverse set of test data sets that are commonly used for evaluation. Our work builds on others existing research (it includes two algorithms developed by other research groups) and allows for a better understanding exploratory analysis of a particular problem through a deconstruction of the components relative performance.

However, this is a fast moving field, and researchers around the world are working on developing better algorithms for this task. Our work funded by this grant has provided a framework for rigorously testing new ideas in a reproducible way: we have experimentally verified the results of numerous researchers through collaborative work using a common code base, and detected flaws in the methodology of a few projects which lead to biased results. The work we were able to complete on this grant has lead to new collaborations with researchers on four continents and will be continued and developed for the foreseeable future. Currently we are using the outputs of the grant to develop techniques to detect forged spirits non-intrusively, to classify insects based on the sounds they make and to identify electric devices in luggage based on a 3D x-ray. Our goal now is to continue improving the algorithms whilst finding more applications areas where we can have a genuine impact.
Exploitation Route Our findings have been widely disseminated. The academic paper describing our comparison of algorithms (the "bakeoff") has been downloaded 20,000 times in two years. Our code has been downloaded hundreds of times and been used in research and teaching (both undergrad and postgrad) across the world. Several research groups are working on algorithms with the goal of improving on our results. We are currently looking at revisions of the algorithms, but our priorities are looking for applications where our approach may enhance standard methods and for widening the scope for the type of problems we can address. We are working with researchers across the world on a range of problems beyond the scope of this grant. For example, one group are looking to use HIVE-COTE to automatically classify marine mammals based on the audio signal. This is essential for conservation purposes. We are also helping a group trying to develop techniques for predicting crop yield based on environmental data, and a group using the algorithms for problems related to forest preservation and air quality monitoring.
Sectors Aerospace, Defence and Marine,Agriculture, Food and Drink,Energy,Healthcare,Security and Diplomacy,Other

URL https://github.com/alan-turing-institute/sktime
 
Description The output of the HIVE-COTE grant has triggered a huge growth in research into time series classification algorithms. It has helped us forge new collaborations in US, Australia, Brazil, Ireland, France, Spain and Germany. The HIVE-COTE algorithm has continued to develop and is still significantly more accurate than all other approaches on a range of standard test problems for both multivariate and univariate time series classification. The sktime/aeon toolkit implements the algorithms developed in this grant. It has proved popular in both industry and academia and is helping disseminate our ideas. The toolkit has recently been further funded by EPSRC to support its use in scientific research. The code base and the HIVE-COTE algorithms have been used by practitioners form many fields, including food science, medical signal processing, manufacturing, chemometrics and econometrics.
First Year Of Impact 2017
Sector Aerospace, Defence and Marine,Agriculture, Food and Drink,Chemicals,Digital/Communication/Information Technologies (including Software),Education,Energy,Financial Services, and Management Consultancy
Impact Types Economic

 
Description BBSRC iCASE Studentship
Amount £90,000 (GBP)
Organisation Scotch Whisky Research Institute 
Sector Private
Country United Kingdom
Start 10/2016 
End 09/2021
 
Description Norwich Research Park Bioscience Doctoral Training Partnership
Amount £70,000 (GBP)
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 10/2017 
End 09/2021
 
Description Shapeseg: segmentation and classification of x-ray imagery
Amount £70,000 (GBP)
Funding ID ACC106973 
Organisation Defence Science & Technology Laboratory (DSTL) 
Sector Public
Country United Kingdom
Start 05/2018 
End 12/2018
 
Description sktime: a toolkit for machine learning with time series
Amount £534,660 (GBP)
Funding ID EP/W030756/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 09/2022 
End 09/2025
 
Title Revised 2018 Time Series Classification Repository 
Description In 2018 we worked with researchers at the University of California, Riverside, to enhance the repository so that it now contains 128 datasets 
Type Of Material Database/Collection of data 
Year Produced 2018 
Provided To Others? Yes  
Impact Over the years, the database has been downloaded thousands of times and been used in hundreds of papers. 
URL https://arxiv.org/abs/1810.07758
 
Title The new Multivariate Time Series Classification Archive (2018) 
Description In 2018 we worked with researchers at the University of California, Riverside, to develop a new archive of multivariate problems to assess multivariate time series classification algorithms 
Type Of Material Database/Collection of data 
Year Produced 2018 
Provided To Others? Yes  
Impact The database is currently being used by researchers, and is beginning to be referenced. 
URL https://arxiv.org/abs/1810.07758
 
Title Time Series Classification Repository 
Description This is an extension of the UCR Time Series Classification and Clustering data repository that will be jointly maintained by UEA and UCR at the new website www.timeseriesclassification.com, a work in progress. 
Type Of Material Database/Collection of data 
Year Produced 2015 
Provided To Others? Yes  
Impact work in progress 
URL http://www.timeseriesclassification.com
 
Description Classifying early onset Alzheimers 
Organisation Medical Research Council (MRC)
Department MRC Cognition and Brain Sciences Unit
Country United Kingdom 
Sector Academic/University 
PI Contribution We have begun to look at data provided by Richard Henson with the goal of seeing if we can contribute to the task of classifying patients with early onset Alzheimers
Collaborator Contribution They have provided us with data from 50 patients
Impact no outputs yet, it has just begun
Start Year 2017
 
Description Classifying insects 
Organisation University of California, Riverside
Country United States 
Sector Academic/University 
PI Contribution We have been working with Eamonn Keogh of UCR to apply our algorithms to the problem of insect classification from sound snippets
Collaborator Contribution UCR have provided us with over 100,000 sound recordings of insects
Impact This collaboration lead directly to a successful bid for a DTP studentship.
Start Year 2016
 
Description Dictionary based classifiers for time series classification 
Organisation University of Rennes 1
Country France 
Sector Academic/University 
PI Contribution Simon Malinowski and Romain Tavenard from Rennes invited me to talk at a workshop in Italy. Further discussion lead to a collaboration on developing a specific type of algorithm. This lead to a paper which is currently under review
Collaborator Contribution They bought expertise and code in a specific form of algorithm used in Computer Vision which we adapted for Time Series Classification
Impact Paper accepted for the Journal Intelligent Data Analysis, in print
Start Year 2016
 
Description Scotch Whisky Research Institute 
Organisation Scotch Whisky Research Institute
Country United Kingdom 
Sector Private 
PI Contribution SWRI have supported an extension of research related to one of the work packages in the form of financial support of £25,000 for a BBSRC iCASE studentship to start in September 2016.
Collaborator Contribution SWRI will provide advice and guidance for our attempt to develop a mechanism for non-intrusively detecting forged spirits.
Impact This collaboration has just begun, so there are as yet no outcomes
Start Year 2015
 
Description sktime: a python tool kit for time series analysis 
Organisation Alan Turing Institute
Country United Kingdom 
Sector Academic/University 
PI Contribution This project began as a result of an output of the grant: the bakeoff paper. The Alan Turing Institute contacted us wishing to develop a toolkit containing the algorithms we designed on the COTE project. It began in 2018 and is ongoing in March 2019, with grant members Tony Bagnall and Jason Lines seconded to the Turing for development
Collaborator Contribution Jason Lines is 100% seconded to the Turing and Tony Bagnall is 10% seconded for 7 months for development of the code base. Currently we have reproducing the published results using the Java code base and will then extend functionality to consider a wider set of use cases
Impact This is a new collaboration that has yet to have any public outputs
Start Year 2018
 
Title The UEA Time Series Classification Codebase (2018) 
Description The code contains implementations of the latest developments in time series classification, from UEA and other researchers worldwide. We moved the codebase from bitbucket to github and it has undergone an extensive redesign to improve usability and remove redundant features. It is the basis for all experiments we have published and facilitates complete reproducability. 
Type Of Technology Software 
Year Produced 2018 
Open Source License? Yes  
Impact Over the years, our code has been downloaded hundreds of times and has been used in many publications. It contains the code used in our bakeoff paper, that has been downloaded 20,000 times in two years. 
URL http://www.timeseriesclassification.com
 
Title Time Series Classification WEKA Code Base 
Description This freely available code base contains implementations of over 20 recently proposed time series classification algorithms and methods to recreate the experiments reported in our papers 
Type Of Technology Software 
Year Produced 2016 
Open Source License? Yes  
Impact The code has been downloaded by over 200 researchers worldwide. 
URL https://bitbucket.org/TonyBagnall/time-series-classification
 
Description 13th International Conference on Hybrid Artificial Intelligent Systems 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact I gave an keynote talk at the HAIS conference in Oviedo in 2018 describing the work we did on this grant. The talk was very well received, particularly from participants from industry, and it has led to two new potential collaborations
Year(s) Of Engagement Activity 2018
URL https://hais2018.uniovi.es/
 
Description 2nd ECML/PKDD Workshop on Advanced analytics and Learning on Temporal Data 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact I gave an invited talk at this workshop in 2016 to describe the early work on this project. It lead to several new contacts and collaborations
Year(s) Of Engagement Activity 2016
URL https://aaltd16.irisa.fr/invited-speakers/
 
Description 3nd ECML/PKDD Workshop on Advanced Analytics and Learning on Temporal Data 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact We helped organise a workshop on advanced analytics and learning on temporal data, where we demoed our code and launched the new archive datasets. Over 100 people attended the full day workshop, where 20 papers were presented as posters or presentation. We will repeat the exercise for 2019.
Year(s) Of Engagement Activity 2018
URL https://project.inria.fr/aaldt18/
 
Description BiDAS 3 - Third Bilbao Data Science Workshop 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact I gave an invited talk at this workshop in Bilbao, disseminating the work we did on the grant and making new contacts. The two day workshop involved invited international speakers and postgraduate posters and presentations. It lead to two potential new collaborations and helped disseminate our work to a wider audience from the statistics community
Year(s) Of Engagement Activity 2018
URL https://wp.bcamath.org/bidas3/
 
Description CNRS/Mastadons Interdisciplinary Workshop on Time Series Analysis 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact This workshop in Paris consisted of invited talks from a range of disciplines, and was intended to stimulate new collaborations.
Year(s) Of Engagement Activity 2016
URL https://indico.in2p3.fr/event/13186/
 
Description Invited talk at the International Federation of Classification Societies, Porto, 2022 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact I was invited to give a talk at the IFCS conference, 2022, which included algorithms that were directly the result of this grant. There were 50-100 researchers in the audience from around the world.
Year(s) Of Engagement Activity 2022
URL https://ifcs2022.fep.up.pt/invited-sessions/