Machine learning algorithms for detecting forged alcohol

Lead Research Organisation: University of East Anglia
Department Name: Computing Sciences

Abstract

This project involves developing algorithms to detect forged spirits non-intrusively, i.e. without opening the bottle and testing the contents. Up to a quarter of licensed premises in some parts of the UK have been found to have counterfeit alcohol for sale. Forged spirits represent a health risks to the consumer, because illegally produced spirits may contain containments such as methanol, and an economic risk to the government, because of the avoidance of duty. We aim to develop a hand held device that shines light through a bottle and uses machine learning and data mining algorithms to use the near-infrared spectra to address the following questions:

a) Can we classify whether the level of methanol is legal or not?
b) Can we classify whether the level of alcohol is correct or not?
c) Can we predict the actual alcohol concentration through regression techniques?

Studentship Projects

Project Reference Relationship Related To Start End Student Name
BB/M011216/1 01/10/2015 31/03/2024
1786465 Studentship BB/M011216/1 01/10/2016 30/06/2021 James Large
 
Description The general aim of this project is to determine whether it is initially possible, and to refine and improve upon algorithms and methods for, detecting the forgery or adulteration of spirits (especially whisky) non-invasively using vibrational spectroscopy. That is, without opening the bottle and thus rendering the sample unsalable, can we measure the properties of light shined through the bottle.. If a sample were removed from the bottle, a number of advanced methods could be used to determine legitimacy with very high confidence, given reference samples. Chromatography, for example. However vibrational spectroscopy has generally lower discriminatory power, and the potential use case results in noise from a range of different sources affecting the spectra that comes out - the shape, colour, thickness and overall size of the bottle can affect how light transmits through to the receiver for example.

As summarised in 'Large, James, et al. "Detecting Forged Alcohol Non-invasively Through Vibrational Spectroscopy and Machine Learning." Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, Cham, 2018', we have found that yes, it appears as though we can determine key factors for determining legitimacy of spirits - their alcohol content - using the setup described. In fact, classical chemometrics methods such as Partial Least Squares (PLS) still work well on the spectra when they are standardised, seemingly resisting a large amount of the noise brought about by the experimental setup.

Work that is not yet published as of writing has shown that ensembling multiple models found to be strong in those initial benchmarking experiments improved their probabilistic output, an important factor when it comes to the final use case of an official testing for samples to be seized. More meaningful probabilistic outputs of classifiers - effectively how confident they are in their prediction - allows the official to make more informed decisions based on their circumstances.

Also in work not yet published, we have shown that the application of typical smoothing methods to the spectra does not results in a significant increases in accuracy. This aspect of the problem also depends on the settings of spectroscope when measuring the light it receives - longer time spent collecting light, and averaging over multiple readings within a given time frame, can have its own smoothing effect, potentially negating the needed for it on the software side of the pipeline.
Exploitation Route In this project, a prototype portable reading device has been produced. The main aim and interest for our research group however is the algorithmic and machine learning side. Future projects could look towards refining the prototype, and packaging in a manner such that it could be used by e.g. Food Standards Agency or Border Inspection officials.

Further, the development of self-contained software using the machine learning advancements being made in this project to be shipped with such a device, would then allow for production of a full system for detecting forged alcohol portably, quickly, cheaply, and with minimal expertise.
Sectors Agriculture, Food and Drink,Government, Democracy and Justice,Retail

 
Title Multivariate Time Series Classification Archive 
Description A collection of standardised, formatted multivariate time series classification datasets for use in evaluating models and advancing specifically multivariate time series analysis. Specifically from this project, the ethanol concentration dataset was supplied, however all datasets were procured and formatted by our research group. 
Type Of Material Database/Collection of data 
Year Produced 2018 
Provided To Others? Yes  
Impact Was presented for the first time at AALTD'18 (3rd ECML/PKDD Workshop on Advanced Analytics and Learning on Temporal Data), September 14, 2018, by Tony Bagnall (project supervisor). Received good attention among the very relevant audience at that particular workshop, and generated much discussion and feedback afterwards. 
URL http://www.timeseriesclassification.com
 
Description Presenting to PhD Cohort, BIODTP 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Postgraduate students
Results and Impact Presented to a group of fellow PhD students from UEA and the neighbouring Norwich Research Park, all involved in Biology, Bioinformatics, Chemical Engineering, Chemometrics etc, over the course of events where eventually all students had presented to each other and various lecturers and research associates. Ended up in discussions with a number of people interested in applying machine learning to their problems, and a advice from a pure chemistry/physics side to apply back to my own.
Year(s) Of Engagement Activity 2017