FoodML: Development of a food quality and safety risk management system, using cloud computing, big data and data science

Lead Research Organisation: CRANFIELD UNIVERSITY
Department Name: School of Water, Energy and Environment

Abstract

Currently, food quality and safety controls relies heavily on regulatory inspection and sampling regimes. Such approaches are often based on conventional chemical and microbiological analysis, making the ultimate goal of 100% real-time inspection technically, financially and logistically impossible.
Over the past decade, rapid non-invasive techniques (e.g. vibrational spectroscopy, hyperspectral / Multispectral imagining) started gaining popularity as rapid and efficient methods for assessing food quality, safety and authentication; as a sensible alternative to the expensive and time-consuming conventional microbiological techniques.
Due to the multi-dimensional nature of the data generated from such analyses, the output needs to be coupled with a suitable statistical approach or machine learning algorithms before the results can be interpreted. Although these platform has been showing great potentials to accurately and quantitatively assess freshness profiles (Panagou, Mohareb et al. 2011) (Mohareb, Iriondo et al. 2015) and safety parameters as well as adulteration (Ropodi, Panagou et al. 2016), their dependence on advanced data mining and statistical algorithms made was the main challenge facing their practical implementation across the food production and supply chain.
In order to overcome these challenges, we have developed sorfML (http://elvis.misc.cranfield.ac.uk/SORF), a Web platform prototype compatible with outputs from 5 instrumental platforms (See Figure) which provides means for interactive data visualisation, multivariate analysis (Principal component analysis and hierarchical clustering), as well as the ability to use stored datasets to develop predictive models to estimate food quality. Currently, the platform provides users with means to upload their experimental datasets to the server, thanks to the truly generic MongoDB NoSQL database backend, and to develop classification and regression models to estimate quality parameters.
The aim of this PhD is to expand the existing sorfML platform into a cloud-enabled framework that supports real-time monitoring of food products throughout the production chain. In order to achieve this, a series of advanced portable sensory devices will be deployed to examine their suitability as "Connected devices" in predicting quality and safety indices for various food perishable food products. A series of machine learning and pattern recognition models will be developed and integrated within the cloud system. This includes Ordinary Least Squares, Stepwise Linear classification and regression, Principal Component regression, Partial Least Squares discriminant analysis, support vector machine, Random forests and k-Nearest Neighbours.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
NE/R011265/1 02/10/2017 01/10/2022
1956111 Studentship NE/R011265/1 02/10/2017 01/04/2021 Emma Sims
 
Title Auto-Generated Poem 
Description Built a script that uses Markov Order 2 Modelling to generate Poems based on an E-Zines previous publications 
Type Of Art Creative Writing 
Year Produced 2020 
Impact Publication and Interview with Kanstellation E-Magazine 
 
Description That food can be modelled via the use of machine learning methods over a variety of statistical models. It can aid in the rapid detection of spoiled foods without the requirement for a microbial population assessment which yields retrospective results, resulting in spoiled foods being sold to consumers.
Multi-spectral imaging is an effective monitoring method, which gives 3 different potential factors for modelling food quality; these are the average pixel intensity per spectra, the standard deviation in pixel intensity, and the change in texture.
Exploitation Route It can be utilised by both the food standards agency as a low cost rapid method to identify potentially spoiled samples, or by food industry leaders to either monitor the change in quality of their food over time, or to identify weaknesses in areas of the food chain which contribute most to food spoilage.
Sectors Agriculture, Food and Drink,Digital/Communication/Information Technologies (including Software),Government, Democracy and Justice

 
Title StompR: A statistical comparison package in R 
Description A package in R which automates the assessment of analytical modelling methods, utilising machine learning and monte carlo simulations to generate an average performance RMSE and MAPE value for each model on various datasets. 
Type Of Material Improvements to research infrastructure 
Year Produced 2020 
Provided To Others? Yes  
Impact Easy meta analysis of statistical data-sets and their performance over various models. 
URL https://github.com/EmmaRSims/StompR
 
Title A food modelling database 
Description A meta-analysis database of statistical model performances which assess the microbial flux of fresh foods 
Type Of Material Computer model/algorithm 
Year Produced 2017 
Provided To Others? Yes  
Impact none so far - two papers by colleagues who use the software to assess microbial flux 
 
Title Computer Vision for MSI in MATLAB 
Description Currently MATLAB scripts which segment multispectral images, and output the region of interests mean pixel intensity, standard deviation of pixel intensity, and probability of noise per pixel. It does this using the Euclidean Normalisation in conjunction with K-Means segmentation, and Canny Edge Detection. 
Type Of Technology Software 
Year Produced 2018 
Impact Allowed the possibility of texture changes to be modelled when looking at the surface of fresh foods like pineapples, and minced beef 
 
Title StompR 
Description A statistical comparison package in R which assesses the general performance of models versus the datasets - multiple datasets can be used to predict the same factor and a heatmap will be generated to provide an overview of the model performances versus the dataset, essentially meaning different combinations of factors can be looked at within one use of the function 
Type Of Technology Software 
Year Produced 2020 
Open Source License? Yes  
Impact None yet but a paper is in the works 
 
Description DREAM Consortium 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Postgraduate students
Results and Impact A 15 minute talk explicitly about my PhD and it's theoretical impacts/progress to other students within my consortium, and also to some external industry partners. Also a poster presentation which highlighted the key features in the software that I'm building. This has occurred for each year of my PhD with the next commencing at the end of March 2020.
Year(s) Of Engagement Activity 2018,2019,2020
URL https://twitter.com/DreamCDT/status/976771643572998144?s=20
 
Description ECCB 2018 Conference 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Attended and presented a poster at the ECCB 2018 conference in Athens, Greece. Also partook in a computational pathology workshop.
Year(s) Of Engagement Activity 2018
URL https://twitter.com/EmmaRSims/status/1039436263806447616?s=20
 
Description Presentation at Rothamsted Conference 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact I gave a 10 minute talk regarding the subjects my PhD is based on, and what big data actually meant.
Year(s) Of Engagement Activity 2018
URL https://twitter.com/KeywanHP/status/1007597457210073088?s=20
 
Description Rothamsted Workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Took part in two workshops regarding prediction of protein shape and using knetminer
Year(s) Of Engagement Activity 2018
 
Description School Visit (Sheffield) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Schools
Results and Impact I visited Fir Vale school to give a 1 hour talk to an audience of around 100 student regarding the skill sets required in this PhD as a motivational speech to encourage more students into the STEM area and raise awareness of the University/Funding partners
Year(s) Of Engagement Activity 2019