A Predictive Fault-tolerance Framework for IoT Systems

Lead Research Organisation: Lancaster University
Department Name: Computing & Communications

Abstract

1 Context
When the term Internet of Things (IoT) is used, its definition spans a wide range of applications which extend the Internet to interconnected objects forming, where a diverse array of sensor devices are connected with the intention of mining the data which they generate. The purpose of IoT is to connect all of these devices to the Internet to allow systems to identify, locate, track, monitor in real time. IoT environments often suffer from failures such as device energy depletion and poor connectivity, which may result in data loss. If an IoT system is to remain resilient to ongoing challenges in its environment, such as changes in sensor location, failures in service provision, and device faults, we need a way to anticipate and to mitigate the problems before they occur. This PhD proposal seeks to investigate how real-time complex event processing and the long-term predictability of machine learning can be combined with fault tolerance mechanisms to provide and maintain an acceptable level of service in the face of faults.

2 Aims and Objectives
The aim of the proposed research is to develop a framework where features can be interchanged that exploit real-time and historic data to improve the resilience of an IoT environment. As such, the objectives for this research are as follows:
Objective1. To identify and classify faults events and fault patterns in IoT systems
In order to become resilience to faults, one must understand the faults that can occur within the system. As such, the first objective is to have a thorough review of the different faults which can occur within IoT systems and classify such faults so that patterns can be identified on what faults occur, their severity, the effect the fault has on the system, and also what is likely to cause such a fault.
Objective2. To develop a service-oriented fault anticipation framework
A service-oriented framework will be developed so that these known faults can be identified and anticipated as they occur within an IoT environment. It will utilise complex event processing so that faults can be identified in real-time, and it will also use machine learning to predict when faults are likely to occur, where the algorithm(s) will be trained using the known faults. The framework will generate predictive models, and live data will be used to infer whether the threat of failure is probable, so that the system can react to the potential fault in good time.
Objective3. To incorporate fault mitigation strategies and mechanisms
An extensive review of strategies and mechanisms to resolve system failures will be undertaken. The outcome of the review will inform the development and integration of strategies and mechanisms for mitigating the faults into the framework.
Objective4. To scale to more complex IoT scenarios
The prior objectives would be initially tested within a controlled, small-scale environment. As large-scale IoT systems (or perhaps ones which grow to a large scale) are to be expected in real-world environments, it is important to assess whether the framework is able to cope when scaled up. As such, the framework will be extended to adapt to a cloud-based architecture, with a close evaluation of existing cloud-based IoT solutions.

3 Applications
Some of IoT's recent applications include in agriculture where sensor devices are used to monitor farmland and supply water when a farm is dry and remove any water when in excess, as well as using triangulation to help the visually impaired navigate around the home.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/N509504/1 01/10/2016 30/09/2021
1806126 Studentship EP/N509504/1 01/10/2016 31/03/2020 Alexander Power
 
Description What I have discovered is that fault-tolerance solutions in literature have been very inflexible (e.g. too many assumptions about system hardware, software, infrastructure) and target very specific IoT applications (e.g. home automation, manufacturing). Consequently, I have been designing a framework that can handle faults for any application in IoT via the inference of errors in data. My last publication, "A Microservices Architecture for Reactive and Proactive Fault Tolerance in IoT Systems", proposed an architecture for efficient and scalable fault-tolerance deployment that can support my research objectives.

I have developed an indoor agriculture system as a testbed for my research, onto which my framework can be deployed. This was chosen because "smart agriculture" is an emerging area of IoT that is geared towards efficient processes (e.g. smart irrigation).
It is also an important domain due to the future concerns of increased urbanisation and a growing world population.
It provides a realistic IoT solution with plenty of fallible hardware and software to perform an effective analysis of how fault-tolerance support can handle system failures.

I have also created my own complex event processing system, BoboCEP, to provide resilient fault-tolerance support at the network edge. It actively replicates the state of partially completed complex events to enable distributed processing that can withstand hardware failure.

Since the publication of my conference paper "Providing Fault Tolerance via Complex Event Processing and Machine Learning for IoT Systems", I have been contacted by a senior software engineer from the vertical-farming company Intelligent Growth Solutions who saw my paper presentation in Bilbao, Spain. He has supplied me with a real-world vertical-farming dataset which has helped to verify that my research can be applied to solve real-world solutions.
Exploitation Route A long-term goal for my research is to develop software that will enable developers to directly implement my proposed fault-tolerance framework, so that they can easily deploy it in their own IoT environments. The software is designed to be pluggable, so that it can be deployed in any IoT system in the form of a "microservice" (i.e. a small, self-contained software package). This will enable future researchers to apply my framework and concepts for their own research.

My software, BoboCEP, is open source and fully documented, so it is available for other researchers to use for their own projects.
Sectors Agriculture, Food and Drink,Digital/Communication/Information Technologies (including Software),Electronics,Environment,Manufacturing, including Industrial Biotechology,Security and Diplomacy

 
Title BoboCEP 
Description BoboCEP is a complex event processing (CEP) engine designed for edge computing in Internet of Things (IoT) systems to provide inferential reasoning and decision making using stream data. It provides fault tolerance (FT) via the active replication of partially-completed complex events across multiple instances of the software using a message broker. 
Type Of Technology Software 
Year Produced 2019 
Open Source License? Yes  
Impact A short paper has been accepted that describes how and why the software was built. I am currently aiming to use this paper in collaborative efforts with other researchers. 
URL https://github.com/r3w0p/bobocep