Extreme learning to handle big data

Lead Research Organisation: Cranfield University
Department Name: Sch of Aerospace, Transport & Manufact

Abstract

As aerospace platforms go through their service life, gradual performance degradations and unwarranted system failures can occur. There is certain physical information known a priori in such aerospace platform operations. The main research hypothesis to be tested in this research is that it should be possible to significantly improve the performance of extreme learning and assure safe and reliable maintenance operation by integrating this prior knowledge into the learning mechanism.
The integrating should enable to guarantee certain properties of the learned functions, while keep leveraging the strength of the data-driven modelling. Most of, if not all, the traditional statistical methods are not suitable for big data due to their certain characteristics: heterogeneity, statistical biases, noise accumulations, spurious correlation, and incidental endogeneity. Therefore, big data demands new statistical thinking and methods. As data size increases, each feature and parameter also becomes highly correlated. Then, their relations get highly complicated too and hidden patterns of big data might not be possible to be captured by traditional modelling approaches.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/N509450/1 01/10/2016 30/09/2021
1893464 Studentship EP/N509450/1 01/06/2017 01/06/2021 Zhiyang Liu
 
Description My project is mainly about anomaly detection in flight data recorder data. The data is from NASA and can be freely downloaded. It contains 186 parameters such as longitude, latitude, speed, temperature, etc.

I have used clustering to find anomalies in the flight data. It is a machine learning method to divide the data into groups where the data in the same group are similar. Very small clusters and isolated data points are then regarded as anomalies. I have tried many different clustering methods such as k-means, k-medoids and DBSCAN.

Before doing clustering, the data has to be pre-processed. The sampling rates of the parameters are not unique. The sampling rates vary from one second to over one minute. In order for clustering to work, unifying the sampling rate is necessary. I have tried two pre-processing methods: aggregation and down-sampling. In aggregation, we use aggregation functions to summarize each flight. In down-sampling, we unify the sampling rate by deleting some of the data.

When aggregation is used, the size of the data is significantly reduced. The running time is just several seconds. However, aggregation has a disadvantage in that it only tells us which flights are abnormal, but not the exact time when the anomaly occurs. When down-sampling is used, the size of the data is also reduced but is a lot larger; it takes around 10 minutes to perform the clustering, but the exact time when the anomaly occurs can be found.

Many abnormal behaviours in the flights are found. For example, in one of the flights, the temperature is abnormally high. It may be due to overheating, or the temperature sensors malfunctioning.

I am currently trying other anomaly detection methods such as density-based methods and regression-based methods.
Exploitation Route My outcomes could be used to detect potential faults and abnormal behaviours from the data in flight data recorders early so that they can be repaired.
Sectors Aerospace, Defence and Marine

 
Description Extreme learning to handle big data
Amount £43,626 (GBP)
Funding ID 1893464 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 06/2017 
End 12/2020