Extreme learning to handle big data

Lead Research Organisation: Cranfield University

Department Name: Sch of Aerospace, Transport & Manufact

Abstract

As aerospace platforms go through their service life, gradual performance degradations and unwarranted system failures can occur. There is certain physical information known a priori in such aerospace platform operations. The main research hypothesis to be tested in this research is that it should be possible to significantly improve the performance of extreme learning and assure safe and reliable maintenance operation by integrating this prior knowledge into the learning mechanism.
The integrating should enable to guarantee certain properties of the learned functions, while keep leveraging the strength of the data-driven modelling. Most of, if not all, the traditional statistical methods are not suitable for big data due to their certain characteristics: heterogeneity, statistical biases, noise accumulations, spurious correlation, and incidental endogeneity. Therefore, big data demands new statistical thinking and methods. As data size increases, each feature and parameter also becomes highly correlated. Then, their relations get highly complicated too and hidden patterns of big data might not be possible to be captured by traditional modelling approaches.

Student:

Zhiyang Liu

Period of Study:

Jun 17 - Jun 21

Funder:

EPSRC

Project Status:

Closed

Project Category:

Studentship

Project Reference:

1893464

Research Topic:

Unclassified

Organisations

Cranfield University (Lead Research Organisation)

People	ORCID iD
Hyo-Sang Shin (Primary Supervisor)	http://orcid.org/0000-0001-9938-0370
Zhiyang Liu (Student)

Publications

Author Name

Title Publication Date Published

10 25 50

Studentship Projects

Project Reference	Relationship	Related To	Start	End	Student Name
EP/N509450/1			01/10/2016	30/09/2021
1893464	Studentship	EP/N509450/1	01/06/2017	01/06/2021	Zhiyang Liu

Key Findings
Further Funding


Description	My project is mainly about anomaly detection in flight data recorder data. The data is from NASA and can be freely downloaded. It contains 186 parameters such as longitude, latitude, speed, temperature, etc. I have used clustering to find anomalies in the flight data. It is a machine learning method to divide the data into groups where the data in the same group are similar. Very small clusters and isolated data points are then regarded as anomalies. I have tried many different clustering methods such as k-means, k-medoids and DBSCAN. Before doing clustering, the data has to be pre-processed. The sampling rates of the parameters are not unique. The sampling rates vary from one second to over one minute. In order for clustering to work, unifying the sampling rate is necessary. I have tried two pre-processing methods: aggregation and down-sampling. In aggregation, we use aggregation functions to summarize each flight. In down-sampling, we unify the sampling rate by deleting some of the data. When aggregation is used, the size of the data is significantly reduced. The running time is just several seconds. However, aggregation has a disadvantage in that it only tells us which flights are abnormal, but not the exact time when the anomaly occurs. When down-sampling is used, the size of the data is also reduced but is a lot larger; it takes around 10 minutes to perform the clustering, but the exact time when the anomaly occurs can be found. Many abnormal behaviours in the flights are found. For example, in one of the flights, the temperature is abnormally high. It may be due to overheating, or the temperature sensors malfunctioning. I am currently trying other anomaly detection methods such as density-based methods and regression-based methods.
Exploitation Route	My outcomes could be used to detect potential faults and abnormal behaviours from the data in flight data recorders early so that they can be repaired.
Sectors	Aerospace, Defence and Marine


Description	Extreme learning to handle big data
Amount	£43,626 (GBP)
Funding ID	1893464
Organisation	Engineering and Physical Sciences Research Council (EPSRC)
Sector	Public
Country	United Kingdom
Start	06/2017
End	12/2020

Abstract

Organisations

People

ORCID iD

Publications

Studentship Projects