Deep Learning Methods for the Analysis of Cyber Behaviour and Detection of Cyber Risk (EPSRC ICASE / AIRBUS)

Lead Research Organisation: Cardiff University
Department Name: Computer Science

Abstract

Automatic malware detection is necessary to handle the volume of new samples discovered daily. Malware authors attempt to evade automatic detection by manipulating code but find it difficult to alter the behaviour of malware as this usually intrinsic to the core functionality of the malware. Behavioural malware detection using deep neural networks gives state of the art detection accuracy but is costly in terms of time and computational resources, making it unsuitable for front-line detection on endpoints and better suited to non-time critical analysis.

The project investigates how deep learning behavioural detection mechanisms could be used for endpoint protection as part of the front-line of defence. The research will explore ways to reduce the time and computational overheads of detection. The temporal reduction will examine the necessity of time spent collecting data from malware as well as the time required for inference from the deep learning model. The computational overhead reduction will consider the architecture of the deep learning models, data collection and hardware components. Throughout the experiments it will be important to consider real damage prevention to the user and any vulnerabilities of the proposed solution, previous work has not combined practical user benefits with behavioural endpoint protection. Relevant research areas include digital signal processing, operational research, and artificial intelligence technologies.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/P510452/1 01/10/2016 30/09/2021
1852525 Studentship EP/P510452/1 01/10/2016 31/05/2020 Matilda Rhode
 
Description Malicious software (malware) is growing in number as the potential rewards of malware attacks increase. Malware authors try to avoid their software being detected by antivirus engines by manipulating the code to appear slightly different from malware that has been seen before. We can largely circumvent this problem by looking at the behaviour of the malware (in a safe environment) when it is running on a machine, as the behaviour is closely linked to the goals of the malware so cannot easily be disguised. For example an attacker wanting to steal 3TB of data will have to move that data at some point and we can measure this happening. The drawback of this approach is that is takes several minutes, and users will not wait that long for their files to be analysed. We have found that the time can be reduced to 5 seconds whilst maintaining a high degree of accuracy.

Further to this work, we have begun research into automatic agents that monitor the behaviour of systems to detect and kill malicious processes. Humans are usually unable to react to alerts that a system is infected quickly enough to prevent damage, especially for many attacks which lie dormant until the middle of the night to reduce the chance of a human seeing the alerts in realtime. Out recent work has found that the proportion of files encrypted or damaged by ransomware can be reduced by 90% using a live monitoring tool.
Exploitation Route We are currently benchmarking our findings against commercial products with our industrial partner, Airbus, to see the robustness of our approach in a real-world setting. If the model gives comparable results it could form the basis of an internal or commercial defence product.
Sectors Aerospace, Defence and Marine,Digital/Communication/Information Technologies (including Software),Security and Diplomacy

URL https://www.sciencedirect.com/science/article/pii/S0167404818305546
 
Description Our findings thus far have been that malicious software can be caught using behavioural detection systems on a far shorter timescale than has previously been used in literature. We are currently benchmarking the benefits of these findings against commercial products used by the industry partner of our partnership (Airbus) to see if it is beneficial for the organisation to incorporate our model into their cyber security defensive systems. This data has led to additional funding and collaboration between the industrial and academic partner.
First Year Of Impact 2020
Sector Aerospace, Defence and Marine,Digital/Communication/Information Technologies (including Software),Security and Diplomacy
Impact Types Economic

 
Title Early Stage Malware Prediction 
Description The dataset gives behavioural data for ~4000 software samples, 2000 benign and 2000 malicious running in a virtual machine. The samples are all Windows executables, and the data are machine activity metrics captured every second including CPU usage, RAM usage and packets being sent and recieved. 
Type Of Material Database/Collection of data 
Year Produced 2018 
Provided To Others? Yes  
Impact First paper to look at early-stage prediction of whether or not a file is malicious, published (see publications section). Public dataset being used by other researchers, as malware datasets quickly go out of date and are often kept secret from industrial research groups due to security/competition concerns. 
URL http://research.cardiff.ac.uk/converis/portal/Dataset/50524986?auxfun=&lang=en_GB
 
Title Early Prediction model for malware detection 
Description The software predicts whether or not a file is malicious based on its early machine activity. 
Type Of Technology Software 
Year Produced 2017 
Open Source License? Yes  
Impact The software is freely available on Github and has been used by other academic researchers. We do not know how else it has been used but it has been "forked" on github 14 times and "starred" 16 times, indicating that at least 28 individuals or groups have used all or parts of the software for further research. It is also possible to use the code without "forking" or "starring" the codebase so the impact may be wider. 
URL https://github.com/mprhode/malware-prediction-rnn
 
Description Presentation to industry group as part of IBM's PowerAI series 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Industry/Business
Results and Impact The PowerAI series is a lecture series organised by IBM to discuss artificial intelligence applications. I was invited to speak at one of their events in Bristol and spoke about my PhD research (the research funded by this project). The attendees asked a number of questions and were particularly interested in how the academic research would be developed into a real-world product.
Year(s) Of Engagement Activity 2018
URL https://www.meetup.com/ibmaisouthwest/events/254786063/