Deep Learning Methods for the Analysis of Cyber Behaviour and Detection of Cyber Risk (EPSRC ICASE / AIRBUS)

Lead Research Organisation: CARDIFF UNIVERSITY

Department Name: Computer Science

Abstract

Automatic malware detection is necessary to handle the volume of new samples discovered daily. Malware authors attempt to evade automatic detection by manipulating code but find it difficult to alter the behaviour of malware as this usually intrinsic to the core functionality of the malware. Behavioural malware detection using deep neural networks gives state of the art detection accuracy but is costly in terms of time and computational resources, making it unsuitable for front-line detection on endpoints and better suited to non-time critical analysis.

The project investigates how deep learning behavioural detection mechanisms could be used for endpoint protection as part of the front-line of defence. The research will explore ways to reduce the time and computational overheads of detection. The temporal reduction will examine the necessity of time spent collecting data from malware as well as the time required for inference from the deep learning model. The computational overhead reduction will consider the architecture of the deep learning models, data collection and hardware components. Throughout the experiments it will be important to consider real damage prevention to the user and any vulnerabilities of the proposed solution, previous work has not combined practical user benefits with behavioural endpoint protection. Relevant research areas include digital signal processing, operational research, and artificial intelligence technologies.

Student:

Matilda Rhode

Period of Study:

Sep 16 - May 20

Funder:

EPSRC

Project Status:

Closed

Project Category:

Studentship

Project Reference:

1852525

Research Topic:

Unclassified

Organisations

People	ORCID iD
Pete Burnap (Primary Supervisor)
Matilda Rhode (Student)

Publications

Author Name

Title Publication Date Published

10 25 50

Rhode M (2019) LAB to SOC: Robust Features for Dynamic Malware Detection

Rhode M (2018) Early-stage malware prediction using recurrent neural networks in Computers & Security

Studentship Projects

Project Reference	Relationship	Related To	Start	End	Student Name
EP/P510452/1			30/09/2016	29/09/2021
1852525	Studentship	EP/P510452/1	30/09/2016	30/05/2020	Matilda Rhode

Key Findings
Impact Summary
Research Databases and Models
Software and Technical Products
Engagement Activities


Description	Malicious software (malware) is growing in number as the potential rewards of malware attacks increase. Malware authors try to avoid their software being detected by antivirus engines by manipulating the code to appear slightly different from malware that has been seen before. We can largely circumvent this problem by looking at the behaviour of the malware (in a safe environment) when it is running on a machine, as the behaviour is closely linked to the goals of the malware so cannot easily be disguised. For example an attacker wanting to steal 3TB of data will have to move that data at some point and we can measure this happening. The drawback of this approach is that is takes several minutes, and users will not wait that long for their files to be analysed. We have found that the time can be reduced to 5 seconds whilst maintaining a high degree of accuracy. Further to this work, we have begun research into automatic agents that monitor the behaviour of systems to detect and kill malicious processes. Humans are usually unable to react to alerts that a system is infected quickly enough to prevent damage, especially for many attacks which lie dormant until the middle of the night to reduce the chance of a human seeing the alerts in realtime. Out recent work has found that the proportion of files encrypted or damaged by ransomware can be reduced by 90% using a live monitoring tool.
Exploitation Route	We are currently benchmarking our findings against commercial products with our industrial partner, Airbus, to see the robustness of our approach in a real-world setting. If the model gives comparable results it could form the basis of an internal or commercial defence product.
Sectors	Aerospace Defence and Marine Digital/Communication/Information Technologies (including Software) Security and Diplomacy
URL	https://www.sciencedirect.com/science/article/pii/S0167404818305546


Description	Our findings thus far have been that malicious software can be caught using behavioural detection systems on a far shorter timescale than has previously been used in literature. We are currently benchmarking the benefits of these findings against commercial products used by the industry partner of our partnership (Airbus) to see if it is beneficial for the organisation to incorporate our model into their cyber security defensive systems. This data has led to additional funding and collaboration between the industrial and academic partner.
First Year Of Impact	2020
Sector	Aerospace, Defence and Marine,Digital/Communication/Information Technologies (including Software),Security and Diplomacy
Impact Types	Economic


Title	Early Stage Malware Prediction
Description	The dataset gives behavioural data for ~4000 software samples, 2000 benign and 2000 malicious running in a virtual machine. The samples are all Windows executables, and the data are machine activity metrics captured every second including CPU usage, RAM usage and packets being sent and recieved.
Type Of Material	Database/Collection of data
Year Produced	2018
Provided To Others?	Yes
Impact	First paper to look at early-stage prediction of whether or not a file is malicious, published (see publications section). Public dataset being used by other researchers, as malware datasets quickly go out of date and are often kept secret from industrial research groups due to security/competition concerns.
URL	http://research.cardiff.ac.uk/converis/portal/Dataset/50524986?auxfun=&lang=en_GB


Title	Early Prediction model for malware detection
Description	The software predicts whether or not a file is malicious based on its early machine activity.
Type Of Technology	Software
Year Produced	2017
Open Source License?	Yes
Impact	The software is freely available on Github and has been used by other academic researchers. We do not know how else it has been used but it has been "forked" on github 14 times and "starred" 16 times, indicating that at least 28 individuals or groups have used all or parts of the software for further research. It is also possible to use the code without "forking" or "starring" the codebase so the impact may be wider.
URL	https://github.com/mprhode/malware-prediction-rnn


Description	Presentation to industry group as part of IBM's PowerAI series
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Regional
Primary Audience	Industry/Business
Results and Impact	The PowerAI series is a lecture series organised by IBM to discuss artificial intelligence applications. I was invited to speak at one of their events in Bristol and spoke about my PhD research (the research funded by this project). The attendees asked a number of questions and were particularly interested in how the academic research would be developed into a real-world product.
Year(s) Of Engagement Activity	2018
URL	https://www.meetup.com/ibmaisouthwest/events/254786063/

Abstract

Organisations

People

ORCID iD

Publications

Studentship Projects