Deep Learning Methods for the Analysis of Cyber Behaviour and Detection of Cyber Risk (EPSRC ICASE / AIRBUS)
Lead Research Organisation:
CARDIFF UNIVERSITY
Department Name: Computer Science
Abstract
Automatic malware detection is necessary to handle the volume of new samples discovered daily. Malware authors attempt to evade automatic detection by manipulating code but find it difficult to alter the behaviour of malware as this usually intrinsic to the core functionality of the malware. Behavioural malware detection using deep neural networks gives state of the art detection accuracy but is costly in terms of time and computational resources, making it unsuitable for front-line detection on endpoints and better suited to non-time critical analysis.
The project investigates how deep learning behavioural detection mechanisms could be used for endpoint protection as part of the front-line of defence. The research will explore ways to reduce the time and computational overheads of detection. The temporal reduction will examine the necessity of time spent collecting data from malware as well as the time required for inference from the deep learning model. The computational overhead reduction will consider the architecture of the deep learning models, data collection and hardware components. Throughout the experiments it will be important to consider real damage prevention to the user and any vulnerabilities of the proposed solution, previous work has not combined practical user benefits with behavioural endpoint protection. Relevant research areas include digital signal processing, operational research, and artificial intelligence technologies.
The project investigates how deep learning behavioural detection mechanisms could be used for endpoint protection as part of the front-line of defence. The research will explore ways to reduce the time and computational overheads of detection. The temporal reduction will examine the necessity of time spent collecting data from malware as well as the time required for inference from the deep learning model. The computational overhead reduction will consider the architecture of the deep learning models, data collection and hardware components. Throughout the experiments it will be important to consider real damage prevention to the user and any vulnerabilities of the proposed solution, previous work has not combined practical user benefits with behavioural endpoint protection. Relevant research areas include digital signal processing, operational research, and artificial intelligence technologies.
People |
ORCID iD |
Pete Burnap (Primary Supervisor) | |
Matilda Rhode (Student) |
Publications
Rhode M
(2019)
LAB to SOC: Robust Features for Dynamic Malware Detection
Rhode M
(2018)
Early-stage malware prediction using recurrent neural networks
in Computers & Security
Studentship Projects
Project Reference | Relationship | Related To | Start | End | Student Name |
---|---|---|---|---|---|
EP/P510452/1 | 30/09/2016 | 29/09/2021 | |||
1852525 | Studentship | EP/P510452/1 | 30/09/2016 | 30/05/2020 | Matilda Rhode |
Description | Malicious software (malware) is growing in number as the potential rewards of malware attacks increase. Malware authors try to avoid their software being detected by antivirus engines by manipulating the code to appear slightly different from malware that has been seen before. We can largely circumvent this problem by looking at the behaviour of the malware (in a safe environment) when it is running on a machine, as the behaviour is closely linked to the goals of the malware so cannot easily be disguised. For example an attacker wanting to steal 3TB of data will have to move that data at some point and we can measure this happening. The drawback of this approach is that is takes several minutes, and users will not wait that long for their files to be analysed. We have found that the time can be reduced to 5 seconds whilst maintaining a high degree of accuracy. Further to this work, we have begun research into automatic agents that monitor the behaviour of systems to detect and kill malicious processes. Humans are usually unable to react to alerts that a system is infected quickly enough to prevent damage, especially for many attacks which lie dormant until the middle of the night to reduce the chance of a human seeing the alerts in realtime. Out recent work has found that the proportion of files encrypted or damaged by ransomware can be reduced by 90% using a live monitoring tool. |
Exploitation Route | We are currently benchmarking our findings against commercial products with our industrial partner, Airbus, to see the robustness of our approach in a real-world setting. If the model gives comparable results it could form the basis of an internal or commercial defence product. |
Sectors | Aerospace Defence and Marine Digital/Communication/Information Technologies (including Software) Security and Diplomacy |
URL | https://www.sciencedirect.com/science/article/pii/S0167404818305546 |
Description | Our findings thus far have been that malicious software can be caught using behavioural detection systems on a far shorter timescale than has previously been used in literature. We are currently benchmarking the benefits of these findings against commercial products used by the industry partner of our partnership (Airbus) to see if it is beneficial for the organisation to incorporate our model into their cyber security defensive systems. This data has led to additional funding and collaboration between the industrial and academic partner. |
First Year Of Impact | 2020 |
Sector | Aerospace, Defence and Marine,Digital/Communication/Information Technologies (including Software),Security and Diplomacy |
Impact Types | Economic |
Title | Early Stage Malware Prediction |
Description | The dataset gives behavioural data for ~4000 software samples, 2000 benign and 2000 malicious running in a virtual machine. The samples are all Windows executables, and the data are machine activity metrics captured every second including CPU usage, RAM usage and packets being sent and recieved. |
Type Of Material | Database/Collection of data |
Year Produced | 2018 |
Provided To Others? | Yes |
Impact | First paper to look at early-stage prediction of whether or not a file is malicious, published (see publications section). Public dataset being used by other researchers, as malware datasets quickly go out of date and are often kept secret from industrial research groups due to security/competition concerns. |
URL | http://research.cardiff.ac.uk/converis/portal/Dataset/50524986?auxfun=&lang=en_GB |
Title | Early Prediction model for malware detection |
Description | The software predicts whether or not a file is malicious based on its early machine activity. |
Type Of Technology | Software |
Year Produced | 2017 |
Open Source License? | Yes |
Impact | The software is freely available on Github and has been used by other academic researchers. We do not know how else it has been used but it has been "forked" on github 14 times and "starred" 16 times, indicating that at least 28 individuals or groups have used all or parts of the software for further research. It is also possible to use the code without "forking" or "starring" the codebase so the impact may be wider. |
URL | https://github.com/mprhode/malware-prediction-rnn |
Description | Presentation to industry group as part of IBM's PowerAI series |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Regional |
Primary Audience | Industry/Business |
Results and Impact | The PowerAI series is a lecture series organised by IBM to discuss artificial intelligence applications. I was invited to speak at one of their events in Bristol and spoke about my PhD research (the research funded by this project). The attendees asked a number of questions and were particularly interested in how the academic research would be developed into a real-world product. |
Year(s) Of Engagement Activity | 2018 |
URL | https://www.meetup.com/ibmaisouthwest/events/254786063/ |