Modelling computer interactions as language for anomaly detection

Lead Research Organisation: University of Edinburgh
Department Name: Sch of Informatics


The overall aim of this project is to enable machine-learning-based methods to capture the fine-grained structure and temporal interpendencies of computer interactions. Computer programs communicate with each other in very structured, but not necessarily deterministic ways, similar to human speech. Methods in natural language processing (NLP) have recently been very successful at creating very accurate model of the structure observed in human speech. The key idea of this project is to transfer the success of NLP models to data captured from computer interactions, mainly network traffic and potentially process and system logs, and build a computer language model that accurately captures the structure of benign interactions. Such a model can then be used to identify malicious exploits of common communication protocols as structural deviations.

Some research questions formulated for this project include:

1. How well-structured is the space of computer interactions observed in the traffic of a computer? How much does noise or input variation blur the observable contextual differences between clearly distinct actions?

2. To what degree can structure in computer interactions be captured in a model from a training dataset, and how can we achieve this? How can a model adapt to changes of normal contextual structures?

3. What is a meaningful representation of computer interaction structures? What requirements must a labelled traffic generation framework fulfill to provide realistic data?

4. What kind of attacks will a computer interaction model be able to prevent? How can an interaction model be evaded

Applications of a computer interaction language model lie predominantly in intrusion detection. The particular contribution a language model would bring in contrast to the established rule-based or signature detection method is that a language model is independent of an attack database and therefore capable of detecting previously unseen attacks. In comparison to other work in anomaly-based intrusion detection is this work focused more on small-scale structures that are better suited at detecting exploits directly rather then the effects of exploitation.

A computer interaction model is also very suited for the analysis of data streams. The projections into a vector space that a language model creates to condense the observed structural information are a lot more suitable for the use of comparison metrics than the original input sequences, and can be used to analyse the heterogeneity of a dataset or potential vulnerabilities to privacy leakage.


10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/N509644/1 01/10/2016 30/09/2021
2371853 Studentship EP/N509644/1 01/04/2018 31/03/2021 Henry Clausen