Modelling computer interactions as language for anomaly detection

Lead Research Organisation: University of Edinburgh

Department Name: Sch of Informatics

Abstract

The overall aim of this project is to enable machine-learning-based methods to capture the fine-grained structure and temporal interpendencies of computer interactions. Computer programs communicate with each other in very structured, but not necessarily deterministic ways, similar to human speech. Methods in natural language processing (NLP) have recently been very successful at creating very accurate model of the structure observed in human speech. The key idea of this project is to transfer the success of NLP models to data captured from computer interactions, mainly network traffic and potentially process and system logs, and build a computer language model that accurately captures the structure of benign interactions. Such a model can then be used to identify malicious exploits of common communication protocols as structural deviations.

Some research questions formulated for this project include:

1. How well-structured is the space of computer interactions observed in the traffic of a computer? How much does noise or input variation blur the observable contextual differences between clearly distinct actions?

2. To what degree can structure in computer interactions be captured in a model from a training dataset, and how can we achieve this? How can a model adapt to changes of normal contextual structures?

3. What is a meaningful representation of computer interaction structures? What requirements must a labelled traffic generation framework fulfill to provide realistic data?

4. What kind of attacks will a computer interaction model be able to prevent? How can an interaction model be evaded

Applications of a computer interaction language model lie predominantly in intrusion detection. The particular contribution a language model would bring in contrast to the established rule-based or signature detection method is that a language model is independent of an attack database and therefore capable of detecting previously unseen attacks. In comparison to other work in anomaly-based intrusion detection is this work focused more on small-scale structures that are better suited at detecting exploits directly rather then the effects of exploitation.

A computer interaction model is also very suited for the analysis of data streams. The projections into a vector space that a language model creates to condense the observed structural information are a lot more suitable for the use of comparison metrics than the original input sequences, and can be used to analyse the heterogeneity of a dataset or potential vulnerabilities to privacy leakage.

Student:

Henry Clausen

Period of Study:

Apr 18 - Mar 21

Funder:

EPSRC

Project Status:

Closed

Project Category:

Studentship

Project Reference:

2371853

Research Topic:

Unclassified

Organisations

People	ORCID iD
David Aspinall (Primary Supervisor)
Henry Clausen (Student)

Publications

Author Name

Title Publication Date Published

10 25 50

Studentship Projects

Project Reference	Relationship	Related To	Start	End	Student Name
EP/N509644/1			01/10/2016	30/09/2021
2371853	Studentship	EP/N509644/1	01/04/2018	31/03/2021	Henry Clausen

Abstract

Organisations

People

ORCID iD

Publications

Studentship Projects