Reinforcement Learning for Physically-Aware Cyber Defence

Lead Research Organisation: University of Liverpool
Department Name: Electrical Engineering and Electronics

Abstract

This project is about combining reinforcement learning with models for the physical (i.e., real-world) impact of decisions pertinent to cyber defence. More specifically, in a cyber defence context, it is currently relatively common to make decisions (e.g., relating to firewall configurations) in such a way that we maximise the availability of services running on machines in a computer network. The implicit assumption is that making the services available will result in an ability for the network to be used in the context of achieving some real-world effect.

However, it is relatively uncommon for the decision-making in cyber defence to explicitly reason about the relationship between service availability and the real-world impact of decisions. It can therefore be the case that cyber defence systems fail to adequately respond to situations where current real-world impacts motivate decisions that might be quite different to those that would normally be undertaken; for example, if a radar detects an imminent physical threat, a user might want the cyber defence system to be less concerned with whether certain services will be available in the future and more concerned with ensuring that the services that are critical to responding to the physical threat are fully functional now

The CDT is working with Aleph Insights, a London-based analytical consultancy and data science company who help organisations understand problems and develop the data and technologies required to solve them.

Previous collaborative work undertaken by the University of Liverpool and Aleph Insights in support of a UK Defence Science Technology Laboratory (Dstl) project has developed a proof-of-concept that demonstrates that it is possible to optimise cyber defence decisions in such a way that the real-world impact of those decisions is considered. The example considered centred on a radar system being used by a human operator to protect friendly forces from attack by hostile air targets. While the modelling of the radar system was relatively high fidelity, albeit in the guise of a computer game, the proof-of-concept involved a relatively simplistic model for the network's operation and did not exploit reinforcement learning to ensure that the decision making could reason about the long-term impact of short-term decisions. Ongoing work is developing an improved model for the network's operation and developing an agent that can mimic the human operator's actions such that fully automated simulations can be run.

While developing those aspects of the system are of some interest in the context of the PhD, the core focus for this project is to explore how the decision making could be extended to consider reinforcement learning. Using reinforcement learning is important in this context since cyber threats will often deploy discretely and then unleash their payloads when they perceive that will be most effective; it is therefore important to take actions now that have an eye to protecting the network in the future. Note that the use of reinforcement learning will necessitate the use of large-scale distributed computational resources that can run many (fully automated) simulations in parallel to iteratively improve performance during training.

Planned Impact

This CDT's focus on using "Future Computing Systems" to move "Towards a Data-driven Future" resonates strongly with two themes of non-academic organisation. In both themes, albeit for slightly different reasons, commodity data science is insufficient and there is a hunger both for the future leaders that this CDT will produce and the high-performance solutions that the students will develop.

The first theme is associated with defence and security. In this context, operational performance is of paramount importance. Government organisations (e.g., Dstl, GCHQ and the NCA) will benefit from our graduates' ability to configure many-core hardware to maximise the ability to extract value from the available data. The CDT's projects and graduates will achieve societal impact by enabling these government organisations to better protect the world's population from threats posed by, for example, international terrorism and organised crime.

There is then a supply chain of industrial organisations that deliver to government organisations (both in the UK and overseas). These industrial organisations (e.g., Cubica, Denbridge Marine, FeatureSpace, Leonardo, MBDA, Ordnance Survey, QinetiQ, RiskAware, Sintela, THALES (Aveillant) and Vision4ce) operate in a globally competitive marketplace where operational performance is a key driver. The skilled graduates that this CDT will provide (and the projects that will comprise the students' PhDs) are critical to these organisations' ability to develop and deliver high-performance products and services. We therefore anticipate economic impact to result from this CDT.

The second theme is associated with high-value and high-volume manufacturing. In these contexts, profit margins are very sensitive to operational costs. For example, a change to the configuration of a production line for an aerosol manufactured by Unilever might "only" cut costs by 1p for each aerosol, but when multiplied by half a billion aerosols each year, the impact on profit can be significant. In this context, industry (e.g., Renishaw, Rolls Royce, Schlumberger, ShopDirect and Unilever) is therefore motivated to optimise operational costs by learning from historic data. This CDT's graduates (and their projects) will help these organisations to perform such data-driven optimisation and thereby enable the CDT to achieve further economic impact.

Other organisations (e.g., IBM) provide hardware, software and advice to those operating in these themes. The CDT's graduates will ensure these organisations can be globally competitive.

The specific organisations mentioned above are the CDT's current partners. These organisations have all agreed to co-fund studentships. That commitment indicates that, in the short term, they are likely to be the focus for the CDT's impact. However, other organisations are likely to benefit in the future. While two (Lockheed Martin and Arup) have articulated their support in letters that are attached to this proposal, we anticipate impact via a larger portfolio of organisations (e.g., via studentships but also via those organisations recruiting the CDT's graduates either immediately after the CDT or later in the students' careers). Those organisations are likely to include those inhabiting the two themes described above, but also others. For example, an entrepreneurial CDT student might identify a niche in another market sector where Distributed Algorithms can deliver substantial commercial or societal gains. Predicting where such niches might be is challenging, though it seems likely that sectors that are yet to fully embrace Data Science while also involving significant turn-over are those that will have the most to gain: we hypothesise that niches might be identified in health and actuarial science, for example.

As well as training the CDT students to be the leaders of tomorrow in Distributed Algorithms, we will also achieve impact by training the CDT's industrial supervisors.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/S023445/1 01/04/2019 30/09/2027
2748812 Studentship EP/S023445/1 01/10/2022 30/09/2026 Adam Neal