Risk Measures for MDPs

Lead Research Organisation: Birkbeck, University of London
Department Name: Computer Science and Information Systems

Abstract

Risk is a very important concept in many areas of every day life. Risk is the potential of gaining or losing something of value. The values include physical health, social status, emotional well-being or financial wealth, etc. Values can be gained or lost when taking risk resulting from a given action or inaction, foreseen or unforeseen. In real-life, many decisions are made to averse risks. For instance, most factories would choose to produce at a lower speed to avoid high risk of flawed products. Likewise, many people would not choose high-stake asset allocation portfolios with the possibility of high wins and losses. There are, however, other cases where risk-seeking actions are preferred, such as gambling. It is then of utmost importance to take risks into consideration when modelling a system.

Markov decision processes (MDPs) is a general mathematical framework to model a system. They are used in a wide area of disciplines, such as robotics, automated control, economics, and manufacturing. In many applications modelled by MDPs, it is crucial to incorporate some measure of risk to rule out, for instance, policies that achieve a high expected reward at the cost of risky and error-prone actions. As a result, risk-sensitive optimality criteria for MDPs were put forward.

In the last decade, the notion of risk measures has become very popular. Intuitively a risk measure is a function that maps a cost or reward to a real value, and the aim is to minimise the risk measure. Despite the existing work on risk measures in MDPs, there are still many questions to be answered in this area: what is the computational complexity, how to develop efficient algorithms, or how to provide effective tool support, just to name a few. The proposed research is to address those questions in depth.

The results will shed some lights on whether the risk measure minimisation problem can be performed efficiently at all. If so, how to perform efficiently? If not, is there an efficient algorithm to compute a close enough solution? On top of the algorithmic results, how to develop a ready-to-use tool to calculate the minimal risks? The ultimate goal is to provide a strategy to guide the decision making so that risks are minimised. This can, for instance, help people distribute their asset portfolio, or give advice on the manufacturing processing, or control the robot to deliver a safe and cost-economic path, etc.

Planned Impact

The project has the ambition to make a "full tour" of solving risk-sensitive Markov
decision processes (MDP) techniques: from theoretical development to implementation of tools,
to real-life case studies.

An immediate impact would be the enhancement of the existing probabilistic
model checkers and the improved robot navigation application as well as a financial risk measurement.
For the success of achieving impact, we will jointly work with collaborators.
For disseminating our results, we will make our software tool freely available, submit papers to
mainstream formal methods conferences, and organise a workshop at the end of the project.

We plan to demonstrate our results by two case studies that have their deep roots in industry.

1. Impact to AI planning. The first case study is a further step to improve the robot
navigation by designing more efficient algorithms in planning with risk sensitive objectives. One
of the most important and challenging components of a mobile robot is the navigation system
where a robot must be able to optimise the use of its resources while avoiding risky zones (e.g.,
electric substations) or under very urgent temporal deadlines and reach inspection goals, or
interest locations. To navigate reliably in a real environment, autonomy and rationality are
always required.

The proposed algorithms aim to help the robots make a quicker decision and
keep out of the dangerous zones while saving fuel, batteries or time. We will compare our results
with the existing benchmarks and if ours turn out to be more fuel and time efficient while
successfully staying safe, we will post our results on the imperial robotics forum that involves
21 laboratories from academia and industry. Recently some self-driving delivery robots are
heading to London to carry out delivery trails in Greenwich. This is a typical application for
our problem, where robots need to avoid obstacles, save batteries and deliver on time. We intend
to get in touch with the company Starship Technologies and seek for chances to collaborate.

2. Impact to finance. The second case study is to improve the asset allocation
returns. The global financial crisis in 2008 caused investors to question what went wrong with
many of their portfolios, which were believed to be diversified. A growing amount of literature on
portfolio construction approaches focused on risks and diversification rather than on
estimating expected returns. The portfolio composition is a model of risk on top of an MDP. Our
work will first look into past real asset allocation data from the global financial data website
and build risk-sensitive MDPs based on different risk measures. The modelling part may involve
the expertise from financial sectors.

As a result, We have contacted Dr. Guoqiang George Yang,
the Vice President of the Counterparty Credit Risk Quant department (see the attached letter of support)
from Bank of America Merrill Lynch. We plan to make visits to the bank's London branch.
The resulting investment strategies will be compared with the real
decisions. If the resulting strategies turn out to be superior in terms of returns in most cases,
our model will be reported to Bank of America Merrill Lynch. Further collaboration may be
carried out.

Publications

10 25 50
 
Description The research mainly explored the deep reinforcement learning using MDP models that have been used to minimise risks, e.g., in the biometric community - to enhance security levels by verifying people through their finger textures, or in the autonomous driving applications - to increase the accuracy of the driving actions to be taken by the car.
Exploitation Route The results in both the biometric and autonomous driving areas are very promising. The techniques may be further investigated and be widely used in other areas where a decision should be made automatically, or risk-critical areas.
Sectors Digital/Communication/Information Technologies (including Software),Electronics,Security and Diplomacy,Transport

 
Description The work in this research went in two directions. The first direction was on fuzzy systems and the second direction focused on applying the deep learning algorithms to a few applications, including 1) using biometrics to identify, verify and recognise people; 2) autonomous driving; and 3) software engineering to summarise source code, generate code comments and generate code itself. All these research directions would directly or indirectly reduce the risk in various scenarios which are most critical. We will summarise here how our work is delivering impact on a wider range of research. Within the context of this project, we have studied fuzzy systems, and this has delivered a considerable impact on the related research. For instance, Jain et al studied fuzzy modal logic, Nguyen et al considered various (approximate) bisimulation on fuzzy transition systems. All of these are either based on, or inspired by our work. According to Google Scholar, as of 2013, our work in this line has recorded about 20 citations. The work on the applications is very recent, where the non-academic impacts are yet to be seen. However, we do anticipate a wide range of economic and societal impacts.
First Year Of Impact 2021
Impact Types Societal,Economic