Quantitative Termination Approximation Analysis of One-Counter Markov Decision Processes

Lead Research Organisation: University of Edinburgh

Department Name: Sch of Informatics

Abstract

A Markov Decision Process(MDP) is a stochastic, controlled, discrete-time process, which is played over a transition graph. A player controls a specified subset of states, choosing their own strategy on how to pick transitions and where to move to next from those states. The rest of the states fall under nature/uncertainty's possession, with pre-defined probability distributions over outgoing transitions. Adding an unbounded counter, that changes by at most 1 at each step of the game, builds the class of one-counter(OC)-MDPs.

This project looks at the analysis of the so-called termination objective over this game, which is the probability of reaching counter value 0 at any state, provided a beginning configuration of given state and counter>0. Depending on whether it is a maximization or minimization OC-MDP, the goal of the player is to maximize or minimize, respectively, this probability.

It has previously been proven that the qualitative decision part of this objective, i.e. "is optimal probability 1 or 0?", is decidable in polynomial time. As for the quantitative part, i.e. "is termination value at least p?", it has been shown that the question of whether such problems are computable is open and that computing an approximation of the optimal game value and (epsilon-)optimal strategies can be done in time exponential in the size of the game. The focus of this research aims at improving the complexity from exponential to polynomial time for such games and the termination approximation problem.

Achieving this has its benefits, since one-counter games are applicable elsewhere. For example, the classes of discrete-time Quasi-Birth-Death processes and Solvency games are subsumed by OC-MDPs, meaning this type of process is widely used - in queueing theory and investment/gambling.

As a milestone, this problem is the focus for this first year. After that, there are many other game-theory-related and/or MDP-related problems that could be analysed.

Student:

Emanuel Martinov

Period of Study:

Oct 16 - Aug 20

Funder:

EPSRC

Project Status:

Closed

Project Category:

Studentship

Project Reference:

1789473

Research Topic:

Unclassified

Organisations

University of Edinburgh (Lead Research Organisation)

People	ORCID iD
Kousha Etessami (Primary Supervisor)
Emanuel Martinov (Student)

Publications

Author Name Title Publication Date Published

10 25 50

Etessami K (2019) Reachability for Branching Concurrent Stochastic Games

Etessami K (2020) Qualitative Multi-objective Reachability for Ordered Branching MDPs

Studentship Projects

Project Reference	Relationship	Related To	Start	End	Student Name
EP/N509644/1			01/10/2016	30/09/2021
1789473	Studentship	EP/N509644/1	01/10/2016	30/08/2020	Emanuel Martinov

Key Findings
Further Funding


Description	Initially, I studied my starting PhD question, namely searching for a polynomial time algorithm for the Quantitative Termination Approximation problem for One-Counter MDPs. It is a very interesting open question, with an already known exponential time algorithm for it. Some observations were made, further showing the difficulty of finding an answer to the question. Then I moved to Branching Processes (BPs) and to problems related to this model. In particular, polynomial time algorithms were given for deciding almost-sure and limit-sure reachability in Branching Concurrent Stochastic Games (BCSGs). These are a class of infinite-state imperfect information stochastic games that generalize both finite-state concurrent stochastic reachability games, as well as branching simple stochastic reachability games. The paper, containing the results, is a joint work with Kousha Etessami, Alistair Stewart and Mihalis Yannakakis and appeared in ICALP'19. Afterwards, I worked on the analysis of Ordered Branching Markov Decision Processes, which is a new model that I and my supervisor, Kousha Etessami, introduced, under the qualitative version of the multi-objective reachability problem, which is a natural extension to the previously studied qualitative reachability problem. This work was published in a paper in RP'20. In my third year of the PhD, I also immersed myself into another area of game theory, namely coalitional games. I obtained results (not yet published) in the sub-areas of coalition formation games and transferable-utility coalitional games. I have already submitted and defended my PhD thesis.
Exploitation Route	The findings related to the models of Branching Processes and Ordered Branching Processes are all work towards obtaining broader and generalised results for model checking for (O)BMDPs, (O)BSSGs, (O)BCSGs. So the findings can certainly be of use to people working in this area of interest. The findings related to coalitional games are another stepping stone towards analysing different notions of stability in such multi-player games, which has had increasing interest in the past 15-20 years. Such analysis can contribute to other industries utilizing such games, which is the reason why people are investigating it.
Sectors	Aerospace, Defence and Marine,Chemicals,Creative Economy,Digital/Communication/Information Technologies (including Software),Education,Electronics,Energy,Environment,Financial Services, and Management Consultancy,Healthcare,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology


Description	SICSA Research Scholar
Amount	£400 (GBP)
Organisation	SICSA Scottish Informatics and Computer Science Alliance
Sector	Academic/University
Country	United Kingdom
Start	07/2019
End	07/2019

Abstract

Organisations

People

ORCID iD

Publications

Studentship Projects