Bandits and beyond: Index heuristics for dynamic resource allocation
Lead Research Organisation:
Lancaster University
Department Name: Mathematics and Statistics
Abstract
The research programme is concerned with the development of simple, yet effective methods for determining how some key resource should be distributed over time among a collection of entities which require it. To fix some ideas, consider the example described in the following imagined scenario, referred to in the literature as the 'stochastic multiproduct batch dispatch problem' :Different types of products arrive at a holding station according to some random process. They are kept there until dispatched onward (to a retailer, say) by one of a fleet of vehicles. Costs are incurred by the items kept in storage at the station. The nature of these holding costs may differ markedly between product types. Products may also differ with regard to their space requirements for transportation. At the beginning of each time period (the start of each day, say), a decision has to be made regarding how many vehicles should be dispatched and what the composition of their loads should be. Such decisions may be based in part, say, on the number of units of each product type at the station at the time. The goal is to take such decisions in a way that minimises the overall costs incurred in holding the products and in dispatching the vehicles. To use some mathematical jargon, the decision-maker is looking for a dynamic policy for the allocation of her key resource (ie, a rule for the deployment of the fleet of vehicles) among a set of entities (here the product types) in a situation which is both stochastic (evolves randomly) and complex. How to develop such a policy is extremely difficult, not least because decisions must be assessed not only in terms of their immediate impact on costs but also with regard to their influence on the costs which will be incurred in the future. The conventional approach to such problems, called stochastic dynamic programming, is unlikely to be able to cope well with problems of realistic size . What would be invaluable to the decision-maker would be some effective, yet reasonably simple and computationally tractable, way of calibrating the value to the respective product types of (differing amounts of) space on the vehicles. Such calibrations could then be used to inform decision-making. Simple so-called index-based solutions do indeed exist (and are very effective), but mainly for (bandit-type) problems which impose serious limitations on how the key resource may be distributed among the entities. The goal of the research project is to extend the scope of such simple index-based solution approaches to more general allocation problems which are freed of such restrictions . There is a rich variety of practical situations which share the broad features of the above example. The methodologies developed during the research programme will yield simple and effective index-based approaches to resource allocation in many such settings.
Organisations
People |
ORCID iD |
Kevin Glazebrook (Principal Investigator) |
Publications
Glazebrook K
(2009)
A Generalized Gittins Index for a Class of Multiarmed Bandits with General Resource Requirements
in Mathematics of Operations Research
Hodge D
(2011)
Dynamic resource allocation in a multi-product make-to-stock production system
in Queueing Systems
Glazebrook K
(2011)
General notions of indexability for queueing control and asset management
in The Annals of Applied Probability
Glazebrook K
(2009)
Index Policies for the Admission Control and Routing of Impatient Customers to Heterogeneous Service Stations
in Operations Research
John Gittins
(2011)
Multi-armed bandit allocation indices
Hodge D. J.
(2015)
ON THE ASYMPTOTIC OPTIMALITY OF GREEDY INDEX HEURISTICS FOR MULTI-ACTION RESTLESS BANDITS
in ADVANCES IN APPLIED PROBABILITY
Glazebrook K
(2013)
Stochastic scheduling: A short history of index policies and new approaches to index generation for dynamic resource allocation
in Journal of Scheduling
Description | We have discovered that the notion of indexing, namely the evaluation of decision options by giving each one a simple score, is powerful in a much wider range of scenarios that was thought. |
Exploitation Route | Development of the theory for application in, say, queueing control and asset management. |
Sectors | Other |