Citation

Glazebrook, K.D; Hodge, D.J; Kirkbride, C and Minty, R.J Stochastic scheduling: A short history of index policies and some key recent developments. In proceedings of the 5th Multidisciplinary International Conference on Scheduling : Theory and Applications (MISTA 2011), 9-11 August 2011, Phoenix, Arizona, USA, pages 94-112, 2011.

Paper


Abstract

A multi-armed bandit problem classically concerns N >= 2 populations of rewards whose statistical properties are unknown (or at least only partly known). A decision-maker secures rewards by sampling sequentially from the populations, using past sampled values to make inferences about the populations and so guide the choice of which population to sample next. The goal is to make these choices in such a way as to maximise some measure of total reward secured. Such problems embody, in a particularly simple form, the dichotomy present in many decision problems between making decisions with a view to securing information which can improve future decision-making (exploration) and those which exploit the information already available (exploitation). In the 1970's John Gittins discovered that important classes of such multi-armed bandit problems have solutions of a particularly simple form: at each stage of the sampling compute an index (the Gittins index ) for each of the N populations, namely a function of the rewards already sampled from the population concerned. Always sample next from the population with the largest current index. Moreover, the index concerned has a simple interpretation as an equivalent known reward for the population concerned. It emerges that many problems involving the sequential allocation of effort, some of quite different character to the above multi-armed bandit problems, have index solutions. Since the 1970's, Gittins' index result together with a range of developments and reformulations of it have constituted an influential stream of ideas and results contributing to research into the scheduling of stochastic objects. Application areas to which these ideas have contributed include approximate dynamic programming, control of queueing systems, fast fashion retail, machine maintenance, military logistics, optimal search, research planning, sensor management, communication channel usage and website morphing. The talk will give an overview of some key ideas and some recent developments.


pdf

You can download the pdf of this publication from here


doi

This publication does not have a doi, so we cannot provide a link to the original source

What is a doi?: A doi (Document Object Identifier) is a unique identifier for sicientific papers (and occasionally other material). This provides direct access to the location where the original article is published using the URL http://dx.doi/org/xxxx (replacing xxx with the doi). See http://dx.doi.org/ for more information



URL

This pubication does not have a URL associated with it.

The URL is only provided if there is additional information that might be useful. For example, where the entry is a book chapter, the URL might link to the book itself.


Bibtex

@INPROCEEDINGS{2011-094-112-P, author = {K.D. Glazebrook and D.J. Hodge and C. Kirkbride and R.J. Minty},
title = {Stochastic scheduling: A short history of index policies and some key recent developments},
booktitle = {In proceedings of the 5th Multidisciplinary International Conference on Scheduling : Theory and Applications (MISTA 2011), 9-11 August 2011, Phoenix, Arizona, USA},
year = {2011},
editor = {J. Fowler and G. Kendall and B. McCollum},
pages = {94--112},
note = {Paper},
abstract = {A multi-armed bandit problem classically concerns N >= 2 populations of rewards whose statistical properties are unknown (or at least only partly known). A decision-maker secures rewards by sampling sequentially from the populations, using past sampled values to make inferences about the populations and so guide the choice of which population to sample next. The goal is to make these choices in such a way as to maximise some measure of total reward secured. Such problems embody, in a particularly simple form, the dichotomy present in many decision problems between making decisions with a view to securing information which can improve future decision-making (exploration) and those which exploit the information already available (exploitation). In the 1970's John Gittins discovered that important classes of such multi-armed bandit problems have solutions of a particularly simple form: at each stage of the sampling compute an index (the Gittins index ) for each of the N populations, namely a function of the rewards already sampled from the population concerned. Always sample next from the population with the largest current index. Moreover, the index concerned has a simple interpretation as an equivalent known reward for the population concerned. It emerges that many problems involving the sequential allocation of effort, some of quite different character to the above multi-armed bandit problems, have index solutions. Since the 1970's, Gittins' index result together with a range of developments and reformulations of it have constituted an influential stream of ideas and results contributing to research into the scheduling of stochastic objects. Application areas to which these ideas have contributed include approximate dynamic programming, control of queueing systems, fast fashion retail, machine maintenance, military logistics, optimal search, research planning, sensor management, communication channel usage and website morphing. The talk will give an overview of some key ideas and some recent developments.},
owner = {gxk},
timestamp = {2011.08.15},
webpdf = {2011-094-112-P.pdf} }