On the optimality of the Gittins index rule in multi-armed bandits with multiple plays

Pandelis, Dimitrios G.; Teneketzis, Demosthenis

doi:10.1109/cdc.1995.480298

Cited by 18 publications

(20 citation statements)

References 21 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A natural extension to the Gittins index policy in this case is to play the machines with the highest Gittins indices (this will be referred to as the extended Gittins index policy below). This is not in general optimal for multi-armed bandits with multiple plays and an infinite horizon discounted reward criterion, see e.g., [18], [19]. However, it may be optimal in some cases, see e.g., [19] for conditions on the reward function, and [20] for an undiscounted case where the Gittins index is always achieved at time 1.…”

Section: Discussionmentioning

confidence: 99%

“…This is not in general optimal for multi-armed bandits with multiple plays and an infinite horizon discounted reward criterion, see e.g., [18], [19]. However, it may be optimal in some cases, see e.g., [19] for conditions on the reward function, and [20] for an undiscounted case where the Gittins index is always achieved at time 1. Even less is known when the bandits are restless, though asymptotic results for restless bandits with multiple plays were provided in [4] and [21].…”

Section: Discussionmentioning

confidence: 99%

See 1 more Smart Citation

Multi-channel opportunistic access: A case of restless bandits with multiple plays

Ahmad

Liu

2009

2009 47th Annual Allerton Conference on Communication, Control, and Computing (Allerton)

View full text Add to dashboard Cite

Abstract-This paper considers the following stochastic control problem that arises in opportunistic spectrum access: a system consists of n channels where the state ("good" or "bad") of each channel evolves as independent and identically distributed Markov processes. A user can select exactly k channels to sense and access (based on the sensing result) in each time slot. A reward is obtained whenever the user senses and accesses a "good" channel. The objective is to design a channel selection policy that maximizes the expected discounted total reward accrued over a finite or infinite horizon. In our previous work we established the optimality of a greedy policy for the special case of k = 1 (i.e., single channel access) under the condition that the channel state transitions are positively correlated over time. In this paper we show under the same condition the greedy policy is optimal for the general case of k ≥ 1; the methodology introduced here is thus more general. This problem may be viewed as a special case of the restless bandit problem, with multiple plays. We discuss connections between the current problem and existing literature on this class of problems.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Discussionmentioning

confidence: 99%

Multi-channel opportunistic access: A case of restless bandits with multiple plays

Ahmad

Liu

2009

2009 47th Annual Allerton Conference on Communication, Control, and Computing (Allerton)

View full text Add to dashboard Cite

show abstract

“…These include all the variants of the classical MAB problem. For instance, resources might have to be allocated among more than one project at a time (Pandelis and Teneketzis [21]), new projects might arrive (Whittle [28]), all projects may change state (Whittle [29]), or there might be constraints linking the bandits (Denardo, Feinberg and Rothblum [10]). Studying the results for these variants in a robust setting is an avenue for future work.…”

Section: Resultsmentioning

confidence: 99%

“…Since V m (s −1 , M ) − M and V 1 m (s 1 , M ) − M are constants with respect to the inner optimization problem in (21) and (22) respectively and both…”

Section: Proposition 2 the Robust Gittins Index Is Given Bymentioning

confidence: 99%

Robust control of the multi-armed bandit problem

Caro

Gupta

2015

Ann Oper Res

View full text Add to dashboard Cite

We study a robust model of the multi-armed bandit (MAB) problem in which the transition probabilities are ambiguous and belong to subsets of the probability simplex. We first show that for each arm there exists a robust counterpart of the Gittins index that is the solution to a robust optimal stopping-time problem and can be computed effectively with an equivalent restart problem. We then characterize the optimal policy of the robust MAB as a project-by-project retirement policy but we show that arms become dependent so the policy based on the robust Gittins index is not optimal. For a project selection problem, we show that the robust Gittins index policy is near optimal but its implementation requires more computational effort than solving a non-robust MAB problem. Hence, we propose a Lagrangian index policy that requires the same computational effort as evaluating the indices of a non-robust MAB and is within 1% of the optimum in the robust project selection problem.

show abstract

“…Casting the above problem into the bandit problem, the state is represented by the posterior distribution on match quality, and the allocation of the license to domestic firms is equivalent to choosing arms with uncertain payoffs. Since the government can allocate licences to more than one firm at a time, it is the multi-armed bandit problem with multiple plays (See Pandelis and Teneketzis (1995)). …”

Section: Industrial Policymentioning

confidence: 99%

A survey on the bandit problem with switching costs

Jun

2004

De Economist

View full text Add to dashboard Cite

SummaryThe paper surveys the literature on the bandit problem, focusing on its recent development in the presence of switching costs. Switching costs between arms makes not only the Gittins index policy suboptimal, but also renders the search for the optimal policy computationally infeasible. This survey will first discuss the decomposability properties of the arms that make the Gittins index policy optimal, and show how these properties break down upon the introduction of costs on switching arms. Having established the failure of the simple index policy, the survey focus on the recent efforts to overcome the difficulty of finding the optimal policy in the bandit problem with switching costs: characterization of the optimal policy, exact derivation of the optimal policy in the restricted environments, and lastly approximation of optimal policy. The advantages and disadvantages of the above approaches are discussed.

show abstract

On the optimality of the Gittins index rule in multi-armed bandits with multiple plays

Cited by 18 publications

References 21 publications

Multi-channel opportunistic access: A case of restless bandits with multiple plays

Multi-channel opportunistic access: A case of restless bandits with multiple plays

Robust control of the multi-armed bandit problem

A survey on the bandit problem with switching costs

Contact Info

Product

Resources

About