Proceedings of 1995 34th IEEE Conference on Decision and Control
DOI: 10.1109/cdc.1995.480298
|View full text |Cite
|
Sign up to set email alerts
|

On the optimality of the Gittins index rule in multi-armed bandits with multiple plays

Abstract: Abstract. We investigate the general multi-armed bandit problem with multiple servers. We determine a condition on the reward processes su½cient to guarantee the optimality of the strategy that operates at each instant of time the projects with the highest Gittins indices. We call this strategy the Gittins index rule for multi-armed bandits with multiple plays, or brie¯y the Gittins index rule. We show by examples that: (i) the aforementioned su½cient condition is not necessary for the optimality of the Gittin… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
20
0

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 18 publications
(20 citation statements)
references
References 21 publications
0
20
0
Order By: Relevance
“…A natural extension to the Gittins index policy in this case is to play the machines with the highest Gittins indices (this will be referred to as the extended Gittins index policy below). This is not in general optimal for multi-armed bandits with multiple plays and an infinite horizon discounted reward criterion, see e.g., [18], [19]. However, it may be optimal in some cases, see e.g., [19] for conditions on the reward function, and [20] for an undiscounted case where the Gittins index is always achieved at time 1.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…A natural extension to the Gittins index policy in this case is to play the machines with the highest Gittins indices (this will be referred to as the extended Gittins index policy below). This is not in general optimal for multi-armed bandits with multiple plays and an infinite horizon discounted reward criterion, see e.g., [18], [19]. However, it may be optimal in some cases, see e.g., [19] for conditions on the reward function, and [20] for an undiscounted case where the Gittins index is always achieved at time 1.…”
Section: Discussionmentioning
confidence: 99%
“…This is not in general optimal for multi-armed bandits with multiple plays and an infinite horizon discounted reward criterion, see e.g., [18], [19]. However, it may be optimal in some cases, see e.g., [19] for conditions on the reward function, and [20] for an undiscounted case where the Gittins index is always achieved at time 1. Even less is known when the bandits are restless, though asymptotic results for restless bandits with multiple plays were provided in [4] and [21].…”
Section: Discussionmentioning
confidence: 99%
“…These include all the variants of the classical MAB problem. For instance, resources might have to be allocated among more than one project at a time (Pandelis and Teneketzis [21]), new projects might arrive (Whittle [28]), all projects may change state (Whittle [29]), or there might be constraints linking the bandits (Denardo, Feinberg and Rothblum [10]). Studying the results for these variants in a robust setting is an avenue for future work.…”
Section: Resultsmentioning
confidence: 99%
“…Since V m (s −1 , M ) − M and V 1 m (s 1 , M ) − M are constants with respect to the inner optimization problem in (21) and (22) respectively and both…”
Section: Proposition 2 the Robust Gittins Index Is Given Bymentioning
confidence: 99%
“…Casting the above problem into the bandit problem, the state is represented by the posterior distribution on match quality, and the allocation of the license to domestic firms is equivalent to choosing arms with uncertain payoffs. Since the government can allocate licences to more than one firm at a time, it is the multi-armed bandit problem with multiple plays (See Pandelis and Teneketzis (1995)). …”
Section: Industrial Policymentioning
confidence: 99%