1980
DOI: 10.1111/j.2517-6161.1980.tb01114.x
|View full text |Cite
|
Sign up to set email alerts
|

A Generalized Bandit Problem

Abstract: Summary A multi‐armed bandit problem is investigated in which rewards obtained from pulls of any arm depend on the states of the other arms, as well as on the state of the arm pulled. A Dynamic Allocation Index is defined for this class of problems, and it is shown that this leads to optimal policies.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
31
0
1

Year Published

1982
1982
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 27 publications
(33 citation statements)
references
References 4 publications
1
31
0
1
Order By: Relevance
“…Continuous and intermittent modes are equally effective if equality holds. This can be shown by examining the derivative on RQ of (12), and is exactly the condition (11) found for the deterministic model.…”
Section: Model 2a: Unlocated Sam/reactive Armsupporting
confidence: 57%
See 1 more Smart Citation
“…Continuous and intermittent modes are equally effective if equality holds. This can be shown by examining the derivative on RQ of (12), and is exactly the condition (11) found for the deterministic model.…”
Section: Model 2a: Unlocated Sam/reactive Armsupporting
confidence: 57%
“…Glazebrook, Gaver, and Jacobs [5] show that Red's problem of maximizing the expected number of Blues to be killed before he is himself eliminated may be modeled as a generalized bandit problem once the radar levels have been established. From Nash [11], an index policy is optimal. See Glazebrook, Gaver, and Jacobs [5] for details of the derivation of the index in (30) and (31).…”
Section: Commentsmentioning
confidence: 99%
“…This reward is increased to R;( G) when a retirement reward b G is available as an alternative to action a;N;. Glazebrook and Fay (1987) were able to show, using work on generalized bandit problems due to Nash (1980), that the optimal strategy for MDP i augmented by retirement option b G when MDP i is still in the first phase of research is determined 23where The equivalent index for b G is simply G. In general, then, this optimal strategy embodies preferences among tasks ij, 1~j~N; -1, of a kind which depend upon G (see (23)) and Condition W will not usually be satisfied. However, it is clear from (23) that the condition 24removes this G-dependence.…”
Section: Scheduling Alternative Stochastic Tasksmentioning
confidence: 99%
“…All of the problems we discuss here have the feature that optimal strategies can be determined by collections of ranking indices. The theoretical foundation of the results presented is due to Gittins (1979), Nash (1980) and Whittle (1980) and the reader will gain an enhanced understanding of the material by consulting those publications. Gittins (1979) described a large class of Markov decision processes in which optimal strategies are determined by attaching to each admissible action a ranking index.…”
Section: Strategies Determined By Gittins Indicesmentioning
confidence: 99%
“…Most of the literature of stochastic scheduling-see, for example, Bruno and Hofri (1975), Gittins (1979), Glazebrook (1980Glazebrook ( ), (1982, (1987), Nash (1980) and Whittle (1980)-discusses problems in which the aim is to process all of the currently available jobs and to do it as economically as possible. We wish to develop that theory to incorporate the notion of alternative tasks mentioned above.…”
Section: Introductionmentioning
confidence: 99%