2018
DOI: 10.1109/tac.2018.2799521
|View full text |Cite
|
Sign up to set email alerts
|

On the Whittle Index for Restless Multiarmed Hidden Markov Bandits

Abstract: We consider a restless multi-armed bandit in which each arm can be in one of two states. When an arm is sampled, the state of the arm is not available to the sampler. Instead, a binary signal with a known randomness that depends on the state of the arm is available. No signal is available if the arm is not sampled. An arm-dependent reward is accrued from each sampling. In each time step, each arm changes state according to known transition probabilities which in turn depend on whether the arm is sampled or not… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
44
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
5
2
1

Relationship

3
5

Authors

Journals

citations
Cited by 40 publications
(45 citation statements)
references
References 33 publications
0
44
0
Order By: Relevance
“…In [29], the MAB problem was defined with satisfying objectives so that the player aims at obtaining a reward above a certain threshold. In [30], a two states RMAB problem was defined in a hidden nature. Specifically, when an arm is sampled, the state of the arm is not fully observable.…”
Section: Resultsmentioning
confidence: 99%
“…In [29], the MAB problem was defined with satisfying objectives so that the player aims at obtaining a reward above a certain threshold. In [30], a two states RMAB problem was defined in a hidden nature. Specifically, when an arm is sampled, the state of the arm is not fully observable.…”
Section: Resultsmentioning
confidence: 99%
“…This additional information also makes computation of Whittle-index expressions easier. In recent work of [36], [37], hidden Markov restless multi-armed bandit has been studied and Whittle-index policy is used. This model assumes that arm state is never fully observable but only binary signals corresponding to each state transition are observed.…”
Section: Literature Overview and Contributionsmentioning
confidence: 99%
“…3) When θ a = 0 or θ a = 1 for all a ∈ {0, 1}, our definitions of indexability and index are still valid. To claim indexability, we will require to show that π th (w) and π th (w) are non-increasing in w. Now, we use the following lemma from [10]. , then π th (w) and π th (w) are monotonically decreasing functions of w. Now, using Lemma 2 and Definition 2, we can show that single-armed restless bandit is indexable.…”
Section: Definitionmentioning
confidence: 99%
“…RMAB assumes that the model of system state variations is known. The Whittle index based policies also studied for opportunistic communication systems in [9], [10], where authors studied partially observable model. The Whittle index policies are popular due to they are asymptotically optimal.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation