2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton) 2011
DOI: 10.1109/allerton.2011.6120191
|View full text |Cite
|
Sign up to set email alerts
|

On a restless multi-armed bandit problem with non-identical arms

Abstract: We consider the following learning problem motivated by opportunistic spectrum access in cognitive radio networks. There are N independent Gilbert-Elliott channels with possibly non-identical transition matrices. It is desired to have an online policy to maximize the long-term expected discounted reward from accessing one channel at each time dynamically. While there is a stream of recent results on this problem when the channels are identical, much less is known for the harder case of non-identical channels. … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
7
0

Year Published

2013
2013
2019
2019

Publication Types

Select...
3
1
1

Relationship

2
3

Authors

Journals

citations
Cited by 6 publications
(7 citation statements)
references
References 16 publications
0
7
0
Order By: Relevance
“…Knowing that this problem has a 0 or 2 threshold structure reduces the problem of identifying optimal performance to finding the (only up to 2) threshold parameters. In settings where the underlying state transition matrices are unknown, this could be exploited by using a multiarmed bandit (MAB) formulation to find the best possible thresholds (similar to the ideas in the papers [9] and [10]). Also, we would like to investigate the case of non-identical channels, and derive useful results for more than 2 channels, possibly in the form of computing the Whittle index [17], if computing the optimal policy in general turns out to be intractable.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Knowing that this problem has a 0 or 2 threshold structure reduces the problem of identifying optimal performance to finding the (only up to 2) threshold parameters. In settings where the underlying state transition matrices are unknown, this could be exploited by using a multiarmed bandit (MAB) formulation to find the best possible thresholds (similar to the ideas in the papers [9] and [10]). Also, we would like to investigate the case of non-identical channels, and derive useful results for more than 2 channels, possibly in the form of computing the Whittle index [17], if computing the optimal policy in general turns out to be intractable.…”
Section: Discussionmentioning
confidence: 99%
“…While similar in spirit to these two studies, our work addresses a more challenging setting involving two independent channels. A more related twochannel problem is studied in [10], which characterizes the optimal policy to opportunistically access two non-identical Gilber-Elliott channels (generalizing the prior work on sensing policies for identical channels [6], [7]). While we address only identical channels in this work, the strategy space explored here is richer because in our formulation of power allocation, it is possible to use both channels simultaneously whilst in [6], [7], [10] only one channel is accessed in each time slot.…”
Section: Introductionmentioning
confidence: 99%
“…Further, we would like to find a closed form expression for the boundary of action region. Also, we would like to investigate the case of non-identical channels like [16], or derive useful results for more than 2 channels.…”
Section: Discussionmentioning
confidence: 99%
“…We specifically simulate links following the Gilbert-Elliot (GE) model [55], [56]. The GE model, consisting of two states (see Fig.…”
Section: Appendix C Correlated Link Lossesmentioning
confidence: 99%