2015 54th IEEE Conference on Decision and Control (CDC) 2015
DOI: 10.1109/cdc.2015.7403456
|View full text |Cite
|
Sign up to set email alerts
|

A restless bandit with no observable states for recommendation systems and communication link scheduling

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
8
0

Year Published

2016
2016
2024
2024

Publication Types

Select...
5
1
1

Relationship

4
3

Authors

Journals

citations
Cited by 13 publications
(8 citation statements)
references
References 15 publications
0
8
0
Order By: Relevance
“…A DM perceives one of finite number of messages. Assume that O = {1, 2, 3, • • • , K} represents the set of messages 1 . If the message k ∈ O is observed with known probability from state j under action a for systen i and this is denoted by q a i,jk = Pr (k | s t,i = j, a t,i = a) .…”
Section: Model Descriptionmentioning
confidence: 99%
See 2 more Smart Citations
“…A DM perceives one of finite number of messages. Assume that O = {1, 2, 3, • • • , K} represents the set of messages 1 . If the message k ∈ O is observed with known probability from state j under action a for systen i and this is denoted by q a i,jk = Pr (k | s t,i = j, a t,i = a) .…”
Section: Model Descriptionmentioning
confidence: 99%
“…Restless multi-armed bandits with partially observable states have been recently found applications in online recommendation systems [1], opportunistic communication systems [2]- [4], machine maintenance [5], age of information, [6]. Restless multi-armed bandits (RMABs) are class of sequential decision problem with multiple independent Markov processes which are coupled via number of independent process that are activated simultaneously, [7].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…RMABs have been used for various applications in across domains. Some specific applications include recommendation systems [29], [30], sensor scheduling and target detection [31], multi-UAV routing for observing targets [32], stochastic network optimization [24]. Most models assume instantaneous feedback and their main interest is to study the Whittleindex or myopic policy.…”
Section: Literature Overview and Contributionsmentioning
confidence: 99%
“…Further, author proposed the heuristic index based policy, it is referred to as Whittle index policy. In [19,22], we have considered a general system of a restless multi-armed bandit with unobservable states and action dependent transitions. In [22] we show that such a system is approximately Whittle-indexable.…”
Section: Related Literaturementioning
confidence: 99%