2020
DOI: 10.48550/arxiv.2007.12933
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Simulation Based Algorithms for Markov Decision Processes and Multi-Action Restless Bandits

Abstract: We consider multi-dimensional Markov decision processes and formulate a long term discounted reward optimization problem. Two simulation based algorithms-Monte Carlo rollout policy and parallel rollout policy are studied, and various properties for these policies are discussed.We next consider a restless multi-armed bandit (RMAB) with multi-dimensional state space and multi-actions bandit model. A standard RMAB consists of two actions for each arms whereas in multi-actions RMAB, there are more that two actions… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 46 publications
0
2
0
Order By: Relevance
“…Moreover, for the Whittle index to be well defined, the problem must satisfy a so-called "indexability" condition, which may not be met and is hard to verify in practice. Another popular approach is simulation-based (Meshram andKaza 2020, Nakhleh et al 2021). However, the simulation-based method from Meshram and Kaza (2020) and Nakhleh et al (2021) does not provide a theoretical guarantee on performance.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Moreover, for the Whittle index to be well defined, the problem must satisfy a so-called "indexability" condition, which may not be met and is hard to verify in practice. Another popular approach is simulation-based (Meshram andKaza 2020, Nakhleh et al 2021). However, the simulation-based method from Meshram and Kaza (2020) and Nakhleh et al (2021) does not provide a theoretical guarantee on performance.…”
Section: Introductionmentioning
confidence: 99%
“…Another popular approach is simulation-based (Meshram andKaza 2020, Nakhleh et al 2021). However, the simulation-based method from Meshram and Kaza (2020) and Nakhleh et al (2021) does not provide a theoretical guarantee on performance.…”
Section: Introductionmentioning
confidence: 99%