2021
DOI: 10.48550/arxiv.2109.09855
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Reinforcement Learning for Finite-Horizon Restless Multi-Armed Multi-Action Bandits

Abstract: We study a finite-horizon restless multi-armed bandit problem with multiple actions, dubbed R(MA) 2 B. The state of each arm evolves according to a controlled Markov decision process (MDP), and the reward of pulling an arm depends on both the current state of the corresponding MDP and the action taken. The goal is to sequentially choose actions for arms so as to maximize the expected value of the cumulative rewards collected. Since finding the optimal policy is typically intractable, we propose a computational… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 37 publications
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?