2017
DOI: 10.48550/arxiv.1707.00205
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

An Asymptotically Optimal Index Policy for Finite-Horizon Restless Bandits

Abstract: We consider restless multi-armed bandit (RMAB) with a finite horizon and multiple pulls per period. Leveraging the Lagrangian relaxation, we approximate the problem with a collection of single arm problems. We then propose an index-based policy that uses optimal solutions of the single arm problems to index individual arms, and offer a proof that it is asymptotically optimal as the number of arms tends to infinity. We also use simulation to show that this index-based policy performs better than the state-of-ar… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
24
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 7 publications
(25 citation statements)
references
References 12 publications
1
24
0
Order By: Relevance
“…This work and its follow-ups such as (Weber and Weiss, 1990) focus on heuristics that are optimal under an asymptotic scaling where the number of pulls per period scales linearly with the total number of arms. More recently, a finite-horizon variant of the restless bandits problem was studied in (Hu and Frazier, 2017) under a similar scaling. For a survey of variations of the restless bandits problem, see (Gittins et al, 2011;Zayas-Caban et al, 2019;Brown and Smith, 2020).…”
Section: Related Literaturementioning
confidence: 99%
“…This work and its follow-ups such as (Weber and Weiss, 1990) focus on heuristics that are optimal under an asymptotic scaling where the number of pulls per period scales linearly with the total number of arms. More recently, a finite-horizon variant of the restless bandits problem was studied in (Hu and Frazier, 2017) under a similar scaling. For a survey of variations of the restless bandits problem, see (Gittins et al, 2011;Zayas-Caban et al, 2019;Brown and Smith, 2020).…”
Section: Related Literaturementioning
confidence: 99%
“…This is in contrast to most of the existing Whittle index-based policies that are only well defined in the case that the system is indexable, which is hard to verify and may not hold in general. A line of works [18,19,40] have been focusing on designing index policies without the indexability requirement, and closest to our work is the parallel work on restless bandits [40] with known transition probabilities and reward functions. In particular, [40] explores index policies that are similar to ours, but under the assumption that the individual MDPs of each arms are homogeneous.…”
Section: The Occupancy-measured-reward Index Policymentioning
confidence: 99%
“…Inspired by Whittle's work, many studies focused on finding the index policy for restless bandit problems, e.g., [17,18,19,20,21]. This line of works assumes that the system parameters are known to the decision-maker.…”
Section: Introductionmentioning
confidence: 99%
“…The proof techniques used by Brown and Smith (2020) and Zayas-Caban et al (2019), however, rely heavily on the Central Limit Theorem (CLT), and do not offer a path toward showing a bound tighter than O( √ N ). Our work fills these two gaps: we propose a broad class of policies, called fluid-priority policies, which generalize the essential characteristics of policies proposed by Brown and Smith (2020) and Hu and Frazier (2017). Addressing the inconsistency between simulation studies and past…”
Section: Introductionmentioning
confidence: 99%
“…As a result, there has been substantial interest (e.g., Whittle 1980, Weber and Weiss 1990, Zayas-Caban et al 2019, Hu and Frazier 2017, Brown and Smith 2020 in developing approximate policies whose performance is provably close to optimal but require computation that does not grow with N . Despite, however, substantial interest and effort focusing on this regime, current understanding is limited in several important ways.…”
Section: Introductionmentioning
confidence: 99%