Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence 2021
DOI: 10.24963/ijcai.2021/556
|View full text |Cite
|
Sign up to set email alerts
|

Learn to Intervene: An Adaptive Learning Policy for Restless Bandits in Application to Preventive Healthcare

Abstract: In many public health settings, it is important for patients to adhere to health programs, such as taking medications and periodic health checks. Unfortunately, beneficiaries may gradually disengage from such programs, which is detrimental to their health. A concrete example of gradual disengagement has been observed by an organization that carries out a free automated call-based program for spreading preventive care information among pregnant women. Many women stop picking up calls after being enrolled for a … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
18
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3

Relationship

1
6

Authors

Journals

citations
Cited by 15 publications
(19 citation statements)
references
References 6 publications
1
18
0
Order By: Relevance
“…We depart from [10] by studying the discounted counterpart as motivated by [42] since the difference in the optimal value between the discounted and average settings is small as long as α is close to 1 [40], [41]. Recently, another line of work [43] leveraged Q-learning to approximate Whittle indices through a single-timescale SA where Q-function and Whittle indices were learned independently. [43] considered the finite-horizon MDP and cannot be directly applied to infinite-horizon discounted or average reward MDPs.…”
Section: B Q-whittle Learningmentioning
confidence: 99%
See 2 more Smart Citations
“…We depart from [10] by studying the discounted counterpart as motivated by [42] since the difference in the optimal value between the discounted and average settings is small as long as α is close to 1 [40], [41]. Recently, another line of work [43] leveraged Q-learning to approximate Whittle indices through a single-timescale SA where Q-function and Whittle indices were learned independently. [43] considered the finite-horizon MDP and cannot be directly applied to infinite-horizon discounted or average reward MDPs.…”
Section: B Q-whittle Learningmentioning
confidence: 99%
“…Recently, another line of work [43] leveraged Q-learning to approximate Whittle indices through a single-timescale SA where Q-function and Whittle indices were learned independently. [43] considered the finite-horizon MDP and cannot be directly applied to infinite-horizon discounted or average reward MDPs. Finally, we are the first to provide a finite-time analysis of Whittle index based Q-learning, which further differentiates our work.…”
Section: B Q-whittle Learningmentioning
confidence: 99%
See 1 more Smart Citation
“…However, this is suboptimal for general RMABs since rewards are state-and action-dependent. Addressing this, Biswas et al [5] give a Q-learning-based based algorithm that acts on the arms that have the largest difference between their active and passive Q values. Fu et al [8] take a related approach that adjust the Q values by some 𝜆, and use it to estimate the Whittle index.…”
Section: Related Workmentioning
confidence: 99%
“…To address this shortcoming in previous work, this paper presents the first algorithms for the online setting for multi-action RMABs. Indeed, the online setting for even binary-action RMABs has received only limited attention, in the works of Fu et al [8], Avrachenkov and Borkar [3], and Biswas et al [5,6]. These papers adopt variants of the Q-learning update rule [29,30], a well studied reinforcement learning algorithm, for estimating the effect of each action across changing dynamics of the systems.…”
Section: Introductionmentioning
confidence: 99%