2022
DOI: 10.1109/tai.2021.3117743
|View full text |Cite
|
Sign up to set email alerts
|

Delayed Reward Bernoulli Bandits: Optimal Policy and Predictive Meta-Algorithm PARDI

Abstract: Bernoulli multi-armed bandits are a reinforcement learning model used to optimize the sequences of decisions with binary outcomes. Well-known bandit algorithms, including the optimal policy, assume that before a decision is made the outcomes of previous decisions are known. This assumption is often not satisfied in real-life scenarios. As demonstrated in this article, if decision outcomes are affected by delays, the performance of existing algorithms can be severely affected. We present the first practically a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
2
2

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(2 citation statements)
references
References 27 publications
(63 reference statements)
0
2
0
Order By: Relevance
“…Williamson et al [64] identify the existing gap between the theory and clinical practice for Bayesian response-adaptive procedures by considering whether responses are intermediately available or delayed, respectively. However, these findings may not apply to the missing data problem in the context of responseadaptive designs, since missing values can be viewed as a very extreme form of delay where the outcomes would never be available [43]. In other words, the problem of missing data is distinct and has not received as much attention as the problem of delayed outcomes.…”
Section: Introductionmentioning
confidence: 94%
“…Williamson et al [64] identify the existing gap between the theory and clinical practice for Bayesian response-adaptive procedures by considering whether responses are intermediately available or delayed, respectively. However, these findings may not apply to the missing data problem in the context of responseadaptive designs, since missing values can be viewed as a very extreme form of delay where the outcomes would never be available [43]. In other words, the problem of missing data is distinct and has not received as much attention as the problem of delayed outcomes.…”
Section: Introductionmentioning
confidence: 94%
“…Williamson et al [ 29 ] identify the existing gap between the theory and clinical practice for Bayesian response-adaptive procedures by considering whether responses are intermediately available or delayed, respectively. However, these findings may not apply to the missing data problem in the context of response-adaptive designs, since missing values can be viewed as a very extreme form of delay where the outcomes would never be available [ 30 ]. In other words, the problem of missing data is distinct and has not received as much attention as the problem of delayed outcomes.…”
Section: Introductionmentioning
confidence: 99%