2020
DOI: 10.48550/arxiv.2012.07048
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Adaptive Algorithms for Multi-armed Bandit with Composite and Anonymous Feedback

Abstract: We study the multi-armed bandit (MAB) problem with composite and anonymous feedback. In this model, the reward of pulling an arm spreads over a period of time (we call this period as reward interval) and the player receives partial rewards of the action, convoluted with rewards from pulling other arms, successively. Existing results on this model require prior knowledge about the reward interval size as an input to their algorithms. In this paper, we propose adaptive algorithms for both the stochastic and the … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 8 publications
0
1
0
Order By: Relevance
“…Then, we present our Adaptive Round-Size EXP3 (ARS-EXP3) algorithm for the non-oblivious case and state its regret upper bound. Similarly, a proof sketch is provided, and the complete proofs are referred to the supplementary file (Wang, Wang, and Huang 2020).…”
Section: Non-oblivious Adversarial Mab With Composite and Anonymous R...mentioning
confidence: 99%
“…Then, we present our Adaptive Round-Size EXP3 (ARS-EXP3) algorithm for the non-oblivious case and state its regret upper bound. Similarly, a proof sketch is provided, and the complete proofs are referred to the supplementary file (Wang, Wang, and Huang 2020).…”
Section: Non-oblivious Adversarial Mab With Composite and Anonymous R...mentioning
confidence: 99%