2021
DOI: 10.1609/aaai.v35i11.17224
|View full text |Cite
|
Sign up to set email alerts
|

Adaptive Algorithms for Multi-armed Bandit with Composite and Anonymous Feedback

Abstract: We study the multi-armed bandit (MAB) problem with composite and anonymous feedback. In this model, the reward of pulling an arm spreads over a period of time (we call this period as reward interval) and the player receives partial rewards of the action, convoluted with rewards from pulling other arms, successively. Existing results on this model require prior knowledge about the reward interval size as an input to their algorithms. In this paper, we propose adaptive algorithms for both the stochastic and the … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1
1
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 15 publications
0
2
0
Order By: Relevance
“…Cesa-Bianchi et al (2018) generalized this setting to a case where the reward generated by an action is not simply revealed to the agent at a single instant in the future, but rather spreads over multiple rounds. Recent work along this line is also found in Garg and Akash (2019), Zhang et al (2022), and Wang et al (2021).…”
Section: Related Workmentioning
confidence: 68%
“…Cesa-Bianchi et al (2018) generalized this setting to a case where the reward generated by an action is not simply revealed to the agent at a single instant in the future, but rather spreads over multiple rounds. Recent work along this line is also found in Garg and Akash (2019), Zhang et al (2022), and Wang et al (2021).…”
Section: Related Workmentioning
confidence: 68%
“…Cesa-Bianchi et al [2018] generalized this setting to a case where the reward generated by an action is not simply revealed to the agent at a single instant in the future, but rather spreads over multiple rounds. Recent work along this line is also found in Garg and Akash [2019], Zhang et al [2022], and Wang et al [2021]. In this paper, we consider a contextual setting with, which is different from the above ones and poses new challenges since each arm no longer has a fixed reward distribution.…”
Section: Related Workmentioning
confidence: 97%