2019
DOI: 10.48550/arxiv.1905.03125
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Batch-Size Independent Regret Bounds for the Combinatorial Multi-Armed Bandit Problem

Abstract: We consider the combinatorial multi-armed bandit (CMAB) problem, where the reward function is nonlinear. In this setting, the agent chooses a batch of arms on each round and receives feedback from each arm of the batch. The reward that the agent aims to maximize is a function of the selected arms and their expectations. In many applications, the reward function is highly nonlinear, and the performance of existing algorithms relies on a global Lipschitz constant to encapsulate the function's nonlinearity. This … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
4
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 13 publications
0
4
0
Order By: Relevance
“…Recently, there are interesting results under Gini-weighted smoothness assumptions (Merlis & Mannor, 2019;2020). Compared with general Lipschitz smoothness considered in this work, this is a more refined smoothness assumption, which leads to near optimal regret bounds with less dependence on the dimension K. Directly applying our algorithms to this setting will lead to an additional dependence on K. How to remove this additional price for privacy preserving, and how to prove the corresponding lower bounds, are interesting problems for future work.…”
Section: Discussionmentioning
confidence: 94%
“…Recently, there are interesting results under Gini-weighted smoothness assumptions (Merlis & Mannor, 2019;2020). Compared with general Lipschitz smoothness considered in this work, this is a more refined smoothness assumption, which leads to near optimal regret bounds with less dependence on the dimension K. Directly applying our algorithms to this setting will lead to an additional dependence on K. How to remove this additional price for privacy preserving, and how to prove the corresponding lower bounds, are interesting problems for future work.…”
Section: Discussionmentioning
confidence: 94%
“…We will now review the combinatorial action space literature in multi-armed bandit (MAB) problems. Most of the work in this space deals with semi-bandit feedback (Chen et al, 2016;Combes et al, 2015;Kveton et al, 2015;Merlis and Mannor, 2019). This is also our feedback model, but we work in a contextual setting.…”
Section: Related Workmentioning
confidence: 99%
“…There is also work in the full-bandit feedback setting, where one gets to observe only one representative reward for the whole set of arms chosen. This body of literature can be divided into the adversarial setting (Merlis and Mannor, 2019;Cesa-Bianchi and Lugosi, 2012) and the stochastic setting Agarwal and Aggarwal, 2018;Lin et al, 2014;Rejwan and Mansour, 2020).…”
Section: Related Workmentioning
confidence: 99%
“…Combinatorial bandit is a classical question in machine learning and has been extensively studied under the settings of stochastic semi-bandits (Chen et al, 2013;Combes et al, 2015;Kveton et al, 2015;Chen et al, 2016a;Merlis & Mannor, 2019;, stochastic bandits (Agarwal & Aggarwal, 2018;Rejwan & Mansour, 2020;Kuroki et al, 2020), and adversarial linear bandits (Cesa-Bianchi & Lugosi, 2012;Bubeck et al, 2012;Audibert et al, 2014;Combes et al, 2015). In the above-mentioned works, either the reward link function g(•) is linear, or the model is stochastic (stationary).…”
Section: Other Related Workmentioning
confidence: 99%