Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence 2018
DOI: 10.24963/ijcai.2018/448
|View full text |Cite
|
Sign up to set email alerts
|

Cost-aware Cascading Bandits

Abstract: In this paper, we propose a cost-aware cascading bandits model, a new variant of multi-armed bandits with cascading feedback, by considering the random cost of pulling arms. In each step, the learning agent chooses an ordered list of items and examines them sequentially, until certain stopping condition is satisfied. Our objective is then to maximize the expected net reward in each step, i.e., the reward obtained in each step minus the total cost incurred in examining the items, by deciding the ordered list of… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
4
2

Relationship

2
4

Authors

Journals

citations
Cited by 10 publications
(7 citation statements)
references
References 1 publication
0
7
0
Order By: Relevance
“…Lastly, the configuration of eNCL is done online, via iterations with UE handovers. This naturally poses as a bandit problem, therefore we leverage the cost-aware cascading bandits [26] in the algorithm design. However, unlike the simple cascading bandits setting in [26], UDN based on small BSs (such as femto and pico), often encounter delay or missing UE measurements, based on which we have modeled the features of reward and cost on the proposed algorithm.…”
Section: Contributions and Structurementioning
confidence: 99%
See 2 more Smart Citations
“…Lastly, the configuration of eNCL is done online, via iterations with UE handovers. This naturally poses as a bandit problem, therefore we leverage the cost-aware cascading bandits [26] in the algorithm design. However, unlike the simple cascading bandits setting in [26], UDN based on small BSs (such as femto and pico), often encounter delay or missing UE measurements, based on which we have modeled the features of reward and cost on the proposed algorithm.…”
Section: Contributions and Structurementioning
confidence: 99%
“…To achieve more accurate evaluation of BSs, we take both channel quality and cell capacity of BSs into consideration in NCL optimization. In order to explicitly enforce the constraint of delay in handover preparation phase, we have modified the cost-aware cascading bandits framework proposed in [26] and modelled the number of scanned BSs as a random cost.…”
Section: B Cost-aware Cascading Banditsmentioning
confidence: 99%
See 1 more Smart Citation
“…The proposed MAB with pre-observations in a singleplayer setting is a variant on cascading bandits (Kveton et al 2015a;2015b;Zong et al 2016). The idea of preobservations with costs is similar to the cost-aware cascading bandits proposed in (Zhou et al 2018) and contextual combinatorial cascading bandits introduced in (Li et al 2016). However, in (Zhou et al 2018), the reward collected by the player can be negative if all selected arms have zero reward in one round; in our model, the player will get zero reward if all selected arms are unavailable.…”
Section: Related Workmentioning
confidence: 99%
“…The idea of preobservations with costs is similar to the cost-aware cascading bandits proposed in (Zhou et al 2018) and contextual combinatorial cascading bandits introduced in (Li et al 2016). However, in (Zhou et al 2018), the reward collected by the player can be negative if all selected arms have zero reward in one round; in our model, the player will get zero reward if all selected arms are unavailable. Moreover, most cascading bandit algorithms are applied to recommendation systems, where there is only a single player.…”
Section: Related Workmentioning
confidence: 99%