Proceedings of the Web Conference 2020 2020
DOI: 10.1145/3366423.3380115
|View full text |Cite
|
Sign up to set email alerts
|

Hierarchical Adaptive Contextual Bandits for Resource Constraint based Recommendation

Abstract: Contextual multi-armed bandit (MAB) achieves cutting-edge performance on a variety of problems. When it comes to real-world scenarios such as recommendation system and online advertising, however, it is essential to consider the resource consumption of exploration. In practice, there is typically non-zero cost associated with executing a recommendation (arm) in the environment, and hence, the policy should be learned with a fixed exploration cost constraint. It is challenging to learn a global optimal policy d… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 12 publications
(7 citation statements)
references
References 16 publications
0
7
0
Order By: Relevance
“…They adopt a generative process based on a topic model to explicitly formulate the arm dependencies as the clusters on arms, where dependent arms are assumed to be generated from the same cluster. Yang et al [197] consider the situations where there are exploration overheads, i.e., there are non-zero costs associated with executing a recommendation (arm) in the environment, and hence, the policy should be learned with a fixed exploration cost constraint. They propose a hierarchical learning structure to address the problem.…”
Section: Recommendation Via Mab-based Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…They adopt a generative process based on a topic model to explicitly formulate the arm dependencies as the clusters on arms, where dependent arms are assumed to be generated from the same cluster. Yang et al [197] consider the situations where there are exploration overheads, i.e., there are non-zero costs associated with executing a recommendation (arm) in the environment, and hence, the policy should be learned with a fixed exploration cost constraint. They propose a hierarchical learning structure to address the problem.…”
Section: Recommendation Via Mab-based Methodsmentioning
confidence: 99%
“…Linear UCB considering item features [92] Considering diversity of recommendation [137,103,40] Cascading bandits providing reliable negative samples [84,230] Combining offline data and online bandit signals [145] Considering pseudo-rewards for arms without feedback [30] Considering dependency among arms [180] Considering exploration overheads [197]…”
Section: Mab In Irssmentioning
confidence: 99%
“…Multi-armed bandits (MAB) problem is a typical sequential decision making process that is also treated as an online decision making problems [32]. A wide range of real world applications can be modeled as MAB problems, such as online recommendation system [16], online advertising [27] and information retrieval [15].…”
Section: Multi-armed Bandit Methodsmentioning
confidence: 99%
“…For example, ConUCB [40] introduces conversations between the agent and users to ask whether the user is interested in a certain topic occasionally. HATCH [39] considers the resource consumption of exploration and proposes a strategy to conduct bandit exploration with budget limitation. S-MAB [6] considers two aspects, one is to maximize the cumulative rewards and the other is to decide how many arms to be pulled so as to reduce the exploration cost.…”
Section: Contextual Bandit Algorithmsmentioning
confidence: 99%