2017
DOI: 10.1609/aaai.v31i1.10939
|View full text |Cite
|
Sign up to set email alerts
|

Efficient Ordered Combinatorial Semi-Bandits for Whole-Page Recommendation

Abstract: Multi-Armed Bandit (MAB) framework has been successfully applied in many web applications. However, many complex real-world applications that involve multiple content recommendations cannot fit into the traditional MAB setting. To address this issue, we consider an ordered combinatorial semi-bandit problem where the learner recommends S actions from a base set of K actions, and displays the results in S (out of M) different positions. The aim is to maximize the cumulative reward with respect to the best possib… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
6
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 9 publications
(6 citation statements)
references
References 13 publications
0
6
0
Order By: Relevance
“…These algorithms maintain and iteratively update a distribution for the rewards from each arm, adjusting based on the observed outcomes in that arm. In some cases, specially when we know the exact distributions of rewards, TS-type algorithms tend to alleviate the influence of delayed feedback by randomizing over actions, thus will have a relatively better performance than other types of algorithm [13,49]. Therefore, we also consider TS-type algorithms in our study.…”
Section: Thompson Sampling Approachmentioning
confidence: 99%
“…These algorithms maintain and iteratively update a distribution for the rewards from each arm, adjusting based on the observed outcomes in that arm. In some cases, specially when we know the exact distributions of rewards, TS-type algorithms tend to alleviate the influence of delayed feedback by randomizing over actions, thus will have a relatively better performance than other types of algorithm [13,49]. Therefore, we also consider TS-type algorithms in our study.…”
Section: Thompson Sampling Approachmentioning
confidence: 99%
“…Instead of highly complex neuron-based NAS, we shed light on searching for self-attention space and limit the maximum search space by first obtaining the number of layers needed for a sample, and then moving forward to prune heads layer-by-layer to further reduce the network size. In order to determine the configurations of layers and heads for each sample in a real-time fashion, we integrate Uniform Confidence Bound multi-arm contextual bandits (UCB) (Li et al, 2010) and Thompson Sampling semi-bandits (TSP) (Wang et al, 2017) into our DHT model, where the UCBs are responsible for determining the number of layers as well as the number of heads to keep in each layer, and the TSB the combination of heads to keep in each layer.…”
Section: Introdcutionmentioning
confidence: 99%
“…The CCS problem includes the linear bandit (LB) problem (Abbasi-Yadkori, Pál, and Szepesvári 2011; Agrawal and Goyal 2013;Auer 2002;Chu et al 2011;Dani, Hayes, and Kakade 2008) and the combinatorial semi-bandit (CS) problem 1 (Chen et al 2016a,b;Combes et al 2015;Gai, Krishnamachari, and Jain 2012;Kveton et al 2015;Wang et al 2017;Wen, Kveton, and Ashkan 2015) as special cases. The difference from the LB problem is that, in the CCS problem, the learner chooses multiple arms at once.…”
Section: Introductionmentioning
confidence: 99%
“…These differences enable the CCS problem to model more realistic situations of applications such as routing networks (Kveton et al 2014), shortest paths (Gai, Krishnamachari, and Jain 2012;Wen, Kveton, and Ashkan 2015), and recommender systems (Li et al 2010;Qin, Chen, and Zhu 2014;Wang et al 2017). For example, when a recommender system is modeled with the LB problem, it is assumed that once a recommendation result is obtained, the internal predictive model is updated before the next recommendation.…”
Section: Introductionmentioning
confidence: 99%