Proceedings of the 25th ACM International on Conference on Information and Knowledge Management 2016
DOI: 10.1145/2983323.2983659
|View full text |Cite
|
Sign up to set email alerts
|

Multi-Dueling Bandits and Their Application to Online Ranker Evaluation

Abstract: New ranking algorithms are continually being developed and refined, necessitating the development of efficient methods for evaluating these rankers. Online ranker evaluation focuses on the challenge of efficiently determining, from implicit user feedback, which ranker out of a finite set of rankers is the best.Online ranker evaluation can be modeled by dueling bandits, a mathematical model for online learning under limited feedback from pairwise comparisons. Comparisons of pairs of rankers is performed by inte… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
34
0

Year Published

2016
2016
2024
2024

Publication Types

Select...
5
1
1

Relationship

3
4

Authors

Journals

citations
Cited by 21 publications
(34 citation statements)
references
References 18 publications
0
34
0
Order By: Relevance
“…Dueling bandits with sets of actions One line of dueling bandits extension consider the case where the learner selects a subset of actions and observes the outcomes of all duels between all pairs of actions in the subset [2,22], or the winner of the subset [20,19]. As a consequence, these settings give the learner strictly more information than the dueling bandits setting.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations
“…Dueling bandits with sets of actions One line of dueling bandits extension consider the case where the learner selects a subset of actions and observes the outcomes of all duels between all pairs of actions in the subset [2,22], or the winner of the subset [20,19]. As a consequence, these settings give the learner strictly more information than the dueling bandits setting.…”
Section: Related Workmentioning
confidence: 99%
“…We refine this idea by a binary search approach, decreasing the number of duels to log(k). We remark that Lemma C.2 in the appendix is a slightly stronger version of the above lemma, which allows us to partition A and B into two subsets each, A = A (1) ∪ A (2) and B = B (1) ∪ B (2) . Under some circumstances, we can then guarantee that Uncover reveals the pairwise comparison between two players a ≻ b, where a is from A (1) and b is from B (1) .…”
Section: Deterministic Settingmentioning
confidence: 99%
See 2 more Smart Citations
“…Performance is often measured by some notion of regret, and while many definitions have been studied [Yue et al, 2012;Zoghi et al, 2014;, they all intuitively ask that the learner identify the good actions, i.e., those that are typically favored amongst the others. Over the last two decades, several algorithms have been proposed for dueling bandit problems [Ailon et al, 2014;Zoghi et al, 2014;Komiyama et al, 2015;Wu and Liu, 2016] and generalizations to subset-wise preference feedback [Sui et al, 2017;Brost et al, 2016;Saha and Gopalan, 2019a;Ren et al, 2018;Saha and Gopalan, 2019b].…”
Section: Introductionmentioning
confidence: 99%