An Improved Multileaving Algorithm for Online Ranker Evaluation

Brost, Brian M.; Cox, Ingemar J.; Seldin, Yevgeny; Lioma, Christina

doi:10.1145/2911451.2914706

Cited by 12 publications

(18 citation statements)

References 4 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To address this gap, we propose a novel multileaved comparison method, Pairwise Preference Multileaving (PPM). PPM di ers from existing multileaved comparison methods as its comparisons are based on inferred pairwise document preferences, whereas existing multileaved comparison methods either use some form of document assignment [27,28] or click credit functions [2,27]. We prove that PPM meets both the considerateness and the delity requirements, thus PPM guarantees correct winners in unambiguous cases while maintaining the user experience at all times.…”

Section: Introductionmentioning

confidence: 91%

“…Sample-Scored-Only Multileaving (SOSM) was introduced by Brost et al [2] in an a empt to create a more scalable multileaved comparison method. It is the only existing multileaved comparison method that does not have an interleaved comparison counterpart.…”

Section: Sample Only Scored Multileavingmentioning

confidence: 99%

“…e metric by which multileaved comparison methods are compared is the binary error, E bin [2,27,28]. LetP nm be the preference inferred by a multileaved comparison method; then the error is:…”

Section: Ranker Selection and Comparisonsmentioning

confidence: 99%

“…While experiments using real users are preferred [4,6,18,31], most researchers do not have access to search engines. As a result the most common way of comparing online evaluation methods is by using simulated user behaviour [2,12,13,27,28]. Such simulated experiments show the performance of multileaved comparison methods when user behaviour adheres to a few simple assumptions.…”

Section: Simulating User Behaviormentioning

confidence: 99%

“…For interleaved comparison, two experimental conditions ("control" and "treatment") are typical. Recently, multileaved comparisons have been introduced for the purpose of e ciently comparing large numbers of rankers [2,27]. ese multileaved comparison methods were introduced as an extension to interleaving and the majority are directly derived from their interleaving counterparts [27,28].…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Sensitive and Scalable Online Evaluation with Theoretical Guarantees

Oosterhuis

Rijke

2017

Proceedings of the 2017 ACM on Conference on Information and Knowledge Management

View full text Add to dashboard Cite

Multileaved comparison methods generalize interleaved comparison methods to provide a scalable approach for comparing ranking systems based on regular user interactions. Such methods enable the increasingly rapid research and development of search engines. However, existing multileaved comparison methods that provide reliable outcomes do so by degrading the user experience during evaluation. Conversely, current multileaved comparison methods that maintain the user experience cannot guarantee correctness. Our contribution is two-fold. First, we propose a theoretical framework for systematically comparing multileaved comparison methods using the notions of considerateness, which concerns maintaining the user experience, and delity, which concerns reliable correct outcomes. Second, we introduce a novel multileaved comparison method, Pairwise Preference Multileaving (PPM), that performs comparisons based on document-pair preferences, and prove that it is considerate and has delity. We show empirically that, compared to previous multileaved comparison methods, PPM is more sensitive to user preferences and scalable with the number of rankers being compared.

show abstract

Section: Introductionmentioning

confidence: 91%

Section: Sample Only Scored Multileavingmentioning

confidence: 99%

Section: Ranker Selection and Comparisonsmentioning

confidence: 99%

Section: Simulating User Behaviormentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Sensitive and Scalable Online Evaluation with Theoretical Guarantees

Oosterhuis

Rijke

2017

Proceedings of the 2017 ACM on Conference on Information and Knowledge Management

View full text Add to dashboard Cite

show abstract

Theoretical Analysis on the Efficiency of Interleaved Comparisons

Iizuka

Hajime

Katô

2023

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Multi-Dueling Bandits and Their Application to Online Ranker Evaluation

Brost

Seldin

Cox

et al. 2016

Proceedings of the 25th ACM International on Conference on Information and Knowledge Management

Self Cite

View full text Add to dashboard Cite

New ranking algorithms are continually being developed and refined, necessitating the development of efficient methods for evaluating these rankers. Online ranker evaluation focuses on the challenge of efficiently determining, from implicit user feedback, which ranker out of a finite set of rankers is the best.Online ranker evaluation can be modeled by dueling bandits, a mathematical model for online learning under limited feedback from pairwise comparisons. Comparisons of pairs of rankers is performed by interleaving their result sets and examining which documents users click on. The dueling bandits model addresses the key issue of which pair of rankers to compare at each iteration, thereby providing a solution to the exploration-exploitation trade-off.Recently, methods for simultaneously comparing more than two rankers have been developed. However, the question of which rankers to compare at each iteration was left open. We address this question by proposing a generalization of the dueling bandits model that uses simultaneous comparisons of an unrestricted number of rankers.We evaluate our algorithm on synthetic data and several standard large-scale online ranker evaluation datasets. Our experimental results show that the algorithm yields orders of magnitude improvement in performance compared to stateof-the-art dueling bandit algorithms.

show abstract

An Improved Multileaving Algorithm for Online Ranker Evaluation

Cited by 12 publications

References 4 publications

Sensitive and Scalable Online Evaluation with Theoretical Guarantees

Sensitive and Scalable Online Evaluation with Theoretical Guarantees

Theoretical Analysis on the Efficiency of Interleaved Comparisons

Multi-Dueling Bandits and Their Application to Online Ranker Evaluation

Contact Info

Product

Resources

About