Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval 2020
DOI: 10.1145/3409256.3409820
|View full text |Cite
|
Sign up to set email alerts
|

Taking the Counterfactual Online: Efficient and Unbiased Online Evaluation for Ranking

Abstract: Counterfactual evaluation can estimate Click-Through-Rate (CTR) differences between ranking systems based on historical interaction data, while mitigating the effect of position bias and item-selection bias. We introduce the novel Logging-Policy Optimization Algorithm (LogOpt), which optimizes the policy for logging data so that the counterfactual estimate has minimal variance. As minimizing variance leads to faster convergence, LogOpt increases the data-efficiency of counterfactual estimation. LogOpt turns th… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
15
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
2
1

Relationship

1
5

Authors

Journals

citations
Cited by 17 publications
(15 citation statements)
references
References 25 publications
0
15
0
Order By: Relevance
“…They suggested that, from a theoretical standpoint, counterfactual LTR exploits position bias better, however they also indicated that empirical results have shown that OLTR (and in particular PDGD) is more reliable. On the other hand, recent works have been focusing on adapting the offline counterfactual learning to the online setting [6,64,34,35]. These works have suggested that OLTR algorithms can benefit from the counterfactual learning framework.…”
Section: Online Learning To Rankmentioning
confidence: 99%
“…They suggested that, from a theoretical standpoint, counterfactual LTR exploits position bias better, however they also indicated that empirical results have shown that OLTR (and in particular PDGD) is more reliable. On the other hand, recent works have been focusing on adapting the offline counterfactual learning to the online setting [6,64,34,35]. These works have suggested that OLTR algorithms can benefit from the counterfactual learning framework.…”
Section: Online Learning To Rankmentioning
confidence: 99%
“…While all of these methods can be useful in practice, none optimize rankings metrics directly in a computationally feasible manner. Interestingly, multiple lines of previous work have found PLranking models to be very effective for various ranking tasks: for result randomization in interleaving [16], multileaving [27] and counterfactual evaluation [22]; for exploration in online LTR [21,23]; for fair distributions of attention exposure [11,28]; and for topic diversity in ranking [32,35]. In particular, Bruch et al [3] argue that the stochastic nature of PL models results in more robust ranking performance.…”
Section: Related Workmentioning
confidence: 99%
“…ranking metrics but that it is generally infeasible to compute, existing work has thus approximated this gradient [11,22,23,28] (see Section 4). The computational costs are particularly relevant because the PL model is often used in online settings where it optimization is performed repeatedly and frequently [22,23,28]. For instance, Oosterhuis and de Rijke [23] show that frequently optimizing the logging-policy model during the gathering of data greatly reduces the data-requirements for online/counterfactual LTR.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations