Unbiased Learning to Rank with Unbiased Propensity Estimation

Ai, Qingyao; Bi, Keping; Luo, Cheng; Guo, Jiafeng; Croft, W. Bruce

doi:10.1145/3209978.3209986

Cited by 154 publications

(245 citation statements)

References 40 publications

(101 reference statements)

Supporting

Mentioning

244

Contrasting

Order By: Relevance

“…Thus, clicks on positions that are observed less often due to position bias will have greater weight to account for that difference. However, the position bias must be learned and estimated somewhat accurately [1]. On the other side of the spectrum are click models, which attempt to model user behavior completely [4].…”

Section: Learning To Rank From Historical Interactionsmentioning

confidence: 99%

See 1 more Smart Citation

Optimizing Ranking Models in an Online Setting

Oosterhuis

Rijke

2019

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Online Learning to Rank (OLTR) methods optimize ranking models by directly interacting with users, which allows them to be very efficient and responsive. All OLTR methods introduced during the past decade have extended on the original OLTR method: Dueling Bandit Gradient Descent (DBGD). Recently, a fundamentally different approach was introduced with the Pairwise Differentiable Gradient Descent (PDGD) algorithm. To date the only comparisons of the two approaches are limited to simulations with cascading click models and low levels of noise. The main outcome so far is that PDGD converges at higher levels of performance and learns considerably faster than DBGD-based methods. However, the PDGD algorithm assumes cascading user behavior, potentially giving it an unfair advantage. Furthermore, the robustness of both methods to high levels of noise has not been investigated. Therefore, it is unclear whether the reported advantages of PDGD over DBGD generalize to different experimental conditions. In this paper, we investigate whether the previous conclusions about the PDGD and DBGD comparison generalize from ideal to worst-case circumstances. We do so in two ways. First, we compare the theoretical properties of PDGD and DBGD, by taking a critical look at previously proven properties in the context of ranking. Second, we estimate an upper and lower bound on the performance of methods by simulating both ideal user behavior and extremely difficult behavior, i.e., almost-random non-cascading user models. Our findings show that the theoretical bounds of DBGD do not apply to any common ranking model and, furthermore, that the performance of DBGD is substantially worse than PDGD in both ideal and worst-case circumstances. These results reproduce previously published findings about the relative performance of PDGD vs. DBGD and generalize them to extremely noisy and non-cascading circumstances.

show abstract

Section: Learning To Rank From Historical Interactionsmentioning

confidence: 99%

“…Furthermore, it is unclear Algorithm 1 Dueling Bandit Gradient Descent (DBGD). 1: Input: initial weights: θ1; unit: u; learning rate η.…”

Section: Online Learning To Rankmentioning

confidence: 99%

Optimizing Ranking Models in an Online Setting

Oosterhuis

Rijke

2019

Lecture Notes in Computer Science

View full text Add to dashboard Cite

show abstract

“…Counterfactual Learning to Rank (CLTR) [1,2,16] aims to learn a ranking model offline from historical interaction data. Employing an offline approach has many benefits compared to an online one.…”

Section: Counterfactual Learning To Rankmentioning

confidence: 99%

“…Table 2 provides the click probabilities for three different click behavior models: Perfect click behavior has probabilities proportional to the relevance and never clicks on a non-relevant document, simulating an ideal user. Binarized click behavior acts on only two levels of relevance and is affected by position-bias; this simulated behavior has been used in previous work on CLTR [1,2,16]. And Near-Random behavior clicks very often, and only slightly more frequently on more relevant documents than on less relevant documents; this behavior simulates very high levels of click noise.…”

Section: Simulating User Behaviormentioning

confidence: 99%

To Model or to Intervene

Jagerman

Oosterhuis

Rijke

2019

Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval

View full text Add to dashboard Cite

Learning to Rank (LTR) from user interactions is challenging as user feedback often contains high levels of bias and noise. At the moment, two methodologies for dealing with bias prevail in the field of LTR: counterfactual methods that learn from historical data and model user behavior to deal with biases; and online methods that perform interventions to deal with bias but use no explicit user models. For practitioners the decision between either methodology is very important because of its direct impact on end users. Nevertheless, there has never been a direct comparison between these two approaches to unbiased LTR. In this study we provide the first benchmarking of both counterfactual and online LTR methods under different experimental conditions. Our results show that the choice between the methodologies is consequential and depends on the presence of selection bias, and the degree of position bias and interaction noise. In settings with little bias or noise counterfactual methods can obtain the highest ranking performance; however, in other circumstances their optimization can be detrimental to the user experience. Conversely, online methods are very robust to bias and noise but require control over the displayed rankings. Our findings confirm and contradict existing expectations on the impact of model-based and intervention-based methods in LTR, and allow practitioners to make an informed decision between the two methodologies. CCS CONCEPTS• Information systems → Learning to rank.

show abstract

“…First, both sources are typically limited in availability and are often proprietary company resources. Second, click-stream data is typically biased towards the first few elements in the ranking presented to the user [2] and are noisy in general. Finally, such logs are only available after the fact, leading to a cold start problem.…”

mentioning

confidence: 99%

Learning More From Less

Haddad¹,

Ghosh²

2019

Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval

View full text Add to dashboard Cite

The limited availability of ground truth relevance labels has been a major impediment to the application of supervised methods to ad-hoc retrieval. As a result, unsupervised scoring methods, such as BM25, remain strong competitors to deep learning techniques which have brought on dramatic improvements in other domains, such as computer vision and natural language processing. Recent works have shown that it is possible to take advantage of the performance of these unsupervised methods to generate training data for learning-to-rank models. The key limitation to this line of work is the size of the training set required to surpass the performance of the original unsupervised method, which can be as large as 10 13 training examples. Building on these insights, we propose two methods to reduce the amount of training data required. The first method takes inspiration from crowdsourcing, and leverages multiple unsupervised rankers to generate soft, or noise-aware, training labels. The second identifies harmful, or mislabeled, training examples and removes them from the training set. We show that our methods allow us to surpass the performance of the unsupervised baseline with far fewer training examples than previous works. CCS CONCEPTS• Information systems → Retrieval models and ranking.

show abstract

Unbiased Learning to Rank with Unbiased Propensity Estimation

Cited by 154 publications

References 40 publications

Optimizing Ranking Models in an Online Setting

Optimizing Ranking Models in an Online Setting

To Model or to Intervene

Learning More From Less

Contact Info

Product

Resources

About