To Model or to Intervene

Jagerman, Rolf; Oosterhuis, Harrie; Rijke, Maarten de

doi:10.1145/3331184.3331269

Cited by 58 publications

(2 citation statements)

References 38 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In order to evaluate these approaches effectively, we need a dataset containing logged feedback for context-action pairs, along with the logging propensity for the action performed. Related work has evaluated counterfactual learning methods on multi-class, multi-label or LTR tasks [13,17,47,48], synthetically generating bandit feedback samples for a certain logging policy and existing datasets. What makes the recommendation task fundamentally different from the aforementioned settings, is that access to the true labels (i.e.…”

Section: Resultsmentioning

confidence: 99%

Joint Policy-Value Learning for Recommendation

Jeunen

Rohde

Vasile

et al. 2020

Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery &Amp; Data Mining

View full text Add to dashboard Cite

Conventional approaches to recommendation often do not explicitly take into account information on previously shown recommendations and their recorded responses. One reason is that, since we do not know the outcome of actions the system did not take, learning directly from such logs is not a straightforward task. Several methods for off-policy or counterfactual learning have been proposed in recent years, but their efficacy for the recommendation task remains understudied. Due to the limitations of offline datasets and the lack of access of most academic researchers to online experiments, this is a non-trivial task. Simulation environments can provide a reproducible solution to this problem.In this work, we conduct the first broad empirical study of counterfactual learning methods for recommendation, in a simulated environment. We consider various different policy-based methods that make use of the Inverse Propensity Score (IPS) to perform Counterfactual Risk Minimisation (CRM), as well as value-based methods based on Maximum Likelihood Estimation (MLE). We highlight how existing off-policy learning methods fail due to stochastic and sparse rewards, and show how a logarithmic variant of the traditional IPS estimator can solve these issues, whilst convexifying the objective and thus facilitating its optimisation. Additionally, under certain assumptions the value-and policy-based methods have an identical parameterisation, allowing us to propose a new model that combines both the MLE and CRM objectives. Extensive experiments show that this łDual Banditž approach achieves stateof-the-art performance in a wide range of scenarios, for varying logging policies, action spaces and training sample sizes.

show abstract

Section: Resultsmentioning

confidence: 99%

Joint Policy-Value Learning for Recommendation

Jeunen

Rohde

Vasile

et al. 2020

Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery &Amp; Data Mining

View full text Add to dashboard Cite

show abstract

“…At the local scale, we exploit a multi-interest extractor module to learn representations of multiple interests in fine granularity from the corresponding subsequences discovered via intention prototype clustering. Encouragingly, the noisy behaviors (e.g., sales promotions, exposure bias [31], and position bias [8]) that are inconsistent with user's real interests will be filtered out when clustering. We further develop an interest aggregation module, which leverages the inherent preference to guide the multi-interests aggregation to generate the user's current interest.…”

Section: Introductionmentioning

confidence: 99%

Dual-Scale Interest Extraction Framework with Self-Supervision for Sequential Recommendation

Chen,

Lin,

et al. 2023

Frontiers in Artificial Intelligence and Applications

View full text Add to dashboard Cite

In the sequential recommendation task, the recommender generally learns multiple embeddings from a user’s historical behaviors, to catch the diverse interests of the user. Nevertheless, the existing approaches just extract each interest independently for the corresponding sub-sequence while ignoring the global correlation of the entire interaction sequence, which may fail to capture the user’s inherent preference for the potential interests generalization and unavoidably make the recommended items homogeneous with the historical behaviors. In this paper, we propose a novel Dual-Scale Interest Extraction framework (DSIE) to precisely estimate the user’s current interests. Specifically, DSIE explicitly models the user’s inherent preference with contrastive learning by attending over his/her entire interaction sequence at the global scale and catches the user’s diverse interests in a fine granularity at the local scale. Moreover, we develop a novel interest aggregation module to integrate the multi-interests according to the inherent preference to generate the user’s current interests for the next-item prediction. Experiments conducted on three real-world benchmark datasets demonstrate that DSIE outperforms the state-of-the-art models in terms of recommendation preciseness and novelty.

show abstract

Counterfactual Online Learning to Rank

Zhuang

Zuccon

2020

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Exploiting users' implicit feedback, such as clicks, to learn rankers is attractive as it does not require editorial labelling effort, and adapts to users' changing preferences, among other benefits. However, directly learning a ranker from implicit data is challenging, as users' implicit feedback usually contains bias (e.g., position bias, selection bias) and noise (e.g., clicking on irrelevant but attractive snippets, adversarial clicks). Two main methods have arisen for optimizing rankers based on implicit feedback: counterfactual learning to rank (CLTR), which learns a ranker from the historical click-through data collected from a deployed, logging ranker; and online learning to rank (OLTR), where a ranker is updated by recording user interaction with a result list produced by multiple rankers (usually via interleaving).In this paper, we propose a counterfactual online learning to rank algorithm (COLTR) that combines the key components of both CLTR and OLTR. It does so by replacing the online evaluation required by traditional OLTR methods with the counterfactual evaluation common in CLTR. Compared to traditional OLTR approaches based on interleaving, COLTR can evaluate a large number of candidate rankers in a more efficient manner. Our empirical results show that COLTR significantly outperforms traditional OLTR methods. Furthermore, COLTR can reach the same effectiveness of the current state-of-the-art, under noisy click settings, and has room for future extensions.

show abstract

To Model or to Intervene

Cited by 58 publications

References 38 publications

Joint Policy-Value Learning for Recommendation

Joint Policy-Value Learning for Recommendation

Dual-Scale Interest Extraction Framework with Self-Supervision for Sequential Recommendation

Counterfactual Online Learning to Rank

Contact Info

Product

Resources

About