Counterfactual Evaluation of Slate Recommendations with Sequential Reward Interactions

McInerney, James O.; Brost, Brian M.; Chandar, Praveen; Mehrotra, Rishabh; Carterette, Ben

doi:10.1145/3394486.3403229

Cited by 38 publications

(61 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For a fair comparison they are therefore ignored when answering RQ2. Time of The Day: We define 5 time windows, where numbers in brackets correspond to the hour range: night (0-5), morning (6)(7)(8)(9)(10)(11), afternoon (12)(13)(14)(15)(16)(17), evening (18)(19)(20)(21)(22)(23), all (0-23). If a session spans across two hours, we round up and consider the whole session as either part of start or end hour.…”

Section: Experimental Settingsmentioning

confidence: 99%

“…Modelling and understanding skipping behaviour in music listening sessions arguably plays a crucial role in understanding user behaviour in modern streaming services. For instance, the skipping signal has already been used as a measure in heuristic-based playlist generation systems [9,25], user satisfaction [16,28], relevance [17], and as counterfactual estimators in Recommender Systems [22].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

On Skipping Behaviour Types in Music Streaming Sessions

Meggetto

Revie

Levine

et al. 2021

Proceedings of the 30th ACM International Conference on Information &Amp; Knowledge Management

View full text Add to dashboard Cite

The ability to skip songs is a core feature in modern online streaming services. Its introduction has led to a new music listening paradigm and has changed the way users interact with the underlying services. Thus, understanding their skipping activity during listening sessions has acquired considerable importance. This is because such implicit feedback signal can be considered a measure of users' satisfaction (dissatisfaction or lack of interest), affecting their engagement with the platforms. Prior work has mainly focused on analysing the skipping activity at an individual song level. In this work, we investigate different behaviours during entire listening sessions with regards to the users' session-based skipping activity. To this end, we propose a data transformation and clustering-based approach to identify and categorise skipping types. Experimental results on the real-world music streaming dataset (Spotify) indicate four main types of session skipping behaviour. A subsequent analysis of short, medium, and long listening sessions demonstrate that these session skipping types are consistent across sessions of varying length. Furthermore, we discuss their distributional differences under various listening context information, i.e. day types (i.e. weekday and weekend), times of the day, and playlist types. CCS CONCEPTS• Information systems → Recommender systems.

show abstract

Section: Experimental Settingsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

On Skipping Behaviour Types in Music Streaming Sessions

Meggetto

Revie

Levine

et al. 2021

Proceedings of the 30th ACM International Conference on Information &Amp; Knowledge Management

View full text Add to dashboard Cite

show abstract

“…One way to reduce the variance is introducing a reasonable assumption on user behavior to make the combinatorial item space tractable. However, unrealistically strong assumptions may cause serious bias in OPE [13]. Therefore, achieving a well-balanced bias-variance tradeoff by introducing an appropriate user behavior assumption is the key for enabling accurate OPE of ranking policies.…”

Section: Introductionmentioning

confidence: 99%

“…setting, however, IPS can suffer from large variance, as the item space is combinatorially large [12,13,23]. In contrast, Independent IPS (IIPS) is based on the independence assumption to address the variance issue [12].…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Doubly Robust Off-Policy Evaluation for Ranking Policies under the Cascade Behavior Model

Kiyohara

Saito

Matsuhiro³

et al. 2022

Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining

View full text Add to dashboard Cite

In real-world recommender systems and search engines, optimizing ranking decisions to present a ranked list of relevant items is critical. Off-policy evaluation (OPE) for ranking policies is thus gaining a growing interest because it enables performance estimation of new ranking policies using only logged data. Although OPE in contextual bandits has been studied extensively, its naive application to the ranking setting faces a critical variance issue due to the huge item space. To tackle this problem, previous studies introduce some assumptions on user behavior to make the combinatorial item space tractable. However, an unrealistic assumption may, in turn, cause serious bias. Therefore, appropriately controlling the bias-variance tradeoff by imposing a reasonable assumption is the key for success in OPE of ranking policies. To achieve a well-balanced bias-variance tradeoff, we propose the Cascade Doubly Robust estimator building on the cascade assumption, which assumes that a user interacts with items sequentially from the top position in a ranking. We show that the proposed estimator is unbiased in more cases compared to existing estimators that make stronger assumptions on user behavior. Furthermore, compared to a previous estimator based on the same cascade assumption, the proposed estimator reduces the variance by leveraging a control variate. Comprehensive experiments on both synthetic and real-world e-commerce data demonstrate that our estimator leads to more accurate OPE than existing estimators in a variety of settings. CCS CONCEPTS• Information systems → Retrieval models and ranking; Evaluation of retrieval results.

show abstract