2021
DOI: 10.48550/arxiv.2106.03546
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

On Learning to Rank Long Sequences with Contextual Bandits

Abstract: Motivated by problems of learning to rank long item sequences, we introduce a variant of the cascading bandit model that considers flexible length sequences with varying rewards and losses. We formulate two generative models for this problem within the generalized linear setting, and design and analyze upper confidence algorithms for it. Our analysis delivers tight regret bounds which, when specialized to vanilla cascading bandits, results in sharper guarantees than previously available in the literature. We e… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 14 publications
(14 reference statements)
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?