2023
DOI: 10.48550/arxiv.2302.12120
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Sequential Counterfactual Risk Minimization

Abstract: Counterfactual Risk Minimization (CRM) is a framework for dealing with the logged bandit feedback problem, where the goal is to improve a logging policy using offline data. In this paper, we explore the case where it is possible to deploy learned policies multiple times and acquire new data. We extend the CRM principle and its theory to this scenario, which we call "Sequential Counterfactual Risk Minimization (SCRM)." We introduce a novel counterfactual estimator and identify conditions that can improve the pe… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Publication Types

Select...

Relationship

0
0

Authors

Journals

citations
Cited by 0 publications
references
References 27 publications
(76 reference statements)
0
0
0
Order By: Relevance

No citations

Set email alert for when this publication receives citations?