2017
DOI: 10.48550/arxiv.1706.03880
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

MNL-Bandit: A Dynamic Learning Approach to Assortment Selection

Abstract: We consider a dynamic assortment selection problem, where in every round the retailer offers a subset (assortment) of N substitutable products to a consumer, who selects one of these products according to a multinomial logit (MNL) choice model. The retailer observes this choice and the objective is to dynamically learn the model parameters, while optimizing cumulative revenues over a selling horizon of length T . We refer to this exploration-exploitation formulation as the MNL-Bandit problem. Existing methods … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
23
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
6
1

Relationship

1
6

Authors

Journals

citations
Cited by 8 publications
(23 citation statements)
references
References 14 publications
0
23
0
Order By: Relevance
“…We make several remarks comparing the above results from [AAGZ17a] with our tail bounds. First, the case of µ ≥ 1 and δ > 1 is missing in the above tail bounds, while our Theorem 1 and Proposition 1 cover all cases of µ, δ > 0.…”
Section: Comparison With [Aagz17a]mentioning
confidence: 75%
See 3 more Smart Citations
“…We make several remarks comparing the above results from [AAGZ17a] with our tail bounds. First, the case of µ ≥ 1 and δ > 1 is missing in the above tail bounds, while our Theorem 1 and Proposition 1 cover all cases of µ, δ > 0.…”
Section: Comparison With [Aagz17a]mentioning
confidence: 75%
“…In this section we give several approximations of H(λ, µ) with simpler forms. We also compare our result with existing tail bounds for sums of geometric random variables, mostly from [Jan18] and [AAGZ17a], showing our bound is tighter in several cases and easier to use overall.…”
Section: Approximations Of H(λ µ) and Comparisonsmentioning
confidence: 84%
See 2 more Smart Citations
“…Dynamic assortment under a MNL choice model has been recently considered (see for instance Rusmevichientong et al (2010), Sauré and Zeevi (2013) and Wang et al (2018)). In particular, UCB and Thompson sampling-based policies are proposed in Agrawal et al (2017a) and Agrawal et al (2017b). These policies achieve optimal regrets of O( √ N T ) in the non-feature based setting.…”
Section: Introductionmentioning
confidence: 99%