The MNL-Bandit Problem

Agrawal, Shipra; Avadhanula, Vashist; Goyal, Vineet; Zeevi, Assaf

doi:10.1007/978-3-031-01926-5_9

Cited by 3 publications

(8 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Within the management science and operations research community, the problem of dynamic assortment planning has recently received much attention (see, e.g., Rusmevichientong et al, 2010;Farias et al, 2013;Sauré & Zeevi, 2013;Agrawal et al, 2017;Cheung & Simchi-Levi, 2017;Chen & Wang, 2018;Ou et al, 2018;Agrawal et al, 2019;Kallus & Udell, 2020;Chen et al, 2021). All these papers study assortment optimization under the MNL model in a sequential decision framework.…”

Section: Literature Reviewmentioning

confidence: 99%

Stochastic approximation for uncapacitated assortment optimization under the multinomial logit model

Peeters

Boer

2022

Naval Research Logistics

View full text Add to dashboard Cite

We consider dynamic assortment optimization with incomplete information under the uncapacitated multinomial logit choice model. We propose an anytime stochastic approximation policy and prove that the regret-the cumulative expected revenue loss caused by offering suboptimal assortments-after T time periods is bounded by √T times a constant that is independent of the number of products. In addition, we prove a matching lower bound on the regret for any policy that is valid for arbitrary model parameters-slightly generalizing a recent regret lower bound derived for specific revenue parameters. Numerical illustrations suggest that our policy outperforms alternatives by a significant margin when T and the number of products N are not too small.

show abstract

Section: Literature Reviewmentioning

confidence: 99%

Stochastic approximation for uncapacitated assortment optimization under the multinomial logit model

Peeters

Boer

2022

Naval Research Logistics

View full text Add to dashboard Cite

show abstract

“…Bertsimas and Mi si c (2019) studied a twostep problem with separate demand estimation and assortment planning, where the first step estimates a generic ranking-based choice model and the second step solves a mixed-integer optimization for assortment planning. Rusmevichientong et al (2010), Saure and Zeevi (2013), Agrawal et al (2017Agrawal et al ( , 2019, Wang et al (2018) and incorporated choice models of MNL into dynamic assortment planning, formulating the problem into an online regret minimization problem. However, the extension of the plain MNL model to nested logit models is highly nontrivial and requires several technical innovations.…”

Section: Related Workmentioning

confidence: 99%

“…In existing dynamic assortment literature, the underlying choice model is usually assumed to be an MNL model (Agrawal et al 2017, Rusmevichientong et al 2010, Saure and Zeevi 2013. (The work of (Saure and Zeevi 2013) also considered other forms of choice models, in addition to the MNL model.)…”

Section: Introductionmentioning

confidence: 99%

“…In many scenarios, customers' choice behavior (e.g., mean utilities of products) is not given as a priori and cannot be easily estimated due to the insufficiency of historical data (e.g., fast fashion sale or online advertising). To address this challenge, dynamic assortment planning that simultaneously learns choice behavior and makes decisions about the assortment has received a lot of attention (Agrawal et al 2017, Caro and Gallien 2007, Rusmevichientong et al 2010, Saure and Zeevi 2013). More specifically, in a dynamic assortment planning problem, the seller offers an assortment (or a set of assortments for different nests in a nested logit model) to each arriving customer in a finite time horizon T, observes the purchase behavior of the customer, and then updates the learned information about the underlying demand function.…”

Section: Introductionmentioning

confidence: 99%

“…To address this challenge, dynamic assortment planning that simultaneously learns choice behavior and makes decisions about the assortment has received a lot of attention (Agrawal et al. 2017, 2019, Caro and Gallien 2007, Chen and Wang 2018, Rusmevichientong et al. 2010, Saure and Zeevi 2013, Wang et al.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Dynamic Assortment Planning Under Nested Logit Models

Chen

Shi²,

Wang

et al. 2021

Production and Operations Management

View full text Add to dashboard Cite

We study a stylized dynamic assortment planning problem during a selling season of finite length T. At each time period, the seller offers an arriving customer an assortment of substitutable products and the customer makes the purchase among offered products according to a discrete choice model. The goal of the seller is to maximize the expected revenue, or equivalently, to minimize the worst‐case expected regret. One key challenge is that utilities of products are unknown to the seller and need to be learned. Although the dynamic assortment planning problem has received increasing attention in revenue management, most existing work is based on the multinomial logit choice models (MNL). In this paper, we study the problem of dynamic assortment planning under a more general choice model—the nested logit model, which models hierarchical choice behavior and is “the most widely used member of the GEV (generalized extreme value) family” (Train 2009). By leveraging the revenue‐ordered structure of the optimal assortment within each nest, we develop a novel upper confidence bound (UCB) policy with an aggregated estimation scheme. Our policy simultaneously learns customers’ choice behavior and makes dynamic decisions on assortments based on the current knowledge. It achieves the accumulated regret at the order of Ofalse~false(MNTfalse), where M is the number of nests and N is the number of products in each nest. We further provide a lower bound result of Ω(italicMT), which shows the near optimality of the upper bound when T is much larger than M and N. When the number of items per nest N is large, we further provide a discretization heuristic for better performance of our algorithm. Numerical results are presented to demonstrate the empirical performance of our proposed algorithms.

show abstract

Multinomial Thompson sampling for rating scales and prior considerations for calibrating uncertainty

Deliu

2023

Stat Methods Appl

View full text Add to dashboard Cite

Bandit algorithms such as Thompson sampling (TS) have been put forth for decades as useful tools for conducting adaptively-randomised experiments. By skewing the allocation toward superior arms, they can substantially improve particular outcomes of interest for both participants and investigators. For example, they may use participants’ ratings for continuously optimising their experience with a program. However, most of the bandit and TS variants are based on either binary or continuous outcome models, leading to suboptimal performances in rating scale data. Guided by behavioural experiments we conducted online, we address this problem by introducing Multinomial-TS for rating scales. After assessing its improved empirical performance in unique optimal arm scenarios, we explore potential considerations (including prior’s role) for calibrating uncertainty and balancing arm allocation in scenarios with no unique optimal arms.

show abstract

The MNL-Bandit Problem

Cited by 3 publications

References 19 publications

Stochastic approximation for uncapacitated assortment optimization under the multinomial logit model

Stochastic approximation for uncapacitated assortment optimization under the multinomial logit model

Dynamic Assortment Planning Under Nested Logit Models

Multinomial Thompson sampling for rating scales and prior considerations for calibrating uncertainty

Contact Info

Product

Resources

About