2018
DOI: 10.1016/j.ijar.2018.04.006
|View full text |Cite
|
Sign up to set email alerts
|

Improving multi-armed bandit algorithms in online pricing settings

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
16
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
3
2

Relationship

3
6

Authors

Journals

citations
Cited by 17 publications
(16 citation statements)
references
References 16 publications
0
16
0
Order By: Relevance
“…Contrarily, in this paper we consider the renewal price adjustment problem as a sequential decision process. This is not the first time that pricing problems are modeled as sequential decision making (Cesa-Bianchi et al, 2006;Blum & Hartline, 2005;Trovò et al, 2018). For instance, Cesa-Bianchi et al (2006) address a similar problem to the considered in this paper.…”
Section: Related Workmentioning
confidence: 90%
“…Contrarily, in this paper we consider the renewal price adjustment problem as a sequential decision process. This is not the first time that pricing problems are modeled as sequential decision making (Cesa-Bianchi et al, 2006;Blum & Hartline, 2005;Trovò et al, 2018). For instance, Cesa-Bianchi et al (2006) address a similar problem to the considered in this paper.…”
Section: Related Workmentioning
confidence: 90%
“…Cohen et al (2020) developed an online contextual bandit approach for pricing online fashion products with each product defined by a set of features. Trovò et al (2018) applied multi-armed bandit algorithms to online pricing of non-perishable goods in both stationary and non-stationary environments. Several other papers have extended dynamic pricing to the full reinforcement learning problem where an agent must consider the long-term consequences of its actions.…”
Section: Related Workmentioning
confidence: 99%
“…The authors propose algorithms that minimize the per-round pseudo-regret over an infinite time horizon. We also mention the work by Trovò, Paladino, Restelli, and Gatti (2018), who provide some bandit algorithms for dynamic pricing in non-stationary settings. Finally, the problem of non-stationarity with bounded per-round variation is tackled using contextual bandit techniques by Slivkins (2011), who designs the Contextual Zooming algorithm, and by Luo, Wei, Agarwal, and Langford (2018), for which they use a variant of the classic EXP4 algorithm.…”
Section: Related Workmentioning
confidence: 99%