Online Regularization towards Always-Valid High-Dimensional Dynamic Pricing

Wang, Chi-Hua; Wang, Zhanyu; Sun, Will Wei; Cheng, Guang

doi:10.48550/arxiv.2007.02470

Cited by 2 publications

(3 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A common and natural choice is to model the market value of the product at time t as a linear function of its features x t plus some market noise z t , i.e. v t = θ x t + z t where θ is some unknown parameter (Qiang and Bayati, 2016;Javanmard, 2017;Miao et al, 2019;Javanmard and Nazerzadeh, 2019;Ban and Keskin, 2020;Wang et al, 2020;Tang et al, 2020;Golrezaei et al, 2020). Under this setting, for 'truthful' buyers whose decision is based on comparing v t and offered price p t , the demand curve can be expressed as a generalized linear model given feature covariates x t , where the link function is closely related to the distribution of the market noise z t (see (2.3) for a detailed reasoning).…”

Section: • Dynamic Pricingmentioning

confidence: 99%

“…They prove that the greedy iterative least squares (GILS) algorithm achieves a regret upper bound of O d (log T ), where O d is the order that hides logarithmic terms and the dimensionality of feature d, and provide a matching lower bound under their setting. Miao et al (2019) and Ban and Keskin (2020) consider a generalized linear model with known link, while Javanmard and Nazerzadeh (2019) and Wang et al (2020) study the same problem with high dimensional sparse parameters. The algorithms are usually a combination of statistical estimation procedures and online learning techniques.…”

Section: • Dynamic Pricingmentioning

confidence: 99%

See 1 more Smart Citation

Policy Optimization Using Semi-parametric Models for Dynamic Pricing

Fan¹,

Guo²,

Yu³

2021

Preprint

View full text Add to dashboard Cite

In this paper, we study the contextual dynamic pricing problem where the market value of a product is linear in its observed features plus some market noise. Products are sold one at a time, and only a binary response indicating success or failure of a sale is observed. Our model setting is similar to Javanmard and Nazerzadeh (2019) except that we expand the demand curve to a semiparametric model and need to learn dynamically both parametric and nonparametric components. We propose a dynamic statistical learning and decision making policy that combines semiparametric estimation from a generalized linear model with an unknown link and online decision making to minimize regret (maximize revenue). Under mild conditions, we show that for a market noise c.d.f. F (•) with m-th order derivative (m ≥ 2), our policy achieves a regret upper bound of O d (T 2m+1 4m−1 ), where T is time horizon and O d is the order that hides logarithmic terms and the dimensionality of feature d. The upper bound is further reduced to O d ( √ T ) if F is super smooth whose Fourier transform decays exponentially. In terms of dependence on the horizon T , these upper bounds are close to Ω( √ T ), the lower bound where F belongs to a parametric class. We further generalize these results to the case with dynamically dependent product features under the strong mixing condition.

show abstract

Section: • Dynamic Pricingmentioning

confidence: 99%

Section: • Dynamic Pricingmentioning

confidence: 99%

Policy Optimization Using Semi-parametric Models for Dynamic Pricing

Fan¹,

Guo²,

Yu³

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Bandit algorithms (Bubeck and Cesa-Bianchi, 2012;Lattimore and Szepesvári, 2020) and reinforcement learning (Sutton and Barto, 2018) are modern strategies to solve sequential decision making problems. They have received recent attentions in statistics community for business and scientific applications including dynamic pricing (Wang et al, 2020;Chen, Simchi-Levi and Wang, 2021;Chen, Miao and Wang, 2021;Wang et al, 2021), online decision making (Shi et al, 2020;Chen, Lu and Song, 2021;Chen et al, 2022), dynamic treatment regimes (Qi and Liu, 2018;Luckett et al, 2019;Qi et al, 2020;Qi, Miao and Zhang, 2021), and online causal effect in two-sided market (Shi et al, 2022b).…”

mentioning

confidence: 99%

Rate-Optimal Contextual Online Matching Bandit

Li¹,

Wang²,

Cheng³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Two-sided online matching platforms have been employed in various markets. However, agents' preferences in the present market are usually implicit and unknown, and thus must be learned from data. With the growing availability of side information involved in the decision process, modern online matching methodology demands the capability to track preference dynamics for agents based on the contextual information. This motivates us to consider a novel Contextual Online Matching Bandit prOblem (COMBO), which allows dynamic preferences in matching decisions. Existing works focus on multi-armed bandit with static preference, but this is insufficient: the two-sided preference changes as long as one-side's contextual information updates, resulting in non-static matching. In this paper, we propose a Centralized Contextual -Explore Then Commit (CC-ETC) algorithm to adapt to the COMBO. CC-ETC solves online matching with dynamic preference. In theory, we show that CC-ETC achieves a sublinear regret upper bound O(log(T )) and is a rate-optimal algorithm by proving a matching lower bound. In the experiments, we demonstrate that CC-ETC is robust to variant preference schemes, dimensions of contexts, reward noise levels, and context variation levels.

show abstract

Online Regularization towards Always-Valid High-Dimensional Dynamic Pricing

Cited by 2 publications

References 6 publications

Policy Optimization Using Semi-parametric Models for Dynamic Pricing

Policy Optimization Using Semi-parametric Models for Dynamic Pricing

Rate-Optimal Contextual Online Matching Bandit

Contact Info

Product

Resources

About