2018
DOI: 10.1287/opre.2017.1713
|View full text |Cite
|
Sign up to set email alerts
|

On Incomplete Learning and Certainty-Equivalence Control

Abstract: We consider a dynamic learning problem where a decision maker sequentially selects a control and observes a response variable that depends on chosen control and an unknown sensitivity parameter. After every observation, the decision maker updates her/his estimate of the unknown parameter and uses a certainty-equivalence decision rule to determine subsequent controls based on this estimate. We show that under this certainty-equivalence learning policy the parameter estimates converge with positive probability t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 40 publications
(7 citation statements)
references
References 39 publications
(47 reference statements)
0
7
0
Order By: Relevance
“…This means that it suffices to analyse the second term which is the (expected) regret that has been analysed in [5,16] assuming a self-exploration property of greedy policies (see Remark 3.1). However, the following examples shows these greedy policies in general do not guarantee exploration and consequently convergence to the optimal solution, which is often referred to as incomplete learning in the literature (see e.g., [21]).…”
Section: Phased-based Learning Algorithm and Our Contributionsmentioning
confidence: 99%
See 1 more Smart Citation
“…This means that it suffices to analyse the second term which is the (expected) regret that has been analysed in [5,16] assuming a self-exploration property of greedy policies (see Remark 3.1). However, the following examples shows these greedy policies in general do not guarantee exploration and consequently convergence to the optimal solution, which is often referred to as incomplete learning in the literature (see e.g., [21]).…”
Section: Phased-based Learning Algorithm and Our Contributionsmentioning
confidence: 99%
“…They prove that if optimal controls of the true model automatically exploit the parameter space, then a greedy least-squares algorithm with suitable initialisation admits a non-asymptotically logarithmic expected regret for LQ models [5], and a O( √ N ) expected regret for LC models [16]. Unfortunately, as shown in Example 1.1 and in [21], such a self-exploration property may not hold in general, even for LQ models. Furthermore, the learning algorithm studied here works for an arbitrary initialisation.…”
Section: Related Workmentioning
confidence: 99%
“…More recently, some have studied dynamic pricing with changing demand covariates (Cohen et al 2016, Qiang and Bayati 2016, Javanmard and Nazerzadeh 2016, Ban and Keskin 2017 or a changing demand function (den Boer 2015, Keskin andZeevi 2015). These changes in the demand environment can help the greedy algorithm explore naturally and achieve asymptotically optimal performance.…”
Section: Related Literaturementioning
confidence: 99%
“…In our numerical analysis (see Section 7) we also include a lookahead type policy. Some recent work have highlighted the effectiveness of simple policies in the context of dynamic learning such as greedy algorithms (e.g., Bastani et al, 2020) and Certainty-Equivalence (e.g., Keskin and Zeevi, 2018). The greedy algorithm behaves well when exploration is expensive while in our case it is free.…”
Section: Related Literaturementioning
confidence: 87%