2015
DOI: 10.1109/tac.2015.2418672
|View full text |Cite
|
Sign up to set email alerts
|

Information Relaxation and Dual Formulation of Controlled Markov Diffusions

Abstract: Information relaxation and duality in Markov decision processes have been studied recently by several researchers with the goal to derive dual bounds on the value function. In this paper we extend this dual formulation to controlled Markov diffusions: in a similar way we relax the constraint that the decision should be made based on the current information and impose a penalty to punish the access to the information in advance. We establish the weak duality, strong duality and complementary slackness results i… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
10
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
6

Relationship

3
3

Authors

Journals

citations
Cited by 14 publications
(10 citation statements)
references
References 44 publications
0
10
0
Order By: Relevance
“…These papers develop methods for bounding the performance of heuristic policies relative to the optimal for non-worst-case stochastic control problems. ( Haugh andYe andZhou 2015 for related ideas. ) In this regard, one contribution of this paper is that it develops systematic performance evaluation methods for worst-case robust control problems, which we apply to performance evaluation of the robust Gittins index.…”
Section: Relevant Literaturementioning
confidence: 99%
See 1 more Smart Citation
“…These papers develop methods for bounding the performance of heuristic policies relative to the optimal for non-worst-case stochastic control problems. ( Haugh andYe andZhou 2015 for related ideas. ) In this regard, one contribution of this paper is that it develops systematic performance evaluation methods for worst-case robust control problems, which we apply to performance evaluation of the robust Gittins index.…”
Section: Relevant Literaturementioning
confidence: 99%
“…The lower bound V L x is obtained in §5.3 by adapting the idea of information relaxation (Brown et al 2010, Haugh and Lim 2012, Rogers 2007, Ye and Zhou 2015, which relaxes the requirement that nature's actions depend only on the information available at the time of the decision. This can be viewed as a "strengthening of nature," because it gives nature the ability to make use of future state information.…”
Section: Performance Evaluationmentioning
confidence: 99%
“…Here we present a simpler version of their approach so that its connection with our proposed approach is more clear. Ye and Zhou (2015) studies the form of optimal dual penalty under the setting of controlled Markov diffusion (CMD) and show that it is a stochastic integral. It then inspires the authors to propose an approximation scheme for the optimal dual penalty of a discrete-time DP with state dynamics (3.14) as follows:…”
Section: American Option Pricingmentioning
confidence: 99%
“…Our framework is more universal and powerful, because it reveals the structure of the optimal dual penalty regardless of the underlying probability measure (i.e., not restricted to the Brownian measure in Belomestny et al (2009) or the Poisson random measure in Zhu et al (2015)). We will also show that the approximation scheme in Ye and Zhou (2015) could be viewed as a special case of the proposed framework as well. To summarize, the contributions are as follows:…”
Section: Introductionmentioning
confidence: 98%
“…Another constraint is the "information constraint" or non-anticipativity of the control policy, that is, the decision should depend on the information up to the time that the decision is made. These relaxations may lead to a simpler dynamic optimization problem: the first constraint that exists universally in mathematical programs can be tackled by the commonly known Lagrangian relaxation (see, e.g., [5]), which results in an unconstrained stochastic dynamic program that may be easier to solve; the second constraint can be approached by a recently developed technique -"information relaxation" (see, e.g, [6], [7], [8], [9], [10]), which relaxes the non-anticipativity constraint on the controls but impose a penalty for such a violation. Since this approach allows the decision to be made based on the future outcome, it involves scenario-based dynamic programs, which are deterministic optimization problems and may be less complicated than the original stochastic dynamic program.…”
mentioning
confidence: 99%