2012 IEEE 51st IEEE Conference on Decision and Control (CDC) 2012
DOI: 10.1109/cdc.2012.6426037
|View full text |Cite
|
Sign up to set email alerts
|

Parameterized penalties in the dual representation of Markov decision processes

Abstract: Duality in Markov decision processes (MDPs) has been studied recently by several researchers with the goal to derive dual bounds on the value function. In this paper we propose the idea of using parameterized penalty functions in the dual representation of MDPs, which allows us to integrate different types of penalty functions and guarantees a tighter dual bound with more penalties used. To complement and diversify the existing linear penalties developed in the literature, we also introduce a new class of nonl… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2013
2013
2018
2018

Publication Types

Select...
4
2

Relationship

4
2

Authors

Journals

citations
Cited by 10 publications
(6 citation statements)
references
References 16 publications
0
6
0
Order By: Relevance
“…Furthermore, from the information relaxation point of view (see Brown et al (2010)), we can gain an intuitive understanding towards the structure of the optimal penalty function. It inspires us to construct good penalty functions over the space of "feasible penalty functions" for general dynamic programming problems, which is still an open area to explore (see Ye and Zhou (2012) for some initial exploration).…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Furthermore, from the information relaxation point of view (see Brown et al (2010)), we can gain an intuitive understanding towards the structure of the optimal penalty function. It inspires us to construct good penalty functions over the space of "feasible penalty functions" for general dynamic programming problems, which is still an open area to explore (see Ye and Zhou (2012) for some initial exploration).…”
Section: Discussionmentioning
confidence: 99%
“…In particular, the dual martingales constructed by Haugh and Kogan (2004), Anderson and Broadie (2004) can be interpreted as perfect information relaxation, which means the option holder has access to all the future prices of the underlying assets. Ye and Zhou (2012) consider a parameterized path-wise optimization technique in constructing the penalties for general dynamic programming problems. Ye and Zhou (2013a) also develop the duality theory for general dynamic programming problems under a continuoustime setting.…”
Section: Introductionmentioning
confidence: 99%
“…The naive approach is to replace the optimal value functions with approximate ones, and use nested simulation to estimate the conditional expectations; however, this approach often requires substantial computational effort and might cause the resulted approximation to lose the dual feasibility. Various methods have been proposed to improve the accuracy and efficiency of the approximation, including the non-nested simulation approach by Belomestny et al (2009) and Zhu et al (2015) in American-style option pricing, and the pathwise optimization techniques by Desai et al (2011) and Ye and Zhou (2012). The advantage of the pathwise optimization method is that it explores a subspace of feasible dual penalties by considering the best linear combination of the existing dual penalties.…”
Section: Introductionmentioning
confidence: 99%
“…In addition, the optimal penalty is not unique: for general problems we have the value function-based penalty developed by [7] and [8]; for problems with convex structure there is an alternative optimal penalty, that is, the gradient-based penalty, as pointed out by [11]. On the other hand, in order to derive tight dual bounds, various approximation schemes based on different optimal penalties have been proposed including [8], [11], [12], [13]. We notice that this dual approach has found increasing applications in different fields, such as [14], [11], [15], [16], [17].…”
Section: Introductionmentioning
confidence: 99%