2018
DOI: 10.2139/ssrn.3102707
|View full text |Cite
|
Sign up to set email alerts
|

The QLBS Q-Learner Goes NuQLear: Fitted Q Iteration, Inverse RL, and Option Portfolios

Abstract: The QLBS model is a discrete-time option hedging and pricing model that is based on Dynamic Programming (DP) and Reinforcement Learning (RL). It combines the famous Q-Learning method for RL with the Black-Scholes (-Merton) model's idea of reducing the problem of option pricing and hedging to the problem of optimal rebalancing of a dynamic replicating portfolio for the option, which is made of a stock and cash. Here we expand on several NuQLear (Numerical Q-Learning) topics with the QLBS model. First, we invest… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2018
2018
2019
2019

Publication Types

Select...
4
1

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(6 citation statements)
references
References 13 publications
0
6
0
Order By: Relevance
“…Quadratic risk-adjusted objective functions were considered in an apparently different problem of optimal option pricing and hedging using a model-free, data-driven approach in the work by one of the authors [26,27]. The approach used in this work assumes off-line, batch-mode learning, that enables using data-efficient batch RL methods such as Fitted Q Iteration [15,37].…”
Section: Reinforcement Learningmentioning
confidence: 99%
See 3 more Smart Citations
“…Quadratic risk-adjusted objective functions were considered in an apparently different problem of optimal option pricing and hedging using a model-free, data-driven approach in the work by one of the authors [26,27]. The approach used in this work assumes off-line, batch-mode learning, that enables using data-efficient batch RL methods such as Fitted Q Iteration [15,37].…”
Section: Reinforcement Learningmentioning
confidence: 99%
“…Note that the multi-period portfolio optimization problem (24) assumes that an optimal policy that determines actions a t is a deterministic policy that can also be described as a delta-like probability distribution π(a t |y t ) = δ (a t − a t (y t )) (27) where the optimal deterministic action a t (y t ) is obtained by maximization of the objective (24) with respect to controls a t .…”
Section: Stochastic Policymentioning
confidence: 99%
See 2 more Smart Citations
“…They have been used in Liu et al [2019b], Liu et al [2019a], Ackerer et al [2019] and Bayer et al [2019] for calibration of stochastic volatility and rough stochastic volatility models. Another strand of literature is on the applications of deep reinforcement learning, specifically related to data driven pricing and hedging of portfolios, which includes notably the work of Buehler et al [2019] and Halperin [2019]. Application of neural networks for model-based pricing of early exercise options includes the approach of policy iteration in Becker et al [2019], value function iteration in Haugh and Kogan [2004], Kohler et al [2010] and Lapeyre and Lelong [2019].…”
Section: Introductionmentioning
confidence: 99%