Deep Q-Learning for Nash Equilibria: Nash-DQN

Casgrain, Philippe; Ning, Brian; Jaimungal, Sebastian

doi:10.48550/arxiv.1904.10554

Cited by 8 publications

(14 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Pioneered by [12,35,36,45,46,47], various reinforcement learning algorithms are implemented and perform extremely successful in portfolio optimization problem with transaction costs. Reinforcement learning can even solve the Nash equilibria, see [16] for details. The key idea is to directly parametrize the optimal trading rate and optimize the discretized version of preference (2.6).…”

Section: Deep Learning-based Numerical Algorithmsmentioning

confidence: 99%

“…In the meantime, with the development of modern model-free techniques, reinforcement learning algorithms are also widely used in single-agent optimization problems. Indeed, as shown in the groundbreaking papers [12,14,15,16,35,36,45,46,47], we treat the utility functions as targets and directly parametrize and learn the optimal trading policy. Moreover, reinforcement learning frameworks are introduced and analyzed rigorously in [53,54].…”

Section: Introductionmentioning

confidence: 99%

“…However, we can numerically find the optimal trading strategy with the modern development of deep learning algorithms. Among various deep learning structures, the most popular choices are the FBSDE solver introduced in the spirit by [32] and the deep hedging algorithm pioneered by [12,14,15,16,35,36,45,47]. We implement these deep learning algorithms with calibrated parameters from [26] and compare the numerical results with the leading order approximations.…”

mentioning

confidence: 99%

See 2 more Smart Citations

Deep Learning Algorithms for Hedging with Frictions

Shi,

Xu,

Zhang

2021

Preprint

View full text Add to dashboard Cite

This work studies the optimal hedging problems in frictional markets with general convex transaction costs on the trading rates. We show that, under the smallness assumption on the magnitude of the transaction costs, the leading order approximation of the optimal trading speed can be identified through the solution to a nonlinear SDE. Unfortunately, models with arbitrary state dynamics generally lead to a nonlinear forward-backward SDE system, where wellposedness results are unavailable. However, we can numerically find the optimal trading strategy with the modern development of deep learning algorithms. Among various deep learning structures, the most popular choices are the FBSDE solver introduced in the spirit by [32] and the deep hedging algorithm pioneered by [12,14,15,16,35,36,45,47]. We implement these deep learning algorithms with calibrated parameters from [26] and compare the numerical results with the leading order approximations. This work documents the performance of different learningbased algorithms and provides better understandings of their advantages and drawbacks.

show abstract

Section: Deep Learning-based Numerical Algorithmsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

mentioning

confidence: 99%

See 1 more Smart Citation

Deep Learning Algorithms for Hedging with Frictions

Shi,

Xu,

Zhang

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…LeCun et al, 2015;Silver et al, 2016;Goodfellow et al, 2016), especially in financial mathematics (see e.g. Al-Aradi et al, 2018;Hu, 2019;Casgrain et al, 2019;Horvath et al, 2021;Campbell et al, 2021;Carmona and Laurière, 2021). The use of compositions of simple functions (usually referred to as propagation and activation functions) through several layers does a good job in modeling complicated functions.…”

Section: Actor-critic Algorithmmentioning

confidence: 99%

Reinforcement Learning with Dynamic Convex Risk Measures

Coache¹,

Jaimungal²

2021

Preprint

Self Cite

View full text Add to dashboard Cite

We develop an approach for solving time-consistent risk-sensitive stochastic optimization problems using model-free reinforcement learning (RL). Specifically, we assume agents assess the risk of a sequence of random variables using dynamic convex risk measures. We employ a time-consistent dynamic programming principle to determine the value of a particular policy, and develop policy gradient update rules. We further develop an actor-critic style algorithm using neural networks to optimize over policies. Finally, we demonstrate the performance and flexibility of our approach by applying it to optimization problems in statistical arbitrage trading strategies and obstacle avoidance robot control.

show abstract

“…While it has worked well in practice, IQL struggles when learning more difficult multiagent coordination and control tasks due to non-stationarity instability. Strategies such as Nash-Q learning [2], [11], [13], [36], [37], Minimax [11], [19], [36], [37], and Friend or Foe Q-Learning [11], [18], [36], [37] have been proposed to solve stochastic games by finding a Nash Equilibrium policy. While these methods work in stochastic game settings, the complexity of the task at hand becomes a significant bottleneck as non-stationarity causes unstable learning.…”

Section: Introductionmentioning

confidence: 99%

Scalable, Decentralized Multi-Agent Reinforcement Learning Methods Inspired by Stigmergy and Ant Colonies

Nguyen

2021

Preprint

View full text Add to dashboard Cite

Bolstering multi-agent learning algorithms to tackle complex coordination and control tasks has been a long-standing challenge of on-going research. Numerous methods have been proposed to help reduce the effects of non-stationarity and unscalability. In this work, we investigate a novel approach to decentralized multi-agent learning and planning that attempts to address these two challenges. In particular, this method is inspired by the cohesion, coordination, and behavior of ant colonies. As a result, these algorithms are designed to be naturally scalable to systems with numerous agents. While no optimality is guaranteed, the method is intended to work well in practice and scale better in efficacy with the number of agents present than others. The approach combines single-agent RL and an ant-colony-inspired decentralized, stigmergic algorithm for multi-agent path planning and environment modification. Specifically, we apply this algorithm in a setting where agents must navigate to a goal location, learning to push rectangular boxes into holes to yield new traversable pathways. It is shown that while the approach yields promising success in this particular environment, it may not be as easily generalized to others. The algorithm designed is notably scalable to numerous agents but is limited in its performance due to its relatively simplistic, rule-based approach. Furthermore, the composability of RL-trained policies is called into question, where, while policies are successful in their training environments, applying trained policies to a larger-scale, multi-agent framework results in unpredictable behavior.

show abstract

Deep Q-Learning for Nash Equilibria: Nash-DQN

Cited by 8 publications

References 0 publications

Deep Learning Algorithms for Hedging with Frictions

Deep Learning Algorithms for Hedging with Frictions

Reinforcement Learning with Dynamic Convex Risk Measures

Scalable, Decentralized Multi-Agent Reinforcement Learning Methods Inspired by Stigmergy and Ant Colonies

Contact Info

Product

Resources

About