2012
DOI: 10.1007/978-3-642-29946-9_31
|View full text |Cite
|
Sign up to set email alerts
|

Compound Reinforcement Learning: Theory and an Application to Finance

Abstract: Abstract. This paper describes compound reinforcement learning (RL) that is an extended RL based on the compound return. Compound RL maximizes the logarithm of expected double-exponentially discounted compound return in returnbased Markov decision processes (MDPs). The contributions of this paper are (1) Theoretical description of compound RL that is an extended RL framework for maximizing the compound return in a return-based MDP and (2) Experimental results in an illustrative example and an application to fi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
8
0

Year Published

2015
2015
2023
2023

Publication Types

Select...
3
3

Relationship

0
6

Authors

Journals

citations
Cited by 9 publications
(8 citation statements)
references
References 13 publications
0
8
0
Order By: Relevance
“…The strategy gradient algorithm directly adopts the function approximation method to establish the strategy network, selects the action through the strategy network to get the reward value, and optimizes the parameters of the strategy network along the gradient direction to get the optimal strategy maximum reward value. [8]In the application of the DRL algorithm, the value function algorithm needs to sample actions, so it can only deal with discrete actions, while the strategy gradient algorithm can directly use policy network to search actions, so it can be used to deal with continuous actions. In recent years, the actor-critic structure, which combines value function algorithm with strategy gradient algorithm, has also attracted extensive attention.…”
Section: Representative Drl Algorithmsmentioning
confidence: 99%
See 3 more Smart Citations
“…The strategy gradient algorithm directly adopts the function approximation method to establish the strategy network, selects the action through the strategy network to get the reward value, and optimizes the parameters of the strategy network along the gradient direction to get the optimal strategy maximum reward value. [8]In the application of the DRL algorithm, the value function algorithm needs to sample actions, so it can only deal with discrete actions, while the strategy gradient algorithm can directly use policy network to search actions, so it can be used to deal with continuous actions. In recent years, the actor-critic structure, which combines value function algorithm with strategy gradient algorithm, has also attracted extensive attention.…”
Section: Representative Drl Algorithmsmentioning
confidence: 99%
“…The method of updating the target Q network every N times is adopted to avoid the instability of the target Q network caused by the change of the current Q network during training. [8]The network structure of the DQN algorithm is shown below. 2…”
Section: Application Of Drl Algorithm In Atari 2600 Gamesmentioning
confidence: 99%
See 2 more Smart Citations
“…Reinforcement learning ( [8], known also as neuro-dynamic programming [9]) is an area of optimal control theory at the intersection of approximate dynamic programming and machine learning. It has been used successfully for many applications, in fields such as engineering [10,11], sociology [12,13], and economics [14,15].…”
Section: Introductionmentioning
confidence: 99%