Proceedings of the 23rd International Conference on Machine Learning - ICML '06 2006
DOI: 10.1145/1143844.1143929
|View full text |Cite
|
Sign up to set email alerts
|

Reinforcement learning for optimized trade execution

Abstract: We present the first large-scale empirical application of reinforcement learning to the important problem of optimized trade execution in modern financial markets. Our experiments are based on 1.5 years of millisecond time-scale limit order data from NASDAQ, and demonstrate the promise of reinforcement learning methods to market microstructure problems. Our learning algorithm introduces and exploits a natural "low-impact" factorization of the state space.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
171
0
5

Year Published

2010
2010
2022
2022

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 204 publications
(182 citation statements)
references
References 9 publications
1
171
0
5
Order By: Relevance
“…Developing tractable models that account for such data remains a challenge. One initiative to incorporate limit order book data into the decision process is presented by Nevmyvaka et al (2006).…”
Section: Resultsmentioning
confidence: 99%
“…Developing tractable models that account for such data remains a challenge. One initiative to incorporate limit order book data into the decision process is presented by Nevmyvaka et al (2006).…”
Section: Resultsmentioning
confidence: 99%
“…A model that fits the problem of manipulation under the representation Fig. 1 is that of a Markov Decision Process (MDP) [18], [22]. In general, an MDP is defined by the tuple {S, A, T, R}, where S and A are sets of states and actions, respectively (s ∈ S and a ∈ A), R is the set of rewards (r ∈ R), and T is a set of transition probabilities ({P (s |s, a)} ∈ T , where P (s |s, a) represents the probability of transitioning to state s from s after action a).…”
Section: A Spoofing As a Markov Decision Processmentioning
confidence: 99%
“…that can for example vary their aggressiveness, are superior to static strategies. Nevmyvaka [30] proposed dynamic price adjustment strategy, where limit order's price is revised every 30 seconds adapting to the changing market state. Wang [35] proposed a dynamic focus strategy, which dynamically adjusts volume according to real-time update of state variables such as inventory and order book imbalance, and showed that dynamic focus strategy can outperforms a static limit order strategy.…”
Section: Typesmentioning
confidence: 99%
“…The objective function we used here is ratio of the difference between the VWAPs of the 30 orders and the entire executed orders generated from the ASM simulation to the entire executed orders' VWAP, which are V W AP 30 and V W AP global respectively. For both buy and sell orders, the smaller the VWAP Ratio, the better the strategy is.…”
Section: Ga Strategiesmentioning
confidence: 99%