2006
DOI: 10.1007/11871842_74
|View full text |Cite
|
Sign up to set email alerts
|

Scaling Model-Based Average-Reward Reinforcement Learning for Product Delivery

Abstract: Reinforcement learning in real-world domains suffers from three curses of dimensionality: explosions in state and action spaces, and high stochasticity. We present approaches that mitigate each of these curses. To handle the state-space explosion, we introduce "tabular linear functions" that generalize tile-coding and linear value functions. Action space complexity is reduced by replacing complete joint action space search with a form of hill climbing. To deal with high stochasticity, we introduce a new algori… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
12
0

Year Published

2010
2010
2019
2019

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 22 publications
(12 citation statements)
references
References 8 publications
0
12
0
Order By: Relevance
“…The criteria of choosing a better joint action helps to reduce the computational cost of searching the joint actions, which is exponentially proportional to the agent number and to provide a better performance. We present in this section the hill climbing search algorithm (HCS) which is proposed in [10]. Then, we propose an enhancement of the Hill climbing search algorithm for an opltimal joint action selection (eHCS) algorithm that speeds up the action search.…”
Section: E Coordinated Multi-agent Rlmentioning
confidence: 99%
See 1 more Smart Citation
“…The criteria of choosing a better joint action helps to reduce the computational cost of searching the joint actions, which is exponentially proportional to the agent number and to provide a better performance. We present in this section the hill climbing search algorithm (HCS) which is proposed in [10]. Then, we propose an enhancement of the Hill climbing search algorithm for an opltimal joint action selection (eHCS) algorithm that speeds up the action search.…”
Section: E Coordinated Multi-agent Rlmentioning
confidence: 99%
“…Unlike the distributed approach which consists on forwarding all information between all agents, that may be a time consuming. Plenty of centralized joint action selection algorithms exists in the literature such as hill climbing search [10], Stackelberg Q-Learning [11] etc. In our model we propose a modified version of hill climbing search algorithm which fit our drone actions selection problem.…”
Section: E Coordinated Multi-agent Rlmentioning
confidence: 99%
“…The numerous successful applications of reinforcement learning include (in no particular order) learning in games (e.g., Backgammon (Tesauro, 1994) and Go (Silver et al, 2007)), applications in networking (e.g., packet routing (Boyan and Littman, 1994), channel allocation (Singh and Bertsekas, 1997)), applications to operations research problems (e.g., targeted marketing (Abe et al, 2004), maintenance problems (Gosavi, 2004), job-shop scheduling (Zhang and Dietterich, 1995), elevator control (Crites and Barto, 1996), pricing (Rusmevichientong et al, 2006), vehicle routing (Proper and Tadepalli, 2006), inventory control (Chang et al, 2007), fleet management (Simão et al, 2009)), learning in robotics (e.g., controlling quadrupedales (Kohl and Stone, 2004), humanoid robots (Peters et al, 2003), or helicopters (Abbeel et al, 2007)), and applications to finance (e.g., option pricing (Tsitsiklis andVan Roy, 1999b, 2001;Yu and Bertsekas, 2007;Li et al, 2009) …”
Section: Applicationsmentioning
confidence: 99%
“…ARL has been previously used in the context of Q-learning [20] to achieve a compact representation of the value function, see for example [21]. More recently, it has also been used in model-based RL [22], in relational RL [23] and in learning general games [24].…”
Section: Introductionmentioning
confidence: 99%