2007
DOI: 10.1007/s10957-007-9331-9
|View full text |Cite
|
Sign up to set email alerts
|

Q-Learning Algorithms with Random Truncation Bounds and Applications to Effective Parallel Computing

Abstract: Motivated by an important problem of load balancing in parallel computing, this paper examines a modified algorithm to enhance Q-learning methods, especially in asynchronous recursive procedures for self-adaptive load distribution at runtime. Unlike the existing projection method that utilizes a fixed region, our algorithm employs a sequence of growing truncation bounds to ensure the boundedness of the iterates. Convergence and rates of convergence of the proposed algorithm are established. This class of algor… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2012
2012
2022
2022

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 10 publications
0
4
0
Order By: Relevance
“…The objective of RL is to maximize the expected cumulative reward by optimizing policy. Qlearning [44] is a typical type of RL to maximize the value function Q π θ (s, a). The value function estimates the expected cumulative reward of state s with action a under policy π θ .…”
Section: Rl-based Resource Scalingmentioning
confidence: 99%
“…The objective of RL is to maximize the expected cumulative reward by optimizing policy. Qlearning [44] is a typical type of RL to maximize the value function Q π θ (s, a). The value function estimates the expected cumulative reward of state s with action a under policy π θ .…”
Section: Rl-based Resource Scalingmentioning
confidence: 99%
“…Q-learning is essentially adaptive optimisation method of a control system governed by a controlled Markov chain, or a Markov decision process. It stems from machine learning and artificial intelligence [21,22,23]. Denote by p i j (u) the transition probabilities for a controlled finite-state Markov chain, where u denotes the control that takes values in a finite set U(i) for each i .…”
Section: Appendix 1: Q-learning Basicsmentioning
confidence: 99%
“…It is a technique of learning from delayed "costs," or a method for adaptive optimization of a controlled discrete-time Markov chain with a finite or countable state space. A comprehensive discussion of the method and research progress in this area towards a general framework can be found in our recent work in [36]. A survey of reinforcement learning technology from a computer science perspective can be seen in [14].…”
Section: Related Workmentioning
confidence: 99%
“…Examples include job and parallel task scheduling [36,3], server allocation in a server farm [28,30], power management [29], and self-optimizing memory controller [13]. Designing a RL-based controller to automate the configuration process of VMs and appliances poses unique challenges as we discussed in Section 2.…”
Section: Related Workmentioning
confidence: 99%