Q-Learning Algorithms with Random Truncation Bounds and Applications to Effective Parallel Computing

Yin, George; Xu, Cheng‐Zhong; Wang, L. Y.

doi:10.1007/s10957-007-9331-9

Cited by 3 publications

(4 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The objective of RL is to maximize the expected cumulative reward by optimizing policy. Qlearning [44] is a typical type of RL to maximize the value function Q π θ (s, a). The value function estimates the expected cumulative reward of state s with action a under policy π θ .…”

Section: Rl-based Resource Scalingmentioning

confidence: 99%

CoScal: Multifaceted Scaling of Microservices With Reinforcement Learning

Song

Ilager

et al. 2022

IEEE Trans. Netw. Serv. Manage.

View full text Add to dashboard Cite

The emerging trend towards moving from monolithic applications to microservices has raised new performance challenges in cloud computing environments. Compared with traditional monolithic applications, the microservices are lightweight, fine-grained, and must be executed in a shorter time. Efficient scaling approaches are required to ensure microservices' system performance under diverse workloads with strict Quality of Service (QoS) requirements and optimize resource provisioning. To solve this problem, we investigate the trade-offs between the dominant scaling techniques, including horizontal scaling, vertical scaling, and brownout in terms of execution cost and response time. We first present a prediction algorithm based on gradient recurrent units to accurately predict workloads assisting in scaling to achieve efficient scaling. Further, we propose a multi-faceted scaling approach using reinforcement learning called CoScal to learn the scaling techniques efficiently. The proposed CoScal approach takes full advantage of datadriven decisions and improves the system performance in terms of high communication cost and delay. We validate our proposed solution by implementing a containerized microservice prototype system and evaluated with two microservice applications. The extensive experiments demonstrate that CoScal reduces response time by 19% to 29% and decreases the connection time of services by 16% when compared with the state-of-the-art scaling techniques.

show abstract

Section: Rl-based Resource Scalingmentioning

confidence: 99%

CoScal: Multifaceted Scaling of Microservices With Reinforcement Learning

Song

Ilager

et al. 2022

IEEE Trans. Netw. Serv. Manage.

View full text Add to dashboard Cite

show abstract

“…Q-learning is essentially adaptive optimisation method of a control system governed by a controlled Markov chain, or a Markov decision process. It stems from machine learning and artificial intelligence [21,22,23]. Denote by p i j (u) the transition probabilities for a controlled finite-state Markov chain, where u denotes the control that takes values in a finite set U(i) for each i .…”

Section: Appendix 1: Q-learning Basicsmentioning

confidence: 99%

Two‐time scale reinforcement learning and applications to production planning

Zhang

Yin

Wang

2020

IET Control Theory & Appl

View full text Add to dashboard Cite

This study is concerned with reinforcement learning enhanced by two-time scale approximations. Many systems arising in applications are large and complex. To treat these problems, it is often beneficial, and sometimes necessary, to reduce the dimensionality and aggregate states that are 'close' to each other. In this study, the authors propose a two-time scale reinforcement learning method for such an aggregation process. In particular, they present how to classify states that are 'close' and demonstrate the effectiveness of the authors' state aggregation based two-time scale methods. Thus the problem can be considered as using learning for identifying the system. A production planning problem with failure-prone machines is used throughout this study to illustrate the main ideas, key steps and results. Monte Carlo simulations are used to generate the random environment.

show abstract

“…It is a technique of learning from delayed "costs," or a method for adaptive optimization of a controlled discrete-time Markov chain with a finite or countable state space. A comprehensive discussion of the method and research progress in this area towards a general framework can be found in our recent work in [36]. A survey of reinforcement learning technology from a computer science perspective can be seen in [14].…”

Section: Related Workmentioning

confidence: 99%

“…Examples include job and parallel task scheduling [36,3], server allocation in a server farm [28,30], power management [29], and self-optimizing memory controller [13]. Designing a RL-based controller to automate the configuration process of VMs and appliances poses unique challenges as we discussed in Section 2.…”

Section: Related Workmentioning

confidence: 99%