2021
DOI: 10.1145/3459991
|View full text |Cite
|
Sign up to set email alerts
|

A Survey of Reinforcement Learning Algorithms for Dynamically Varying Environments

Abstract: Reinforcement learning (RL) algorithms find applications in inventory control, recommender systems, vehicular traffic management, cloud computing, and robotics. The real-world complications arising in these domains makes them difficult to solve with the basic assumptions underlying classical RL algorithms. RL agents in these applications often need to react and adapt to changing operating conditions. A significant part of research on single-agent RL techniques focuses on developing algorithms when the underlyi… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
31
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
3
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 97 publications
(31 citation statements)
references
References 63 publications
0
31
0
Order By: Relevance
“…In the domain of optimal control, the agent is identified with the controller, environment is the controlled system (or plant) and action is the control signal [2]. Among the various existing DRL algorithms [3], Deep Policy Gradient methods (that use gradient descent for the purpose of optimizing a decision making function, denoted as policy, with respect to the expected return) are deemed the most suitable method for handling robotic domains for the following reasons:…”
Section: B Learning-based Adaptive Controlmentioning
confidence: 99%
“…In the domain of optimal control, the agent is identified with the controller, environment is the controlled system (or plant) and action is the control signal [2]. Among the various existing DRL algorithms [3], Deep Policy Gradient methods (that use gradient descent for the purpose of optimizing a decision making function, denoted as policy, with respect to the expected return) are deemed the most suitable method for handling robotic domains for the following reasons:…”
Section: B Learning-based Adaptive Controlmentioning
confidence: 99%
“…Training an agent for 150 epochs takes about 3 minutes on a single CPU core (Intel I7-4870HQ). In contrast, previous approaches using active inference [Ueltzhöffer, 2018, Tschantz et al, 2019, 2020 and policy gradient methods (e.g., [Liu et al, 2017]) use (offline) policy replay and typically need hours of GPU-accelerated compute while achieving similar convergence. To our knowledge, this is the first model-based RL method to learn online using neural network representations.…”
Section: Experiments On the Mountain Car Problemmentioning
confidence: 99%
“…The field of Reinforcement Learning (RL) has achieved great success in designing artificial agents that can learn to navigate and solve unknown environments, and has had significant applications in robotics [Kober et al, 2013, Polydoros andNalpantidis, 2017], game playing [Mnih et al, 2015, Silver et al, 2017, Shao et al, 2019, and many other dynamically varying environments with nontrivial solutions [Padakandla, 2020]. However, environments with sparse reward signals are still an open challenge in RL because optimizing policies over Heaviside or deceptive reward functions such as that in the mountain car problem requires substantial exploration to experience enough reward to learn.…”
Section: Introductionmentioning
confidence: 99%
“…Development Operations Safety/Security MDP formulation [30] Metrics design [31] Algotithm design [32] Training methodologies [33] Explainability [34] Digital twins [35], [36] Sim2Real [37], [38] Hyperparameter optimisation [39] Performance evaluation [40] A/B deployment [41] Model decay [42] Interoperability [43] Deployment sites [44] Constrained MDP [45] DevSecOps [46] Adversarial agent [47], [48] Attack detection [49] Fig. 2.…”
Section: Designmentioning
confidence: 99%