A Survey of Reinforcement Learning Algorithms for Dynamically Varying Environments

Padakandla, Sindhu

doi:10.1145/3459991

Cited by 97 publications

(31 citation statements)

References 63 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In the domain of optimal control, the agent is identified with the controller, environment is the controlled system (or plant) and action is the control signal [2]. Among the various existing DRL algorithms [3], Deep Policy Gradient methods (that use gradient descent for the purpose of optimizing a decision making function, denoted as policy, with respect to the expected return) are deemed the most suitable method for handling robotic domains for the following reasons:…”

Section: B Learning-based Adaptive Controlmentioning

confidence: 99%

Learning Stochastic Adaptive Control using a Bio-Inspired Experience Replay

CHAFFRE¹,

SANTOS²,

CHENADEC³

et al. 2022

Preprint

View full text Add to dashboard Cite

<p>Deep Reinforcement Learning (DRL) methods are dominating the field of adaptive control where they are used to adapt the controller response to disturbances. Nevertheless, the usage of these methods on physical platforms is still limited due to their data inefficiency and the performance drop when facing unseen process variations. This is particularly perceived in the Autonomous Underwater Vehicles (AUVs) context as studied here, where the process observability is limited. To be effective, DRL-based AUV control systems require the use of methods that are data-efficient (in order to reach a satisfactory behavior with a sufficiently fast response time) and are resilient (to ensure robustness to severe changes in operating conditions). With this ambition, we study in this paper the effect of the Experience Replay (ER) mechanism on the performance variation of a DRL-based stochastic adaptive controller. We propose a new ER method (denoted as BIER) that takes inspiration from the biological Replay Mechanism and compare it to the standard method denoted as CER. We apply it to the Soft Actor-Critic, a maximum entropy DRL algorithm, for use with an AUV maneuvering task that consists in stabilizing the vehicle at a given velocity and pose. The training results show that our BIER method exceeds the performance of the nonadaptive optimal model-based counterpart of the controller in less than half the number of episodes compared to CER. We proposed different evaluation scenarios of increasing complexity as measured by desired velocity value and amplitude of current disturbance. Our results suggest that the BIER method achieves improved learning stability and better generalization abilities.</p>

show abstract

Section: B Learning-based Adaptive Controlmentioning

confidence: 99%

Learning Stochastic Adaptive Control using a Bio-Inspired Experience Replay

CHAFFRE¹,

SANTOS²,

CHENADEC³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Training an agent for 150 epochs takes about 3 minutes on a single CPU core (Intel I7-4870HQ). In contrast, previous approaches using active inference [Ueltzhöffer, 2018, Tschantz et al, 2019, 2020 and policy gradient methods (e.g., [Liu et al, 2017]) use (offline) policy replay and typically need hours of GPU-accelerated compute while achieving similar convergence. To our knowledge, this is the first model-based RL method to learn online using neural network representations.…”

Section: Experiments On the Mountain Car Problemmentioning

confidence: 99%

“…The field of Reinforcement Learning (RL) has achieved great success in designing artificial agents that can learn to navigate and solve unknown environments, and has had significant applications in robotics [Kober et al, 2013, Polydoros andNalpantidis, 2017], game playing [Mnih et al, 2015, Silver et al, 2017, Shao et al, 2019, and many other dynamically varying environments with nontrivial solutions [Padakandla, 2020]. However, environments with sparse reward signals are still an open challenge in RL because optimizing policies over Heaviside or deceptive reward functions such as that in the mountain car problem requires substantial exploration to experience enough reward to learn.…”

Section: Introductionmentioning

confidence: 99%

Online reinforcement learning with sparse rewards through an active inference capsule

Daniel¹,

Hoof²,

Millidge³

2021

Preprint

View full text Add to dashboard Cite

Intelligent agents must pursue their goals in complex environments with partial information and often limited computational capacity. Reinforcement learning methods have achieved great success by creating agents that optimize engineered reward functions, but which often struggle to learn in sparse-reward environments, generally require many environmental interactions to perform well, and are typically computationally very expensive. Active inference is a model-based approach that directs agents to explore uncertain states while adhering to a prior model of their goal behaviour. This paper introduces an active inference agent which minimizes the novel free energy of the expected future. Our model is capable of solving sparse-reward problems with a very high sample efficiency due to its objective function, which encourages directed exploration of uncertain states. Moreover, our model is computationally very light and can operate in a fully online manner while achieving comparable performance to offline RL methods. We showcase the capabilities of our model by solving the mountain car problem, where we demonstrate its superior exploration properties and its robustness to observation noise, which in fact improves performance. We also introduce a novel method for approximating the prior model from the reward function, which simplifies the expression of complex objectives and improves performance over previous active inference approaches.

show abstract

“…Development Operations Safety/Security MDP formulation [30] Metrics design [31] Algotithm design [32] Training methodologies [33] Explainability [34] Digital twins [35], [36] Sim2Real [37], [38] Hyperparameter optimisation [39] Performance evaluation [40] A/B deployment [41] Model decay [42] Interoperability [43] Deployment sites [44] Constrained MDP [45] DevSecOps [46] Adversarial agent [47], [48] Attack detection [49] Fig. 2.…”

Section: Designmentioning

confidence: 99%

RLOps: Development Life-cycle of Reinforcement Learning Aided Open RAN

Li¹,

Thomas²,

Wang³

et al. 2021

Preprint

View full text Add to dashboard Cite

Radio access network (RAN) technologies continue to witness massive growth, with Open RAN gaining the most recent momentum. In the O-RAN specifications, the RAN intelligent controller (RIC) serves as an automation host. This article introduces principles for machine learning (ML), in particular, reinforcement learning (RL) relevant for the O-RAN stack. Furthermore, we review state-of-the-art research in wireless networks and cast it onto the RAN framework and the hierarchy of the O-RAN architecture. We provide a taxonomy of the challenges faced by ML/RL models throughout the development life-cycle: from the system specification to production deployment (data acquisition, model design, testing and management, etc.). To address the challenges, we integrate a set of existing MLOps principles with unique characteristics when RL agents are considered. This paper discusses a systematic life-cycle model development, testing and validation pipeline, termed: RLOps. We discuss all fundamental parts of RLOps, which include: model specification, development and distillation, production environment serving, operations monitoring, safety/security and data engineering platform. Based on these principles, we propose the best practices for RLOps to achieve an automated and reproducible model development process.

show abstract

A Survey of Reinforcement Learning Algorithms for Dynamically Varying Environments

Cited by 97 publications

References 63 publications

Learning Stochastic Adaptive Control using a Bio-Inspired Experience Replay

Learning Stochastic Adaptive Control using a Bio-Inspired Experience Replay

Online reinforcement learning with sparse rewards through an active inference capsule

RLOps: Development Life-cycle of Reinforcement Learning Aided Open RAN

Contact Info

Product

Resources

About