Motivated by the recent empirical success of policy-based reinforcement learning (RL), there has been a research trend studying the performance of policy-based RL methods on standard control benchmark problems. In this paper, we examine the effectiveness of policy-based RL methods on an important robust control problem, namely µ synthesis. We build a connection between robust adversarial RL and µ synthesis, and develop a model-free version of the wellknown DK-iteration for solving state-feedback µ synthesis with static D-scaling. In the proposed algorithm, the K step mimics the classical central path algorithm via incorporating a recently-developed double-loop adversarial RL method as a subroutine, and the D step is based on model-free finite difference approximation. Extensive numerical study is also presented to demonstrate the utility of our proposed modelfree algorithm. Our study sheds new light on the connections between adversarial RL and robust control.
The growing prospect of deep reinforcement learning (DRL) being used in cyber-physical systems has raised concerns around safety and robustness of autonomous agents. Recent work on generating adversarial attacks have shown that it is computationally feasible for a bad actor to fool a DRL policy into behaving sub optimally. Although certain adversarial attacks with specific attack models have been addressed, most studies are only interested in off-line optimization in the data space (e.g., example fitting, distillation). This paper introduces a Meta-Learned Advantage Hierarchy (MLAH) framework that is attack model-agnostic and more suited to reinforcement learning, via handling the attacks in the decision space (as opposed to data space) and directly mitigating learned bias introduced by the adversary. In MLAH, we learn separate sub-policies (nominal and adversarial) in an online manner, as guided by a supervisory master agent that detects the presence of the adversary by leveraging the advantage function for the sub-policies. We demonstrate that the proposed algorithm enables policy learning with significantly lower bias as compared to the state-of-the-art policy learning approaches even in the presence of heavy state information attacks. We present algorithm analysis and simulation results using popular OpenAI Gym environments.
Many existing region-of-attraction (ROA) analysis tools find difficulty in addressing feedback systems with large-scale neural network (NN) policies and/or highdimensional sensing modalities such as cameras. In this letter, we tailor the projected gradient descent (PGD) attack method as a general-purpose ROA analysis tool for highdimensional nonlinear systems and end-to-end perceptionbased control. We show that the ROA analysis can be approximated as a constrained maximization problem such that PGD-based iterative methods can be directly applied. In the model-based setting, we show that the PGD updates can be efficiently performed using back-propagation. In the model-free setting, we propose a finite-difference PGD estimate which is general and only requires a black-box simulator for generating the trajectories of the closed-loop system given any initial state. Finally, we demonstrate the scalability and generality of our analysis tool on several numerical examples with large state dimensions or complex image observations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.