We formulate notions of opacity for cyberphysical systems modeled as discrete-time linear timeinvariant systems. A set of secret states is k-ISO with respect to a set of nonsecret states if, starting from these sets at time 0, the outputs at time k are indistinguishable to an adversarial observer. Necessary and sufficient conditions to ensure that a secret specification is k-ISO are established in terms of sets of reachable states. We also show how to adapt techniques for computing underapproximations and over-approximations of the set of reachable states of dynamical systems in order to soundly approximate k-ISO. Further, we provide a condition for output controllability, if k-ISO holds, and show that the converse holds under an additional assumption.We extend the theory of opacity for single-adversary systems to the case of multiple adversaries and develop several notions of decentralized opacity. We study the following scenarios: i) the presence or lack of a centralized coordinator, and ii) the presence or absence of collusion among adversaries. In the case of colluding adversaries, we derive a condition for nonopacity that depends on the structure of the directed graph representing the communication between adversaries.Finally, we relax the condition that the outputs be indistinguishable and define a notion of -opacity, and also provide an extension to the case of nonlinear systems.
This paper studies the synthesis of control policies for an agent that has to satisfy a temporal logic specification in a partially observable environment, in the presence of an adversary. The interaction of the agent (defender) with the adversary is modeled as a partially observable stochastic game. The search for policies is limited to over the space of finite state controllers, which leads to a tractable approach to determine policies. The goal is to generate a defender policy to maximize satisfaction of a given temporal logic specification under any adversary policy. We relate the satisfaction of the specification in terms of reaching (a subset of) recurrent states of a Markov chain. We then present a procedure to determine a set of defender and adversary finite state controllers of given sizes that will satisfy the temporal logic specification. We illustrate our approach with an example.
This paper augments the reward received by a reinforcement learning agent with potential functions in order to help the agent learn (possibly stochastic) optimal policies. We show that a potential-based reward shaping scheme is able to preserve optimality of stochastic policies, and demonstrate that the ability of an agent to learn an optimal policy is not affected when this scheme is augmented to soft Q-learning. We propose a method to impart potential-based advice schemes to policy gradient algorithms. An algorithm that considers an advantage actor-critic architecture augmented with this scheme is proposed, and we give guarantees on its convergence. Finally, we evaluate our approach on a puddle-jump grid world with indistinguishable states, and the continuous state and action mountain car environment from classical control. Our results indicate that these schemes allow the agent to learn a stochastic optimal policy faster and obtain a higher average reward.
This paper studies the satisfaction of a class of temporal properties for cyber-physical systems (CPSs) over a finite-time horizon in the presence of an adversary, in an environment described by discrete-time dynamics. The temporal logic specification is given in sa f e − LTL F , a fragment of linear temporal logic over traces of finite length. The interaction of the CPS with the adversary is modeled as a two-player zero-sum discretetime dynamic stochastic game with the CPS as defender. We formulate a dynamic programming based approach to determine a stationary defender policy that maximizes the probability of satisfaction of a sa f e − LTL F formula over a finite time-horizon under any stationary adversary policy. We introduce secure control barrier certificates (S-CBCs), a generalization of barrier certificates and control barrier certificates that accounts for the presence of an adversary, and use S-CBCs to provide a lower bound on the above satisfaction probability. When the dynamics of the evolution of the system state has a specific underlying structure, we present a way to determine an S-CBC as a polynomial in the state variables using sum-of-squares optimization. An illustrative example demonstrates our approach.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.