On-the-fly informed search of non-blocking directed controllers

Ciolek, Daniel; Duran, Matias; Zanollo, Florencia; Pazos, Nicolas; Braier, Julián; Braberman, Vı́ctor; D’Ippolito, Nicolás; Uchitel, Sebastián

doi:10.1016/j.automatica.2022.110731

Cited by 1 publication

(4 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…While this quickly becomes intractable, there are problems for which the state explosion can be delayed significantly by exploring a small subset of the plant that is enough to determine a control strategy (or to conclude that there is none). The OTF-DCS algorithm (Ciolek et al 2023) is briefly summarized in Algorithm 1. It performs a best-first search of the composed plant, adding one transition at a time from the exploration frontier to a partial exploration structure (A).…”

Section: On-the-fly Modular Directed Controlmentioning

confidence: 99%

“…Experiments were run on an Intel i7-7700 CPU with 16GB of RAM and no GPU. We compare the results with an exploration policy that always chooses a random transition in the frontier (RANDOM) and with the Ready Abstraction (RA), the overall best performing heuristic of Ciolek et al (2023).…”

Section: Experimental Evaluationmentioning

confidence: 99%

“…Approaches that first build the full plant and compute a director can fail within a time and memory budget even when there is a director that keeps the system in a very small proportion of the full plant state space. On-the-fly Directed Controller Synthesis (OTF-DCS) (Ciolek et al 2023) attempts to avoid state explosion by exploring the composed plant incrementally, checking for the existence of directors after each new transition is added. If guided by good heuristics, this process allows finding controllers by building only the parts of the plant that the controllers themselves enable reaching.…”

Section: Introductionmentioning

confidence: 99%

“…Our results show first that with this technique it is possible to learn competitive heuristics on the training instances; and second, that these policies are effective when used in larger instances. Our agents are evaluated both in terms of expanded transitions and in terms of solved instances within a time budget, and overall they outperform the best heuristic from Ciolek et al (2023), pushing the boundaries of instances solved in various of the benchmark problems.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Exploration Policies for On-the-Fly Controller Synthesis: A Reinforcement Learning Approach

Delgado,

Sánchez Sorondo,

Braberman

et al. 2023

ICAPS

View full text Add to dashboard Cite

Controller synthesis is in essence a case of model-based planning for non-deterministic environments in which plans (actually “strategies”) are meant to preserve system goals indefinitely. In the case of supervisory control environments are specified as the parallel composition of state machines and valid strategies are required to be “non-blocking” (i.e., always enabling the environment to reach certain marked states) in addition to safe (i.e., keep the system within a safe zone). Recently, On-the-fly Directed Controller Synthesis techniques were proposed to avoid the exploration of the entire -and exponentially large- environment space, at the cost of non-maximal permissiveness, to either find a strategy or conclude that there is none. The incremental exploration of the plant is currently guided by a domain-independent human-designed heuristic. In this work, we propose a new method for obtaining heuristics based on Reinforcement Learning (RL). The synthesis algorithm is thus framed as an RL task with an unbounded action space and a modified version of DQN is used. With a simple and general set of features that abstracts both states and actions, we show that it is possible to learn heuristics on small versions of a problem that generalize to the larger instances, effectively doing zero-shot policy transfer. Our agents learn from scratch in a highly partially observable RL task and outperform the existing heuristic overall, in instances unseen during training.

show abstract

Section: On-the-fly Modular Directed Controlmentioning

confidence: 99%

Section: Experimental Evaluationmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations