$$L^*$$-Based Learning of Markov Decision Processes

Tappler, Martin; Aichernig, Bernhard K.; Bacci, Giovanni; Eichlseder, Maria; Larsen, Kim Guldstrand

doi:10.1007/978-3-030-30942-8_38

Cited by 25 publications

(22 citation statements)

References 45 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…To the best of our knowledge, L * mdp is the first L*-based algorithm for MDPs that can be implemented via testing. Experimental results and the implementation can be found in the evaluation material for L * mdp [Tap20].…”

Section: Discussionmentioning

confidence: 99%

See 1 more Smart Citation

L∗-based learning of Markov decision processes (extended version)

et al. 2021

Self Cite

View full text Add to dashboard Cite

Automata learning techniques automatically generate systemmodels fromtest observations. Typically, these techniques fall into two categories: passive and active. On the one hand, passive learning assumes no interaction with the system under learning and uses a predetermined training set, e.g., system logs. On the other hand, active learning techniques collect training data by actively querying the system under learning, allowing one to steer the discovery ofmeaningful information about the systemunder learning leading to effective learning strategies. A notable example of active learning technique for regular languages is Angluin’s $$L^*$$ L ∗ -algorithm. The $$L^*$$ L ∗ -algorithm describes the strategy of a student who learns the minimal deterministic finite automaton of an unknown regular language $$L$$ L by asking a succinct number of queries to a teacher who knows $$L$$ L .In this work, we study $$L^*$$ L ∗ -based learning of deterministic Markov decision processes, a class of Markov decision processes where an observation following an action uniquely determines a successor state. For this purpose, we first assume an ideal setting with a teacher who provides perfect information to the student. Then, we relax this assumption and present a novel learning algorithm that collects information by sampling execution traces of the system via testing.Experiments performed on an implementation of our sampling-based algorithm suggest that our method achieves better accuracy than state-of-the-art passive learning techniques using the same amount of test obser vations. In contrast to existing learning algorithms which assume a predefined number of states, our algorithm learns the complete model structure including the state space.

show abstract

Section: Discussionmentioning

confidence: 99%

“…All tables include mean values along with the corresponding standard deviations separated by ±. Experimental results, the examined models, and the implementation of L * mdp can be found in the evaluation material [Tap20].…”

Section: Methodsmentioning

confidence: 99%

L∗-based learning of Markov decision processes (extended version)

et al. 2021

Self Cite

View full text Add to dashboard Cite

show abstract

“…One easily adapts the example from Section 5 to show that learning probabilistic automata has a similar termination issue. On the positive side, Tappler et al [26] have shown that deterministic MDPs can be learned using an L based algorithm. The deterministic MDPs in loc.cit.…”

Section: Discussionmentioning

confidence: 99%

Learning Weighted Automata over Principal Ideal Domains

Heerdt¹

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…Predictive monitoring has been combined with deep learning [17] and Bayesian inference [22], where the key problem is that the computation of an imminent failure is too expensive to be done exactly. More generally, learning automata models has been motivated with runtime assurance [1,55]. Testing approaches statistically evaluate whether traces are likely to be produced by a given model [25].…”

Section: Related Workmentioning

confidence: 99%

Runtime Monitors for Markov Decision Processes

Junges

Torfah

Seshia

2021

Computer Aided Verification

View full text Add to dashboard Cite

We investigate the problem of monitoring partially observable systems with nondeterministic and probabilistic dynamics. In such systems, every state may be associated with a risk, e.g., the probability of an imminent crash. During runtime, we obtain partial information about the system state in form of observations. The monitor uses this information to estimate the risk of the (unobservable) current system state. Our results are threefold. First, we show that extensions of state estimation approaches do not scale due the combination of nondeterminism and probabilities. While exploiting a geometric interpretation of the state estimates improves the practical runtime, this cannot prevent an exponential memory blowup. Second, we present a tractable algorithm based on model checking conditional reachability probabilities. Third, we provide prototypical implementations and manifest the applicability of our algorithms to a range of benchmarks. The results highlight the possibilities and boundaries of our novel algorithms.

show abstract

$$L^*$$-Based Learning of Markov Decision Processes

Cited by 25 publications

References 45 publications

L∗-based learning of Markov decision processes (extended version)

L∗-based learning of Markov decision processes (extended version)

Learning Weighted Automata over Principal Ideal Domains

Runtime Monitors for Markov Decision Processes

Contact Info

Product

Resources

About