The problem of learning automata from example traces (but no equivalence or membership queries) is fundamental in automata learning theory and practice. In this paper we study this problem for finite state machines with inputs and outputs, and in particular for Moore machines. We develop three algorithms for solving this problem: (1) the PTAP algorithm, which transforms a set of input-output traces into an incomplete Moore machine and then completes the machine with self-loops; (2) the PRPNI algorithm, which uses the well-known RPNI algorithm for automata learning to learn a product of automata encoding a Moore machine; and (3) the MooreMI algorithm, which directly learns a Moore machine using PTAP extended with state merging. We prove that MooreMI has the fundamental identification in the limit property. We also compare the algorithms experimentally in terms of the size of the learned machine and several notions of accuracy, introduced in this paper. Finally, we compare with OSTIA, an algorithm that learns a more general class of transducers, and find that OSTIA generally does not learn a Moore machine, even when fed with a characteristic sample.
arXiv:1605.07805v2 [cs.FL] 2 Sep 2016research on grammatical inference [15] which has studied similar, but not exactly the same problems, such as learning deterministic finite automata (DFA), which are special cases of Moore machines with a binary output, or subsequential transducers, which are more general than Moore machines.Our contributions are the following:1. We define formally the LMoMIO problem (learning Moore machines from input-output traces). Apart from the correctness criterion of consistency (that the learned machine be consistent with the given traces) we also introduce several performance criteria including size and accuracy of the learned machine, and computational complexity of the learning algorithm. 2. We adapt the notion of characteristic sample, which is known for DFA [15], to the case of Moore machines.Intuitively, a characteristic sample of a machine M is a set of traces which contains enough information to "reconstruct" M . The characteristic sample requirement (CSR) states that, when given as input a characteristic sample, the learning algorithm must produce a machine equivalent to the one that produced the sample. CSR is important, as it ensures identification in the limit: this is a key concept in automata learning theory which ensures that the learning algorithm will eventually learn the right machine when provided with a sufficiently large set of examples [18]. 3. We develop three algorithms to solve the LMoMIO problem, and analyze them in terms of computational complexity and other properties. We show that although all three algorithms guarantee consistency, only the most advanced among them, called MooreMI, satisfies the characteristic sample requirement. We also show that MooreMI achieves identification in the limit. 4. We report on a prototype implementation of all three algorithms and experimental results. The experiments show that Moor...