Inferring a decision tree from a given dataset is a classic problem in machine learning. This problem consists of building, from a labelled dataset, a tree where each node corresponds to a class and a path between the tree root and a leaf corresponds to a conjunction of features to be satisfied in this class. Following the principle of parsimony, we want to infer a minimal tree consistent with the dataset. Unfortunately, inferring an optimal decision tree is NP-complete for several definitions of optimality. For this reason, the majority of existing approaches rely on heuristics, and the few existing exact approaches do not work on large datasets. In this paper, we propose a novel approach for inferring an optimal decision tree with a minimum depth based on the incremental generation of Boolean formulas. The experimental results indicate that it scales sufficiently well and the time it takes to run grows slowly with the size of datasets.
Part 2: Test Derivation MethodsInternational audienceThe paper focuses on the problems of passive and active FSM inference as well as checking sequence generation. We consider the setting where an FSM cannot be reset so that its inference is constrained to a single trace either given a priori in passive inference scenario or to be constructed in active inference scenario or aiming at obtaining checking sequence for a given FSM. In each of the last two cases, the expected result is a trace representing a checking sequence for an inferred machine, if it was not given. We demonstrate that this can be achieved by a repetitive use of a procedure that infers an FSM from a given trace (identifying a minimal machine consistent with a trace) avoiding equivalent conjectures. We thus show that FSM inference and checking sequence construction can be seen as two sides of the same coin. Following an existing approach of constructing conjectures by SAT solving, we elaborate first such a procedure and then based on it the methods for obtaining checking sequence for a given FSM and inferring a machine from a black box. The novelty of our approach is that it does not use any state identification facilities. We only assume that we know initially the input set and a bound on the number of states of the machine. Experiments with a prototype implementation of the developed approach using as a backend an existing SAT solver indicate that it scales for FSMs with up to a dozen of states and requires relatively short sequences to identify the machine
The k-undercover Boolean matrix factorization problem aims to approximate a m×n Boolean matrix X as the Boolean product of an m×k and a k×n matrices A◦B such that X is a cover of A◦B, i.e., no representation error is allowed on the 0’s entries of the matrix X. To infer an optimal and “block-optimal” k-undercover, we propose two exact methods based on MaxSAT encodings. From a theoretical standpoint, we prove that our method of inferring “block-optimal” k-undercover is a (1 - 1/e) ≈ 0.632 approximation for the optimal k-undercover problem. From a practical standpoint, experimental results indicate that our “block-optimal” k-undercover algorithm outperforms the state-of-the-art even when compared with algorithms for the more general k-undercover Boolean Matrix Factorization problem for which only minimizing reconstruction error is required.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.