An established method to detect concept drift in data streams is to perform statistical hypothesis testing on the multivariate data in the stream. Statistical decision theory offers rank-based statistics for this task. However, these statistics depend on a fixed set of characteristics of the underlying distribution. Thus, they work well whenever the change in the underlying distribution affects these properties measured by the statistic, but they perform not very well, if the drift influences the characteristics caught by the test statistic only to a small degree. To address this problem, we present three novel drift detection tests, whose test statistics are dynamically adapted to match the actual data at hand. The first one is based on a rank statistic on density estimates for a binary representation of the data, the second compares average margins of a linear classifier induced by the 1-norm support vector machine (SVM), and the last one is based on the average zero-one or sigmoid error rate of an SVM classifier. Experiments show that the margin-and errorbased tests outperform the multivariate Wald-Wolfowitz test for concept drift detection. We also show that the tests work even if the drift is gradual in nature and that the new methods are faster than the Wald-Wolfowitz test.
We introduce MiningZinc, a declarative framework for constraint-based data mining. MiningZinc consists of two key components: a language component and an execution mechanism.First, the MiningZinc language allows for high-level and natural modeling of mining problems, so that MiningZinc models are similar to the mathematical definitions used in the literature. It is inspired by the Zinc family of languages and systems and supports user-defined constraints and functions.Secondly, the MiningZinc execution mechanism specifies how to compute solutions for the models. It is solver independent and supports both standard constraint solvers and specialized data mining systems. The high-level problem specification is first translated into a normalized constraint language (FlatZinc). Rewrite rules are then used to add redundant constraints or solve subproblems using specialized data mining algorithms or generic constraint programming solvers. Given a model, different execution strategies are automatically extracted that correspond to different sequences of algorithms to run. Optimized data mining algorithms, specialized processing routines and generic solvers can all be automatically combined.Thus, the MiningZinc language allows one to model constraint-based itemset mining problems in a solver independent way, and its execution mechanism can automatically chain different algorithms and solvers. This leads to a unique combination of declarative modeling with high-performance solving.
Finding small sets of interesting patterns is an important challenge in pattern mining. In this paper, we argue that several well-known approaches that address this challenge are based on performing pairwise comparisons between patterns. Examples include finding closed patterns, free patterns, relevant subgroups and skyline patterns. Although progress has been made on each of these individual problems, a generic approach for solving these problems (and more) is still lacking. This paper tackles this challenge. It proposes a novel, generic approach for handling pattern mining problems that involve pairwise comparisons between patterns. Our key contributions are the following. First, we propose a novel algebra for programming pattern mining problems. This algebra extends relational algebras in a novel way towards pattern mining. It allows for the generic combination of constraints on individual patterns with dominance relations between patterns. Second, we introduce a modified generic constraint satisfaction system to evaluate these algebraic expressions. Experiments show that this generic approach can indeed effectively identify patterns expressed in the algebra.
With more and more large networks becoming available, mining and querying such networks are increasingly important tasks which are not being supported by database models and querying languages. This paper wants to alleviate this situation by proposing a data model and a query language for facilitating the analysis of networks. Key features include support for executing external tools on the networks, flexible contexts on the network each resulting in a different graph, primitives for querying subgraphs (including paths) and transforming graphs. The data model provides for a closure property, in which the output of every query can be stored in the database and used for further querying.
We present ProbLog2, the state of the art implementation of the probabilistic programming language ProbLog. The ProbLog language allows the user to intuitively build programs that do not only encode complex interactions between a large sets of heterogenous components but also the inherent uncertainties that are present in real-life situations.The system provides efficient algorithms for querying such models as well as for learning their parameters from data. It is available as an online tool on the web and for download. The offline version offers both command line access to inference and learning and a Python library for building statistical relational learning applications from the system's components.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.