Identifying the extremal (minimal and maximal) sets from a collection of sets is an important subproblem in the areas of data-mining and satisfiability checking. For example, extremal set finding algorithms are used in the context of mining maximal frequent itemsets, and for simplifying large propositional satisfiability instances derived from real world tasks such as circuit routing and verification. In this paper, we describe two new algorithms for the task and detail their performance on real and synthetic data. Each algorithm leverages an entirely different principleone primarily exploits set cardinality constraints, the other lexicographic constraints. Despite the inherent difficulty of this problem (the best known worst-case bounds are nearly quadratic), we show that both these algorithms provide excellent performance in practice, and can identify all extremal sets from multi-gigabyte itemset data using only a single processor core. Both algorithms are concise and can be implemented in no more than a few hundred lines of code. Our reference C++ implementations are open source and available for download. 1
We propose a demonstration of Cayuga, a complex event monitoring system for high speed data streams. Our demonstration will show Cayuga applied to monitoring Web feeds; the demo will illustrate the expressiveness of the Cayuga query language, the scalability of its query processing engine to high stream rates, and a visualization of the internals of the query processing engine.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.