Item Sets That Compress

Siebes, Arno; Vreeken, Jilles; Leeuwen, Matthijs van

doi:10.1137/1.9781611972764.35

Cited by 125 publications

(163 citation statements)

References 13 publications

Supporting

Mentioning

161

Contrasting

Order By: Relevance

“…In [15] we defined the optimal set of (frequent) item sets as that one whose associated code table minimises the total compressed size:…”

Section: J∈ct F Req(j)mentioning

confidence: 99%

“…Rather, in this paper we study the problem for one specific class of models, viz., the code tables induced by our Krimp algorithm [15]. Given all frequent item sets on a table, Krimp selects a small subset of these frequent item sets.…”

Section: Introductionmentioning

confidence: 99%

“…For the convenience of the reader we provide a brief introduction to Krimp in this section, it was originally introduced in [15] (although not by that name) and the reader is referred to that paper for more details.…”

Section: Introducing Krimpmentioning

confidence: 99%

See 2 more Smart Citations

Mining Databases to Mine Queries Faster

Siebes

Puspitaningrum

2009

Machine Learning and Knowledge Discovery in Databases

Self Cite

View full text Add to dashboard Cite

“…In [15] we defined the optimal set of (frequent) item sets as that one whose associated code table minimises the total compressed size:…”

Section: J∈ct F Req(j)mentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Mining Databases to Mine Queries Faster

Siebes

Puspitaningrum

2009

Machine Learning and Knowledge Discovery in Databases

Self Cite

View full text Add to dashboard Cite

“…In Siebes et al (2006) we introduced the krimp algorithm. 1 This MDL-based algorithm picks a few descriptive frequent item sets that compress the data well.…”

mentioning

confidence: 99%

Identifying the components

Leeuwen

Vreeken

Siebes

2009

Data Min Knowl Disc

Self Cite

View full text Add to dashboard Cite

Most, if not all, databases are mixtures of samples from different distributions. Transactional data is no exception. For the prototypical example, supermarket basket analysis, one also expects a mixture of different buying patterns. Households of retired people buy different collections of items than households with young children. Models that take such underlying distributions into account are in general superior to those that do not. In this paper we introduce two MDL-based algorithms that follow orthogonal approaches to identify the components in a transaction database. The first follows a model-based approach, while the second is data-driven. Both are parameter-free: the number of components and the components themselves are chosen such that the combined complexity of data and models is minimised. Further, neither prior knowledge on the distributions nor a distance metric on the data is required. Experiments with both methods show that highly characteristic components are identified.

show abstract

“…Keogh et al [17] developed a simple and effective scheme for mining time-series data through compression. Actually, compression or Minimum Description Language (MDL) have become the workhorse of many parameter-free algorithms: frequent itemsets [24], biclustering [4,23], time-evolving graph clustering [25], and spatial-clustering [20].…”

Section: Related Workmentioning

confidence: 99%

Hierarchical, Parameter-Free Community Discovery

Papadimitriou¹,

Sun²,

Faloutsos

et al.

Machine Learning and Knowledge Discovery in Databases

View full text Add to dashboard Cite

Abstract. Given a large bipartite graph (like document-term, or userproduct graph), how can we find meaningful communities, quickly, and automatically? We propose to look for community hierarchies, with communities-within-communities. Our proposed method, the Context-specific Cluster Tree (CCT) finds such communities at multiple levels, with no user intervention, based on information theoretic principles (MDL). More specifically, it partitions the graph into progressively more refined subgraphs, allowing users to quickly navigate from the global, coarse structure of a graph to more focused and local patterns. As a fringe benefit, and also as an additional indication of its quality, it also achieves better compression than typical, non-hierarchical methods. We demonstrate its scalability and effectiveness on real, large graphs.

show abstract

Item Sets That Compress

Cited by 125 publications

References 13 publications

Mining Databases to Mine Queries Faster

Mining Databases to Mine Queries Faster

Identifying the components

Hierarchical, Parameter-Free Community Discovery

Contact Info

Product

Resources

About