Proceedings of the 2012 SIAM International Conference on Data Mining 2012
DOI: 10.1137/1.9781611972825.28
|View full text |Cite
|
Sign up to set email alerts
|

Mining Compressing Sequential Patterns

Abstract: Pattern mining based on data compression has been successfully applied in many data mining tasks. For itemset data, the Krimp algorithm based on the minimum description length (MDL) principle was shown to be very effective in solving the redundancy issue in descriptive pattern mining. However, for sequence data, the redundancy issue of the set of frequent sequential patterns is not fully addressed in the literature. In this article, we study MDL-based algorithms for mining nonredundant sets of sequential patte… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
76
0

Year Published

2013
2013
2019
2019

Publication Types

Select...
3
3
2

Relationship

1
7

Authors

Journals

citations
Cited by 40 publications
(76 citation statements)
references
References 12 publications
0
76
0
Order By: Relevance
“…In an attempt to tackle this problem, modern approaches to sequence mining have used the minimum description length (MDL) principle to find the set of sequences that best summarize the data. The GoKrimp algorithm [12] directly mines sequences that best compress a database using a MDL-based approach. The goal of GoKrimp is essentially to cover the database with as few sequences as possible, because the dictionary-based description length that is used by GoKrimp favours encoding schemes that cover more long and frequent subsequences in the database.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…In an attempt to tackle this problem, modern approaches to sequence mining have used the minimum description length (MDL) principle to find the set of sequences that best summarize the data. The GoKrimp algorithm [12] directly mines sequences that best compress a database using a MDL-based approach. The goal of GoKrimp is essentially to cover the database with as few sequences as possible, because the dictionary-based description length that is used by GoKrimp favours encoding schemes that cover more long and frequent subsequences in the database.…”
Section: Related Workmentioning
confidence: 99%
“…A natural family of approaches for sequential pattern mining is to mine frequent subsequences [2] or closed frequent subsequences [26], but these suffer from the well-known problem of pattern explosion, that is, the list of frequent subsequences is typically long, highly redundant, and difficult to understand. Recently, researchers have introduced methods to prevent the problem of pattern explosion based on the minimum description length (MDL) principle [12,25]. These methods define an encoding scheme which describes an algorithm for compressing a sequence database based on a library of subsequence patterns, and then search for a set of patterns that lead to the best compression of the database.…”
Section: Introductionmentioning
confidence: 99%
“…Webb and Vreeken () follow a similar evaluation protocol, embedding 15 patterns and report that their method does not return any spurious itemsets but that it does return subsets of embedded itemsets, as well as subsets of those itemsets' unions. Lam et al () used artificial data with patterns generated by five independent parallel processes without noise to compare two pattern set mining techniques, SQS and GoKrimp , and report that while precision@10 is good for both of them, SQS has worse recall. The extensive evaluation in (Zimmermann, ) varied different parameters of a data generator embedding episodes into noise, such as alphabet size, maximal and distributions for temporal gaps, length and number of episodes etc., and evaluated different episode mining techniques' ability to recover the patterns.…”
Section: Matching Results To Realitymentioning
confidence: 99%
“…One way to address this is by mining frequent closed sequences, i.e., those that have no subsequences with the same frequency, such as via the BIDE algorithm [39]. More recently, there has been work on sequence mining that directly addresses the pattern explosion issue, such as SQS-search [36] and GoKrimp algorithm [21]. Our proposed approach falls into this class of probabilistic sequential pattern mining algorithms, and returns patterns that are of a quality that is comparable to, if not better than, both SQS and GoKrimp (see [15] for details).…”
Section: Related Workmentioning
confidence: 99%