Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 2016
DOI: 10.1145/2939672.2939787
|View full text |Cite
|
Sign up to set email alerts
|

A Subsequence Interleaving Model for Sequential Pattern Mining

Abstract: Recent sequential pattern mining methods have used the minimum description length (MDL) principle to define an encoding scheme which describes an algorithm for mining the most compressing patterns in a database. We present a novel subsequence interleaving model based on a probabilistic model of the sequence database, which allows us to search for the most compressing set of patterns without designing a specific encoding scheme. Our proposed algorithm is able to efficiently mine the most relevant sequential pat… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
51
0

Year Published

2016
2016
2020
2020

Publication Types

Select...
7
2

Relationship

1
8

Authors

Journals

citations
Cited by 41 publications
(52 citation statements)
references
References 35 publications
1
51
0
Order By: Relevance
“…Happily, this problem can be approximately solved using a simple greedy algorithm (cf. Algorithm 1 in [15]), and we find that the greedy algorithm works well in practice. The greedy algorithm repeatedly chooses an API pattern S that maximizes the improvement in log probability divided by the number of methods in S that have not yet been explained.…”
Section: Inferencementioning
confidence: 75%
“…Happily, this problem can be approximately solved using a simple greedy algorithm (cf. Algorithm 1 in [15]), and we find that the greedy algorithm works well in practice. The greedy algorithm repeatedly chooses an API pattern S that maximizes the improvement in log probability divided by the number of methods in S that have not yet been explained.…”
Section: Inferencementioning
confidence: 75%
“…One of the defects in frequent pattern mining is that there are abundant redundant patterns in the very large number of output patterns [22]. As a result, how to effectively reduce redundancy of the output becomes an essential problem of current research [23,13,11,18,16,10,6,8]. Frequent episode mining [14] (FEM for short), as one of the sub-topics of frequent pattern mining, which aims at discovering frequently appeared ordered sets of events from a single symbol (event) sequence, is facing the similar problem as well.…”
Section: Introductionmentioning
confidence: 99%
“…10 and 50 patterns of 5 events long 10 times each over an otherwise independent sequence, with a 10% probability of having a gap between consecutive events. To evaluate the ability of SQUISH to discover interleaved and nested patterns, we consider the Parallel database [4]. Each event in this database is generated by five independent parallel processes chosen at random.…”
Section: Methodsmentioning
confidence: 99%
“…Recently, Fowkes and Sutton proposed the ISM algorithm [4]. ISM is based on a generative probabilistic model of the sequence database, and uses EM to search for that set of patterns that is most likely to generate the database.…”
Section: Related Workmentioning
confidence: 99%