A Subsequence Interleaving Model for Sequential Pattern Mining

Fowkes, Jaroslav; Sutton, Charles

doi:10.1145/2939672.2939787

Cited by 41 publications

(52 citation statements)

References 35 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Happily, this problem can be approximately solved using a simple greedy algorithm (cf. Algorithm 1 in [15]), and we find that the greedy algorithm works well in practice. The greedy algorithm repeatedly chooses an API pattern S that maximizes the improvement in log probability divided by the number of methods in S that have not yet been explained.…”

Section: Inferencementioning

confidence: 75%

Parameter-free probabilistic API mining across GitHub

Fowkes

Sutton

2016

Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering

Self Cite

View full text Add to dashboard Cite

Existing API mining algorithms can be difficult to use as they require expensive parameter tuning and the returned set of API calls can be large, highly redundant and difficult to understand. To address this, we present PAM (Probabilistic API Miner), a near parameter-free probabilistic algorithm for mining the most interesting API call patterns. We show that PAM significantly outperforms both MAPO and UPMiner, achieving 69% test-set precision, at retrieving relevant API call sequences from GitHub. Moreover, we focus on libraries for which the developers have explicitly provided code examples, yielding over 300,000 LOC of hand-written API example code from the 967 client projects in the data set. This evaluation suggests that the hand-written examples actually have limited coverage of real API usages.

show abstract

Section: Inferencementioning

confidence: 75%

Parameter-free probabilistic API mining across GitHub

Fowkes

Sutton

2016

Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering

Self Cite

View full text Add to dashboard Cite

show abstract

“…One of the defects in frequent pattern mining is that there are abundant redundant patterns in the very large number of output patterns [22]. As a result, how to effectively reduce redundancy of the output becomes an essential problem of current research [23,13,11,18,16,10,6,8]. Frequent episode mining [14] (FEM for short), as one of the sub-topics of frequent pattern mining, which aims at discovering frequently appeared ordered sets of events from a single symbol (event) sequence, is facing the similar problem as well.…”

Section: Introductionmentioning

confidence: 99%

Free-Rider Episode Screening via Dual Partition Model

Liu

Huang

et al. 2018

Database Systems for Advanced Applications

View full text Add to dashboard Cite

One of the drawbacks of frequent episode mining is that overwhelmingly many of the discovered patterns are redundant. Free-rider episode, as a typical example, consists of a real pattern doped with some additional noise events. Because of the possible high support of the inside noise events, such free-rider episodes may have abnormally high support that they cannot be filtered by frequency based framework. An effective technique for filtering free-rider episodes is using a partition model to divide an episode into two consecutive subepisodes and comparing the observed support of such episode with its expected support under the assumption that these two subepisodes occur independently. In this paper, we take more complex subepisodes into consideration and develop a novel partition model named EDP for free-rider episode filtering from a given set of episodes. It combines (1) a dual partition strategy which divides an episode to an underlying real pattern and potential noises; (2) a novel definition of the expected support of a free-rider episode based on the proposed partition strategy. We can deem the episode interesting if the observed support is substantially higher than the expected support estimated by our model. The experiments on synthetic and real-world datasets demonstrate EDP can effectively filter free-rider episodes compared with existing state-of-the-arts.

show abstract

“…10 and 50 patterns of 5 events long 10 times each over an otherwise independent sequence, with a 10% probability of having a gap between consecutive events. To evaluate the ability of SQUISH to discover interleaved and nested patterns, we consider the Parallel database [4]. Each event in this database is generated by five independent parallel processes chosen at random.…”

Section: Methodsmentioning

confidence: 99%

“…Recently, Fowkes and Sutton proposed the ISM algorithm [4]. ISM is based on a generative probabilistic model of the sequence database, and uses EM to search for that set of patterns that is most likely to generate the database.…”

Section: Related Workmentioning

confidence: 99%

Efficiently Summarising Event Sequences with Rich Interleaving Patterns

Bhattacharyya

Vreeken

2017

Proceedings of the 2017 SIAM International Conference on Data Mining

View full text Add to dashboard Cite

Discovering the key structure of a database is one of the main goals of data mining. In pattern set mining we do so by discovering a small set of patterns that together describe the data well. The richer the class of patterns we consider, and the more powerful our description language, the better we will be able to summarise the data. In this paper we propose SQUISH, a novel greedy MDLbased method for summarising sequential data using rich patterns that are allowed to interleave. Experiments show SQUISH is orders of magnitude faster than the state of the art, results in better models, as well as discovers meaningful semantics in the form patterns that identify multiple choices of values.

show abstract

A Subsequence Interleaving Model for Sequential Pattern Mining

Cited by 41 publications

References 35 publications

Parameter-free probabilistic API mining across GitHub

Parameter-free probabilistic API mining across GitHub

Free-Rider Episode Screening via Dual Partition Model

Efficiently Summarising Event Sequences with Rich Interleaving Patterns

Contact Info

Product

Resources

About