Finding recent frequent itemsets adaptively over online data streams

Chang, Joong Hyuk; Lee, Won Suk

doi:10.1145/956750.956807

Cited by 228 publications

(90 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…There has been much work to find frequent itemsets (and their variations) in the off-line setting, often starting from the A priori [1] and FP-Tree algorithms [21]. These concepts have been adapted to work over streams of data, generating algorithms such as FUP [8], and FP-stream [20]. A limitation of finding frequent itemsets is that the number of possibly frequent itemsets can become very large, meaning that the algorithm either has to track information about many candidates, or else aggressively prune the retained data, and risk missing out on some frequent itemsets.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Conditional heavy hitters: detecting interesting correlations in data streams

et al. 2015

View full text Add to dashboard Cite

The notion of heavy hitters-items that make up a large fraction of the population-has been successfully used in a variety of applications across sensor and RFID monitoring, network data analysis, event mining, and more. Yet this notion often fails to capture the semantics we desire when we observe data in the form of correlated pairs. Here, we are interested in items that are conditionally frequent: when a particular item is frequent within the context of its parent item. In this work, we introduce and formalize the notion of Conditional Heavy Hitters to identify such items, with applications in network monitoring, and Markov chain modeling. We explore the relationship between Conditional Heavy Hitters and other related notions in the literature, and show analytically and experimentally the usefulness of our approach. We introduce several algorithm variations that allow us to efficiently find conditional heavy hitters for input data with very different characteristics, and provide analytical results for their performance. Finally, we perform experimental evaluations with several synthetic and real datasets to demonstrate the efficacy of our methods, and to study the behavior of the proposed algorithms for different types of data.

show abstract

Section: Related Workmentioning

confidence: 99%

“…We now compare all the algorithms on the truly sparse synthetic data, for a stream of length 10 8 . This data has a much smaller number of conditional heavy hitters compared to the number of parent items.…”

Section: Performance On Sparse Datamentioning

confidence: 99%

Conditional heavy hitters: detecting interesting correlations in data streams

et al. 2015

View full text Add to dashboard Cite

show abstract

“…Most of the achievements related to frequent itemset mining in stream data [21][22][23][24][25][26][27][28][29][30][31] focus on this issue. In 2002, Datar proposed Ref.…”

Section: General Frequent Itemset Miningmentioning

confidence: 99%

Frequent itemset mining over stream data: overview

Niu

Deng

et al. 2013

IET International Conference on Information and Communications Technologies (IETICT 2013)

View full text Add to dashboard Cite

During the past decade, stream data mining has been attracting widespread attentions of the experts and the researchers all over the world and a large number of interesting research results have been achieved. Among them, frequent itemset mining is one of main research branches of stream data mining with a fundamental and significant position. In order to further advance and develop the research of frequent itemset mining, this paper summarizes its main challenges and corresponding algorithm features. Based on them, current related results are divided into two categories: data-based algorithms and task-based algorithms. According to its taxonomy, the related methods belonging to the different categories and sub-categories are comprehensively introduced for better understanding. Finally, a brief conclusion is given.

show abstract

“…It is likely that the embedded knowledge in a data stream will change quickly as time goes by. In order to catch the recent trend of data, the estDec algorithm [2] decayed the old occurrences of each itemset to diminish the effect of old transactions on the mining result of frequent itemsets in the data steam. However, in particular applications, it is interested only the frequent patterns mined from the recently arriving data within a fixed time period.…”

Section: Introductionmentioning

confidence: 99%

“…Although the problem of mining frequent itemsets over data streams has been investigated in the above literatures [2][3][8] [9][11], the temporal relations among data items were not considered in these studies. Accordingly, it is essential to provide a data structure for maintaining sequential information of items within a sliding window to discover recently repeating patterns in the window.…”

Section: Introductionmentioning

confidence: 99%

Incrementally Mining Recently Repeating Patterns over Data Streams

Koh

Chou

2009

New Frontiers in Applied Data Mining

View full text Add to dashboard Cite

Abstract. Repeating patterns represent temporal relations among data items, which could be used for data summarization and data prediction. More and more data of various applications is generated as a data stream. Based on time sensitive concern, mining repeating patterns from the whole history data sequence of a data stream does not extract the current trend of patterns in the stream. Therefore, the traditional strategies for mining repeating patterns on static database are not applicable to data streams. For this reason, an algorithm, named appearing-bit-sequencebased incremental mining algorithm, for efficiently discovering recently repeating patterns from a data stream is proposed in this paper. The appearing bit sequences are used to monitor the occurrences of patterns within a sliding window. Two versions of algorithms are proposed by maintaining the appearing bit sequences of maximum repeating patterns and closed repeating patterns, respectively. Accordingly, the cost of re-mining repeating patterns over a sliding window is reduced to that of monitoring frequency changes of the maintained patterns. The experimental results show that the incremental mining methods perform much better than the re-miming approach.

show abstract

Finding recent frequent itemsets adaptively over online data streams

Cited by 228 publications

References 15 publications

Conditional heavy hitters: detecting interesting correlations in data streams

Conditional heavy hitters: detecting interesting correlations in data streams

Frequent itemset mining over stream data: overview

Incrementally Mining Recently Repeating Patterns over Data Streams

Contact Info

Product

Resources

About