Fast vertical mining using diffsets

Zaki, Mohammed J.; Gouda, Karam

doi:10.1145/956750.956788

Cited by 354 publications

(119 citation statements)

References 25 publications

Supporting

Mentioning

118

Contrasting

Order By: Relevance

“…Zaki and Gouda [27] proposed a method that replaces tidsets by diffsets. Diffsets only keep track of the differences in the tids of a candidate pattern from its generated frequent patterns.…”

Section: Improvements For Storing Tidsets Of Itemsetsmentioning

confidence: 99%

“…Methods that use a hybrid approach: These methods use a vertical data format to compress the database and mine frequent itemsets using a divide-and-conquer strategy. Eclat (Zaki,[26]), dEclat (Zaki & Gouda, [27]), Index-BitTableFI (Song, Yang, & Xu, [15]), DBV-FI (Vo, Hong, & Le, [22]) and MBiS (Nguyen et al, [13]) are some examples. First, a horizontal database is scanned to convert it to a vertical database by creating the tidset (set of transaction IDs) of all items in the database and removing items that do not satisfy the minimum support threshold.…”

Section: Introductionmentioning

confidence: 99%

“…Some studies have improved Eclat. Zaki and Gouda [27] proposed the dEclat algorithm, which uses the diffset concept to replace tidsets. Diffsets only keep track of differences in the tids (transaction IDs) of a candidate pattern from its generated frequent patterns.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

An efficient algorithm for mining frequent weighted itemsets using interval word segments

Nguyen

Nguyen³

et al. 2016

Appl Intell

View full text Add to dashboard Cite

Mining frequent weighted itemsets (FWIs) from weighted-item transaction databases has recently received research interest. In real-world applications, sparse weighted-item transaction databases (SWITDs) are common. For example, supermarkets have many items, but each transaction has a small number of items. In this paper, we propose an interval word segment (IWS) structure to store and process tidsets for enhancing the effectiveness of mining FWIs from SWITDs. The IWS structure allows the intersection of tidsets between two itemsets to be performed very fast. A map array is proposed for storing a 1-bit index for words. From the map array, 1-bits are mapped to create

show abstract

Section: Improvements For Storing Tidsets Of Itemsetsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

An efficient algorithm for mining frequent weighted itemsets using interval word segments

Nguyen

Nguyen³

et al. 2016

Appl Intell

View full text Add to dashboard Cite

show abstract

“…The dEclat algorithm (Zaki and Gouda, 2003) makes use of the vertical database representation where each item maintains a set of transaction ids where this item is contained. They store the difference of ids, called the diffset, between the candidate itemset and its prefix frequent itemsets, instead of the ids intersection set.…”

Section: Fig 1: An Fp-tree Registers Compressed Frequent Pattern Inmentioning

confidence: 99%

Using Unique-Prime-Factorization Theorem to Mine Frequent Patterns without Generating Tree

Tohidi¹

2011

American Journal of Economics and Business Administration

View full text Add to dashboard Cite

Problem statement: Ffrequent patterns are patterns that appear in a data set frequently. Finding such frequent patterns plays an essential role in mining associations, correlations and many other interesting relationships among data. Approach: Most of the previous studies adopt an Apriorilike approach. For huge database it may need to generate a huge number of candidate sets. An interest solution is to design an approach that without generating candidate is able to mine frequent patterns. Results: An interesting method to frequent pattern mining without generating candidate pattern is called frequent-pattern growth, or simply FP-growth, which adopts a divide-and-conquer strategy as follows. However, for a large database, constructing a large tree in the memory is a time consuming task and increase the time of execution. In this study we introduce an algorithm to generate frequent patterns without generating a tree and therefore improve the time complexity and memory complexity as well. Our algorithm works based on prime factorization and is called Prime Factor Miner (PFM). Conclusion/Recommendations: This algorithm is able to achieve low memory order at O(1) which is significantly better than FP-growth.

show abstract

“…The support of the itemset is computed by subtracting the cardinality of the DIFFset from the support of the k -1 frequent itemset's prefix. The performance of the dEclat algorithm has experimentally been shown to be better than that of Eclat [12].…”

Section: Introductionmentioning

confidence: 99%

Smart frequent itemsets mining algorithm based on FP-tree and DIFFset data structures

Gatuha¹,

Jiang²

2017

Turk J Elec Eng & Comp Sci

View full text Add to dashboard Cite

Association rule data mining is an important technique for finding important relationships in large datasets.Several frequent itemsets mining techniques have been proposed using a prefix-tree structure, FP-tree, a compressed data structure for database representation. The DIFFset data structure has also been shown to significantly reduce the run time and memory utilization of some data mining algorithms. Experimental results have demonstrated the efficiency of the two data structures in frequent itemsets mining. This work proposes FDM, a new algorithm based on FP-tree and DIFFset data structures for efficiently discovering frequent patterns in data. FDM can adapt its characteristics to efficiently mine long and short patterns from both dense and sparse datasets. Several optimization techniques are also outlined to increase the efficiency of FDM. An evaluation of FDM against three frequent itemset data mining algorithms, dEclat, FP-growth, and FDM* (FDM without optimization), was performed using datasets having both long and short frequent patterns. The experimental results show significant improvement in performance compared to the FP-growth, dEclat, and FDM* algorithms.

show abstract

Fast vertical mining using diffsets

Cited by 354 publications

References 25 publications

An efficient algorithm for mining frequent weighted itemsets using interval word segments

An efficient algorithm for mining frequent weighted itemsets using interval word segments

Using Unique-Prime-Factorization Theorem to Mine Frequent Patterns without Generating Tree

Smart frequent itemsets mining algorithm based on FP-tree and DIFFset data structures

Contact Info

Product

Resources

About