Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 2003
DOI: 10.1145/956750.956788
|View full text |Cite
|
Sign up to set email alerts
|

Fast vertical mining using diffsets

Abstract: A number of vertical mining algorithms have been proposed recently for association mining, which have shown to be very effective and usually outperform horizontal approaches. The main advantage of the vertical format is support for fast frequency counting via intersection operations on transaction ids (tids) and automatic pruning of irrelevant data. The main problem with these approaches is when intermediate results of vertical tid lists become too large for memory, thus affecting the algorithm scalability.In … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
118
0

Year Published

2011
2011
2017
2017

Publication Types

Select...
8

Relationship

0
8

Authors

Journals

citations
Cited by 354 publications
(119 citation statements)
references
References 25 publications
1
118
0
Order By: Relevance
“…Zaki and Gouda [27] proposed a method that replaces tidsets by diffsets. Diffsets only keep track of the differences in the tids of a candidate pattern from its generated frequent patterns.…”
Section: Improvements For Storing Tidsets Of Itemsetsmentioning
confidence: 99%
See 2 more Smart Citations
“…Zaki and Gouda [27] proposed a method that replaces tidsets by diffsets. Diffsets only keep track of the differences in the tids of a candidate pattern from its generated frequent patterns.…”
Section: Improvements For Storing Tidsets Of Itemsetsmentioning
confidence: 99%
“…Methods that use a hybrid approach: These methods use a vertical data format to compress the database and mine frequent itemsets using a divide-and-conquer strategy. Eclat (Zaki,[26]), dEclat (Zaki & Gouda, [27]), Index-BitTableFI (Song, Yang, & Xu, [15]), DBV-FI (Vo, Hong, & Le, [22]) and MBiS (Nguyen et al, [13]) are some examples. First, a horizontal database is scanned to convert it to a vertical database by creating the tidset (set of transaction IDs) of all items in the database and removing items that do not satisfy the minimum support threshold.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…The dEclat algorithm (Zaki and Gouda, 2003) makes use of the vertical database representation where each item maintains a set of transaction ids where this item is contained. They store the difference of ids, called the diffset, between the candidate itemset and its prefix frequent itemsets, instead of the ids intersection set.…”
Section: Fig 1: An Fp-tree Registers Compressed Frequent Pattern Inmentioning
confidence: 99%
“…The support of the itemset is computed by subtracting the cardinality of the DIFFset from the support of the k -1 frequent itemset's prefix. The performance of the dEclat algorithm has experimentally been shown to be better than that of Eclat [12].…”
Section: Introductionmentioning
confidence: 99%