2015
DOI: 10.5120/ijca2015906632
|View full text |Cite
|
Sign up to set email alerts
|

Performance Analysis of Apriori Algorithm with Different Data Structures on Hadoop Cluster

Abstract: Mining frequent itemsets from massive datasets is always being a most important problem of data mining. Apriori is the most popular and simplest algorithm for frequent itemset mining. To enhance the efficiency and scalability of Apriori, a number of algorithms have been proposed addressing the design of efficient data structures, minimizing database scan and parallel and distributed processing. MapReduce is the emerging parallel and distributed technology to process big datasets on Hadoop Cluster. To mine big … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
16
0

Year Published

2016
2016
2024
2024

Publication Types

Select...
5
2
1

Relationship

3
5

Authors

Journals

citations
Cited by 18 publications
(16 citation statements)
references
References 17 publications
0
16
0
Order By: Relevance
“…Distributed computing using Hadoop and Spark frameworks became popular because of their parallel processing 42 . Several research works adopted the Hadoop with MapReduce programming engine for frequent itemset mining on big data 43‐45 . Finally, on a practical level, the deployment of our solution in a real setting within the CRM service, would allow us to see the contribution of “optimized” ARs compared to other rules and to show the advantage of using our framework in the decision‐making process.…”
Section: Discussionmentioning
confidence: 99%
“…Distributed computing using Hadoop and Spark frameworks became popular because of their parallel processing 42 . Several research works adopted the Hadoop with MapReduce programming engine for frequent itemset mining on big data 43‐45 . Finally, on a practical level, the deployment of our solution in a real setting within the CRM service, would allow us to see the contribution of “optimized” ARs compared to other rules and to show the advantage of using our framework in the decision‐making process.…”
Section: Discussionmentioning
confidence: 99%
“…The nodes of the Trie are simple which may be cached in, and the linear search is faster in cache memory. Singh et al [19] have investigated the performance of the same three data structures in the context of MapReduce-based Apriori on the Hadoop cluster. The authors have shown in their experimental results that Hash Table Trie performed much better than Trie on some datasets while Hash Tree was the worst one.…”
Section: Performance Analysismentioning
confidence: 99%
“…The data structure perspective to the performance of Spark-based Apriori has not been explored well. Singh et al [19] have evaluated the performance of the Apriori algorithm on the different data structures, but on the Hadoop MapReduce and not on the Spark.…”
Section: Introductionmentioning
confidence: 99%
“…Some researchers use various data structures to improve the efficiency of association rule mining algorithms. Singh [35] tries to use a hash table, hash trie and hash table trie for candidate storage in Apriori MapReduce-based implementation. They find that hash table trie is most efficient than others in MapReduce context while it is not much efficient in a sequential approach.…”
Section: Related Workmentioning
confidence: 99%