2020
DOI: 10.1007/s10489-020-01664-w
|View full text |Cite
|
Sign up to set email alerts
|

A general-purpose distributed pattern mining system

Abstract: This paper explores five pattern mining problems and proposes a new distributed framework called DT-DPM: Decomposition Transaction for Distributed Pattern Mining. DT-DPM addresses the limitations of the existing pattern mining problems by reducing the enumeration search space. Thus, it derives the relevant patterns by studying the different correlation among the transactions. It first decomposes the set of transactions into several clusters of different sizes, and then explores heterogeneous architectures, inc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
15
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
7
1

Relationship

1
7

Authors

Journals

citations
Cited by 22 publications
(15 citation statements)
references
References 75 publications
0
15
0
Order By: Relevance
“…However, this is not a realistic scenario, and the objective is to decrease the number of items shared by the separated groups. Existing work [5] identified that k-means [41] and DBSCAN [42] can obtain a good performance of transaction decomposition, and k -means showed better results than that of the DBSCAN. Thus, a k-means model is used in the designed framework for transaction decomposition that can group highly relevant transactions in the same group.…”
Section: Propositionmentioning
confidence: 99%
See 1 more Smart Citation
“…However, this is not a realistic scenario, and the objective is to decrease the number of items shared by the separated groups. Existing work [5] identified that k-means [41] and DBSCAN [42] can obtain a good performance of transaction decomposition, and k -means showed better results than that of the DBSCAN. Thus, a k-means model is used in the designed framework for transaction decomposition that can group highly relevant transactions in the same group.…”
Section: Propositionmentioning
confidence: 99%
“…As there is rapid growth of information technologies regarding machine learning models, Internet of Things (IoT) [1], and edge and cloud computing [2,3], data-driven mining has become an important topic that can be used to extract the meaningful information from the collections of those techniques. Several pattern mining models [4][5][6][7][8][9] have been extensively studied, and the most fundamental knowledge of pattern mining in knowledge discovery in databases (KDD) is called ARM or association rule mining, which is deployed through varied applications and specific domains. Among them, Apriori was presented for finding the association rules set in transactional databases iteratively.…”
Section: Introductionmentioning
confidence: 99%
“…It integrates "Density-Based Spatial Clustering of Applications and distributed computing represented, CPU multi-cores and Single CPU for solving pattern mining problems". Performance of any algorithm [14]is also effected by the number of nodes in the distributed data system. With the increase of the transactions or nodes, the performance improves.…”
Section: Related Workmentioning
confidence: 99%
“…In the distributed setup,data is generated or createdat different locations. The number of transactions at various sites differ [14] from a few hundred to millions of transactions.In this setup, resources at each site is also limited and are distributed. Data mining requires great amounts of resources [27] so technique for flexible distribution of work load amongst sites needs to be developed.…”
Section: Sizebased Assignment (Sba) Technique For Polling Site Assignmentmentioning
confidence: 99%
“…Although these examples are perfectly capable of fulfilling the sequential pattern mining task, traditional algorithms suffer greatly with runtime and accuracy when dealing with massive datasets [11]. Another drawback of the frequent pattern mining solutions is that their output data are proved to be challenging to interpret and handle-especially when the number of the mined sequences is high-often introducing a new problem to solve [12].…”
Section: Introductionmentioning
confidence: 99%