Yaling Xun scite author profile

Existing parallel mining algorithms for frequent itemsets lack a mechanism that enables automatic parallelization, load balancing, data distribution, and fault tolerance on large clusters. As a solution to this problem, we design a parallel frequent itemsets mining algorithm called FiDoop using the MapReduce programming model. To achieve compressed storage and avoid building conditional pattern bases, FiDoop incorporates the frequent items ultrametric tree, rather than conventional FP trees. In FiDoop, three MapReduce jobs are implemented to complete the mining task. In the crucial third MapReduce job, the mappers independently decompose itemsets, the reducers perform combination operations by constructing small ultrametric trees, and the actual mining of these trees separately. We implement FiDoop on our in-house Hadoop cluster. We show that FiDoop on the cluster is sensitive to data distribution and dimensions, because itemsets with different lengths have different decomposition and construction costs. To improve FiDoop's performance, we develop a workload balance metric to measure load balance across the cluster's computing nodes. We develop FiDoop-HD, an extension of FiDoop, to speed up the mining performance for high-dimensional data analysis. Extensive experiments using real-world celestial spectral data demonstrate that our proposed solution is efficient and scalable.Index Terms-Frequent itemsets, frequent items ultrametric tree (FIU-tree), Hadoop cluster, load balance, MapReduce.

show abstract

FiDoop-DP: Data Partitioning in Frequent Itemset Mining on Hadoop Clusters

Xun

Zhang

Qin

et al. 2017

IEEE Trans. Parallel Distrib. Syst.

View full text Add to dashboard Cite

Incremental frequent itemsets mining based on frequent pattern tree and multi-scale

Xun

Cui

Zhang

et al. 2021

Expert Systems with Applications

View full text Add to dashboard Cite

Incremental Frequent Itemsets Mining With FCFP Tree

et al. 2019

View full text Add to dashboard Cite

Frequent itemsets mining (FIM) as well as other mining techniques has been being challenged by large scale and rapidly expanding datasets. To address this issue, we propose a solution for incremental frequent itemsets mining using a Full Compression Frequent Pattern Tree (FCFP-Tree) and related algorithms called FCFPIM. Unlike FP-tree, the FCFP-Tree maintains complete information of all the frequent and infrequent items in the original dataset. This allows the FCFPIM algorithm not to waste any scan and computational overhead for the previously processed original dataset when new dataset are added and support changes. Therefore, much processing time is saved. Importantly, FCFPIM adopts an effective tree structure adjustment strategy when the support of some items changes due to the arrival of new data. FCFPIM is conducive to speeding up the performance of incremental FIM. Although the tree structure containing the lossless items information is space-consuming, a compression strategy is used to save space. We conducted experiments to evaluate our solution, and the experimental results show the space-consuming is worthwhile to win the gain of execution efficiency, especially when the support threshold is low.INDEX TERMS Frequent itemsets mining, incremental mining, FP-tree, FCFP-tree, association rule.

show abstract

A relevant subspace based contextual outlier mining algorithm

Zhang

et al. 2016

Knowledge-Based Systems

View full text Add to dashboard Cite

For high-dimensional and massive data sets, a relevant subspace based contextual outlier detection algorithm is proposed. Firstly, the relevant subspace, which can effectively describe the local distribution of the various data sets, is redefined by using local sparseness of attribute dimensions. Secondly, a local outlier factor calculation formula in the relevant subspace is defined with probability density of local data sets, and the formula can effectively reflect the outlier degree of data object that does not obey the distribution of the local data set in the relevant subspace. Thirdly, attribute dimensions of constituting the relevant subspace and local outlier factor are defined as the contextual information, which can improve the interpretability and comprehensibility of outlier. Fourthly, the selection of N data objects with the greatest local outlier factor value is defined as contextual outliers. In the end, experimental results validate the effectiveness of the algorithm by using UCI data sets.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Yaling Xun

FiDoop: Parallel Mining of Frequent Itemsets Using MapReduce

FiDoop-DP: Data Partitioning in Frequent Itemset Mining on Hadoop Clusters

Incremental frequent itemsets mining based on frequent pattern tree and multi-scale

Incremental Frequent Itemsets Mining With FCFP Tree

A relevant subspace based contextual outlier mining algorithm

Contact Info

Product

Resources

About