-Distributed Data Mining (DDM) has become one of the promising areas of Data Mining (DM). DDM evolved from DM from the urge to mine data from distributed sites. DM paved way for increased computational cost and privacy due to centralized data mining, whereas DDM paved way for decrease in computational cost as well as enhanced data privacy by distributing resources across distributed sites. Mining techniques framed for DM can't be applied for DDM since mining DDM follows a different strategy compared to DM. DDM includes classifier based, agent based and privacy preserving based approaches. In this paper, DDM approaches and techniques is studied in detail. Keyword -Distributed Data Mining, distributed sites, computation cost, classifier approach, agent based, privacy-preserving I. INTRODUCTION Data Mining (DM) is the process of extracting useful information from datasets using DM techniques, namely pattern matching, clustering, rule association, regression, etc. The progressive growth of information technology has paved way to further explore Distributed/Collective Data Mining, Spatial and Geographic Data Mining, Temporal Data Mining, Spatio-Temporal Data Mining, Multimedia Data Mining and Phenomenal Data Mining. DM today performs computation on the database or warehouse at a single geographical location paving way for increased computation cost and questioning on data privacy. Future scope of DM is computing data located at different geographical locations. This is termed DDM/CDM (Collective Data Mining) [1].The main factors which led to the evolution of DDM are -privacy of sensitive data, transmission cost, computation cost and memory cost. The objective of DDM is to extract useful information from data located at heterogeneous sites. Distributed computing comprises distributed sites, hosting computing units at individual heterogeneous points. DDM follows decentralized mining strategy which differs from centralized strategy making entire working system scalable by distributing workload across heterogeneous sites [1].Alfredo Cuzzocrea[2] stated that framing a methodology for DDM is challenging not only by distributed environment, but also for its efficient resource sharing and minimized computational complexity specifications. DDM mainly comprises of two variations -data distributed and computation distributed. In the former method, data is distributed among heterogeneous sites at local level and computation is hosted at global level. In the latter method, computation is distributed among heterogeneous sites at local level and data is hosted at global level. Figure 1 explains DDM working architecture. The database of heterogeneous sites hosts useful, unknown information. DDM algorithms will be applied over data at heterogeneous sites as local model and finally the DM computed result will be agglomerated to form global model [1]. Kargupta et al.[3] and Zaki et al. [4] discussed that several researchers analyzed the complexity involved in framing methodology for DDM in two ways: analyzing on effective and...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.