Scalable Parallel Clustering for Data Mining on Multicomputers

Foti, D.; Lipari, D A; Pizzuti, Clara; Talia, Domenico

doi:10.1007/3-540-45591-4_51

Cited by 31 publications

(19 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Surprisingly our results show that by using the concept of clustering and parallelism, search becomes more cost effective, time effective and the quality of the search becomes accurate. Our results show that this strategy is able to cause efficient performance both in large scale and small scale search engines [16].…”

Section: Inputmentioning

confidence: 76%

See 1 more Smart Citation

Clustered Distributed Index for Efficient Text Retrieval Using Threads

Basavaraju¹,

Prabhakar²

2010

IJGCA

View full text Add to dashboard Cite

show abstract

Section: Inputmentioning

confidence: 76%

“…developed scalable parallel clustering models for data mining on multi-computers in their research paper in [16]. They designed & implemented on MIMD parallel machines of PAutoClass, a parallel version of the AutoClass system based upon the Bayesian method for determining optimal classes in large datasets.…”

Section: Introductionmentioning

confidence: 99%

Clustered Distributed Index for Efficient Text Retrieval Using Threads

Basavaraju¹,

Prabhakar²

2010

IJGCA

View full text Add to dashboard Cite

show abstract

“…An example of parallel implementation of a clustering algorithm is P-CLUSTER [10]. Other parallel clustering algorithms are discussed in [5], [12], and [7]. In particular, in [7] an SPDM implementation of the AutoClass algorithm, named P-AutoClass is described.…”

Section: Parallel Cluster Analysismentioning

confidence: 99%

“…Other parallel clustering algorithms are discussed in [5], [12], and [7]. In particular, in [7] an SPDM implementation of the AutoClass algorithm, named P-AutoClass is described. The paper shows interesting performance results on distributed memory MIMD machines.…”

Section: Parallel Cluster Analysismentioning

confidence: 99%

Parallelism in Knowledge Discovery Techniques

Talia

2002

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. Knowledge discovery in databases or data mining is the semiautomated analysis of large volumes of data, looking for the relationships and knowledge that are implicit in large volumes of data and are 'interesting' in the sense of impacting an organization's practice. Data mining and knowledge discovery on large amounts of data can benefit of the use of parallel computers both to improve performance and quality of data selection. This paper presents and discusses different forms of parallelism that can be exploited in data mining techniques and algorithms. For the main data mining techniques, such as rule induction, clustering algorithms, decision trees, genetic algorithms, and neural networks, the possible ways to exploit parallelism are presented and discussed in detail. Finally, some promising research directions in the parallel data mining research area are outlined.

show abstract

“…Nevertheless, it is easy to see how distributed formulation of related hill-climbing algorithms such as k-means and k-median clustering [5,7,6] can be adapted to solve distributed FLP. We note, however, that all previous work on distributed clustering assumes tight cooperation and synchronization between the processors containing the data, and a central processor that collects the sufficient statistics needed in each step of the hill-climbing heuristic.…”

Section: Introductionmentioning

confidence: 99%

A Local Facility Location Algorithm for Sensor Networks

Krivitski

Schuster

Wolff

2005

Distributed Computing in Sensor Systems

View full text Add to dashboard Cite

Abstract. In this paper we address a well-known facility location problem (FLP) in a sensor network environment. The problem deals with finding the optimal way to provide service to a (possibly) very large number of clients. We show that a variation of the problem can be solved using a local algorithm. Local algorithms are extremely useful in a sensor network scenario. This is because they allow the communication range of the sensor to be restricted to the minimum, they can operate in routerless networks, and they allow complex problems to be solved on the basis of very little information, gathered from nearby sensors. The local facility location algorithm we describe is entirely asynchronous, seamlessly supports failures and changes in the data during calculation, poses modest memory and computational requirements, and can provide an anytime solution which is guaranteed to converge to the exact same one that would be computed by a centralized algorithm given the entire data.

show abstract

Scalable Parallel Clustering for Data Mining on Multicomputers

Cited by 31 publications

References 6 publications

Clustered Distributed Index for Efficient Text Retrieval Using Threads

Clustered Distributed Index for Efficient Text Retrieval Using Threads

Parallelism in Knowledge Discovery Techniques

A Local Facility Location Algorithm for Sensor Networks

Contact Info

Product

Resources

About