2004
DOI: 10.1109/tkde.2004.25
|View full text |Cite
|
Sign up to set email alerts
|

Efficient disk-based K-means clustering for relational databases

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
45
0

Year Published

2005
2005
2015
2015

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 79 publications
(45 citation statements)
references
References 29 publications
0
45
0
Order By: Relevance
“…How to reduce the number of times the whole dataset is scanned so as to save the computation cost is one of the most important things in all the frequent pattern studies. The similar situation also exists in data clustering and classification studies because the design concept of earlier algorithms, such as mining the patterns on-the-fly [46], mining partial patterns at different stages [47], and reducing the number of times the whole dataset is scanned [32], are therefore presented to enhance the performance of these mining algorithms. Since some of the data mining problems are NP-hard [48] or the solution space is very large, several recent studies [23,49] have attempted to use metaheuristic algorithm as the mining algorithm to get the approximate solution within a reasonable time.…”
Section: Discussionmentioning
confidence: 90%
“…How to reduce the number of times the whole dataset is scanned so as to save the computation cost is one of the most important things in all the frequent pattern studies. The similar situation also exists in data clustering and classification studies because the design concept of earlier algorithms, such as mining the patterns on-the-fly [46], mining partial patterns at different stages [47], and reducing the number of times the whole dataset is scanned [32], are therefore presented to enhance the performance of these mining algorithms. Since some of the data mining problems are NP-hard [48] or the solution space is very large, several recent studies [23,49] have attempted to use metaheuristic algorithm as the mining algorithm to get the approximate solution within a reasonable time.…”
Section: Discussionmentioning
confidence: 90%
“…Since is usually much larger than both and , the complexity becomes near linear to the number of samples in the data sets. -means algorithm is effective in clustering largescale data sets, and efforts have been made in order to overcome its disadvantages [142], [218].…”
Section: )mentioning
confidence: 99%
“…To ensure efficient computation of the contrast measure, we use the onepass k-means clustering strategy introduced in [23] with k = Q. We obtain Q clusters summarizing the data.…”
Section: Efficient Contrast Computationmentioning
confidence: 99%