2014
DOI: 10.4236/jsea.2014.78059
|View full text |Cite
|
Sign up to set email alerts
|

D-IMPACT: A Data Preprocessing Algorithm to Improve the Performance of Clustering

Abstract: In this study, we propose a data preprocessing algorithm called D-IMPACT inspired by the IMPACT clustering algorithm. D-IMPACT iteratively moves data points based on attraction and density to detect and remove noise and outliers, and separate clusters. Our experimental results on two-dimensional datasets and practical datasets show that this algorithm can produce new datasets such that the performance of the clustering algorithm is improved.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

1
1
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 13 publications
1
1
0
Order By: Relevance
“…The main purpose of implementing the big data processing approach in the Online Roadshow is to reduce the data inaccuracy, inconsistency, and noise to provide a quality low-latency clustering outcome for a better user roadshow experience. This is in line with similar works like the D-Impact, a pre-processing data algorithm that achieves higher clustering quality by removing noise and outliers [25]. In leveraging the real-time big data processing technique on the clustering procedure, low-latency personalization solutions, such as real-time web personalization, are made possible.…”
Section: Online Roadshowsupporting
confidence: 74%
“…The main purpose of implementing the big data processing approach in the Online Roadshow is to reduce the data inaccuracy, inconsistency, and noise to provide a quality low-latency clustering outcome for a better user roadshow experience. This is in line with similar works like the D-Impact, a pre-processing data algorithm that achieves higher clustering quality by removing noise and outliers [25]. In leveraging the real-time big data processing technique on the clustering procedure, low-latency personalization solutions, such as real-time web personalization, are made possible.…”
Section: Online Roadshowsupporting
confidence: 74%
“…Chameleon clustering results over this data set are the poorest over the five data sets proposed in [71] 4 and this test case was excluded from the final paper [70] reporting Chameleon. Nevertheless, this data set remains an interesting test case for clustering algorithms and has been extensively used for clustering validation [72,66,73,74]. DyClee results for this test case are given in Figure 22, showing the correct detection of six natural clusters, which correspond to the genuine clusters, but including some outlier samples in the natural clusters.…”
Section: Clustering Chameleon Data Setsmentioning
confidence: 99%