2002
DOI: 10.1007/3-540-47887-6_54
|View full text |Cite
|
Sign up to set email alerts
|

Cluster-Based Algorithms for Dealing with Missing Values

Abstract: Abstract. We first survey existing methods to deal with missing values and report the results of an experimental comparative evaluation in terms of their processing cost and quality of imputing missing values. We then propose three cluster-based mean-and-mode algorithms to impute missing values. Experimental results show that these algorithms with linear complexity can achieve comparative quality as sophisticated algorithms and therefore are applicable to large datasets.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
8
0
1

Year Published

2002
2002
2018
2018

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 29 publications
(9 citation statements)
references
References 3 publications
0
8
0
1
Order By: Relevance
“…Faturamento ≤ 61000 ∈ F at 1 61000 < Faturamento ≤ 123000 ∈ F at 2 123000 < Faturamento ≤ 377000 ∈ F at 3 Faturamento > 377000 ∈ F at 4 (2)…”
Section: Transformação Dos Atributosmentioning
confidence: 99%
See 1 more Smart Citation
“…Faturamento ≤ 61000 ∈ F at 1 61000 < Faturamento ≤ 123000 ∈ F at 2 123000 < Faturamento ≤ 377000 ∈ F at 3 Faturamento > 377000 ∈ F at 4 (2)…”
Section: Transformação Dos Atributosmentioning
confidence: 99%
“…Técnicas como classificador por vizinho mais pró-ximo nearest neighbor, classificadores bayesianos e diversas técnicas estatísticas, não conseguem lidar com conjunto de dados com valores ausentes, tornando seu uso inviável para determinadas bases de dados [4]. Por outro lado, técnicas convencionais que lidam com bases de dados contendo pequeno número de valores ausentes, como árvores de decisão, podem ser utilizadas na tentativa de se retirar conhecimento dessas bases, porém experiências mostram que estas não apresentam resposta satisfatória quando o número de dados ausentes é muito grande.…”
unclassified
“…The soft computing paradigm is to exploit the tolerance for imprecision, uncertainty and partial truth to achieve tractability, robustness and low solution cost [16]. The use of soft computing techniques in missing data imputation presents the major difference of our approach from that presented in [14].…”
Section: Introductionmentioning
confidence: 98%
“…These common characteristics are derived from auxiliary variables, e.g., age, gender, race, or education degree, whose values are available from the cases to be imputed. Generally, there are two steps in hot deck imputation [14]. First, data are partitioned into several clusters based on certain similarity metric, and each instance with missing data is associated with one of the clusters.…”
Section: Introductionmentioning
confidence: 99%
“…A clustering based approach for missing data imputation was considered as a local alternative to global estimation [3]. The premise was that instances could be grouped such that all the imputations in identified groups are independent from other groups.…”
Section: Introductionmentioning
confidence: 99%