2010
DOI: 10.1016/j.patcog.2009.12.003
|View full text |Cite
|
Sign up to set email alerts
|

Selection–fusion approach for classification of datasets with missing values

Abstract: This paper proposes a new approach based on missing value pattern discovery for classifying incomplete data. This approach is particularly designed for classification of datasets with a small number of samples and a high percentage of missing values where available missing value treatment approaches do not usually work well. Based on the pattern of the missing values, the proposed approach finds subsets of samples for which most of the features are available and trains a classifier for each subset. Then, it co… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
20
0

Year Published

2011
2011
2019
2019

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 33 publications
(20 citation statements)
references
References 24 publications
0
20
0
Order By: Relevance
“…Despite the innovation regarding the use of genetic algorithm to perform data imputation itself, this approach shows some weaknesses considering scalability and accuracy, since the individuals are codified into a m size vector, instead of using subsets/partitions [2,12]. A modified version of these approaches was proposed for multivariate data [8] and it is important to highlight that this method falls into completecase analysis, since the statistic information used in fitness function are extracted from examples without missing values, thus, important information is lost.…”
Section: Evolutionary Approachesmentioning
confidence: 99%
“…Despite the innovation regarding the use of genetic algorithm to perform data imputation itself, this approach shows some weaknesses considering scalability and accuracy, since the individuals are codified into a m size vector, instead of using subsets/partitions [2,12]. A modified version of these approaches was proposed for multivariate data [8] and it is important to highlight that this method falls into completecase analysis, since the statistic information used in fitness function are extracted from examples without missing values, thus, important information is lost.…”
Section: Evolutionary Approachesmentioning
confidence: 99%
“…However, due to the missing values in the feature matrix, feature and sample selections can not be performed directly. We thus partition the feature matrix, together with the target output matrix, into submatrices with only complete data (Ghannad-Rezaie et al, 2010), so that a 2-step multi-task learning algorithm (Obozinski et al, 2006; Zhang and Shen, 2012) can be applied to these sub-matrices to obtain a set of discriminative features and samples. The selected features and samples then form a shrunk, but still incomplete, matrix which is more “friendly” to imputation algorithms, as redundant/noisy features and samples have been removed and there are now a smaller number of missing values that need to be imputed.…”
Section: Introductionmentioning
confidence: 99%
“…The underlying idea of ensemble learning for classification problems is to build a number of base classifiers and then combine their outputs using a fusion rule. It has been shown that classifier ensembles outperform single classifiers for a wide range of classification problems [27][28][29]. The reason is that a combination of multiple classifiers reduces risks associated with choosing an insufficient single classifier.…”
Section: Classifier Ensemblementioning
confidence: 99%