Combining classification improvements by ensemble processing

Ishii, Naohiro; Tsuchiya, Eisuke; Bao, Yongguang; Yamaguchi, Nobuhiko

doi:10.1109/sera.2005.30

Cited by 10 publications

(7 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…According to the discussion about the desirable subsets, the number of the features in each subset should be large enough to get a reliable determination of the class label. A lower bound can be obtained using a simple feature reduction technique; for details see [9-10]. …”

Section: Proposed Methodsmentioning

confidence: 99%

“…Generally speaking, there are five classes of well-established strategies to deal with the missing values: 1) discard the incomplete samples (e.g., pairwise deletion [2]); 2) avoid the missing features by dynamic decisions (e.g., decision trees such as CART [7]); 3) recover unknown values from the similar samples (e.g., Expectation Maximization (EM) [8]); 4) insert possible values for the missing features, classify after each insertion and combine the classification results (e.g., Multiple Imputations (MI) [9]); and 5) design multiple classifiers on the subsets of the data and combine the classification results (e.g., ensemble classifiers [17]).…”

Section: Related Workmentioning

confidence: 99%

“…Multiple imputations (MI) method [1,9] is an alternative solution that uses Monte Carlo simulation to generate more than one imputation of the missing values. However, the MI usually implies several assumptions on the data distribution such as joint normality [13] and regression relationships [14].…”

Section: Related Workmentioning

confidence: 99%

See 2 more Smart Citations

Selection–fusion approach for classification of datasets with missing values

Ghannad-Rezaie

Soltanian-Zadeh

Ying

et al. 2010

Pattern Recognition

View full text Add to dashboard Cite

This paper proposes a new approach based on missing value pattern discovery for classifying incomplete data. This approach is particularly designed for classification of datasets with a small number of samples and a high percentage of missing values where available missing value treatment approaches do not usually work well. Based on the pattern of the missing values, the proposed approach finds subsets of samples for which most of the features are available and trains a classifier for each subset. Then, it combines the outputs of the classifiers. Subset selection is translated into a clustering problem, allowing derivation of a mathematical framework for it. A trade off is established between the computational complexity (number of subsets) and the accuracy of the overall classifier. To deal with this trade off, a numerical criterion is proposed for the prediction of the overall performance. The proposed method is applied to seven datasets from the popular University of California, Irvine data mining archive and an epilepsy dataset from Henry Ford Hospital, Detroit, Michigan (total of eight datasets). Experimental results show that classification accuracy of the proposed method is superior to those of the widely used multiple imputations method and four other methods. They also show that the level of superiority depends on the pattern and percentage of missing values.

show abstract

Section: Proposed Methodsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Selection–fusion approach for classification of datasets with missing values

Ghannad-Rezaie

Soltanian-Zadeh

Ying

et al. 2010

Pattern Recognition

View full text Add to dashboard Cite

show abstract

“…In this way, a robust decoding strategy is required to obtain accurate results. Several techniques for the binary decoding step have been proposed in the literature (Windeatt and Ghaderi, 2003) (Ishii et al, 2005) (Passerini et al, 2004) (Dekel and Singer, 2002), though the most common ones are the Hamming and the Euclidean approaches (Windeatt and Ghaderi, 2003). In the work of (Pujol et al, 2006), authors showed that usually the Euclidean distance was more suitable than the traditional Hamming distance in both the binary and the ternary cases.…”

Section: Decoding Designsmentioning

confidence: 99%

“…In (Windeatt and Ghaderi, 2003), Inverse Hamming Distance (IHD) and Centroid distance (CEN) for binary problems are introduced. Other decoding strategies for nominal, discrete and heterogeneous attributes have been proposed in (Ishii et al, 2005). With the introduction of the zero symbol, Allwein et al (Allwein et al, 2002) show the advantage of using a loss based function of the margin of the base classifier on the ternary ECOC.…”

Section: Introductionmentioning

confidence: 99%

Loss-Weighted Decoding for Error-Correcting Output Codin

2008

Proceedings of the Third International Conference on Computer Vision Theory and Applications

View full text Add to dashboard Cite

The multi-class classification is a challenging problem for several applications in Computer Vision. Error Correcting Output Codes technique (ECOC) represents a general framework capable to extend any binary classification process to the multi-class case. In this work, we present a novel decoding strategy that takes advantage of the ECOC coding to outperform the up to now existing decoding strategies. The novel decoding strategy is applied to the state-of-the-art coding designs, extensively tested on the UCI Machine Learning repository database and in two real vision applications: tissue characterization in medical images and traffic sign categorization. The results show that the presented methodology considerably increases the performance of the traditional ECOC strategies and the state-of-the-art multi-classifiers. senting each class, where each bit identifies the class membership by a given binary classifier.

show abstract