In this paper, a new version of the Supervised t-Stochastic Neighbor Embedding (S-tSNE) algorithm is proposed which introduces the use of a dissimilarity measure related to class information. The proposed S-tSNE can be applied in any high dimensional dataset for visualization or as a feature extraction for classification problems. In this study, the S-tSNE is applied to three datasets MNIST, Chest x-ray, and SEER Breast Cancer. The two-dimensional data generated by the S-tSNE showed better visualization and an improvement in terms of classification accuracy in comparison to the original t-Stochastic Neighbor Embedding(t-SNE) method. The results from k-nearest neighbors (k-NN) classification model which used the lower dimension space generated by the new S-tSNE method showed more than 20% improvement on average in accuracy in all the three datasets compared with the t-SNE method. In addition, the classification accuracy using the S-tSNE for feature extraction was even higher than classification accuracy obtained from the original high dimensional data.
In recent years, a variety of supervised manifold learning techniques have been proposed to outperform their unsupervised alternative versions in terms of classification accuracy and data structure capturing. Some dissimilarity measures have been used in these techniques to guide the dimensionality reduction process. Their good performance was empirically demonstrated; however, the relevant analysis is still missing. This paper contributes to a theoretical analysis on a) how dissimilarity measures affect maintaining manifold neighbourhood structure and b) how supervised manifold learning techniques could contribute to the reduction of classification error. This paper also provides a cross-comparison between supervised and unsupervised manifold learning approaches in terms of structure capturing using Kendall's Tau coefficients and co-ranking matrices. Four different metrics (including three dissimilarity measures and Euclidean distance) have been considered along with manifold learning methods such as Isomap, t-Stochastic Neighbour Embedding (t-SNE), and Laplacian Eigenmaps (LE), in two datasets: Breast Cancer and Swiss-Roll. This paper concludes that although the dissimilarity measures used in the manifold learning techniques can reduce classification error, they do not learn well or preserve the structure of the hidden manifold in the high dimensional space, but instead, they destroy the structure of the data. Based on the findings of this paper, it is advisable to use supervised manifold learning techniques as a pre-processing step in classification. In addition, it is not advisable to apply supervised manifold learning for visualization purposes since the two-dimensional representation using supervised manifold learning does not improve the preservation of data structure.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.