Dimensionality reduction (DR) methods are applied to extract relevant features from inherently high dimensional and noisy single-cell RNA sequencing (scRNA-seq) data. Choice of DR method could influence the performance of clustering algorithm and subsequent analysis outcomes. We performed a benchmarking study of seven popular DR methods and four clustering algorithms widely used for scRNA-seq datasets. For this purpose, we used three publicly available real scRNA-seq datasets. The performance was evaluated using two clustering metrics viz. adjusted random index (ARI) and normalized mutual index (NMI). We also compared our results with a similar study published by Xiang and colleagues. Overall, we observed higher ARI and NMI scores for DR methods when compared with Xiangs study. We also noticed several differences between our and Xiangs study. Noteworthy, three methods, namely, Independent Component Analysis (ICA), t-Distributed Stochastic Neighbor Embedding (t-SNE) and Uniform Manifold Approximation and Projection (UMAP) performed consistently well across three datasets. Linear method ICA was best performer on Segerstolpe dataset, while nonlinear methods UMAP and t-SNE best performed on Deng and Chu datasets, respectively. Neural network-based methods Variational Autoencoder (VAE) and Deep Count Autoencoder (DCA) could not perform well probably due to their sensitivity to hyperparameters and overfitting. Among clustering methods, Gaussian Mixture Models (GMMs) performed consistently well across datasets. This might be because GMMs are the universal approximators of posterior probability densities. We conclude that performance of different DR methods is more dataset dependent and for various scRNA-seq datasets different algorithms are more suited and there is no one-fit-all method.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.