2021
DOI: 10.1038/s41467-021-26085-2
|View full text |Cite
|
Sign up to set email alerts
|

DUBStepR is a scalable correlation-based feature selection method for accurately clustering single-cell data

Abstract: Feature selection (marker gene selection) is widely believed to improve clustering accuracy, and is thus a key component of single cell clustering pipelines. Existing feature selection methods perform inconsistently across datasets, occasionally even resulting in poorer clustering accuracy than without feature selection. Moreover, existing methods ignore information contained in gene-gene correlations. Here, we introduce DUBStepR (Determining the Underlying Basis using Stepwise Regression), a feature selection… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
25
0
1

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
2

Relationship

1
6

Authors

Journals

citations
Cited by 47 publications
(34 citation statements)
references
References 38 publications
0
25
0
1
Order By: Relevance
“…1 ). We performed de novo clustering on CRC-SG1 epithelial cells using DUBStepR 10 for feature selection, and then re-clustered the cells using differentially expressed genes (DEGs) between the initial clusters (Supplementary Fig. 1 and Methods ).…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…1 ). We performed de novo clustering on CRC-SG1 epithelial cells using DUBStepR 10 for feature selection, and then re-clustered the cells using differentially expressed genes (DEGs) between the initial clusters (Supplementary Fig. 1 and Methods ).…”
Section: Resultsmentioning
confidence: 99%
“…To cluster the 15,920 high-quality epithelial cells described above, we first used DUBStepR, a correlation-based feature selection algorithm that outperforms existing methods across diverse clustering benchmarks, to identify an informative set of genes 10 (Supplementary Fig. 1 ).…”
Section: Methodsmentioning
confidence: 99%
“…While filters are the most common options for pre-processing and feature selection from single-cell transcriptomics data, the application of wrapper methods is gaining much attention with a range of approaches built and extends on classic methods with the primary goal of facilitating downstream analyses such as cell type classification. Some examples include the application of classic methods such as greedy-based optimisation of entropy [ 66 ], nature-inspired optimisation such as using GA [ 67 , 68 ], and their hybrid with filters [ 69 – 71 ] or embedded methods [ 72 ]. More advanced methods include active learning-based feature selection using SVM as a wrapper [ 73 ] and optimisation based on data projection [ 74 ].…”
Section: Feature Selection In the Single-cell Eramentioning
confidence: 99%
“…Muto et al [ 88 ] performed filter-based differential analysis on both chromatin and gene levels based on Cicero estimated gene activity scores [ 89 ]. Finally, DUBStepR [ 71 ], a hybrid approach that combines a correlation-based filter and a regression-based wrapper for gene selection from scRNA-seq data, can also be applied to scATAC-seq data. Collectively, these methods and tools demonstrate the utility and impact of feature selection on scATAC data for cell-type identification, motif analysis, regulatory element and gene interaction detection among other applications.…”
Section: Feature Selection In the Single-cell Eramentioning
confidence: 99%
See 1 more Smart Citation