Jinyu Park scite author profile

Feature selection (marker gene selection) is widely believed to improve clustering accuracy, and is thus a key component of single cell clustering pipelines. Existing feature selection methods perform inconsistently across datasets, occasionally even resulting in poorer clustering accuracy than without feature selection. Moreover, existing methods ignore information contained in gene-gene correlations. Here, we introduce DUBStepR (Determining the Underlying Basis using Stepwise Regression), a feature selection algorithm that leverages gene-gene correlations with a novel measure of inhomogeneity in feature space, termed the Density Index (DI). Despite selecting a relatively small number of genes, DUBStepR substantially outperformed existing single-cell feature selection methods across diverse clustering benchmarks. Additionally, DUBStepR was the only method to robustly deconvolve T and NK heterogeneity by identifying disease-associated common and rare cell types and subtypes in PBMCs from rheumatoid arthritis patients. DUBStepR is scalable to over a million cells, and can be straightforwardly applied to other data types such as single-cell ATAC-seq. We propose DUBStepR as a general-purpose feature selection solution for accurately clustering single-cell data.

show abstract

scConsensus: combining supervised and unsupervised clustering for cell type identification in single-cell RNA sequencing data

Ranjan

Schmidt

Sun

et al. 2021

BMC Bioinformatics

View full text Add to dashboard Cite

Background Clustering is a crucial step in the analysis of single-cell data. Clusters identified in an unsupervised manner are typically annotated to cell types based on differentially expressed genes. In contrast, supervised methods use a reference panel of labelled transcriptomes to guide both clustering and cell type identification. Supervised and unsupervised clustering approaches have their distinct advantages and limitations. Therefore, they can lead to different but often complementary clustering results. Hence, a consensus approach leveraging the merits of both clustering paradigms could result in a more accurate clustering and a more precise cell type annotation. Results We present scConsensus, an $${\mathbf {R}}$$ R framework for generating a consensus clustering by (1) integrating results from both unsupervised and supervised approaches and (2) refining the consensus clusters using differentially expressed genes. The value of our approach is demonstrated on several existing single-cell RNA sequencing datasets, including data from sorted PBMC sub-populations. Conclusions scConsensus combines the merits of unsupervised and supervised approaches to partition cells with better cluster separation and homogeneity, thereby increasing our confidence in detecting distinct cell types. scConsensus is implemented in $${\mathbf {R}}$$ R and is freely available on GitHub at https://github.com/prabhakarlab/scConsensus.

show abstract

scConsensus: combining supervised and unsupervised clustering for cell type identification in single-cell RNA sequencing data

Ranjan

Schmidt

Sun

et al. 2020

Preprint

View full text Add to dashboard Cite

Clustering is a crucial step in the analysis of single-cell data. Clusters identified using unsupervised clustering are typically annotated to cell types based on differentially expressed genes. In contrast, supervised methods use a reference panel of labelled transcriptomes to guide both clustering and cell type identification. Supervised and unsupervised clustering strategies have their distinct advantages and limitations. Therefore, they can lead to different but often complementary clustering results. Hence, a consensus approach leveraging the merits of both clustering paradigms could result in a more accurate clustering and a more precise cell type annotation. We present scConsensus, an R framework for generating a consensus clustering by (i) integrating the results from both unsupervised and supervised approaches and (ii) refining the consensus clusters using differentially expressed (DE) genes. The value of our approach is demonstrated on several existing single-cell RNA sequencing datasets, including data from sorted PBMC sub-populations. scConsensus is freely available on GitHub at https://github.com/prabhakarlab/scConsensus.

show abstract

DUBStepR: correlation-based feature selection for clustering single-cell RNA sequencing data

Ranjan¹,

Sun²,

Park³

et al. 2020

Preprint

View full text Add to dashboard Cite

Feature selection (marker gene selection) is widely believed to improve clustering accuracy, and is thus a key component of single cell clustering pipelines. However, we found that the performance of existing feature selection methods was inconsistent across benchmark datasets, and occasionally even worse than without feature selection. Moreover, existing methods ignored information contained in gene-gene correlations. We therefore developed DUBStepR (Determining the Underlying Basis using Stepwise Regression), a feature selection algorithm that leverages gene-gene correlations with a novel measure of in-homogeneity in feature space, termed the Density Index (DI). Despite selecting a relatively small number of genes, DUB-StepR substantially outperformed existing single-cell feature selection methods across diverse clustering benchmarks. In a published scRNA-seq dataset from sorted monocytes, DUBStepR sensitively detected a rare and previously invisible population of contaminating basophils. DUBStepR is scalable to large datasets, and can be straightforwardly applied to other data types such as single-cell ATAC-seq. We propose DUBStepR as a general-purpose feature selection solution for accurately clustering single-cell data.

show abstract

A case of isodicentric chromosome 15 presented with epilepsy and developmental delay

et al. 2012

View full text Add to dashboard Cite

We report a case of isodicentric chromosome 15 (idic(15) chromosome), the presence of which resulted in uncontrolled seizures, including epileptic spasms, tonic seizures, and global developmental delay. A 10-month-old female infant was referred to our pediatric neurology clinic because of uncontrolled seizures and global developmental delay. She had generalized tonic-clonic seizures since 7 months of age. At referral, she could not control her head and presented with generalized hypotonia. Her brain magnetic resonance imaging scans and metabolic evaluation results were normal. Routine karyotyping indicated the presence of a supernumerary marker chromosome of unknown origin (47, XX +mar). An array-comparative genomic hybridization (CGH) analysis revealed amplification from 15q11.1 to 15q13.1. Subsequent fluorescence in situ hybridization analysis confirmed a idic(15) chromosome. Array-CGH analysis has the advantage in determining the unknown origin of a supernumerary marker chromosome, and could be a useful method for the genetic diagnosis of epilepsy syndromes associated with various chromosomal aberrations.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Jinyu Park

DUBStepR is a scalable correlation-based feature selection method for accurately clustering single-cell data

scConsensus: combining supervised and unsupervised clustering for cell type identification in single-cell RNA sequencing data

scConsensus: combining supervised and unsupervised clustering for cell type identification in single-cell RNA sequencing data

DUBStepR: correlation-based feature selection for clustering single-cell RNA sequencing data

A case of isodicentric chromosome 15 presented with epilepsy and developmental delay

Contact Info

Product

Resources

About