2019
DOI: 10.1016/j.cels.2018.11.005
|View full text |Cite
|
Sign up to set email alerts
|

Scrublet: Computational Identification of Cell Doublets in Single-Cell Transcriptomic Data

Abstract: Highlights d We define two multiplet errors in single-cell RNA-seq data: ''embedded'' and ''neotypic'' d Neotypic errors can lead to misidentification of cell types or transitional states d Scrublet code identifies neotypic doublets and predicts the overall doublet rate d The algorithm is tested against several experimental methods for labeling multiplets

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
871
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
6
3

Relationship

0
9

Authors

Journals

citations
Cited by 1,659 publications
(973 citation statements)
references
References 30 publications
3
871
0
Order By: Relevance
“…We selected nuclei with at least 1,000 sequenced fragments that displayed high enrichment (>10) in the annotated transcriptional start sites (TSS; Extended Data Figure 2b). We also removed the snATAC-seq profiles likely resulting from potential barcode collision or doublets using a procedure modified from Scrublet 33 (Extended Data Figure. 2c, see Methods ).…”
Section: Resultsmentioning
confidence: 99%
“…We selected nuclei with at least 1,000 sequenced fragments that displayed high enrichment (>10) in the annotated transcriptional start sites (TSS; Extended Data Figure 2b). We also removed the snATAC-seq profiles likely resulting from potential barcode collision or doublets using a procedure modified from Scrublet 33 (Extended Data Figure. 2c, see Methods ).…”
Section: Resultsmentioning
confidence: 99%
“…Briefly: total counts were normalized to the median total counts for each cell and highly variable genes selected using the SPRING gene filtering function ("filter_genes") using parameters (90, 3, 3). Putative doublet cells were removed using Scrublet (8% of cells removed) 75 . Prior to two-dimensional visualization, the dimensionality of the data was reduced to 40 using principal components analysis (PCA).…”
Section: Mouse Scseq Datasetsmentioning
confidence: 99%
“…Single-cell RNA-seq data from HBC-derived cells from Fletcher et al and Gadye et al 37,49 , labeled via Krt5-CreER driver mice, were downloaded from GEO at accession GSE99251 using the file "GSE95601_oeHBCdiff_Cufflinks_eSet_counts_table.txt.gz". Processing was performed as described above, including total counts normalization and filtering for highly variable genes using the SPRING gene filtering function "filter_genes" with parameters (75,20,10). The resulting data were visualized in SPRING and a subset of cells were removed for quality control, including a cluster of cells with low total counts and another with predominantly reads from ERCC spike-in controls.…”
Section: Mouse Scseq Datasetsmentioning
confidence: 99%
“…(6) We removed the cells that had total RNA counts lower than 2% quantile or higher than 98% quantile. (7) We removed potential doublets using Scrublet 70 . Briefly, principal component analysis (PCA) was used to train a k-nearest neighbor (kNN) classifier to predict a doublet score for each cell.…”
Section: Cell Clustering Analysis Of Merfish Datamentioning
confidence: 99%