2021
DOI: 10.1038/s41587-021-00870-2
|View full text |Cite
|
Sign up to set email alerts
|

Modular, efficient and constant-memory single-cell RNA-seq preprocessing

Abstract: Analysis of single-cell RNA-seq data begins with the pre-processing of reads to generate count matrices. We investigate algorithm choices for the challenges of pre-processing, and describe a workflow that balances efficiency and accuracy. Our workflow is based on the kallisto and bustools programs, and is near-optimal in speed and memory. The workflow is modular, and we demonstrate its flexibility by showing how it can be used for RNA velocity analyses.

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
314
4
1

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 332 publications
(321 citation statements)
references
References 50 publications
2
314
4
1
Order By: Relevance
“…We compared STARsolo performance on simulated and real data with several existing tools: CellRanger [14], Alevin/Alevin-fry [7,10,21] and Kallisto/Bustools [8,9,22]. CellRanger is a de facto standard for analyzing 10X Genomics scRNA-seq data, while Kallisto and Alevin use light-weight alignment-to-transcriptome algorithms which are profoundly different from STAR's aligment to the full genome.…”
Section: Resultsmentioning
confidence: 99%
See 2 more Smart Citations
“…We compared STARsolo performance on simulated and real data with several existing tools: CellRanger [14], Alevin/Alevin-fry [7,10,21] and Kallisto/Bustools [8,9,22]. CellRanger is a de facto standard for analyzing 10X Genomics scRNA-seq data, while Kallisto and Alevin use light-weight alignment-to-transcriptome algorithms which are profoundly different from STAR's aligment to the full genome.…”
Section: Resultsmentioning
confidence: 99%
“…To include the multi-gene reads, the expectation-maximization algorithm was introduced in Alevin, and its impact on gene detection and quantification was investigated [7,20]. A similar option was also implemented in Kallisto [8].…”
Section: Simulation With Multi-gene Readsmentioning
confidence: 99%
See 1 more Smart Citation
“…Coupled with memory efficient methods, like Kallisto Bustools 55 , for generating cell-gene count matrices, Scarf represents an end-to-end solution for the analysis of single-cell RNA-Seq datasets. During the preparation of this manuscript an R-based memory efficient tool, ArchR 56 , was published for analysis of scATAC-Seq data.…”
Section: Discussionmentioning
confidence: 99%
“…After demultiplexing the sequencing reads, we generated UMI count matrices for the libraries using the kallisto indexing and tag extraction (kite) workflow with the hg38 reference genome (Ensembl v96), followed by the kallisto | bustools scRNA-seq pipeline (74). We then analyzed the UMI count matrices in R v.4.0.2 with Seurat v.3.0.0 (41) and custom scripts for differential gene expression testing within the SCEPTRE framework (26).…”
Section: Single Cell Data Processingmentioning
confidence: 99%