2023
DOI: 10.1101/2023.09.14.543267
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Universal preprocessing of single-cell genomics data

A. Sina Booeshaghi,
Delaney K. Sullivan,
Lior Pachter

Abstract: We describe a workflow for preprocessing a wide variety of single-cell genomics data types. The approach is based on parsing of machine-readable seqspec assay specifications to customize inputs for kb-python, which uses kallisto and bustools to catalog reads, error correct barcodes, and count reads. The universal preprocessing method is implemented in the Python package cellatlas that is available for download at: https://github.com/cellatlas/cellatlas/.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
5
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
1

Relationship

2
1

Authors

Journals

citations
Cited by 3 publications
(5 citation statements)
references
References 32 publications
0
5
0
Order By: Relevance
“…To facilitate the generation of reference maps from single-cell genomics data, we developed a collection of tools, mx and ec , that operate on cell by feature (gene/isoform/protein/peak) matrices and marker equivalence class files respectively and work together with uniform preprocessing tools kallisto, bustools, kb-python , and cellatlas (Booeshaghi, Sullivan, and Pachter 2023; Melsted et al 2021; Melsted, Ntranos, and Pachter 2019; Bray et al 2016) (Methods). These tools solve key algorithmic and infrastructure problems in single-cell RNAseq preprocessing, namely automated cell type assignment, marker gene selection, and iterative data reprocessing.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation

Algorithms for a Commons Cell Atlas

Booeshaghi,
Galvez-Merchán,
Pachter
2024
Preprint
Self Cite
“…To facilitate the generation of reference maps from single-cell genomics data, we developed a collection of tools, mx and ec , that operate on cell by feature (gene/isoform/protein/peak) matrices and marker equivalence class files respectively and work together with uniform preprocessing tools kallisto, bustools, kb-python , and cellatlas (Booeshaghi, Sullivan, and Pachter 2023; Melsted et al 2021; Melsted, Ntranos, and Pachter 2019; Bray et al 2016) (Methods). These tools solve key algorithmic and infrastructure problems in single-cell RNAseq preprocessing, namely automated cell type assignment, marker gene selection, and iterative data reprocessing.…”
Section: Resultsmentioning
confidence: 99%
“…First, data are often preprocessed with different tools introducing unnecessary computational variability (Davis et al 2018; Z. Zhang et al 2021; Booeshaghi, Sullivan, and Pachter 2023). Second, quantifications are often limited to the gene-level and do not distinguish between spliced and unspliced molecules (Hjörleifsson et al 2022).…”
Section: Introductionmentioning
confidence: 99%

Algorithms for a Commons Cell Atlas

Booeshaghi,
Galvez-Merchán,
Pachter
2024
Preprint
Self Cite
“…one can specify -x 10xv3). ○ Option 2: One can use seqspec 40,41 which contains machine-readable specifications for a wide range of sequencing assays. ○ Option 3: One can format their own custom technology string specifying the read locations of the barcodes, unique molecular identifiers (UMIs), and the biological sequence that is to be mapped (Box 3).…”
Section: Mapping and Quantificationmentioning
confidence: 99%
“…one can specify -x 10xv3). ○ Option 2: One can use seqspec 40,41 which contains machine-readable specifications for a wide range of sequencing assays.…”
Section: Mapping and Quantificationmentioning
confidence: 99%
See 1 more Smart Citation