2020
DOI: 10.1101/2020.12.16.419036
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Sfaira accelerates data and model reuse in single cell genomics

Abstract: Exploratory analysis of single-cell RNA-seq data sets is currently based on statistical and machine learning models that are adapted to each new data set from scratch. A typical analysis workflow includes a choice of dimensionality reduction, selection of clustering parameters, and mapping of prior annotation. These steps typically require several iterations and can take up significant time in many single-cell RNA-seq projects. Here, we introduce sfaira, which is a single-cell data and model zoo which houses d… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
10
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
3

Relationship

1
6

Authors

Journals

citations
Cited by 10 publications
(10 citation statements)
references
References 63 publications
0
10
0
Order By: Relevance
“…The HLCA core is publicly available as a data portal and a model repository to explore, download, and use as a reference for new datasets. As the atlassing community has multiple outlets for newly generated data, we made the atlas available in Sfaira 70 , Zenodo 71 , Azimuth 14 , CellTypist 72 , and FASTGenomics 73 .…”
Section: Discussionmentioning
confidence: 99%
“…The HLCA core is publicly available as a data portal and a model repository to explore, download, and use as a reference for new datasets. As the atlassing community has multiple outlets for newly generated data, we made the atlas available in Sfaira 70 , Zenodo 71 , Azimuth 14 , CellTypist 72 , and FASTGenomics 73 .…”
Section: Discussionmentioning
confidence: 99%
“…For simplicity, normalized expression data are simply referred to as TPM throughout the manuscript. In addition, SimBu allows selecting annotated scRNA-seq data available through the Sfaira database (Fischer et al ., 2021). Starting from the input scRNA-seq data, an object from the SummarizedExperiment (Morgan et al ., 2021; Huber et al ., 2015) package is built.…”
Section: Methodsmentioning
confidence: 99%
“…1a). ScRNA-seq datasets can be obtained through the Sfaira (Fischer et al ., 2021) database, which provides access to annotated scRNA-seq datasets from various organisms, tissues, and cell types. SimBu samples single cells from a scRNA-seq expression matrix and aggregates their transcriptomes to build pseudo-bulk RNA-seq expression profiles, summarized as gene counts, counts per million (CPM), or transcripts per millions (TPM), depending on the input data (Fig.…”
Section: Introductionmentioning
confidence: 99%
“…A recent review and repository of single-cell perturbation data for machine learning lists 22 datasets, but supplied cleaned and format-unified data for only 6 (Ji et al, 2021). An existing unified framework for single cell data, called ‘sfaira’, is ideal for model building and memory efficient data loading, but the public ‘data zoo’ does not currently supply perturbation datasets or standardized perturbation annotations (Fischer et al, 2021).…”
Section: Introductionmentioning
confidence: 99%