2022
DOI: 10.1101/2022.08.20.504663
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

scPerturb: Harmonized Single-Cell Perturbation Data

Abstract: Recent biotechnological advances led to growing numbers of single-cell studies, which reveal molecular and phenotypic responses to large numbers of perturbations. However, analysis across diverse datasets is typically hampered by differences in format, naming conventions, data filtering and normalization. In order to facilitate development and benchmarking of computational methods in systems biology, we collect a set of 44 publicly available single-cell perturbation-response datasets with molecular readouts, i… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
30
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
6
3

Relationship

1
8

Authors

Journals

citations
Cited by 23 publications
(30 citation statements)
references
References 91 publications
0
30
0
Order By: Relevance
“…We selected distance metrics based on common usage in genomics analysis and machine learning, including those used previously for model evaluation, as loss functions in model training, or for data analysis. We tested classically used distances (Euclidean, cosine distance, mean absolute error (MAE), mean squared error (MSE), linear maximum mean discrepancy (linear MMD) 16 , Wasserstein 17,18 ), in addition to some statistics commonly used in biological analysis (t-statistic, Pearson correlation, Spearman correlation, Kendall tau distance, coefficient of determination (R 2 ), energy distance (E-distance) 19 ), and previously unexplored ones (Kolmogorov-Smirnov test, symmetric Kullback-Leibler divergence, linear classification probability). After scaling of the metric, linear MMD, MSE, and Euclidean are mathematically identical, but we included all three implementations for completeness.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…We selected distance metrics based on common usage in genomics analysis and machine learning, including those used previously for model evaluation, as loss functions in model training, or for data analysis. We tested classically used distances (Euclidean, cosine distance, mean absolute error (MAE), mean squared error (MSE), linear maximum mean discrepancy (linear MMD) 16 , Wasserstein 17,18 ), in addition to some statistics commonly used in biological analysis (t-statistic, Pearson correlation, Spearman correlation, Kendall tau distance, coefficient of determination (R 2 ), energy distance (E-distance) 19 ), and previously unexplored ones (Kolmogorov-Smirnov test, symmetric Kullback-Leibler divergence, linear classification probability). After scaling of the metric, linear MMD, MSE, and Euclidean are mathematically identical, but we included all three implementations for completeness.…”
Section: Methodsmentioning
confidence: 99%
“…Reproducibility code and notebooks can be found at https://github.com/theislab/perturbation-metrics. All datasets used in this study are available from scPerturb 19 .…”
Section: Data and Code Availabilitymentioning
confidence: 99%
“…To enhance the interpretability of high-content perturbation resources, the dataset analyzing module provides visualization for five major analytic results that cover the primary requirements, including the quality control, denoise, identification of DEGs, perturbation function analysis, and the correlation between perturbations. Compared to scPerturb, PerturBase offers modules for querying, analyzing and visualizing a more comprehensive scPerturbation data through an interactive interface (28).…”
Section: Conclusion and Future Developmentmentioning
confidence: 99%
“…A processed version of the Norman et al . data can also be found at http://projects.sanderlab.org/scperturb[31].…”
Section: Data Availabilitymentioning
confidence: 99%