2020
DOI: 10.1101/2020.09.25.313882
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Exponential-family embedding with application to cell developmental trajectories for single-cell RNA-seq data

Abstract: Scientists often embed cells into a lower-dimensional space when studying single-cell RNA-seq data for improved downstream analyses such as developmental trajectory analyses, but the statistical properties of such non-linear embedding methods are often not well understood. In this article, we develop the eSVD (exponential-family SVD), a non-linear embedding method for both cells and genes jointly with respect to a random dot product model using exponential-family distributions. Our estimator uses alternating m… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
9
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(9 citation statements)
references
References 71 publications
0
9
0
Order By: Relevance
“…To see relevance of this variability, we also applied SVD (with k = 5 latent dimension) and UMAP (using the umap R package with k = 2 latent dimensions) followed by the same k-means clustering procedure (5 clusters) to the same data. Consistent with Lin, Lei, and Roeder (2021), SVD and UMAP were run using normalized and log 2 -transformed read counts, whereas eSVD was run using normalized data without log 2 -transformation. It was observed that the variability in eSVD's performance due to gene filtering can be even bigger than the variability across different dimension reduction methods (Figure 1(c)).…”
Section: Impact Of Data Preprocessing and Feature Selectionmentioning
confidence: 99%
See 4 more Smart Citations
“…To see relevance of this variability, we also applied SVD (with k = 5 latent dimension) and UMAP (using the umap R package with k = 2 latent dimensions) followed by the same k-means clustering procedure (5 clusters) to the same data. Consistent with Lin, Lei, and Roeder (2021), SVD and UMAP were run using normalized and log 2 -transformed read counts, whereas eSVD was run using normalized data without log 2 -transformation. It was observed that the variability in eSVD's performance due to gene filtering can be even bigger than the variability across different dimension reduction methods (Figure 1(c)).…”
Section: Impact Of Data Preprocessing and Feature Selectionmentioning
confidence: 99%
“…Thanks to the rapidly evolving technology, the data size also grows fast. Lin, Lei, and Roeder (2021) has demonstrated eSVD using data with n = 10 2 − 10 3 cells and p = 10 2 − 10 3 genes. However, today's cell atlas projects increasingly generate much larger datasets, such as scRNA-seq data consisting of n = 10 5 − 10 6 cells and p = 10 3 − 10 4 genes (Regev et al 2017;Su et al 2020) or single-cell chromatin accessibility data consisting of n = 10 4 − 10 5 cells and p = 10 4 − 10 6 genomic regulatory elements (Cusanovich et al 2018).…”
Section: Scalability and Computational Efficiencymentioning
confidence: 99%
See 3 more Smart Citations