2021
DOI: 10.1038/s41467-021-26017-0
|View full text |Cite
|
Sign up to set email alerts
|

VEGA is an interpretable generative model for inferring biological network activity in single-cell transcriptomics

Abstract: Deep learning architectures such as variational autoencoders have revolutionized the analysis of transcriptomics data. However, the latent space of these variational autoencoders offers little to no interpretability. To provide further biological insights, we introduce a novel sparse Variational Autoencoder architecture, VEGA (VAE Enhanced by Gene Annotations), whose decoder wiring mirrors user-provided gene modules, providing direct interpretability to the latent variables. We demonstrate the performance of V… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
66
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 64 publications
(66 citation statements)
references
References 40 publications
0
66
0
Order By: Relevance
“…We first proposed a regularized linear decoder to include domain knowledge into autoencoders for single-cell data at a conference 86 , with scalable and expressive embeddings when compared to existing factor models, such as f-scLVM 87 . Recent approaches such as VEGA 88 , scTEM 89 and pmVAE 90 also feature VAE-based architectures with linear decoders or training separate VAEs for each GP yet connected via a global loss in the case of pmVAE. In contrast, expiMap aims toward interpretable reference mapping allowing to fuse reference atlases with GPs and enabling the query of genes or GPs.…”
Section: Discussionmentioning
confidence: 99%
“…We first proposed a regularized linear decoder to include domain knowledge into autoencoders for single-cell data at a conference 86 , with scalable and expressive embeddings when compared to existing factor models, such as f-scLVM 87 . Recent approaches such as VEGA 88 , scTEM 89 and pmVAE 90 also feature VAE-based architectures with linear decoders or training separate VAEs for each GP yet connected via a global loss in the case of pmVAE. In contrast, expiMap aims toward interpretable reference mapping allowing to fuse reference atlases with GPs and enabling the query of genes or GPs.…”
Section: Discussionmentioning
confidence: 99%
“…Over the past years, deep learning (DL) has become an essential tool for analysis (Lopez et al, 2020) and interpretation (Rybakov et al, 2020) of scRNA-seq data. Representation learning in particular, has been useful not only for identifying cellular heterogeneity and integration , or mapping query to reference datasets (Lotfollahi et al, 2022), but also in the context of modelling single-cell perturbation responses (Rampášek et al, 2019;Seninge et al, 2021;Lotfollahi et al, 2019;.…”
Section: Related Workmentioning
confidence: 99%
“…While this approach does not require labeled data, it can only be applied to VAE models, and can not be applied to standard, deterministic autoencoders. Another fully unsupervised approach to ranking important pathways examines the L2 norm of the weights connecting each latent pathway to the reconstruction output; however, this metric is obviously limited to models with linear decoders [21]. Unlike all of these approaches, our proposed pathway attribution (see Methods) is both fully unsupervised, meaning it requires no labeled data, and model agnostic, meaning it can be applied to any model regardless of architecture or implementation details.…”
Section: Additional Covariatesmentioning
confidence: 99%
“…These biologically-constrained networks have been used in a supervised setting to improve the prediction of survival or treatment resistance from cancer gene expression or mutational status [17,18], and to improve Genome Wide Association studies by aggregating the effects of single nucleotide polymorphisms (SNPs) into SNP sets [19]. In particular, a variety of recent works have proposed using biologicaly-constrained autoencoders to model gene expression data [20][21][22].…”
Section: Introductionmentioning
confidence: 99%