Rigorously comparing gene expression and chromatin accessibility inthe same single cells could illuminate the logic of how coupling or decoupling of these mechanisms regulates fate commitment. Here, we present MIRA: Probabilistic Multimodal Models for Integrated Regulatory Analysis, a comprehensive methodology that systematically contrasts transcription and accessibility to infer the regulatory circuitry driving cells along developmental trajectories. MIRA leverages joint topic modeling of cell states and regulatory potential modeling of individual gene loci. MIRA thereby represents cell states in an efficient and interpretable latent space, infers high fidelity lineage trees, determines key regulators of fate decisions at branch points, and exposes the variable influence of local accessibility on transcription at distinct loci. Applied to epidermal maintenance differentiation and embryonic brain development from two different multimodal platforms, MIRA revealed that early developmental genes were tightly regulated by local chromatin landscape whereas terminal fate genes were titrated without requiring extensive chromatin remodeling. MainProfiling both expression and chromatin accessibility in the same single cells 1-5 opens an unprecedented opportunity to understand the interaction of transcriptional and epigenetic mechanisms driving cells along developmental continuums. While many computational methods analyze expression and accessibility separately, several recent algorithms have adopted joint analysis where the cells are projected onto a shared latent space based on both data modalities, which better captures the biological structure of the data 6-11 . However, the field lacks tools that go beyond visualization and clustering to rigorously contrast transcription and .
The recent advances in single-cell epigenomic techniques have created a growing demand for scATAC-seq analysis. One key task is to determine cell types based on epigenetic profiling. We introduce scATAnno, a workflow designed to automatically annotate scATAC-seq data using large-scale scATAC-seq reference atlases. This workflow can generate scATAC-seq reference atlases from publicly available datasets, and enable accurate cell type annotation by integrating query data with reference atlases, without the aid of scRNA-seq profiling. To enhance annotation accuracy, we have incorporated KNN-based and weighted distance-based uncertainty scores to effectively detect unknown cell populations within the query data. We showcase the utility of scATAnno across multiple datasets, including peripheral blood mononuclear cell (PBMC), Triple Negative Breast Cancer (TNBC), and basal cell carcinoma (BCC), and demonstrate that scATAnno accurately annotates cell types across conditions. Overall, scATAnno is a useful tool for cell type annotation in scATAC-seq data and can aid in the interpretation of new scATAC-seq datasets in complex biological systems.
Cell state atlases constructed through single-cell RNA-seq and ATAC-seq analysis are powerful tools for analyzing the effects of genetic and drug treatment-induced perturbations on complex cell systems. Comparative analysis of such atlases can yield new insights into cell state and trajectory alterations. Perturbation experiments often require that single-cell assays be carried out in multiple batches, which can introduce technical distortions that confound the comparison of biological quantities between different batches. Here we propose CODAL, a variational autoencoder-based statistical model which uses a mutual information regularization technique to explicitly disentangle factors related to technical and biological effects. We demonstrate CODAL’s capacity for batch-confounded cell type discovery when applied to simulated datasets and embryonic development atlases with gene knockouts. CODAL improves the representation of RNA-seq and ATAC-seq modalities, yields interpretable modules of biological variation, and enables the generalization of other count-based generative models to multi-batched data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.