14Single-cell epigenomics provides new opportunities to decipher genomic regulatory programs from 15 heterogeneous samples and dynamic processes. We present a probabilistic framework called cisTopic, 16 to simultaneously discover "cis-regulatory topics" and stable cell states from sparse single-cell 17 epigenomics data. After benchmarking cisTopic on single-cell ATAC-seq data, single-cell DNA 18 methylation data, and semi-simulated single-cell ChIP-seq data, we use cisTopic to predict regulatory 19 programs in the human brain and validate these by aligning them with co-expression networks derived 20 from single-cell RNA-seq data. Next, we performed a time-series single-cell ATAC-seq experiment 21 after SOX10 perturbations in melanoma cultures, where cisTopic revealed dynamic regulatory topics 22 driven by SOX10 and AP-1. Finally, machine learning and enhancer modelling approaches allowed to 23 predict cell type specific SOX10 and SOX9 binding sites based on topic specific co-regulatory motifs. 24 cisTopic is available as an R/Bioconductor package at http://github.com/aertslab/cistopic. 25 26 Here, we develop cisTopic, an unsupervised Bayesian framework based on topic modelling, that allows 62 simultaneous grouping of co-accessible regions into regulatory topics and clustering of cells based on 63 their regulatory topic contributions. These "cis-regulatory topics" can be directly exploited for motif 64 discovery to predict combinations of transcription factors, but also to explore dynamic changes in 65 chromatin state. We benchmarked cisTopic using simulated data and concluded that this approach 66 outperforms previously published methods in terms of accuracy, robustness and interpretability. We 67 validated cisTopic by applying it to a previously published data set of 30,000 cells from the human 68 brain (Lake et al., 2017), finding subpopulations in an unsupervised manner and in agreement with gene 69 regulatory programs derived from single-cell transcriptomics data. In addition, we generate new 70 scATAC-seq data and reveal dynamic changes in chromatin accessibility during melanoma phenotype 71 switching in vitro, driven by the loss of SOX10. Finally, by comparing the SOX10 topics in melanoma 72 with SOX9 and SOX10 topics in the brain, we propose a cooperative pioneering model for the SOXE 73 (i.e. SOX8, SOX9 and SOX10) family members. 74 4
Results
75Probabilistic topic modelling identifies cell states and reveals regulatory programs at 76 single-cell resolution 77 We have developed cisTopic, a new method for the analysis of single-cell epigenomics data that allows 78 the simultaneous identification of cell states and co-regulatory regions in an unsupervised manner (Fig. 79 1). The input for cisTopic is a binary accessibility matrix, with cells (i.e. objects) as columns and 80 regulatory regions (i.e. features) as rows (in the case of single-cell methylation data, binary methylation 81 scores) ( Fig. 1a). Since this matrix is very sparse, we reasoned that Latent Dirichlet Allocation (LDA) 82 (Blei et al.,...