Single-cell ChIP-seq analysis is challenging due to data sparsity. We present SIMPA (https://github.com/salbrec/SIMPA), a single-cell ChIP-seq data imputation method leveraging predictive information within bulk ENCODE data to impute missing protein-
DNA interacting regions of target histone marks or transcription factors. Machine learning models trained for each single cell, each target, and each genomic region enable drastic improvement in cell types clustering and genes identification.The discovery of protein-DNA interactions of histone marks and transcription factors is of high importance in biomedical studies because of their impact on the regulation of core cellular processes such as chromatin structure organization and gene expression. These interactions are measured by chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq). Public data from the ENCODE portal, which provides a large collection of experimental bulk ChIP-seq data, has been used for comprehensive investigations revealing insights into epigenomic processes impacting chromatin 3Dstructure, open chromatin state, and gene expression to name just a few (ENCODE project consortium, 2012). Recently developed protocols for single-cell ChIP-seq (scChIP-seq) are