Hi-C is commonly used to study three-dimensional genome organization. However, due to the high sequencing cost and technical constraints, the resolution of most Hi-C datasets is coarse, resulting in a loss of information and biological interpretability. Here we develop DeepHiC, a generative adversarial network, to predict high-resolution Hi-C contact maps from low-coverage sequencing data.We demonstrated that DeepHiC is capable of reproducing high-resolution Hi-C data from as few as 1% downsampled reads. Empowered by adversarial training, our method can restore fine-grained details similar to those in high-resolution Hi-C matrices, boosting accuracy in chromatin loops identification and TADs detection, and outperforms the state-of-the-art methods in accuracy of prediction. Finally, application of DeepHiC to Hi-C data on mouse embryonic development can facilitate chromatin loop detection with higher accuracy. We develop a web-based tool (DeepHiC, http://sysomics.com/deephic) that allows researchers to enhance their own Hi-C data with just a few clicks.
Author summaryWe developed a novel method, DeepHiC, for enhancing Hi-C data resolution from low-coverage sequencing data using generative adversarial network. DeepHiC is capable of reproducing highresolution (10-kb) Hi-C data with high quality even using 1/100 downsampled reads. Our method outperforms the previous methods in Hi-C data resolution enhancement, boosting accuracy in chromatin loops identification and TADs detection. Application of DeepHiC on mouse embryonic development data shows that enhancements afforded by DeepHiC facilitates the chromatin loops identification of these data achieving higher accuracy. We also developed a user-friendly web server (http://sysomics.com/deephic) that allows researchers to enhance their own low-resolution Hi-C data (40kb-1Mb) with just few clicks. The high-throughput chromosome conformation capture (Hi-C) technique [1] is a genome-wide technique used to investigate three-dimensional (3D) chromatin conformation inside the nucleus. It has facilitated the identification and characterization of multiple structural elements, such as the A/B compartment [1], topological associating domains (TADs) [2, 3], enhancer-promoter loops [4] and stripes [5] over recent decades. In practice, Hi-C data is conventionally stored as a pairwise read count matrix , where is the number of observed interactions (read-pair count) between × genomic regions and , and the genome is partitioned into fixed-size bins (e.g., 25 kb). Bin size (i.e., resolution), is a crucial parameter for Hi-C data analysis, as it directly affects the results of downstream analysis, such as predictions of enhancer-promoter interactions [6-11] or identification of TAD boundaries [6, 12-16]. Depending on sequencing depths, the size of commonly used bins ranges from 1 kb to 1 Mb. Because of the high cost of sequencing, most available Hi-C datasets have relatively low resolution [17], which limits their application in studies of genomic regulatory elements. Sequencing high-res...