Motivation: The Hi-C experiments have been extensively used for the studies of mammalian genomic structures. In the last few years, spatiotemporal Hi-C has enormously contributed to the investigation of genome dynamic reorganization. However, computationally modeling and forecasting spatiotemporal Hi-C data still have not been seen in the literature.
Results: We present HiC4D for dealing with the problem of forecasting spatiotemporal Hi-C data. We designed and benchmarked a novel network, which is a combination of residual network and convolutional long short-term memory (ConvLSTM), and named it residual ConvLSTM (ResConvLSTM). We evaluated our new method and compared it with the other four methods including three outstanding video-prediction methods from the literature: ConvLSTM, spatiotemporal LSTM (ST-LSTM), and simple video prediction (SimVP), and one self-designed naive network (NaiveNet) as a baseline. We used four different spatiotemporal Hi-C datasets for the blind test, including two from mouse embryogenesis, one from somatic cell nuclear transfer (SCNT) embryos, and one from human embryogenesis. Our evaluation results indicate that ResConvLSTM almost always outperforms the other four methods on the four blind-test datasets in terms of accurately reproducing spatiotemporal Hi-C contact matrices at future time steps. Our benchmarks also indicate that all five methods can successfully recover the boundaries of topologically associating domains (TADs) called on the experimental Hi-C contact matrices. Taken together, our benchmarks suggest that HiC4D is an effective, useful tool for predicting spatiotemporal Hi-C data.
Availability: HiC4D is publicly available at http://dna.cs.miami.edu/HiC4D/.
Contact: zheng.wang@miami.edu