Recently, deep learning-based image compression has shown significant performance improvement in terms of coding efficiency and subjective quality. However, there has been relatively less effort on video compression based on deep neural networks. In this paper, we propose an end-toend deep predictive video compression network, called DeepPVCnet, using mode-selective uni-and bidirectional predictions based on multi-frame hypothesis with a multi-scale structure and a temporal-contextadaptive entropy model. Our DeepPVCnet jointly compresses motion information and residual data that are generated from the multi-scale structure via the feature transformation layers. Recent deep learningbased video compression methods were proposed in a limited compression environment using only P-frame or B-frame. Learned from the lesson of the conventional video codecs, we firstly incorporate a modeselective framework into our DeepPVCnet with uni-and bi-directional predictive modes in a rate-distortion minimization sense. Also, we propose a temporal-context-adaptive entropy model that utilizes the temporal context information of the reference frames for the current frame coding. The autoregressive entropy models for CNN-based image and video compression is difficult to compute with parallel processing. On the other hand, our temporal-context-adaptive entropy model utilizes temporally coherent context from the reference frames, so that the context information can be computed in parallel, which is computationally and architecturally advantageous. Extensive experiments show that our DeepPVCnet outperforms AVC/H.264, HEVC/H.265 and state-of-the-art methods in an MS-SSIM perspective.