Visible-infrared person re-identification(VI-ReID) is an emerging but challenging problem that aims to match pedestrians captured by visible and infrared cameras. Existing studies in this field mainly focus on learning sharable feature representations from the last layer of deep convolution neural networks (CNNs) to handle the cross-modality discrepancies. However, due to the huge differences between visible and infrared images, the last layer's feature representations are less discriminative for VI-ReID. To remedy this, we propose a novel deep supervision learning network, namely Dual-path Deep Supervision Network (DDSN), for VI-ReID. Based on the backbone network, DDSN consists of two key modules, (1) a dual-path deep supervision learning (DDSL) module that is plugged into multiple network layers, and (2) a self-attention module that is developed on top of the backbone network. The backbone network extracts multilevel features at lower middle layers, and several DDSL modules utilize these features to generate more discriminative descriptors. Furthermore, we apply the self-attention module for context modeling to capture useful contextual cues as a supplement. By fusing these descriptors, DDSN can utilize both multi-level information and potential contextual information. Despite the apparent simplification, our method outperforms several state-ofthe-art methods on two large-scale datasets: RegDB and SYSU-MM01.