Facial expression recognition (FER) under active near-infrared (NIR) illumination has the advantages of illumination invariance. In this paper, we propose a three-stream 3D convolutional neural network, named as NIRExpNet for NIR FER. The 3D structure of NIRExpNet makes it possible to extract automatically, not just spatial features, but also, temporal features. The design of multiple streams of the NIRExpNet enables it to fuse local and global facial expression features. To avoid over-fitting, the NIRExpNet has a moderate size to suit the Oulu-CASIA NIR facial expression database that is a medium-size database. Experimental results show that the proposed NIRExpNet outperforms some previous state-of-art methods, such as Histogram of Oriented Gradient to 3D (HOG 3D), Local binary patterns from three orthogonal planes (LBP-TOP), deep temporal appearance-geometry network (DTAGN), and adapt 3D Convolutional Neural Networks (3D CNN DAP).