The problem of image-based head pose estimation attracts intensive attention due to a large number of applications such as face analysis and attention modeling. Existing methods often convert head pose estimation into the pose classification problem, but ignored the non-stationary appearance change brought about by the equally distributed bins. This paper targets the head pose estimation problem via deep neural decision trees, where the non-linear property of the representative appearance is learned together with the bin classification probability. First, we use Convolutional Neural Network (CNN) to get pose related features. Second, we apply the Fully Connected (FC) layer on the learned features to extract branch weight for each Euler angle and the representative values for each bin. Third, we employ neural decision tree on the branch weight to get bin classification probability. To explicitly characterize the relationship between the adjacent pose intervals, we embed continuity of the head angles into the tree architecture by constructing the bridge-tree. The final estimation is obtained via a weighted sum between the estimated bin probability and the representative bin values. We evaluate our methods on different public datasets including Pointing'04, Chinese Academy of Sciences-Pose, Expression, Accessory, and Lighting (CAS-PEAL) and Biwi Kinect Head Pose and find that, the proposed method outperforms as compared to state-of-the-art. Besides, we leverage the template marching based alignment for data preprocessing and demonstrate its superiority over traditional alignment methods on the task of head pose estimation.INDEX TERMS Head pose estimation, convolutional neural network, neural decision tree, continuity, template marching based alignment.