As one of the supervised tensor learning methods, the support tensor machine (STM) for tensorial data classification is receiving increasing attention in machine learning and related applications, including remote sensing imaging, video processing, fault diagnosis, etc. Existing STM approaches lack consideration for support tensors in terms of data reduction. To address this deficiency, we built a novel sparse STM model to control the number of support tensors in the binary classification of tensorial data. The sparsity is imposed on the dual variables in the context of the feature space, which facilitates the nonlinear classification with kernel tricks, such as the widely used Gaussian RBF kernel. To alleviate the local risk associated with the constant width in the tensor Gaussian RBF kernel, we propose a two-stage classification approach; in the second stage, we advocate for a scaling strategy on the kernel function in a data-dependent way, using the information of the support tensors obtained from the first stage. The essential optimization models in both stages share the same type, which is non-convex and discontinuous, due to the sparsity constraint. To resolve the computational challenge, a subspace Newton method is tailored for the sparsity-constrained optimization for effective computation with local convergence. Numerical experiments were conducted on real datasets, and the numerical results demonstrate the effectiveness of our proposed two-stage sparse STM approach in terms of classification accuracy, compared with the state-of-the-art binary classification approaches.