In real industrial processes, fault diagnosis methods are required to learn from limited fault samples since the procedures are mainly under normal conditions and the faults rarely occur. Although attention mechanisms have become popular in the field of fault diagnosis, the existing attention-based methods are still unsatisfying for the above practical applications. First, pure attention-based architectures like transformers need a large number of fault samples to offset the lack of inductive biases thus performing poorly under limited fault samples. Moreover, the poor fault classification dilemma further leads to the failure of the existing attention-based methods to identify the root causes. To address the aforementioned issues, we innovatively propose a supervised contrastive convolutional attention mechanism (SCCAM) with ante-hoc interpretability, which solves the root cause analysis problem under limited fault samples for the first time. First, accurate classification results are obtained under limited fault samples. More specifically, we integrate the convolutional neural network (CNN) with attention mechanisms to provide strong intrinsic inductive biases of locality and spatial invariance, thereby strengthening the representational power under limited fault samples. In addition, we ulteriorly enhance the classification capability of the SCCAM method under limited fault samples by employing the supervised contrastive learning (SCL) loss. Second, a novel ante-hoc interpretable attention-based architecture is designed to directly obtain the root causes without expert knowledge. The convolutional block attention module (CBAM) is utilized to directly provide feature contribution behind each prediction thus achieving feature-level explanations. The proposed SCCAM method is tested on a continuous stirred tank heater and the Tennessee Eastman industrial process benchmark. Three common fault diagnosis scenarios are covered, including a balanced scenario for additional verification and two scenarios with limited fault samples (i.e., imbalanced scenario and long-tail scenario). The comprehensive results demonstrate that the proposed SCCAM method can achieve better performance compared with the state-of-the-art methods on fault classification