This study focuses on the motor fault diagnosis facing the long-tailed distribution data, characterized by a multitude of fault types with limited data per category and the healthy state with massive data. This skewed distribution makes the traditional diagnostic models fail to identify less frequent faults. To this end, we introduce a novel fault diagnosis model, named TransGRU, to improve the diagnosis accuracy with the long-tailed distribution data. The TransGRU has two main modules, i.e., the feature extraction module and the correction module. The former is based on the Informer encoder with ProbSparse self-attention to extract features from the long-range multi-sensor data. The latter employs the GRU network addressing the long-tail effect by adjusting the diagnosis results via the gate mechanism. Besides, we informatively design an adaptive-conditional loss (ACL) function for the long-tailed fault diagnosis by integrating the properties of focal loss, class-tailored weights, and confusion weights. ACL concentrates on challenging classifications while balancing the representation and significance of various fault modes. Validation on experimental motor data confirms the capability of our TransGRU in identifying a wide range of fault types with limited fault data compared with the Transformer and state-of-the-art methods.