Vehicle on-board equipment (VOBE) is a significant component of the control system of high-speed railway train, the fault diagnosis of VOBE mainly depends on maintenance experience, which is inefficiency. The fault data of on-board equipment is described by natural language. Due to its unstructured, high-dimensional and unbalanced fault class distribution, it has become a challenge in fault diagnosis. In this paper, bilevel topic labeled latent Dirichlet allocation for extraction feature of fault text data is proposed. Firstly, label information is set according to prior knowledge of railway field. Then, local topics and global topics are defined for two level fault types of VOBE. The fault feature space generated by Gibbs sampling from local topics and global topics contains two level fault features, which are conducive to fault text classification. Finally, considering the imbalanced distribution of fault class, cost sensitive support vector machine based fault text classification is proposed. By using actual fault data of on-board equipment of China Railway Corporation, the accuracy, precision and F1-score are used as performance indicators to compare the proposed method with other fault diagnosis methods. The results show that the accuracy of the proposed method is 90.3%, which is about 2% higher than that of the suboptimal method, and the average recall, precision and F1-score of various fault classes are 77.9%, 91.8% and 83.4%, respectively, which outperforms other fault diagnosis methods.