Cracks in walnuts during processing and storage can adversely affect their quality and cause economic losses. To achieve efficient identification of cracked walnuts, this study proposed a method of walnut crack identification based on acoustic vibration and feature fusion. First, the sound signals of intact and cracked walnuts were collected using an acoustic signal acquisition system, and 44 time‐domain features, 13 frequency‐domain features, and 768 Mel spectrogram features (the number of pixel frequencies corresponding to the gray‐scale values of R, G, and B channels) of the sound signals were extracted. Then, the classification models of support vector machines (SVM), least squares support vector machines (LSSVM), and extreme learning machines (ELM) were established based on single class features data and fusion of different feature groups data respectively. The results indicated that the LSSVM model with the fusion of the three feature sets was optimal, with an accuracy of 85% in the testing set. Next, three feature selection methods were employed to reduce the dimensionality of the best fused feature data. Subsequently, the LSSVM classification model was established based on the feature selection data. Finally, arithmetic optimization algorithm (AOA), particle swarm optimization (PSO), and gray wolf optimization (GWO) were introduced to optimize the parameters c and of the classification model. The results indicated that the best classification model was VISSA‐IRIV‐GWO‐LSSVM, with 95% accuracy in the testing set. This study provides theoretical support for the research and development of online detection equipment for walnut crack in Yunnan Yangbi.Practical applicationsCracks in walnut during harvesting, transportation and peeling may lead to economic losses and food safety problems. Aiming at the difficulty and low accuracy of crack identification in Yunnan walnut, this paper proposed a method for crack detection based on acoustic vibration identification combine with feature fusion. The features of time domain, frequency domain and Mel spectrogram were extracted from the effective sound signal, and the features were fused. The influence of three feature selection methods, three models and three optimization algorithms on walnut sound signal recognition was analyzed and compared. The results indicated that the best classification model was VISSA‐IRIV‐GWO‐LSSVM, and the method proposed in this study provides theoretical support for the research and development of online detection equipment for walnut crack in Yunnan Yangbi.