Aiming at the difficulty in extracting the features of time–frequency images for the recognition of car engine sounds, we propose a method to recognize them based on a deformable feature map residual network. A deformable feature map residual block includes offset and convolutional layers. The offset layers shift the pixels of the input feature map. The shifted feature map is superimposed on the feature map extracted by the convolutional layers through shortcut connections to concentrate the network to the sampling in the region of interest, and to transmit the information of the offset feature map to the lower network. Then, a deformable convolution residual network is designed, and the features extracted through this network are fused with the Mel frequency cepstral coefficients of car engine sounds. After recalibration by the squeeze and excitation block, the fused results are fed into the fully connected layer for classification. Experiments on a car engine sound dataset show that the accuracy of the proposed method is 84.28%. Compared with the existing state-of-the-art methods, in terms of the accuracy of recognizing car engine sounds under various operating conditions, the proposed method represents an improvement over the method based on dictionary learning and a convolutional neural network.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.