The periodic inspection of railroad tracks is very important to find structural and geometrical problems that lead to railway accidents. Currently, in Pakistan, rail tracks are inspected by an acoustic-based manual system that requires a railway engineer as a domain expert to differentiate between different rail tracks’ faults, which is cumbersome, laborious, and error-prone. This study proposes the use of traditional acoustic-based systems with deep learning models to increase performance and reduce train accidents. Two convolutional neural networks (CNN) models, convolutional 1D and convolutional 2D, and one recurrent neural network (RNN) model, a long short-term memory (LSTM) model, are used in this regard. Initially, three types of faults are considered, including superelevation, wheel burnt, and normal tracks. Contrary to traditional acoustic-based systems where the spectrogram dataset is generated before the model training, the proposed approach uses on-the-fly feature extraction by generating spectrograms as a deep learning model’s layer. Different lengths of audio samples are used to analyze their performance with each model. Each audio sample of 17 s is split into 3 variations of 1.7, 3.4, and 8.5 s, and all 3 deep learning models are trained and tested against each split time. Various combinations of audio data augmentation are analyzed extensively to investigate models’ performance. The results suggest that the LSTM with 8.5 split time gives the best results with the accuracy of 99.7%, the precision of 99.5%, recall of 99.5%, and F1 score of 99.5%.