Source recording device identification poses a significant challenge in the field of Audio Sustainable Security (ASS). Most existing studies on end-to-end identification of digital audio sources follow a two-step process: extracting device-specific features and utilizing them in machine learning or deep learning models for decision-making. However, these approaches often rely on empirically set hyperparameters, limiting their generalization capabilities. To address this limitation, this paper leverages the self-learning ability of deep neural networks and the temporal characteristics of audio data. We propose a novel approach that utilizes the Sinc function for audio preprocessing and combine it with a Deep Neural Network (DNN) to establish a comprehensive end-to-end identification model for digital audio sources. By allowing the parameters of the preprocessing and feature extraction processes to be learned through gradient optimization, we enhance the model’s generalization. To overcome practical challenges such as limited timeliness, small sample sizes, and incremental expression, this paper explores the effectiveness of an end-to-end transfer learning model. Experimental verification demonstrates that the proposed end-to-end transfer learning model achieves both timely and accurate results, even with small sample sizes. Moreover, it avoids the need for retraining the model with a large number of samples due to incremental expression. Our experiments showcase the superiority of our method, achieving an impressive 97.7% accuracy when identifying 141 devices. This outperforms four state-of-the-art methods, demonstrating an absolute accuracy improvement of 4.1%. This research contributes to the field of ASS and provides valuable insights for future studies in audio source identification and related applications of information security, digital forensics, and copyright protection.