Deep learning focuses on the representation of the input data and generalization of the model. It is well known that data augmentation can combat overfitting and improve the generalization ability of deep neural network. In this paper, we summarize and compare multiple data augmentation methods for audio classification. These strategies include traditional methods on raw audio signal, as well as the current popular augmentation of linear interpolation and nonlinear mixing on the spectrum. We explore the generation of new samples, the transformation of labels, and the combination patterns of samples and labels of each data augmentation method. Finally, inspired by SpecAugment and Mixup, we propose an effective and easy to implement data augmentation method, which we call Mixed frequency Masking data augmentation. This method adopts nonlinear combination method to construct new samples and linear method to construct labels. All methods are verified on the Freesound Dataset Kaggle2018 dataset, and ResNet is adopted as the classifier. The baseline system uses the log-mel spectrogram feature as the input. We use mean Average Precision @3 (mAP@3) as the evaluation metric to evaluate the performance of all data augmentation methods.
With the rapid development of modern communication systems, the amount of data has exploded, the system structure has become increasingly complex, and existing communication theories and technologies are facing huge challenges. The successful application of deep learning technology in the fields of images, speech, natural language processing, and games provides a possible solution for the theory and technology of communication systems that goes beyond traditional ideas and performance. This article mainly summarizes the application cases of deep learning methods in channel estimation, signal detection, and modulation recognition, and shows their outstanding performance compared to traditional communication theory and technology. Finally, we analyze the opportunities and challenges faced by deep learning-based communication technologies.
Acoustic scene classification is an intricate problem for a machine. As an emerging field of research, deep Convolutional Neural Networks (CNN) achieve convincing results. In this paper, we explore the use of multi-scale Dense connected convolutional neural network (DenseNet) for the classification task, with the goal to improve the classification performance as multi-scale features can be extracted from the time-frequency representation of the audio signal. On the other hand, most of previous CNN-based audio scene classification approaches aim to improve the classification accuracy, by employing different regularization techniques, such as the dropout of hidden units and data augmentation, to reduce overfitting. It is widely known that outliers in the training set have a high negative influence on the trained model, and culling the outliers may improve the classification performance, while it is often under-explored in previous studies. In this paper, inspired by the silence removal in the speech signal processing, a novel sample dropout approach is proposed, which aims to remove outliers in the training dataset. Using the DCASE 2017 audio scene classification datasets, the experimental results demonstrates the proposed multi-scale DenseNet providing a superior performance than the traditional single-scale DenseNet, while the sample dropout method can further improve the classification robustness of multi-scale DenseNet.
The most widely used Wi-Fi wireless communication system, which is based on OFDM, is currently developing quickly. The receiver must, however, accurately estimate the carrier frequency offset between the transmitter and the receiver due to the characteristics of the OFDM system that make it sensitive to carrier frequency offset. The autocorrelation of training symbols is typically used by the conventional algorithm to estimate the carrier frequency offset. Although this method is simple to use and low in complexity, it has poor estimation performance at low signal-to-noise ratios, which has a significant negative impact on the performance of the wireless communication system. Meanwhile, the design of the communication physical layer using deep-learning-based (DL-based) methods is receiving more and more attention but is rarely used in carrier frequency offset estimation. In this paper, we propose a DL-based carrier frequency offset (CFO) model architecture for 802.11n standard OFDM systems. With regard to multipath channel models with varied degrees of multipath fadding, the estimation error of the proposed model is 70.54% lower on average than that of the conventional method under 802.11n standard channel models, and the DL-based method can outperform the estimation range of conventional methods. Besides, the model trained in one channel environment and tested in another was cross-evaluated to determine which models could be used for deployment in the real world. The cross-evaluation demonstrates that the DL-based model can perform well over a large class of channels without extra training when trained under the worst-case (most severe) multipath channel model.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.