Acoustic Scene Classification (ASC) systems have great potential to transform existing embedded technologies. However, research on ASC has put little emphasis on solving the existing challenges in embedding ASC systems. In this paper, we focus on one of the problems associated with smaller ASC models: the generation of smaller yet highly informative training datasets. To achieve this goal, we propose to employ the so-called multitaper-reassignment technique to generate high-resolution spectrograms from audio signals. These sharp time-frequency (TF) representations are used as inputs to a splitting method based on TF-related entropy metrics. We show via simulations that the datasets created through the proposed segmentation can successfully be used to train small convolutional neural networks (CNNs), which could be employed in embedded ASC applications.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.