This paper presents a sound source distance estimation (SSDE) method using a convolutional recurrent neural network (CRNN). We approach the sound source distance estimation task as an image classification problem, and we aim to classify a given audio signal into one of three predefined distance classes—one meter, two meters, and three meters—irrespective of its orientation angle. For the purpose of training, we create a dataset by recording audio signals at the three different distances and three angles in different rooms. The CRNN is trained using time-frequency representations of the audio signals. Specifically, we transform the audio signals into log-scaled mel spectrograms, allowing the convolutional layers to extract the appropriate features required for the classification. When trained and tested with combined datasets from all rooms, the proposed model exhibits high classification accuracies; however, training and testing the model in separate rooms results in lower accuracies, indicating that further study is required to improve the method’s generalization ability. Our experimental results demonstrate that it is possible to estimate sound source distances in known environments by classification using the log-scaled mel spectrogram.
This paper proposes a method for 3-D sound source localization (SSL) using region selection and TDOA. 3-D SSL involves the estimation of an azimuth angle and an elevation angle. With the aim of reducing the computation time, we compare signal energies to select one out of three regions. In the selected region, we compute only one TDOA value for the azimuth angle estimation. Also, to estimate the vertical angle, we choose the higher energy signal from the selected region and pair it up with the elevated microphone's signal for TDOA computation and elevation angle estimation. Our experimental results show that the proposed method achieves average error values of 0.778° in azimuth and 1.296° in elevation, which is similar to other methods. The method uses one energy comparison and two TDOA computations therefore, the total processing time is reduced.키워드 : 방위각, 높이, 신호 에너지, TDOA
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.