Although sound source localization is a desirable technique in many communication systems and intelligence applications, the distortion caused by diffuse noise or reverberation makes the time delay estimation (TDE) between signals acquired by a pair of microphones a complicated and challenging problem. In this paper, we describe a method that can efficiently achieve sound source localization in noisy and reverberant environments. This method is based on the generalized cross-correlation (GCC) function with phase transform (PHAT) weights (GCC-PHAT) to achieve robustness against reverberation. In addition, to estimate the time delay robust to diffuse components and to further improve the robustness of the GCC-PHAT against reverberation, time-frequency(t-f) components of observations directly emitted by a point source are chosen by ''inversed'' diffuseness. The diffuseness that can be estimated from the coherent-to-diffuse power ratio (CDR) based on spatial coherence between two microphones represents the contribution of diffuse components on a scale of zero to one with direct sounds from a source modeled to be fully coherent. In particular, the ''inversed'' diffuseness is binarized with a very rigorous threshold to select highly reliable components for accurate TDE even in noisy and reverberant environments. Experimental results for both simulated and real-recorded data consistently demonstrated the robustness of the presented method against diffuse noise and reverberation.
Violence recognition is challenging since recognition must be performed on videos acquired by a lot of surveillance cameras at any time or place. It should make reliable detections in real time and inform surveillance personnel promptly when violent crimes take place. Therefore, we focus on efficient violence recognition for real-time and on-device operation, for easy expansion into a surveillance system with numerous cameras. In this paper, we propose a novel violence detection pipeline that can be combined with the conventional 2-dimensional Convolutional Neural Networks (2D CNNs). In particular, framegrouping is proposed to give the 2D CNNs the ability to learn spatio-temporal representations in videos. It is a simple processing method to average the channels of input frames and group three consecutive channelaveraged frames as an input of the 2D CNNs. Furthermore, we present spatial and temporal attention modules that are lightweight but consistently improve the performance of violence recognition. The spatial attention module named Motion Saliency Map (MSM) can capture salient regions of feature maps derived from the motion boundaries using the difference between consecutive frames. The temporal attention module called Temporal Squeeze-and-Excitation (T-SE) block can inherently highlight the time periods that are correlated with a target event. Our proposed pipeline brings significant performance improvements compared to the 2D CNNs followed by the Long Short-Term Memory (LSTM) and much less computational complexity than existing 3D-CNN-based methods. In particular, MobileNetV3 and EfficientNet-B0 with our proposed modules achieved state-of-the-art performance on six different violence datasets. Our codes are available at https://github.com/ahstarwab/Violence_Detection. INDEX TERMS Real-time violence detection, efficient spatio-temporal attention, efficient convolution method for spatio-temporal modeling
Various patterns of neural activity are observed in dynamic cortical imaging data. Such patterns may reflect how neurons communicate using the underlying circuitry to perform appropriate functions; thus it is crucial to investigate the spatiotemporal characteristics of the observed neural activity patterns. In general, however, neural activities are highly nonlinear and complex, so it is a demanding job to analyze them quantitatively or to classify the patterns of observed activities in various types of imaging data. Here, we present our implementation of a novel method that successfully addresses the above issues for precise comparison and classification of neural activity patterns. Based on two-dimensional representations of the geometric structure and temporal evolution of activity patterns, our method successfully classified a number of computer-generated sample patterns created from combinations of various spatial and temporal patterns. In addition, we validated our method with voltage-sensitive dye imaging data of Alzheimer’s disease (AD) model mice. Our analysis algorithm successfully distinguished the activity data of AD mice from that of wild type with significantly higher performance than previously suggested methods. Our result provides a pragmatic solution for precise analysis of spatiotemporal patterns of neural imaging data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.