Speech segregation is an important task of auditory scene analysis (ASA), in which the speech of a certain speaker is separated from other interfering signals. Wang and Brown proposed a multistage neural model for speech segregation, the core of which is a two-layer oscillator network. In this paper, we extend their model by adding further processes based on psychoacoustic evidence to improve the performance. These processes include pitch tracking and grouping based on amplitude modulation (AM). Our model is systematically evaluated and compared with the Wang-Brown model, and it yields significantly better performance.
Abstract-A typical auditory scene in a natural environment contains multiple sources. Auditory scene analysis (ASA) is the process in which the auditory system segregates a scene into streams corresponding to different sources. Segmentation is a major stage of ASA by which an auditory scene is decomposed into segments, each containing signal mainly from one source. We propose a system for auditory segmentation by analyzing onsets and offsets of auditory events. The proposed system first detects onsets and offsets, and then generates segments by matching corresponding onset and offset fronts. This is achieved through a multiscale approach. A quantitative measure is suggested for segmentation evaluation. Systematic evaluation shows that most of target speech, including unvoiced speech, is correctly segmented, and target speech and interference are well separated into different segments.Index Terms-Auditory segmentation, event detection, multiscale analysis, onset and offset.
Monaural speech segregation has proven to be extremely challenging. While efforts in computational auditory scene analysis have led to considerable progress in voiced speech segregation, little attention has been given to unvoiced speech, which lacks harmonic structure and has weaker energy, hence more susceptible to interference. This study proposes a new approach to the problem of segregating unvoiced speech from nonspeech interference. The study first addresses the question of how much speech is unvoiced. The segregation process occurs in two stages: Segmentation and grouping. In segmentation, the proposed model decomposes an input mixture into contiguous time-frequency segments by a multiscale analysis of event onsets and offsets. Grouping of unvoiced segments is based on Bayesian classification of acoustic-phonetic features. The proposed model for unvoiced speech segregation joins an existing model for voiced speech segregation to produce an overall system that can deal with both voiced and unvoiced speech. Systematic evaluation shows that the proposed system extracts a majority of unvoiced speech without including much interference, and it performs substantially better than spectral subtraction.
Speech segregation is an important task of auditory scene analysis (ASA), in which the speech of a certain speaker is separated from other interfering signals. Wang and Brown proposed a multistage neural model for speech segregation, the core of which is a two-layer oscillator network. In this paper, we extend their model by adding further processes based on psychoacoustic evidence to improve the performance. These processes include pitch tracking and grouping based on amplitude modulation (AM). Our model is systematically evaluated and compared with the Wang-Brown model, and it yields significantly better performance.
The threat of malware on mobile devices is gaining attention recently. It is important to provide security solutions to these devices before these threats cause widespread damage. However, mobile devices have severe resource constraints in terms of memory and power. Hence, even though there are well developed techniques for malware detection on the PC domain, it requires considerable effort to adapt these techniques for mobile devices. In this paper, we outline the considerations for malware detection on mobile devices and propose a signature based malware detection method. Specifically, we detail a signature matching algorithm that is well suited for use in mobile device scanning due to its low memory requirements. Additionally, the matching algorithm is shown to have high scanning speed which makes it unobtrusive to users. Our evaluation and comparison study with the well known Clam-AV scanner shows that our solution consumes less than 50% of the memory used by Clam-AV while maintaining a fast scanning rate.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.