This study introduces a model for solving three different auditory tasks in a multi-talker setting: target localization, target identification, and word recognition. The model was used to simulate psychoacoustic data from a call-sign-based listening test involving multiple spatially separated talkers [Brungart and Simpson (2007). Percept. Psychophys. 69(1), 79–91]. The main characteristics of the model are (i) the extraction of salient auditory features (“glimpses”) from the multi-talker signal and (ii) the use of a classification method that finds the best target hypothesis by comparing feature templates from clean target signals to the glimpses derived from the multi-talker mixture. The four features used were periodicity, periodic energy, and periodicity-based interaural time and level differences. The model results widely exceeded probability of chance for all subtasks and conditions, and generally coincided strongly with the subject data. This indicates that, despite their sparsity, glimpses provide sufficient information about a complex auditory scene. This also suggests that complex source superposition models may not be needed for auditory scene analysis. Instead, simple models of clean speech may be sufficient to decode even complex multi-talker scenes.
Human listeners robustly decode speech information from a talker of interest that is embedded in a mixture of spatially distributed interferers. A relevant question is which time-frequency segments of the speech are predominantly used by a listener to solve such a complex Auditory Scene Analysis task. A recent psychoacoustic study investigated the relevance of low signal-to-noise ratio (SNR) components of a target signal on speech intelligibility in a spatial multitalker situation. For this, a three-talker stimulus was manipulated in the spectro-temporal domain such that target speech time-frequency units below a variable SNR threshold (SNR ) were discarded while keeping the interferers unchanged. The psychoacoustic data indicate that only target components at and above a local SNR of about 0 dB contribute to intelligibility. This study applies an auditory scene analysis "glimpsing" model to the same manipulated stimuli. Model data are found to be similar to the human data, supporting the notion of "glimpsing," that is, that salient speech-related information is predominantly used by the auditory system to decode speech embedded in a mixture of sounds, at least for the tested conditions of three overlapping speech signals. This implies that perceptually relevant auditory information is sparse and may be processed with low computational effort, which is relevant for neurophysiological research of scene analysis and novelty processing in the auditory system.
This study investigated the influence of high-frequency cue bands on the detection and discrimination of low-frequency target bands presented in a 3000-Hz low-pass noise masker. Target and cue bands were complex tones with 80-Hz spacing. The cue band consisted of 60 components starting at 4000 Hz; targets consisted of four components starting at different frequencies (500, 700, 1000, 1200, and 1500 Hz). Targets were presented with different durations within the 500-ms masker; target and cue bands had a common on- and offset. Presentation of the high-frequency complex tone significantly enhanced both the discrimination and detection thresholds by 2–3 dB.
A recent study showed that human listeners are able to localize a short speech target simultaneously masked by four speech tokens in reverberation [Kopčo, Best, and Carlile (2010). J. Acoust. Soc. Am. 127, 1450-1457]. Here, an auditory model for solving this task is introduced. The model has three processing stages: (1) extraction of the instantaneous interaural time difference (ITD) information, (2) selection of target-related ITD information ("glimpses") using a template-matching procedure based on periodicity, spectral energy, or both, and (3) target location estimation. The model performance was compared to the human data, and to the performance of a modified model using an ideal binary mask (IBM) at stage (2). The IBM-based model performed similarly to the subjects, indicating that the binaural model is able to accurately estimate source locations. Template matching using spectral energy and using a combination of spectral energy and periodicity achieved good results, while using periodicity alone led to poor results. Particularly, the glimpses extracted from the initial portion of the signal were critical for good performance. Simulation data show that the auditory features investigated here are sufficient to explain human performance in this challenging listening condition and thus may be used in models of auditory scene analysis.
A temporally acute binaural system can help to resolve inherent fluctuations in binaural information that are often present in complex auditory scenes. Using a broadband noise stimulus that rapidly alternates between two different values of interaural time difference (ITD), the ability of the binaural system to hear the lateral position resulting from one of the ITD values was investigated. Results show that listeners are able to accurately lateralize brief noise tokens of only 3-7 ms in duration. In two subsequent experiments, the role of an amplitude modulation (AM) imposed on the ITD-switching stimulus used in the first experiment was tested. For wideband stimuli, the temporal position of the ITD target relative to the phase of the AM did not influence absolute lateralization or detection performance. When the stimuli were narrowband, however, detection of the ITD target was best when temporally positioned in the rising portion of the AM. These experiments illustrate that the auditory system is capable of making accurate lateral estimates of very brief moments of ITD information. Furthermore, for these instantaneous changes in ITD information, the stimulus bandwidth can influence the role of envelope cues for the readout of binaural information.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.