Are musicians better able to understand speech in noise than non-musicians? Recent findings have produced contradictory results. Here we addressed this question by asking musicians and non-musicians to understand target sentences masked by other sentences presented from different spatial locations, the classical ‘cocktail party problem’ in speech science. We found that musicians obtained a substantial benefit in this situation, with thresholds ~6 dB better than non-musicians. Large individual differences in performance were noted particularly for the non-musically trained group. Furthermore, in different conditions we manipulated the spatial location and intelligibility of the masking sentences, thus changing the amount of ‘informational masking’ (IM) while keeping the amount of ‘energetic masking’ (EM) relatively constant. When the maskers were unintelligible and spatially separated from the target (low in IM), musicians and non-musicians performed comparably. These results suggest that the characteristics of speech maskers and the amount of IM can influence the magnitude of the differences found between musicians and non-musicians in multiple-talker “cocktail party” environments. Furthermore, considering the task in terms of the EM-IM distinction provides a conceptual framework for future behavioral and neuroscientific studies which explore the underlying sensory and cognitive mechanisms contributing to enhanced “speech-in-noise” perception by musicians.
An approach to hearing aid design is described, and preliminary acoustical and perceptual measurements are reported, in which an acoustic beam-forming microphone array is coupled to an eye-glasses-mounted eye-tracker. This visually guided hearing aid (VGHA)—currently a laboratory-based prototype—senses direction of gaze using the eye tracker and an interface converts those values into control signals that steer the acoustic beam accordingly. Preliminary speech intelligibility measurements with noise and speech maskers revealed near- or better-than normal spatial release from masking with the VGHA. Although not yet a wearable prosthesis, the principle underlying the device is supported by these findings.
The aim of this study was to evaluate the performance of a visually guided hearing aid (VGHA) under conditions designed to capture some aspects of “real-world” communication settings. The VGHA uses eye gaze to steer the acoustic look direction of a highly directional beamforming microphone array. Although the VGHA has been shown to enhance speech intelligibility for fixed-location, frontal targets, it is currently not known whether these benefits persist in the face of frequent changes in location of the target talker that are typical of conversational turn-taking. Participants were 14 young adults, 7 with normal hearing and 7 with bilateral sensorineural hearing impairment. Target stimuli were sequences of 12 question–answer pairs that were embedded in a mixture of competing conversations. The participant’s task was to respond via a key press after each answer indicating whether it was correct or not. Spatialization of the stimuli and microphone array processing were done offline using recorded impulse responses, before presentation over headphones. The look direction of the array was steered according to the eye movements of the participant as they followed a visual cue presented on a widescreen monitor. Performance was compared for a “dynamic” condition in which the target stimulus moved between three locations, and a “fixed” condition with a single target location. The benefits of the VGHA over natural binaural listening observed in the fixed condition were reduced in the dynamic condition, largely because visual fixation was less accurate.
While conversing in a crowded social setting, a listener is often required to follow a target speech signal amid multiple competing speech signals (the so-called "cocktail party" problem). In such situations, separation of the target speech signal in azimuth from the interfering masker signals can lead to an improvement in target intelligibility, an effect known as spatial release from masking (SRM). This study assessed the contributions of two stimulus properties that vary with separation of sound sources, binaural envelope (ENV) and temporal fine structure (TFS), to SRM in normal-hearing (NH) human listeners. Target speech was presented from the front and speech maskers were either colocated with or symmetrically separated from the target in azimuth. The target and maskers were presented either as natural speech or as "noise-vocoded" speech in which the intelligibility was conveyed only by the speech ENVs from several frequency bands; the speech TFS within each band was replaced with noise carriers. The experiments were designed to preserve the spatial cues in the speech ENVs while retaining/eliminating them from the TFS. This was achieved by using the same/different noise carriers in the two ears. A phenomenological auditory-nerve model was used to verify that the interaural correlations in TFS differed across conditions, whereas the ENVs retained a high degree of correlation, as intended. Overall, the results from this study revealed that binaural TFS cues, especially for frequency regions below 1500 Hz, are critical for achieving SRM in NH listeners. Potential implications for studying SRM in hearing-impaired listeners are discussed.
This study investigated the benefit of a priori cues in a masked nonspeech pattern identification experiment. Targets were narrowband sequences of tone bursts forming six easily identifiable frequency patterns selected randomly on each trial. The frequency band containing the target was randomized. Maskers were also narrowband sequences of tone bursts chosen randomly on every trial. Targets and maskers were presented monaurally in mutually exclusive frequency bands, producing large amounts of informational masking. Cuing the masker produced a significant improvement in performance, while holding the target frequency band constant provided no benefit. The cue providing the greatest benefit was a copy of the masker presented ipsilaterally before the target-plus-masker. The masker cue presented contralaterally, and a notched-noise cue produced smaller benefits. One possible mechanism underlying these findings is auditory "enhancement" in which the neural response to the target is increased relative to the masker by differential prior stimulation of the target and masker frequency regions. A second possible mechanism provides a benefit to performance by comparing the spectrotemporal correspondence of the cue and targetplus-masker and is effective for either ipsilateral or contralateral cue presentation. These effects improve identification performance by emphasizing spectral contrasts in sequences or streams of sounds.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.