When speech is in competition with interfering sources in rooms, monaural indicators of intelligibility fail to take account of the listener's abilities to separate target speech from interfering sounds using the binaural system. In order to incorporate these segregation abilities and their susceptibility to reverberation, Lavandier and Culling [J. Acoust. Soc. Am. 127, 387-399 (2010)] proposed a model which combines effects of better-ear listening and binaural unmasking. A computationally efficient version of this model is evaluated here under more realistic conditions that include head shadow, multiple stationary noise sources, and real-room acoustics. Three experiments are presented in which speech reception thresholds were measured in the presence of one to three interferers using real-room listening over headphones, simulated by convolving anechoic stimuli with binaural room impulse-responses measured with dummy-head transducers in five rooms. Without fitting any parameter of the model, there was close correspondence between measured and predicted differences in threshold across all tested conditions. The model's components of better-ear listening and binaural unmasking were validated both in isolation and in combination. The computational efficiency of this prediction method allows the generation of complex "intelligibility maps" from room designs.
Three experiments measured constancy in speech perception, using natural-speech messages or noise-band vocoder versions of them. The eight vocoder-bands had equally log-spaced center-frequencies and the shapes of corresponding "auditory" filters. Consequently, the bands had the temporal envelopes that arise in these auditory filters when the speech is played. The "sir" or "stir" test-words were distinguished by degrees of amplitude modulation, and played in the context; "next you'll get _ to click on." Listeners identified test-words appropriately, even in the vocoder conditions where the speech had a "noise-like" quality. Constancy was assessed by comparing the identification of test-words with low or high levels of room reflections across conditions where the context had either a low or a high level of reflections. Constancy was obtained with both the natural and the vocoded speech, indicating that the effect arises through temporal-envelope processing. Two further experiments assessed perceptual weighting of the different bands, both in the test word and in the context. The resulting weighting functions both increase monotonically with frequency, following the spectral characteristics of the test-word's [s]. It is suggested that these two weighting functions are similar because they both come about through the perceptual grouping of the test-word's bands.
Room reverberation usually degrades speech reception, such as when listeners identify test words from a 'sir'-to-'stir' continuum. Here, substantial reverberation introduces a 'tail' from the [s], which tends to fill the gap that cues the [t], and a degradation effect arises as listeners report correspondingly fewer 'stir' sounds. This effect is particularly clear when test words are preceded by a precursor phrase (e.g. 'next you'll get…') that contains much less reverberation than the test word. When the precursor's reverberation is increased to be the same as in the test word, the degradation diminishes as more 'stir' sounds are heard once again. This last effect has been attributed to a perceptual compensation mechanism that is informed by the precursor's reverberation level. However, a recent claim is that the degradation is caused by 'modulation masking' from precursors with a low level of reverberation. Such masking is likely to diminish when the precursor's reverberation level is raised, because reverberation acts as a low-pass modulation filter. Support for this hypothesis comes from results in conditions where degradation effects seem to be entirely absent, despite substantial reverberation. In these conditions, test words were played in isolation, with no precursor, and reverberation was kept at the same level in the test words of every trial. The experiments reported here have conditions that are similar, except that reverberation in test words is varied unpredictably from trial to trial, so that substantial-level trials are interspersed with trials that have a much lower level of reverberation. The result is that under these conditions, the degradation effect is entirely restored, allowing rejection of the modulation-masking hypothesis. An alternative is that some perceptual compensation comes from reverberation information within test words, and its effects accumulate over sequences of trials as long as the test word's reverberation level stays the same from trial to trial.
Two experiments measured thresholds for the detection of increments and decrements in the intensity of a quasi-continuous broadband-noise (experiment 1) or increments in a 477-Hz pure-tone pedestal (experiment 2). A variety of onset and offset ramps for the intensity change were tested, from instantaneous onsets or offsets to ramps lasting several tens of milliseconds. For increments and decrements with equal duration, the characteristics of the ramps had little effect on performance. Abrupt rise times, which are associated with strong transient responses in auditory neurons, did not facilitate detection in comparison to much slower rise times. The temporal window model of temporal resolution provided a good account of the data when the decision statistic was the maximum magnitude of the change in the output of the window produced by the increment or decrement, but provided a poor account of the data when the decision statistic was the maximum rate of change in the output of the window over time. Overall the results suggest that, in the absence of cues in the audio-frequency domain, rapid changes in envelope contribute little to near-threshold increment or decrement detection.
textThe experiment asks whether constancy in hearing precedes or follows grouping. Listeners heard speech-like sounds comprising 8 auditory-filter shaped noise-bands that had temporal envelopes corresponding to those arising in these filters when a speech message is played. The "context" words in the message were "next you"ll get _to click on", into which a "sir" or "stir" test word was inserted. These test words were from an 11-step continuum that was formed by amplitude modulation. Listeners identified the test words appropriately and quite consistently, even though they had the "robotic" quality typical of this type of 8-band speech. The speech-like effects of these sounds appears to be a consequence of auditory grouping. Constancy was assessed by comparing the influence of room reflections on the test word across conditions where the context had either the same level of reflections, or where it had a much lower level. Constancy effects were obtained with these 8-band sounds, but only in "matched" conditions, where the room reflections were in the same bands in both the context and the test word. This was not the case in a comparison "mismatched" condition, and here, no constancy effects were found. It would appear that this type of constancy in hearing precedes the across-channel grouping whose effects are so apparent in these sounds. This result is discussed in terms of the ubiquity of grouping across different levels of representation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.