This study examines the ability of listeners to judge speaker height and weight from speech samples. Although previous investigations indicate that listeners are consistent in estimating body characteristics, it is not known which speech signal parameters are being used by the listeners for such estimates. Therefore, a series of listening tests was carried out in which male and female listeners judged the height and weight from male and female speakers reading isolated words and two text paragraphs. Both speaker sex and listener sex turned out to be important factors: Significant correlations between estimated height/weight and actual height/weight were found only for male speakers. The majority of these estimates came from the male listeners. Neither male nor female listeners, however, were able to estimate female speaker height or weight. Regression analysis involving F0, formant frequencies, energy below 1 kHz, and speech rate showed no significant correlations between these parameters and actually measured speaker height and weight, the only exception being a significant correlation between male speaker weight and speech rate. Furthermore, regression data suggested that the listeners (correctly) used speech rate information in judging male speaker weight, whereas low F0 and formant frequency values (wrongly) were taken to indicate large speaker body dimensions.
This paper investigates the realization of word-final /t/ in conversational standard Dutch. First, based on a large number of word tokens (6747) annotated with broad phonetic transcription by an automatic transcription tool, we show that morphological properties of the words and their position in the utterance's syntactic structure play a role for the presence versus absence of their final /t/. We also replicate earlier findings on the role of predictability (word frequency and bigram frequency with the following word) and provide a detailed analysis of the role of segmental context. Second, we analyze the detailed acoustic properties of word-final /t/ on the basis of a smaller number of tokens (486) which were annotated manually. Our data show that word and bigram frequency as well as segmental context also predict the presence of sub-phonemic properties. The investigations presented in this paper extend research on the realization of /t/ in spontaneous speech and have potential consequences for psycholinguistic models of speech production and perception as well as for automatic speech recognition systems
Four speaker identification tests were conducted using five female speakers known to the listeners. Starting from acoustic recordings of reiterant "ma" syllables, the perceptual importance of the following three factors was investigated: F0 height, F0 contour, and speech rhythm. For speakers with typically low or high voices F0 height turned out to be a highly relevant cue in speaker identification. For all speakers F0 contour was of secondary importance, whereas speech rhythm had a small but consistent influence on recognition rates. It could be inferred that remaining factors alone (mainly global spectral information) would yield recognition scores of approximately 50%. Consistent with previous investigations, the relevance of perceptual cues in the recognition of familiar voices was shown to be not hierarchically fixed, but to depend on speaker-specific voice characteristics.
Using a Fourcin laryngograph, Lx recordings of three male speakers were made. After manipulation, the Lx signals were presented to a group of eight listeners, who performed both an AX discrimination and a speaker identification test. The results show that the listeners made use of the three parameters varied in the listening tests, viz. speech rhythm, F0 contour and F0 height. Furthermore, the data suggest that the relevance of these different parameters for speaker recognition is speaker-dependent rather than absolute.
Processing speech in a non-native language requires listeners to cope with influences from their first language and to overcome the effects of limited exposure and experience. These factors may be particularly important when listening in adverse conditions. However, native listeners also suffer in noise, and the intelligibility of speech in noise clearly depends on factors which are independent of a listener's first language. The current study explored the issue of language-independence by comparing the responses of eight listener groups differing in native language when confronted with the task of identifying English intervocalic consonants in three masker backgrounds, viz. stationary speech-shaped noise, temporally-modulated speech-shaped noise and competing English speech. The study analysed the effects of (i) noise type, (ii) speaker, (iii) vowel context, (iv) consonant, (v) phonetic feature classes, (vi) stress position, (vii) gender and (viii) stimulus onset relative to noise onset. A significant degree of similarity in the response to many of these factors was evident across all eight language groups, suggesting that acoustic and auditory considerations play a large rô le in determining intelligibility. Language-specific influences were observed in the rankings of individual consonants and in the masking effect of competing speech relative to speech-modulated noise.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.