Native American English and non-native ͑Dutch͒ listeners identified either the consonant or the vowel in all possible American English CV and VC syllables. The syllables were embedded in multispeaker babble at three signal-to-noise ratios ͑0, 8, and 16 dB͒. The phoneme identification performance of the non-native listeners was less accurate than that of the native listeners. All listeners were adversely affected by noise. With these isolated syllables, initial segments were harder to identify than final segments. Crucially, the effects of language background and noise did not interact; the performance asymmetry between the native and non-native groups was not significantly different across signal-to-noise ratios. It is concluded that the frequently reported disproportionate difficulty of non-native listening under disadvantageous conditions is not due to a disproportionate increase in phoneme misidentifications.
An evaluation of vowel normalization procedures for the purpose of studying language variation is presented. The procedures were compared on how effectively they ͑a͒ preserve phonemic information, ͑b͒ preserve information about the talker's regional background ͑or sociolinguistic information͒, and ͑c͒ minimize anatomical/physiological variation in acoustic representations of vowels. Recordings were made for 80 female talkers and 80 male talkers of Dutch. These talkers were stratified according to their gender and regional background. The normalization procedures were applied to measurements of the fundamental frequency and the first three formant frequencies for a large set of vowel tokens. The normalization procedures were evaluated through statistical pattern analysis. The results show that normalization procedures that use information across multiple vowels ͑''vowel-extrinsic'' information͒ to normalize a single vowel token performed better than those that include only information contained in the vowel token itself ͑''vowel-intrinsic'' information͒. Furthermore, the results show that normalization procedures that operate on individual formants performed better than those that use information across multiple formants ͑e.g., ''formant-extrinsic'' F2-F1).
A database is presented of measurements of the fundamental frequency, the frequencies of the first three formants, and the duration of the 15 vowels of Standard Dutch as spoken in the Netherlands ͑Northern Standard Dutch͒ and in Belgium ͑Southern Standard Dutch͒. The speech material consisted of read monosyllabic utterances in a neutral consonantal context ͑i.e., /sVs/͒. Recordings were made for 20 female talkers and 20 male talkers, who were stratified for the factors age, gender, and region. Of the 40 talkers, 20 spoke Northern Standard Dutch and 20 spoke Southern Standard Dutch. The results indicated that the nine monophthongal Dutch vowels /a Ä } i ( Å u y +/ can be separated fairly well given their steady-state characteristics, while the long mid vowels /e o Ö/ and three diphthongal vowels /}( Åu !y/ also require information about their dynamic characteristics. The analysis of the formant values indicated that Northern Standard Dutch and Southern Standard Dutch differ little in the formant frequencies at steady-state for the nine monophthongal vowels. Larger differences between these two language varieties were found for the dynamic specifications of the three long mid vowels, and, to a lesser extent, of the three diphthongal vowels.
A new method for determining the instants of significant excitation in speech signals is proposed. Here, significant excitation refers primarily to the instant of glottal closure within a pitch period in voiced speech. The method is based on the global phase characteristics of minimum phase signals. The average slope of the unwrapped phase of the short-time Fourier transform of linear prediction residual is calculated as a function of time. Instants where the phase slope function makes a positive zerocrossing are identified as significant excitations. The method is discussed in a source-filter context of speech production. The method is not sensitive to the characteristics of the filter. The influence of the type, length, and position of the analysis window is discussed. The method works well for all types of voiced speech in male as well as female speech but, in all cases, under noisefree conditions only.
This article is concerned with the question of how listeners recognize coarticulated phonemes. The problem is approached from a pattern classification perspective. First, the potential acoustical effects of coarticulation are defined in terms of the patterns that form the input to a classifier. Next, a categorization model called HICAT is introduced that incorporates hierarchical dependencies to optimally deal with this input. The model allows the position, orientation, and steepness of one phoneme boundary to depend on the perceived value of a neighboring phoneme. It is argued that, if listeners do behave like statistical pattern recognizers, they may use the categorization strategies incorporated in the model. The HICAT model is compared with existing categorization models, among which are the fuzzylogical model of perception and Nearey' s diphone-biased secondary-cue model. Finally, a method is presented by which categorization strategies that are likely to be used by listeners can be predicted from distributions of acoustical cues as they occur in natural speech.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.