This study constitutes a large-scale comparative analysis of acoustic cues for classification of place of articulation in fricatives. To date, no single metric has been found to classify fricative place of articulation with a high degree of accuracy. This study presents spectral, amplitudinal, and temporal measurements that involve both static properties ͑spectral peak location, spectral moments, noise duration, normalized amplitude, and F2 onset frequency͒ and dynamic properties ͑relative amplitude and locus equations͒. While all cues ͑except locus equations͒ consistently serve to distinguish sibilant from nonsibilant fricatives, the present results indicate that spectral peak location, spectral moments, and both normalized and relative amplitude serve to distinguish all four places of fricative articulation. These findings suggest that these static and dynamic acoustic properties can provide robust and unique information about all four places of articulation, despite variation in speaker, vowel context, and voicing.
Most theories of categorization emphasize how continuous perceptual information is mapped to categories. However, equally important is the informational assumptions of a model, the type of information subserving this mapping. This is crucial in speech perception where the signal is variable and context-dependent. This study assessed the informational assumptions of several models of speech categorization, in particular, the number of cues that are the basis of categorization and whether these cues represent the input veridically or have undergone compensation. We collected a corpus of 2880 fricative productions (Jongman, Wayland & Wong, 2000) spanning many talker- and vowel-contexts and measured 24 cues for each. A subset was also presented to listeners in an 8AFC phoneme categorization task. We then trained a common classification model based on logistic regression to categorize the fricative from the cue values, and manipulated the information in the training set to contrast 1) models based on a small number of invariant cues; 2) models using all cues without compensation, and 3) models in which cues underwent compensation for contextual factors. Compensation was modeled by Computing Cues Relative to Expectations (C-CuRE), a new approach to compensation that preserves fine-grained detail in the signal. Only the compensation model achieved a similar accuracy to listeners, and showed the same effects of context. Thus, even simple categorization metrics can overcome the variability in speech when sufficient information is available and compensation schemes like C-CuRE are employed.
Auditory training has been shown to be effective in the identification of non-native segmental distinctions. In this study, it was investigated whether such training is applicable to the acquisition of non-native suprasegmental contrasts, i.e., Mandarin tones. Using the high-variability paradigm, eight American learners of Mandarin were trained in eight sessions during the course of two weeks to identify the four tones in natural words produced by native Mandarin talkers. The trainees' identification accuracy revealed an average 21% increase from the pretest to the post-test, and the improvement gained in training was generalized to new stimuli ͑18% increase͒ and to new talkers and stimuli ͑25% increase͒. Moreover, the six-month retention test showed that the improvement was retained long after training by an average 21% increase from the pretest. The results are discussed in terms of non-native suprasegmental perceptual modification, and the analogies between L2 acquisition processes at the segmental and suprasegmental levels.
Training American listeners to perceive Mandarin tones has been shown to be effective, with trainees' identification improving by 21%. Improvement also generalized to new stimuli and new talkers, and was retained when tested six months after training ͓Y. Wang et al., J. Acoust. Soc. Am. 106, 3649-3658 ͑1999͔͒. The present study investigates whether the tone contrasts gained perceptually transferred to production. Before their perception pretest and after their post-test, the trainees were recorded producing a list of Mandarin words. Their productions were first judged by native Mandarin listeners in an identification task. Identification of trainees' post-test tone productions improved by 18% relative to their pretest productions, indicating significant tone production improvement after perceptual training. Acoustic analyses of the pre-and post-training productions further reveal the nature of the improvement, showing that post-training tone contours approximate native norms to a greater degree than pretraining tone contours. Furthermore, pitch height and pitch contour are not mastered in parallel, with the former being more resistant to improvement than the latter. These results are discussed in terms of the relationship between non-native tone perception and production as well as learning at the suprasegmental level.
Words which are expected to contain the same surface string of segments may, under identical prosodic circumstances, sometimes be realized with slight differences in duration. Some researchers have attributed such effects to differences in the words' underlying forms (incomplete neutralization), while others have suggested orthographic influence and extremely careful speech as the cause. In this paper, we demonstrate such sub-phonemic durational differences in Dutch, a language which some past research has found not to have such effects. Past literature has also shown that listeners can often make use of incomplete neutralization to distinguish apparent homophones. We extend perceptual investigations of this topic, and show that listeners can perceive even durational differences which are not consistently observed in production. We further show that a difference which is primarily orthographic rather than underlying can also create such durational differences. We conclude that a wide variety of factors, in addition to underlying form, can induce speakers to produce slight durational differences which listeners can also use in perception.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.