Previous research revealing universal biases in infant vowel perception forms the basis of the Natural Referent Vowel (NRV) framework (Polka & Bohn, 2011). To explore the feasibility of extending this framework to consonant manner perception, we investigated perception of the stop vs. fricative consonant contrast /b/-/v/ to test the hypothesis that young infants will display a perceptual bias grounded in the acoustic-phonetic properties of these sounds. We examined perception of stop-initial /bas/ and fricative-initial /vas/ syllables in English-learning and French-learning 5- to 6-month-olds. The /b/ and /v/ sounds distinguish words in English and French but have different distributional patterns; in spoken English /b/ occurs more frequently than /v/ whereas in spoken French /v/ occurs more frequently than /b/. A perceptual bias favoring /b/ over /v/ emerged in two experiments. In Experiment 1, a directional asymmetry was observed in discrimination; infants noticed when /vas/ changed to /bas/ but not when /bas/ changed to /vas/. In Experiment 2, a robust listening preference favoring stop-initial /bas/ was evident in responses from the same infants. This is the first study to show a perceptual bias related to consonant manner and to directly measure a consonant perception bias within the same infants. These data encourage further efforts to extend the NRV principles to perception of consonant manner. These findings indicate that we need to reform our view of infant speech perception to accommodate the fact that both discrimination abilities and biases shape speech perception during infancy.
Convolutional neural networks (CNNs) are a state-of-the-art technique for speech emotion recognition. However, CNNs have mostly been applied to noise-free emotional speech data, and limited evidence is available for their applicability in emotional speech denoising. In this study, a cascaded denoising CNN (DnCNN)–CNN architecture is proposed to classify emotions from Korean and German speech in noisy conditions. The proposed architecture consists of two stages. In the first stage, the DnCNN exploits the concept of residual learning to perform denoising; in the second stage, the CNN performs the classification. The classification results for real datasets show that the DnCNN–CNN outperforms the baseline CNN in overall accuracy for both languages. For Korean speech, the DnCNN–CNN achieves an accuracy of 95.8%, whereas the accuracy of the CNN is marginally lower (93.6%). For German speech, the DnCNN–CNN has an overall accuracy of 59.3–76.6%, whereas the CNN has an overall accuracy of 39.4–58.1%. These results demonstrate the feasibility of applying the DnCNN with residual learning to speech denoising and the effectiveness of the CNN-based approach in speech emotion recognition. Our findings provide new insights into speech emotion recognition in adverse conditions and have implications for language-universal speech emotion recognition.
Although language experience has a profound impact on phonetic perception, there is increasing evidence that phonetic perception is also shaped by universal biases which can be revealed as asymmetries in discrimination performance. In the present study, we explore potential perceptual asymmetries in adult Korean perception of four English affricate-fricative contrasts. Korean adults completed a native-language assimilation task and a category-based AX discrimination task with the phonemic contrast /tʃa-sa/ and non-phonemic contrasts /tʃa-ʃa/, /dʒa-za/ and /dʒa-ʒa/. Both voiceless contrasts—/tʃa-sa/ and /tʃa-ʃa/—were assimilated to distinct Korean affricate and fricative categories and were discriminated very well (>90%); performance revealed no perceptual asymmetries. Both voiced contrasts—/dʒa-za/ and /dʒa-ʒa/—were assimilated to the same Korean affricate category (/tʃa/) and were poorly discriminated (63-65%); performance was asymmetric on different pairs for both contrasts (fricative-affricate pairs>affricate-fricative pairs) and on same pairs for /dʒa-ʒa/ (fricative-fricative pairs>affricate-affricate pairs). These findings, and prior research, show that asymmetrical performance on different pairs is highly uniform and predicted by phone type, pointing to a possible universal bias favoring sharp amplitude onsets. However, asymmetries in same pair performance are predicted by language categorization and thus appear to be shaped by language-specific experience.
Phoneme inventories are biased favoring stop over fricative consonants. A similar bias is evident in acquisition. For example, an asymmetrical pattern was observed when infant word learning was assessed using the switch task with stop-initial and fricative-initial minimal pair CVC nonsense syllables (Altvater-Mackensen & Fikkert, 2010). In this task, Dutch-learning fourteen-month-olds noticed a fricative to stop change but failed to detect a stop to fricative change. These findings were interpreted in terms of phonological representations emerging in early lexical development. In this study, we tested English and French infants aged 4-5 months to determine whether they show a perceptual bias favoring stop manner. We presented CVC nonsense syllables - /bas/ and /vas/- in a preference task using the look-to-listen procedure. The /b-v/ contrast is phonemic in English and French. Infants listened significantly longer to /bas/ than to /vas/ trials (p = .004). This perceptual preference cannot be explained in terms of phonological representations in young infants who are not yet producing stops or fricatives and have almost no receptive vocabulary. We will discuss this phonetic bias in light of adult data showing similar perceptual asymmetries and consider the implications for the development of infant speech processing and early word learning.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.