This study examines how visual speech information affects native judgments of the intelligibility of speech sounds produced by non-native (L2) speakers. Native Canadian English perceivers as judges perceived three English phonemic contrasts (/b-v, θ-s, l-ɹ/) produced by native Japanese speakers as well as native Canadian English speakers as controls. These stimuli were presented under audio-visual (AV, with speaker voice and face), audio-only (AO), and visual-only (VO) conditions. The results showed that, across conditions, the overall intelligibility of Japanese productions of the native (Japanese)-like phonemes (/b, s, l/) was significantly higher than the non-Japanese phonemes (/v, θ, ɹ/). In terms of visual effects, the more visually salient non-Japanese phonemes /v, θ/ were perceived as significantly more intelligible when presented in the AV compared to the AO condition, indicating enhanced intelligibility when visual speech information is available. However, the non-Japanese phoneme /ɹ/ was perceived as less intelligible in the AV compared to the AO condition. Further analysis revealed that, unlike the native English productions, the Japanese speakers produced /ɹ/ without visible lip-rounding, indicating that non-native speakers' incorrect articulatory configurations may decrease the degree of intelligibility. These results suggest that visual speech information may either positively or negatively affect L2 speech intelligibility.
Compared to stress-timed English, mora-timed Japanese is characterized by a simpler syllabic structure and no vowel reduction. Such differences may explain some aspects of the problems that Japanese talkers have in producing English speech rhythm, i.e., an L1 influence on L2 rhythm production. The present study tested whether this L1 influence on L2 could be moderated by an increase in L2 experience. We examined English sentences spoken by Japanese ('experienced' and 'inexperienced' English learners) and native Australian English talkers. The mean duration and variability of consonant and vowel intervals were calculated using rhythm metrics. The results showed that the mean duration of phoneme intervals was relatively longer in L2 speech, particularly the inexperienced L2, compared to L1 speech. Furthermore, the inexperienced L2 talkers exhibited the least vowel durational variability, with the English talkers having the most; the values of the experienced L2 talkers were intermediate. Differences among the talker groups were well described by the coefficient of variations of vowel and consonant durations, more specifically, durational variability increased as the phoneme duration got shorter. Overall, the results demonstrated that an L1 influence on L2 speech rhythm production decreases as a function of L2 experience.
This study investigated the effects of L2 learning experience in relation to L1 background on hemispheric processing of Japanese pitch accent. Native Mandarin Chinese (tonal L1) and English (non-tonal L1) learners of Japanese were tested using dichotic listening. These listener groups were compared with those recruited in Wu, Tu & Wang (2012), including native Mandarin and English listeners without Japanese experience and native Japanese listeners. Results revealed an overall right-hemisphere preference across groups, suggesting acoustically oriented processing. Individual pitch accent patterns also revealed pattern-specific laterality differences, further reflecting acoustic-level processing. However, listener group differences indicated L1 effects, with the Chinese but not English listeners approximating the Japanese patterns. Furthermore, English learners but not naïve listeners exhibited a shift towards the native direction, revealing effects of L2 learning. These findings imply integrated effects of acoustic and linguistic aspects on Japanese pitch accent processing as a function of L1 and L2 experience.
This study investigated the durational rhythm characteristics of L2 English produced by L1 Japanese talkers with a particular focus on the interaction between L1 and L2 speech rhythm and the role of L2 experience. More specifically, for English sentences (N = 40) spoken by native Japanese (N = 10) and native Australian English (N = 10) talkers, we examined (1) mean consonant and vowel durations and durational variability and (2) vowel and consonant timing patterns within consonant clusters. Half of the Japanese talkers had more experience in L2 English. L2 productions had longer consonant and vowel duration and less variability compared to native English ones. The degree of L2 experience played a role: inexperienced L2 talkers produced less variable vowel durations than the experienced ones. The analyses of speech timing patterns in consonant cluster productions showed that inexperienced Japanese talkers produced significantly shorter second consonants in consonant clusters compared to native English productions, possibly compensating for the difficulty in producing nonnative consonant clusters. Additional analyses of the pattern of consonant cluster production by experienced L2 talkers will also be reported.
This study examined whether visual speech provides speech-rhythm information that perceivers can use in speech perception. This was tested by using speech that naturally varied in the familiarity of its rhythm. Thirty Australian English L1 listeners performed a speech perception in noise task with English sentences produced by three speakers: an English L1 speaker (familiar rhythm); an experienced English L2 speaker who had a weak foreign accent (familiar rhythm), and an inexperienced English L2 speaker who had a strong foreign accent (unfamiliar speech rhythm). The spoken sentences were presented in three conditions: Audio-Only (AO), Audio-Visual with mouth covered (AVm), and Audio-Visual (AV). Speech was best recognized in the AV condition regardless of the degree of foreign accent. However, speech recognition in AVm was better than AO for the speech with no foreign accent and with a weak accent, but not for the speech with a strong accent. A follow-up experiment was conducted that only used the speech with a strong foreign accent, under more audible conditions. The results also showed no difference between the AVm and AO conditions, indicating the null effect was not due to a floor effect. We propose that speech rhythm is conveyed by the motion of the jaw opening and closing, and perceivers use this information to better perceive speech in noise.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.