IntroductionSpeech communication is multi-sensory in nature. Seeing a speaker’s head and face movements may significantly influence the listeners’ speech processing, especially when the auditory information is not clear enough. However, research on the visual-auditory integration speech processing has left prosodic perception less well investigated than segmental perception. Furthermore, while native Japanese speakers tend to use less visual cues in segmental perception than in other western languages, to what extent the visual cues are used in Japanese focus perception by the native and non-native listeners remains unknown. To fill in these gaps, we test focus perception in Japanese among native Japanese speakers and Cantonese speakers who learn Japanese, using auditory-only and auditory-visual sentences as stimuli.MethodologyThirty native Tokyo Japanese speakers and thirty Cantonese-speaking Japanese learners who had passed the Japanese-Language Proficiency Test with level N2 or N3 were asked to judge the naturalness of 28 question-answer pairs made up of broad focus eliciting questions and three-word answers carrying broad focus, or contrastive or non-contrastive narrow focus on the middle object words. Question-answer pairs were presented in two sensory modalities, auditory-only and visual-auditory modalities in two separate experimental sessions.ResultsBoth the Japanese and Cantonese groups showed weak integration of visual cues in the judgement of naturalness. Visual-auditory modality only significantly influenced Japanese participants’ perception when the questions and answers were mismatched, but when the answers carried non-contrastive narrow focus, the visual cues impeded rather than facilitated their judgement. Also, the influences of specific visual cues like the displacement of eyebrows or head movements of both Japanese and Cantonese participants’ responses were only significant when the questions and answers were mismatched. While Japanese participants consistently relied on the left eyebrow for focus perception, the Cantonese participants referred to head movements more often.DiscussionThe lack of visual-auditory integration in Japanese speaking population found in segmental perception also exist in prosodic perception of focus. Not much foreign language effects has been found among the Cantonese-speaking learners either, suggesting a limited use of facial expressions in focus marking by native and non-native Japanese speakers. Overall, the present findings indicate that the integration of visual cues in perception of focus may be specific to languages rather than universal, adding to our understanding of multisensory speech perception.