In this study, we investigate syntactic and prosodic features of the speaker's speech at points where turn-taking and backchannels occur, on the basis of our analysis of Japanese spontaneous dialogs. Specifically, we focus on features such as part of speech, duration, F0 contour pattern, relative height of the peak F0, energy trajectory pattern, and relative height of the peak energy at the final part of speech segments. We examine, first, the relationship between turn-taking/backchannels and each feature of speech segments independently, showing that the features examined in this study are all related to turn-taking or backchannels and that the way they correlate is fairly consistent with previous studies. Next, we explore the inter-relationship among the features with respect to turn-taking and backchannels. We show that in both turn-taking and backchannels, (1) some instances of syntactic features make extremely strong contributions, and (2) in general, syntax has a stronger contribution than any individual prosodic feature, although the whole prosody contributes as strongly as, or even more strongly than, syntax. We also discuss some implications of our results, comparing them with previous models that have mentioned roles of syntax and prosody in turn-taking and backchannels.
The balanced corpus of contemporary written Japanese (BCCWJ) is Japan's first 100 million words balanced corpus. It consists of three subcorpora (publication subcorpus, library subcorpus, and special-purpose subcorpus) and covers a wide range of text registers including books in general, magazines, newspapers, governmental white papers, best-selling books, an internet bulletinboard, a blog, school textbooks, minutes of the national diet, publicity newsletters of local governments, laws, and poetry verses. A random sampling technique is utilized whenever possible in order to maximize the representativeness of the corpus. The corpus is annotated in terms of dual POS analysis, document structure, and bibliographical information. The BCCWJ is currently accessible in three different ways including Chunagon a web-based interface to the dual POS analysis data. Lastly, results of some pilot evaluation of the corpus with respect to the textual diversity are reported. The analyses include POS distribution, word-class distribution, entropy of orthography, sentence length, and variation of the adjective predicate. High textual diversity is observed in all these analyses.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.