Interspeech 2017 2017
DOI: 10.21437/interspeech.2017-837
|View full text |Cite
|
Sign up to set email alerts
|

End-of-Utterance Prediction by Prosodic Features and Phrase-Dependency Structure in Spontaneous Japanese Speech

Abstract: This study is aimed at uncovering a way that participants in conversation predict end-of-utterance for spontaneous Japanese speech. In spontaneous everyday conversation, the participants must predict the ends of utterances of a speaker to perform smooth turn-taking without too much gap. We consider that they utilize not only syntactic factors but also prosodic factors for the end-of-utterance prediction because of the difficulty of prediction of a syntactic completion point in spontaneous Japanese. In previous… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
8
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 14 publications
(8 citation statements)
references
References 7 publications
0
8
0
Order By: Relevance
“…There are some works that investigated other features in speech such as N-gram model [13], dependency structures [14], and the previous turn-taking behaviors [15]. There are also other works that investigated non-verbal features such as respiratory features [16], head pose features [17], and eye-gaze features [18].…”
Section: Turn-taking Predictionmentioning
confidence: 99%
“…There are some works that investigated other features in speech such as N-gram model [13], dependency structures [14], and the previous turn-taking behaviors [15]. There are also other works that investigated non-verbal features such as respiratory features [16], head pose features [17], and eye-gaze features [18].…”
Section: Turn-taking Predictionmentioning
confidence: 99%
“…One line of work looked at the use of acoustic and lexical features for modeling turn-taking behavior [4,5,9,10]. Liu et al [5], Masumura et al [9], and Ishimoto et al [10] looked at the problem in Japanese conversations while Maier et al [4] looked at the problem in German conversations. Masumura et al proposed using stacked time-asynchronous sequential networks for detecting end-of-turns given sequences of asynchronous features (e.g., MFCCs and words) [9].…”
Section: Related Workmentioning
confidence: 99%
“…One line of work looked at the use of acoustic and lexical features for modeling turn-taking behavior [4,5,9,10]. Liu et al [5], Masumura et al [9], and Ishimoto et al [10] looked at the problem in Japanese conversations while Maier et al [4] looked at the problem in German conversations.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…The majority of investigated feature sets are based on prosodic features such as fundamental frequency (F0) and power [9,10,11,12,13]. Linguistic features were also investigated such as syntactic structure, turn-ending markers, and language model [14,15]. Moreover, multi-modal features were also considered such as eye-gaze [16,17,18,19], respiration [20,21,22], and head-direction [16,23].…”
Section: Introductionmentioning
confidence: 99%