End-of-Utterance Prediction by Prosodic Features and Phrase-Dependency Structure in Spontaneous Japanese Speech

Ishimoto, Yuichi; Teraoka, Takehiro; Enomoto, Mika

doi:10.21437/interspeech.2017-837

Cited by 14 publications

(8 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…There are some works that investigated other features in speech such as N-gram model [13], dependency structures [14], and the previous turn-taking behaviors [15]. There are also other works that investigated non-verbal features such as respiratory features [16], head pose features [17], and eye-gaze features [18].…”

Section: Turn-taking Predictionmentioning

confidence: 99%

Prediction of Turn-taking Using Multitask Learning with Prediction of Backchannels and Fillers

et al. 2018

View full text Add to dashboard Cite

We address prediction of turn-taking considering related behaviors such as backchannels and fillers. Backchannels are used by the listeners to acknowledge that the current speaker can hold the turn. On the other hand, fillers are used by the prospective speakers to indicate a will to take a turn. We propose a turntaking model based on multitask learning in conjunction with prediction of backchannels and fillers. The multitask learning of LSTM neural networks shared by these tasks allows for efficient and generalized learning, and thus improves prediction accuracy. Evaluations with two kinds of dialogue corpora of human-robot interaction demonstrate that the proposed multitask learning scheme outperforms the conventional single-task learning.

show abstract

Section: Turn-taking Predictionmentioning

confidence: 99%

Prediction of Turn-taking Using Multitask Learning with Prediction of Backchannels and Fillers

et al. 2018

View full text Add to dashboard Cite

show abstract

“…One line of work looked at the use of acoustic and lexical features for modeling turn-taking behavior [4,5,9,10]. Liu et al [5], Masumura et al [9], and Ishimoto et al [10] looked at the problem in Japanese conversations while Maier et al [4] looked at the problem in German conversations. Masumura et al proposed using stacked time-asynchronous sequential networks for detecting end-of-turns given sequences of asynchronous features (e.g., MFCCs and words) [9].…”

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

“…Masumura et al proposed using stacked time-asynchronous sequential networks for detecting end-of-turns given sequences of asynchronous features (e.g., MFCCs and words) [9]. Ishimoto et al investigated the dependency between syntactic and prosodic features and showed that combining the two features is useful for predicting end-of-turns [10]. Liu et al built a Recurrent Neural Network (RNN) to classify a given utterance into four classes that relate to turn-taking behavior using joint acoustic and lexical embeddings [5].…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Improving End-of-Turn Detection in Spoken Dialogues by Detecting Speaker Intentions as a Secondary Task

Aldeneh

Dimitriadis

Provost

2018

2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

This work focuses on the use of acoustic cues for modeling turn-taking in dyadic spoken dialogues. Previous work has shown that speaker intentions (e.g., asking a question, uttering a backchannel, etc.) can influence turn-taking behavior and are good predictors of turn-transitions in spoken dialogues. However, speaker intentions are not readily available for use by automated systems at run-time; making it difficult to use this information to anticipate a turn-transition. To this end, we propose a multi-task neural approach for predicting turntransitions and speaker intentions simultaneously. Our results show that adding the auxiliary task of speaker intention prediction improves the performance of turn-transition prediction in spoken dialogues, without relying on additional input features during run-time.

show abstract

“…The majority of investigated feature sets are based on prosodic features such as fundamental frequency (F0) and power [9,10,11,12,13]. Linguistic features were also investigated such as syntactic structure, turn-ending markers, and language model [14,15]. Moreover, multi-modal features were also considered such as eye-gaze [16,17,18,19], respiration [20,21,22], and head-direction [16,23].…”

Section: Introductionmentioning

confidence: 99%

Turn-Taking Prediction Based on Detection of Transition Relevance Place

Hara¹,

Inoue²,

Takanashi³

et al. 2019

Interspeech 2019

View full text Add to dashboard Cite

We address turn-taking prediction in which spoken dialogue systems predict when to take the conversational floor. In natural conversations, many turn-taking decisions are arbitrary and subjective. In this study, we propose taking into account the concept of the transition relevance place (TRP) for turn-taking prediction. TRP is defined as a timing when the current speaking turn can be completed and other participants are able to take the turn. We conducted annotation of TRP on a human-robot dialogue corpus, ensuring the objectivity of this annotation among annotators. The proposed turn-taking prediction model adopts a two-step approach that detects TRP at first and then predicts a turn-taking event if TRP is detected. Experimental evaluations demonstrate that the proposed model improves the accuracy of turn-taking prediction by incorporating TRP detection.

show abstract

End-of-Utterance Prediction by Prosodic Features and Phrase-Dependency Structure in Spontaneous Japanese Speech

Cited by 14 publications

References 7 publications

Prediction of Turn-taking Using Multitask Learning with Prediction of Backchannels and Fillers

Prediction of Turn-taking Using Multitask Learning with Prediction of Backchannels and Fillers

Improving End-of-Turn Detection in Spoken Dialogues by Detecting Speaker Intentions as a Secondary Task

Turn-Taking Prediction Based on Detection of Transition Relevance Place

Contact Info

Product

Resources

About