A Spoken Dialogue System for Enabling Information Behavior of Various Intention Levels

Takatsu, Hirokatsu; Fukuoka, Ishin; Fujie, Shinya; Hayashi, Yoshikatsu; Kobayashi, Tetsuo

doi:10.1527/tjsai.dsh-c

Cited by 5 publications

(4 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In addition, we defined Wait Request to make the system wait to speak so that simultaneous utterances by the user and system would not occur. We employed seven annotators to annotate these intention labels to users' utterance data collected by our spoken dialogue system [3]. Among all the collected user utterances, we extracted short user utterances of less than 1.5 seconds using a voice activity detection (VAD) program.…”

Section: Datasetmentioning

confidence: 99%

“…By applying such a series of processes to a human-system conversation, Fujie and Kobayashi developed a smooth and convenient conversation system [1,2]. By providing or eliminating information depending on the user's needs [3], we develop a spoken dialogue system that efficiently delivers a massive amount of information. In the proposed system, any given written documents, such as news articles, can be translated into an utterance plan consisting of a primary plan for delivering main content and the associated subsidiary plans for supplementing the main content.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Recognition of Intentions of Users’ Short Responses for Conversational News Delivery System

et al. 2019

Self Cite

View full text Add to dashboard Cite

In human-human conversations, listeners often convey intentions to their speakers through feedbacks comprising reflexive short responses. The speakers then recognize these intentions and dynamically change the conversational plans to transmit information more efficiently. For the design of spoken dialogue systems that deliver a massive amount of information, such as news, it is essential to accurately capture users' intentions from reflexive short responses to efficiently select or eliminate the information to be transmitted depending on the user's needs. However, such short responses from users are normally too short to recognize their actual intentions only from the prosodic and linguistic features of their short responses. In this paper, we propose a user's short-response intention-recognition model that accounts for the previous system's utterances as the context of the conversation in addition to prosodic and linguistic features of user's utterances. To achieve this, we define types of short response intentions in terms of effective information transmission and created new dataset by annotating over the interaction data collected using our spoken dialogue system. Our experimental results demonstrate that the classification accuracy can be improved using the linguistic features of the system's previous utterances encoded by Bidirectional Encoder Representations from Transformers (BERT) as the conversational context.

show abstract

Section: Datasetmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Recognition of Intentions of Users’ Short Responses for Conversational News Delivery System

et al. 2019

Self Cite

View full text Add to dashboard Cite

show abstract

“…As a realistic application of the proposed personalized summarization method for a spoken dialogue system, we consider a news delivery task (Takatsu et al, 2018). This news dialogue system proceeds the dialogue according to a primary plan to explain the summary of the news article and subsidiary plans to transmit supplementary information though question answering.…”

Section: Introductionmentioning

confidence: 99%

Personalized Extractive Summarization Using an Ising Machine Towards Real-time Generation of Efficient and Coherent Dialogue Scenarios

Takatsu¹,

Kashikawa²,

Kimura³

et al. 2021

Proceedings of the 3rd Workshop on Natural Language Processing for Conversational AI

Self Cite

View full text Add to dashboard Cite

We propose a personalized dialogue scenario generation system which transmits efficient and coherent information with a real-time extractive summarization method optimized by an Ising machine. The summarization problem is formulated as a quadratic unconstraint binary optimization (QUBO) problem, which extracts sentences that maximize the sum of the degree of user's interest in the sentences of documents with the discourse structure of each document and the total utterance time as constraints. To evaluate the proposed method, we constructed a news article corpus with annotations of the discourse structure, users' profiles, and interests in sentences and topics. The experimental results confirmed that a Digital Annealer, which is a simulated annealing-based Ising machine, can solve our QUBO model in a practical time without violating the constraints using this dataset.

show abstract

“…When news is transmitted by a synthesized voice, it is beneficial for listeners that news with positive content is transmitted with voices that are synthesized with positive emotion, whereas news with negative content is transmitted with voices that are synthesized with negative emotion (Pitrelli et al, 2006). In our spoken dialogue system that delivers news (Takatsu et al, 2018), it is important to speak clearly with emotion according to the content of the news to improve the users' understanding. Table 1 shows an example of a news conversation.…”

Section: Introductionmentioning

confidence: 99%

Sentiment Analysis for Emotional Speech Synthesis in a News Dialogue System

Takatsu¹,

Ando²,

Matsuyama³

et al. 2020

Proceedings of the 28th International Conference on Computational Linguistics

Self Cite

View full text Add to dashboard Cite

As smart speakers and conversational robots become ubiquitous, the demand for expressive speech synthesis has increased. In this paper, to control the emotional parameters of the speech synthesis according to certain dialogue contents, we construct a news dataset with emotion labels ("positive," "negative," or "neutral") annotated for each sentence. We then propose a method to identify emotion labels using a model combining BERT and BiLSTM-CRF, and evaluate its effectiveness using the constructed dataset. The results showed that the classification model performance can be efficiently improved by preferentially annotating news articles with low confidence in the human-in-the-loop machine learning framework.

show abstract

A Spoken Dialogue System for Enabling Information Behavior of Various Intention Levels

Cited by 5 publications

References 19 publications

Recognition of Intentions of Users’ Short Responses for Conversational News Delivery System

Recognition of Intentions of Users’ Short Responses for Conversational News Delivery System

Personalized Extractive Summarization Using an Ising Machine Towards Real-time Generation of Efficient and Coherent Dialogue Scenarios

Sentiment Analysis for Emotional Speech Synthesis in a News Dialogue System

Contact Info

Product

Resources

About