ASR post-correction for spoken dialogue systems based on semantic, syntactic, lexical and contextual information

López-Cózar, Ramón; Callejas, Zoraida

doi:10.1016/j.specom.2008.03.008

Cited by 22 publications

(5 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As Figure 2 shows, these tasks are usually implemented in different modules. Speech recognition is the process of obtaining a sequence of words (sentence in text format) from a speech signal generated by a speaker [58,59]. It is a very complex task as there is much variability in the input characteristics, which can differ depending on the linguistics of the utterance, the speaker, the interaction context and the transmission channel.…”

Section: Our Methodology For Creating Conversational Metabotsmentioning

confidence: 99%

“…Natural language processing is the process of obtaining the semantic of a text string. It generally involves morphological, lexical, syntactical, semantic, discourse and pragmatical knowledge [60]. The dialog manager decides the next action of the system, for example, provide information to the user after a query to the databases [61].…”

Section: Our Methodology For Creating Conversational Metabotsmentioning

confidence: 99%

See 1 more Smart Citation

Developing enhanced conversational agents for social virtual worlds

et al. 2019

Self Cite

View full text Add to dashboard Cite

In this paper, we present a methodology for the development of embodied conversational agents for social virtual worlds. The agents provide multimodal communication with their users in which speech interaction is included. Our proposal combines different techniques related to Artificial Intelligence, Natural Language Processing, Affective Computing, and User Modeling. Firstly, the developed conversational agents. A statistical methodology has been developed to model the system conversational behavior, which is learned from an initial corpus and improved with the knowledge acquired from the successive interactions. In addition, the selection of the next system response is adapted considering information stored into user's profiles and also the emotional contents detected in the user's utterances. Our proposal has been evaluated with the successful development of an embodied conversational agent which has been placed in the Second Life social virtual world. The avatar includes the different models and interacts with the users who inhabit the virtual world in order to provide academic information. The experimental results show that the agent's conversational behavior adapts successfully to the specific characteristics of users interacting in such environments.

show abstract

Section: Our Methodology For Creating Conversational Metabotsmentioning

confidence: 99%

Section: Our Methodology For Creating Conversational Metabotsmentioning

confidence: 99%

Developing enhanced conversational agents for social virtual worlds

et al. 2019

Self Cite

View full text Add to dashboard Cite

show abstract

“…domain-dependent) manner. [9][10][11][12][13] To reduce the limitations of the statistical approach, we employ a representational approach based on the focus tree model of attentional information in human-machine interaction. Various adaptations of this model were successfully applied in several prototypical conversational agents for the purposes of natural language understanding and dialogue management.…”

Section: The Methodological Aspectmentioning

confidence: 99%

Hybrid methodological approach to context-dependent speech recognition

Miskovic

Gnjatović

Štrbac

et al. 2017

International Journal of Advanced Robotic Systems

View full text Add to dashboard Cite

Although the importance of contextual information in speech recognition has been acknowledged for a long time now, it has remained clearly underutilized even in state-of-the-art speech recognition systems. This article introduces a novel, methodologically hybrid approach to the research question of context-dependent speech recognition in human-machine interaction. To the extent that it is hybrid, the approach integrates aspects of both statistical and representational paradigms. We extend the standard statistical pattern-matching approach with a cognitively inspired and analytically tractable model with explanatory power. This methodological extension allows for accounting for contextual information which is otherwise unavailable in speech recognition systems, and using it to improve post-processing of recognition hypotheses. The article introduces an algorithm for evaluation of recognition hypotheses, illustrates it for concrete interaction domains, and discusses its implementation within two prototype conversational agents.

show abstract

“…A syllable-based noisy channel model combined with higher level semantic knowledge for post recognition error correction, independent of the internal confidence measures of the ASR engine is described in (Jeong et al, 2004). In (López-Cózar and Callejas, 2008) the authors propose a method to correct errors in spoken dialogue systems. They consider several contexts to correct the speech recognition output including learning a threshold during training to decide when the correction must be carried out in the context of a dialogue system.…”

Section: Related Workmentioning

confidence: 99%

Adapting general-purpose speech recognition engine output for domain-specific natural language question answering

Anantaram,

Kopparapu

2017

Preprint

View full text Add to dashboard Cite

Speech-based natural language question-answering interfaces to enterprise systems are gaining a lot of attention. General-purpose speech engines can be integrated with NLP systems to provide such interfaces. Usually, general-purpose speech engines are trained on large 'general' corpus. However, when such engines are used for specific domains, they may not recognize domain-specific words well, and may produce erroneous output. Further, the accent and the environmental conditions in which the speaker speaks a sentence may induce the speech engine to inaccurately recognize certain words. The subsequent natural language question-answering does not produce the requisite results as the question does not accurately represent what the speaker intended. Thus, the speech engine's output may need to be adapted for a domain before further natural language processing is carried out. We present two mechanisms for such an adaptation, one based on evolutionary development and the other based on machine learning, and show how we can repair the speech-output to make the subsequent natural language question-answering better. IntroductionSpeech-enabled natural-language question-answering interfaces to enterprise application systems, such as Incident-logging systems, Customer-support systems, Marketing-opportunities systems, Sales data systems etc., are designed to allow end-users to speak-out the problems/questions that they encounter and get automatic responses. The process of converting human spoken speech into text is performed by an Automatic Speech Recognition (ASR) engine. While functional examples of ASR with enterprise systems can be seen in day-to-day use, most of

show abstract

ASR post-correction for spoken dialogue systems based on semantic, syntactic, lexical and contextual information

Cited by 22 publications

References 34 publications

Developing enhanced conversational agents for social virtual worlds

Developing enhanced conversational agents for social virtual worlds

Hybrid methodological approach to context-dependent speech recognition

Adapting general-purpose speech recognition engine output for domain-specific natural language question answering

Contact Info

Product

Resources

About