In this paper we present technology used in spoken dialog systems for applications of a wide range. They include tasks from the travel domain and automatic switchboards as well as large scale directory assistance. The overall goal in developing spoken dialog systems is to allow for a natural and flexible dialog flow similar to human-human interaction. This imposes the challenging task to recognize and interpret user input, where he/she is allowed to choose from an unrestricted vocabulary and an infinite set of possible formulations. We therefore put emphasis on strategies that make the system more robust while still maintaining a high level of naturalness and flexibility. In view of this paradigm, we found that two fundamental principles characterize many of the proposed methods: 1) to consider available sources of information as early as possible, and 2) to keep alternative hypotheses and delay the decision for a single option as long as possible.We describe how our system architecture caters to incorporating application specific knowledge, including, for example, database constraints, in the determination of the best sentence hypothesis for a user turn. On the next higher level, we use the dialog history to assess the plausibility of a sentence hypothesis by applying consistency checks with information items from previous user turns. In particular, we demonstrate how combination decisions over several turns can be exploited to boost the recognition performance of the system. The dialog manager can also use information on the dialog flow to dynamically modify and tune the system for the specific dialog situations. An important means to increase the "intelligence" of a spoken dialog system is to use confidence measures. We propose methods to obtain confidence measures for semantic items, whole sentences and even full N-best lists and give examples for the benefits obtained from their application. Experiences from field tests with our systems are summarized that have been found crucial for the system acceptance.Index Terms-Application specific knowledge, combined decisions, confidence measures, dialog history, natural language understanding, spoken dialog systems.
The Philips automatic telephone switchboard and directory information system PADIS provides a natural-language user interface to a telephone directory database. Using speech recognition and language understanding technologies, the system offers phone numbers, fax numbers, email addresses, and room numbers as well as direct call completion to a desired party.In this paper, we present the underlying probabilistic framework, the system architecture, and the individual modules for speech recognition, language understanding, dialogue control, and speech output. In addition, we report results on performance and user behaviour obtained from a field test in our research lab with a 600-entry database.We derive a new maximum-a-posteriori decision rule which incorporates database knowledge and dialogue history as constraints in speech recognition and language understanding. It has improved speech understanding accuracy by 19% (in terms of concept error rate), and reduced attribute substitution errors (e.g. recognition of a wrong name) by 38%.The decision rule is implemented in a multi-stage approach as a combination of state-of-the-art speech recognition, partial parsing with an attributed stochastic context-free grammar, and an N-best algorithm which is also described in this paper.The system conducts a flexible mixed-initiative dialogue rather than using a rigid form-filling scheme, and incorporates database knowledge to optimize the dialogue flow.
In this paper, we present the Philips automatic telephone switchboard and directory information system PADIS 1 . PADIS understands natural-language requests in fluently spoken German. The system offers telephone, fax, and room numbers, email addresses, private phone numbers, and direct call completion. A setup with a 500-entry database is currently in a field test in our research laboratory and has shown a success rate of 90%. This paper describes the system architecture and its components, and presents experiences as well as results from the field test.
In the course of a (man-machine) dialogue, the system's belief concerning the user's intention is continuously being built up. Moreover, restricting the discourse to a narrow application domain further constrains the variety of possible user reactions. In this paper, we will show h o w these knowledge sources may be utilized in a stochastic framework to improve speech understanding. On eld-test data collected with our automatic exchange board prototype PADIS 1 , a relative reduction of attribute errors by 27% has been obtained.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.