This paper presents the development of a speech interface for controlling a high fidelity system from natural language sentences. A Bayesian Belief Network approach is proposed for dialog modeling. This solution is applied to infer the user's goals corresponding to the processed utterances. Subsequently, from the inferred goals, missing or spurious concepts are automatically detected. This is used to drive the dialog prompting for missing concepts and clarifying for spurious concepts allowing more flexible and natural dialogs. A dialog strategy which makes use of the dialog history and the system's state is also presented.
At ICSLP'96 we presented a flexible, large vocabulary, speaker independent, isolated-word preselection system in a telephone environment, using a two stage, bottom-up strategy [6]. We achieved reasonable performance in large and very large vocabulary tasks, ranging from 1200 to 10000 words.In this paper, we describe recent studies we have carried out on the system, aimed at two directions: handling of non speech sounds in the speech signal (we consider lips, respiration and click noises); and making the preselection lists dynamic in length, to reduce computational load, in the average. In the first case, we want to model non speech sounds, as these effects are crucial in real-life situations, leading to wrong endpointing and increasing error rates. In the second, we are interested in integrating any available system parameter to calculate the preselection list length to use, having applied both parametric and non parametric methods.
In the context of large vocabulary speech recognition system, it's of major interest to classify every utterance as being correctly or incorrectly recognised.In this paper we are presenting a preliminary study on a wordlevel confidence estimation system based on the output of a neural network. We use a combination of multiple features extracted from the acoustical and lexical decoders of our reference system, those available in the hypothesis stage of a hypothesis-verification very large vocabulary telephone speech recognition system. We will show the system architecture, describe the experiments leading to the selection of the set of parameters to be used by the NN and the final performance, showing promising results as compared with the use of standard log-likelihood ratio techniques for confidence scoring.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.