One of the first steps in building a spoken language understanding (SLU) module for dialogue systems is the extraction of flat concepts out of a given word sequence, usually provided by an automatic speech recognition (ASR) system. In this paper, six different modeling approaches are investigated to tackle the task of concept tagging. These methods include classical, well-known generative and discriminative methods like Finite State Transducers (FSTs), Statistical Machine Translation (SMT), Maximum Entropy Markov Models (MEMMs), or Support Vector Machines (SVMs) as well as techniques recently applied to natural language processing such as Conditional Random Fields (CRFs) or Dynamic Bayesian Networks (DBNs). Following a detailed description of the models, experimental and comparative results are presented on three corpora in different languages and with different complexity. The French MEDIA corpus has already been exploited during an evaluation campaign and so a direct comparison with existing benchmarks is possible. Recently collected Italian and Polish corpora are used to test the robustness and portability of the modeling approaches. For all tasks, manual transcriptions as well as ASR inputs are considered. Additionally to single systems, methods for system combination are investigated. The best performing model on all tasks is based on conditional random fields. On the MEDIA evaluation corpus, a concept error rate of 12.6% could be achieved. Here, additionally to attribute names, attribute values have been extracted using a combination of a rule-based and a statistical approach. Applying system combination using weighted ROVER with all six systems, the concept error rate (CER) drops to 12.0%.
The magnetotransport in a set of identical parallel Al x Ga 1−x N / GaN quantum wire structures is investigated. The width of the wires ranges between 1110 and 340 nm. For all sets of wires, clear Shubnikov-de Haas oscillations are observed. We find that the electron concentration and mobility are approximately the same for all wires, confirming that the electron gas in the Al x Ga 1−x N / GaN heterostructure is not deteriorated by the fabrication procedure of the wire structures. For the wider quantum wires, the weak antilocalization effect is clearly observed, indicating the presence of spin-orbit coupling. For narrow quantum wires with an effective electrical width below 250 nm, the weak antilocalization effect is suppressed. By comparing the experimental data to a theoretical model for quasi-one-dimensional structures, we come to the conclusion that the spin-orbit scattering length is enhanced in narrow wires.
In large-scale commercial dialog systems, users express the same request in a wide variety of alternative ways with a long tail of less frequent alternatives. Handling the full range of this distribution is challenging, in particular when relying on manual annotations. However, the same users also provide useful implicit feedback as they often paraphrase an utterance if the dialog system failed to understand it. We propose MARUPA, a method to leverage this type of feedback by creating annotated training examples from it. MARUPA creates new data in a fully automatic way, without manual intervention or effort from annotators, and specifically for currently failing utterances. By re-training the dialog system on this new data, accuracy and coverage for longtail utterances can be improved. In experiments, we study the effectiveness of this approach in a commercial dialog system across various domains and three languages.
Recently, there have been many papers studying discriminative acoustic modeling techniques like conditional random fields or discriminative training of conventional Gaussian HMMs. This paper will give an overview of the recent work and progress. We will strictly distinguish between the type of acoustic models on the one hand and the training criterion on the other hand. We will address two issues in more detail: the relation between conventional Gaussian HMMs and conditional random fields and the advantages of formulating the training criterion as a convex optimization problem. Experimental results for various speech tasks will be presented to carefully evaluate the different concepts and approaches, including both a digit string and large vocabulary continuous speech recognition tasks.Index Terms-speech recognition, hidden Markov model, discriminative training, log-linear model, conditional random field INTRODUCTIONState-of-the-art speech recognition systems are based on discriminative Gaussian HMMs (GHMMs). The major points of criticism of this conventional approach are the indirect parameterization of the posterior model including many parameter constraints, the nonconvexity of the conventional training criteria such that the optimization can get stuck in local optima, and the insufficient flexibility of the HMMs to incorporate additional dependencies and knowledge sources. The log-linear framework addresses these issues in a principled way. Examples for this framework include the log-linear model, the maximum entropy Markov model (MEMM) [1], the conditional random field (CRF) [2,3], the hidden CRF (HCRF) [4,5], and the conditional augmented (C-Aug) models [6]. In the log-linear approach, the posterior is directly modeled, the traditional training criterion is convex (except for HCRFs and C-Aug models), and it is easy to incorporate additional knowledge (although possibly at the cost of increased complexity) [7].Various approaches to direct acoustic modeling have been investigated. HCRFs [4] are closest to GHMMs. Linear-chain HCRFs [4,5] differ from conventional GHMMs mainly in the model parameterization. The training criterion for HCRFs is non-convex like for GHMMs. If all hidden variables are eliminated (cf. mixtures) or suppressed (cf. alignments), the HCRF reduces to a CRF [2,3] and the optimization problem is convex. MEMMs [1] are similar to CRFs but the posterior is based on a different decomposition and different dependence assumptions. Alternatively, a hybrid architecture [8] with log-linear models to represent the HMM state posteriors can be used. All these convex approaches have in common that the decision boundary is linear and thus, the choice of features is essential for a
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.