We are interested in the problem of robust understanding from noisy spontaneous speech input. With the advances in automated speech recognition (ASR), there has been increasing interest in spoken language understanding (SLU). A challenge in large vocabulary spoken language understanding is robustness to ASR errors. State of the art spoken language understanding relies on the best ASR hypotheses (ASR 1-best). In this paper, we propose methods for a tighter integration of ASR and SLU using word confusion networks (WCNs). WCNs obtained from ASR word graphs (lattices) provide a compact representation of multiple aligned ASR hypotheses along with word confidence scores, without compromising recognition accuracy. We present our work on exploiting WCNs instead of simply using ASR one-best hypotheses. In this work, we focus on the tasks of named entity detection and extraction and call classification in a spoken dialog system, although the idea is more general and applicable to other spoken language processing tasks. For named entity detection, we have improved the F-measure by using both word lattices and WCNs, 6-10% absolute. The processing of WCNs was 25 times faster than lattices, which is very important for real-life applications. For call classification, we have shown between 5% and 10% relative reduction in error rate using WCNs compared to ASR 1-best output.
This paper describes our sentiment analysis systems which have been built for SemEval-2015 Task 10 Subtask B and E. For subtask B, a Logistic Regression classifier has been trained after extracting several groups of features including lexical, syntactic, lexiconbased, Z score and semantic features. A weighting schema has been adapted for positive and negative labels in order to take into account the unbalanced distribution of tweets between the positive and negative classes. This system is ranked third over 40 participants, it achieves average F1 64.27 on Twitter data set 2015 just 0.57% less than the first system. We also present our participation in Subtask E in which our system has got the second rank with Kendall metric but the first one with Spearman for ranking twitter terms according to their association with the positive sentiment.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.