Online deception is disrupting our daily life, organizational process, and even national security. Existing approaches to online deception detection follow a traditional paradigm by using a set of cues as antecedents for deception detection, which may be hindered by ineffective cue identification. Motivated by the strength of statistical language models (SLMs) in capturing the dependency of words in text without explicit feature extraction, we developed SLMs to detect online deception. We also addressed the data-sparsity problem in building SLMs in general and in deception detection in specific using smoothing and vocabulary pruning techniques. The developed SLMs were evaluated empirically with diverse data sets. The results showed that the proposed SLM approach to deception detection outperformed a state-of-the-art text categorization method, as well as traditional feature-based methods.
The great potential of speech recognition systems in freeing users' hands while interacting with computers has inspired a variety of promising applications. However, given the performance of the state-of-the-art speech recognition technology today, widespread acceptance of speech recognition technology would not be realistic without designing and developing new approaches to detecting and correcting recognition errors effectively. In seeking solutions to the above problem, identifying cues to error detection (CERD) is central. Our survey of the extant literature on the detection and correction of speech recognition errors reveals that the systeminitiated, data-driven approach is dominant, but that heuristics from human users have been largely overlooked. This may have hindered the advance of speech technology. In this research, we propose a user-centered approach to discovering CERD. User studies are carried out to implement the approach. Content analysis of the collected verbal protocols lends itself to a taxonomy of CERD. The CERD discovered in this study can improve our knowledge on CERD by not only validating CERD from a user's perspective but also suggesting promising new CERD for detecting speech recognition errors. Moreover, the analysis of CERD in relation to error types and other CERD provides new insights into the context where specific CERD are effective. The findings of this study can be used to not only improve speech recognition output but also to provide context-aware support for error detection. This will help break the barrier for mainstream adoption of speech technology in a variety of information systems and applications.KEY WORDS AND PHRASES: cues to error detection, speech recognition, taxonomy, verbal protocol analysis.SPEECH RECOGNITION IS ONE OF THE MAIN information technologies that provide users with hands-free interaction with computers. The power of speech control promises to help many individuals who might otherwise not be able to interact with a computer through the conventional mouse or keyboard interface. Particularly, users who are physically challenged, visually impaired, or mobility impaired are offered new opportunities by speech recognition-enabled applications.Although many interesting applications have emerged or enhanced with the advance of speech technology, the current use of such a technology reflects "only a tip of the iceberg of the full power that the technology could potentially offer" (cf. [8, p. 69]). However, some fundamental and practical limitations of the technology result in unsatisfactory performance of speech recognition systems. The convenience and efficiency promised by speech technology in interacting with computers is seriously compromised by the laborious efforts and frustration experienced in detecting and correcting recognition errors [40]. Widespread acceptance of speech as a primary input modality for computers will not be possible unless the underlying recognition technology can produce sufficiently robust and low-error output [8].To br...
Recognition errors hinder the proliferation of speech recognition (SR) systems. Based on the observation that recognition errors may result in ungrammatical sentences, especially in dictation application where an acceptable level of accuracy of generated documents is indispensable, we propose to incorporate two kinds of linguistic features into error detection: lexical features of words, and syntactic features from a robust lexicalized parser. Transformation-based learning is chosen to predict recognition errors by integrating word confidence scores with linguistic features. The experimental results on a dictation data corpus show that linguistic features alone are not as useful as word confidence scores in detecting errors. However, linguistic features provide complementary information when combined with word confidence scores, which collectively reduce the classification error rate by 12.30% and improve the F measure by 53.62%.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.