This aticle presents trainable methods for extracting principal content words from voicemail messages. The short text summaries generated are suitable for mobile messaging applications. The system uses a set of classifiers to identify the summary words with each word described by a vector of lexical and prosodic features. We use an ROC-based algorithm, Parcel, to select input features (and classifiers). We have performed a series of objective and subjective evaluations using unseen data from two different speech recognition systems as well as human transcriptions of voicemail speech.
The amount of archived audio material in digital form is increasing rapidly, as advantage is taken of the growth in available storage and processing power. Computational resources are becoming less of a bottleneck to digitally record and archive vast amounts of spoken material, both television and radio broadcasts and individual conversations. However, listening to this ever-growing amount of spoken audio sequentially is too slow, and the bottleneck will become the development of effective ways to access content in these voluminous archives. The provision of accurate and efficient computer-mediated content access is a challenging task, because spoken audio combines information from multiple levels (phonetic, acoustic, syntactic, semantic and discourse). Most systems that assist humans in accessing spoken audio content have approached the problem by performing automatic speech recognition, followed by text-based information access. These systems have addressed diverse tasks including indexing and retrieving voicemail messages, searching for broadcast news, and extracting information from recordings of meetings and lectures. Spoken audio content is far richer than what a simple textual transcription can capture as it has additional cues that disclose the intended meaning and speaker's emotional state. However, the text transcription alone still provides a great deal of useful information in applications. This article describes approaches to content-based access to spoken audio with a qualitative and tutorial emphasis. We describe how the analysis, retrieval and delivery phases contribute making spoken audio content more accessible, and we outline a number of outstanding research issues. We also discuss the main application domains and try to identify important issues for future developments. The structure of the article is based on general system architecture for content-based
We have used data on patents and publications, and from an Internet-based survey, to analyse corporate technological activities in Automatic Speech Recognition (ASR) and Natural Language Processing (NLP) technologies. Two distinct clusters of firms exist: large firms mainly in telecommunications, desktop computing and consumer electronics; and small firms specialising in speech technologies. The small specialised firms depend heavily on nearby universities and public research institutes and, to some extent, on nearby large firms; their relations with the large firms are complementary as well as competitive. Similar patterns can be observed in other, recently emerging, "new science"-based technologies. Integration between ASR and NLP has so far been weak with the two research communities functioning more or less independently, with the former progressing more rapidly than the latter. Having built technological capabilities in ASR and NLP with a small proportion of their corporate technological resources, the large firms have two options depending on the rate of progress in these technologies (especially NLP) in the future. If it is high, substantial investments (including those in complementary technologies) could open up massive market opportunities. If it is low, modest investments will allow the exploitation of niche markets.
This paper describes the development of a system to transcribe and summarize voicemail messages. The results of the research presented in this paper are two-fold. First, a hybrid connectionist approach to the Voicemail transcription task shows that competitive performance can be achieved using a context-independent system with fewer parameters than those based on mixtures of Gaussian likelihoods. Second, an effective and robust combination of statistical with prior knowledge sources for term weighting is used to extract information from the decoder's output in order to deliver summaries to the message recipients via a GSM Short Message Service (SMS) gateway.
Abstract. This paper is about the applicability of stochastic language models to the task of categorizing voicemail message transcripts. The target categories are related to priority and content and are thus suitable for mobile messaging applications based on profiles which can be determined by users' physical and social environment. Categorization is performed by comparing the posterior probabilities of test messages under the language models of each target category. Stochastic models were selected over other lexical features because of their ability to incorporate context dependencies while their parameters are determined automatically from data. Despite the relatively small amount of training data used and given the spontaneous nature of voicemail, the models performed fairly accurately. Our experiments examine the effects that factors such as the word error rate, the n-gram order, smoothing and textual representation have on overall categorization accuracy.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.