Speech Summarization

Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Liu

Chen³

et al. 2014

Statistical language modeling (LM) that purports to quantify the acceptability of a given piece of text has long been an interesting yet challenging research area. In particular, language modeling for information retrieval (IR) has enjoyed remarkable empirical success; one emerging stream of the LM approach for IR is to employ the pseudo-relevance feedback process to enhance the representation of an input query so as to improve retrieval effectiveness. This paper presents a continuation of such a general line of research and the main contribution is threefold. First, we propose a principled framework which can unify the relationships among several widely-used query modeling formulations. Second, on top of the successfully developed framework, we propose an extended query modeling formulation by incorporating critical query-specific information cues to guide the model estimation. Third, we further adopt and formalize such a framework to the speech recognition and summarization tasks. A series of empirical experiments reveal the feasibility of such an LM framework and the performance merits of the deduced models on these two tasks.

Section: Resultsmentioning

confidence: 99%

Section: Speech Summarizationmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Leveraging Effective Query Modeling Techniques for Speech Recognition and Summarization

Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Liu

Chen³

et al. 2014

“…Obviously, speech is one of the most important sources of information about multimedia. Users can listen to and digest multimedia associated with spoken documents efficiently by virtue of extractive speech summarization, which selects a set of indicative sentences from an original spoken document according to a target summarization ratio and concatenates them together to form a summary accordingly [4][5][6][7]. The wide array of extractive speech summarization methods that have been developed so far may roughly fall into three main categories [4,7]: 1) methods simply based on the sentence position or structure information, 2) methods based on unsupervised sentence ranking, and 3) methods based on supervised sentence classification.…”

Section: Introductionmentioning

confidence: 99%

“…Even if the performance of unsupervised summarizers is not always comparable to that of supervised summarizers, their easy-to-implement and flexible property (i.e., they can be readily adapted and carried over to summarization tasks pertaining to different languages, genres or domains) still makes them attractive. Interested readers may also refer to [4][5][6][7] for thorough and entertaining discussions of major methods that have been successfully developed and applied to a wide variety of text and speech summarization tasks.…”

Section: Introductionmentioning

confidence: 99%

A recurrent neural network language modeling framework for extractive speech summarization

Liu

2014 IEEE International Conference on Multimedia and Expo (ICME)

et al. 2014

Extractive speech summarization, with the purpose of automatically selecting a set of representative sentences from a spoken document so as to concisely express the most important theme of the document, has been an active area of research and development. A recent school of thought is to employ the language modeling (LM) approach for important sentence selection, which has proven to be effective for performing speech summarization in an unsupervised fashion. However, one of the major challenges facing the LM approach is how to formulate the sentence models and accurately estimate their parameters for each spoken document to be summarized. This paper presents a continuation of this general line of research and its contribution is two-fold. First, we propose a novel and effective recurrent neural network language modeling (RNNLM) framework for speech summarization, on top of which the deduced sentence models are able to render not only word usage cues but also long-span structural information of word co-occurrence relationships within spoken documents, getting around the need for the strict bag-of-words assumption. Second, the utilities of the method originated from our proposed framework and several widely-used unsupervised methods are analyzed and compared extensively. A series of experiments conducted on a broadcast news summarization task seem to demonstrate the performance merits of our summarization method when compared to several state-of-the-art existing unsupervised methods.

Asso for Info Science & Tech

Generic speech summarization of transcribed lecture videos: Using tags and their semantic relations

Kim

2014

We propose a tag-based framework that simulates human abstractors' ability to select significant sentences based on key concepts in a sentence as well as the semantic relations between key concepts to create generic summaries of transcribed lecture videos. The proposed extractive summarization method uses tags (viewer-and author-assigned terms) as key concepts. Our method employs Flickr tag clusters and WordNet synonyms to expand tags and detect the semantic relations between tags. This method helps select sentences that have a greater number of semantically related key concepts. To investigate the effectiveness and uniqueness of the proposed method, we compare it with an existing technique, latent semantic analysis (LSA), using intrinsic and extrinsic evaluations. The results of intrinsic evaluation show that the tag-based method is as or more effective than the LSA method. We also observe that in the extrinsic evaluation, the grand mean accuracy score of the tag-based method is higher than that of the LSA method, with a statistically significant difference. Elaborating on our results, we discuss the theoretical and practical implications of our findings for speech video summarization and retrieval.