A Term Weighting Method Based on Lexical Chain for Automatic Summarization

Song, Young In; Han, Kyoung Soo; Rim, Hae Chang

doi:10.1007/978-3-540-24630-5_78

Cited by 18 publications

(13 citation statements)

References 1 publication

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The former one, directly extracting sentences from the texts, is widely used because it is not restricted to text domain and genre. Most extractive summarization tasks are regarded as sentence ranking problems, which can be roughly divided into three types: 1) statistical feature based methods [5], [6], which simply consider term frequency, sentence position and length, title and clue words; 2) lexical chain based methods [7], which construct chains of related words with the help of lexicon such as WordNet, and select strong chains to extract salient sentences according to some standards; 3) graph ranking based methods such as LexRank [8] and TextRank [4], which use PageRank [9] to rank the text graph. Nevertheless, traditional text summarization methods are unable to satisfy the need of microblog summarization due to the severe sparsity, heavy noise and bad format of posts.…”

Section: Related Workmentioning

confidence: 99%

TR-LDA: A Cascaded Key-Bigram Extractor for Microblog Summarization

Wu¹,

Zhang²,

Xu³

et al. 2015

IJMLC

View full text Add to dashboard Cite

Abstract-Microblog summarization can save large amount of time for users in browsing. However, it is more challenging to summarize microblog than traditional documents due to the heavy noise and severe sparsity of posts. In this paper, we propose an unsupervised method named TR-LDA for summarizing microblog by cascading two key-bigram extractors based on TextRank and Latent Dirichlet Allocation (LDA). Cascading strategy contributes to a key-bigram set with better noise immunity. Two sentence ranking strategies are proposed based on the key-bigram set. Moreover, an approach of sentence extraction is proposed by merging two ranking results. Compared with some other text content based summarizers, the proposed method was shown to perform superiorly in experiments on Sina Weibo dataset. IndexTerms-Key-Bigram, extraction, microblog summarization, sentence extraction, TR-LDA. I. INTRODUCTIONMicroblog platforms such as Twitter and Sina Weibo have become part of our daily life, from which we can gain information timely to keep in touch with the world every now and then. However, sometimes we may sink into the massive information. A lot of time can be saved for users in browsing if microblog can be summarized automatically. Moreover, text analysis tasks such as classification, clustering and information retrieval can benefit from text summarization due to the reduction of dimensions.The purpose of this paper is to automatically extract several salient sentences from a set of topic related microblog posts to form a summary to summarize the core contents. From the perspective of traditional document summarization, it can be treated as a multi-document summarization problem by treating each post as a document or a single-document summarization problem by simply concatenating all posts as one document. However, the problem is still more intractable than summarizing any traditional documents, since microblog posts suffer from severe sparsity, heavy noise and bad normalization [1], while traditional documents are usually in nice structure and clear semantic. Most existing microblog summarization methods suffer from low precision.To overcome the above difficulties, we propose an unsupervised method named TR-LDA to summarize microblog by cascading key-bigram extractors. Unlike most existing methods [1]- [4], which are based on Bag-of-Words Manuscript received October 5, 2014; revised January 7, 2015. This work was supported in part by the National Natural Science Foundation of China (Grants No. 61203281 and No. 61303172).The authors are with the Institute of Automation, Chinese Academy of Sciences, 95 Zhongguancun East Road, Beijing, 100190, China (e-mail: yufang.wu@ia.ac.cn).(BoW) model to weight sentences or rank sentences directly based on text graph, our TR-LDA method generates summary by two main steps: 1) Extract a key-bigram set to discover the subtopics of the hot topic posts by cascading TextRank and LDA extractors; 2) Rank sentences based on the key-bigram set by two strategies and extract sentences by merging the two ...

show abstract

Section: Related Workmentioning

confidence: 99%

TR-LDA: A Cascaded Key-Bigram Extractor for Microblog Summarization

Wu¹,

Zhang²,

Xu³

et al. 2015

IJMLC

View full text Add to dashboard Cite

show abstract

“…Xu et al [12] derives relevance of a term from an ontology constructed with formal concept analysis. Song et al [2] basically weight a word basing on the number of lexical connections, such as semantic associations expressed in a thesaurus, that the word has with its neighboring words; along with this, more frequent words are weighted higher. Mihalcea [13,14] presents a similar idea in the form of a neat, clear graph-based formalism: the words that have closer relationships with a greater number of "important" words become more important themselves, the importance being determined in a recursive way similar to the PageRank algorithm used by Google to weight web pages.…”

Section: Related Workmentioning

confidence: 99%

“…Although, some approaches claim being domain and language independent, they use some degree of language knowledge like lexical information [2], key-phrases [3] or golden samples for supervised learning approaches [4][5][6]. Furthermore, training on a specific domain tends to customize the extraction process to that domain, so the resulting classifier is not necessarily portable.…”

Section: Introductionmentioning

confidence: 99%

Text Summarization by Sentence Extraction Using Unsupervised Learning

García-Hernández¹,

Montiel²,

Ledeneva³

et al. 2008

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. The main problem for generating an extractive automatic text summary is to detect the most relevant information in the source document. Although, some approaches claim being domain and language independent, they use high dependence knowledge like key-phrases or golden samples for machine-learning approaches. In this work, we propose a language-and domain-independent automatic text summarization approach by sentence extraction using an unsupervised learning algorithm. Our hypothesis is that an unsupervised algorithm can help for clustering similar ideas (sentences). Then, for composing the summary, the most representative sentence is selected from each cluster. Several experiments in the standard DUC-2002 collection show that the proposed method obtains more favorable results than other approaches.

show abstract

“…Xu et al [6] derives relevance of a term from an ontology constructed with formal concept analysis. Song et al [3] basically weight a word basing on the number of lexical connections, such as semantic associations expressed in a thesaurus, that the word has with its neighboring words; along with this, more frequent words are weighted higher. Mihalcea [15] presents a similar idea in the form of a neat, clear graph-based formalism: the words that have closer relationships with a greater number of "important" words become more important themselves, the importance being determined in a recursive way similar to the PageRank algorithm used by Google to weight webpages.…”

Section: Related Workmentioning

confidence: 99%

Terms Derived from Frequent Sequences for Extractive Text Summarization

Ledeneva

Gelbukh

García-Hernández

Computational Linguistics and Intelligent Text Processing

View full text Add to dashboard Cite

Abstract. Automatic text summarization helps the user to quickly understand large volumes of information. We present a language-and domain-independent statistical-based method for single-document extractive summarization, i.e., to produce a text summary by extracting some sentences from the given text. We show experimentally that words that are parts of bigrams that repeat more than once in the text are good terms to describe the text's contents, and so are also so-called maximal frequent sentences. We also show that the frequency of the term as term weight gives good results (while we only count the occurrences of a term in repeating bigrams).

show abstract

A Term Weighting Method Based on Lexical Chain for Automatic Summarization

Cited by 18 publications

References 1 publication

TR-LDA: A Cascaded Key-Bigram Extractor for Microblog Summarization

TR-LDA: A Cascaded Key-Bigram Extractor for Microblog Summarization

Text Summarization by Sentence Extraction Using Unsupervised Learning

Terms Derived from Frequent Sequences for Extractive Text Summarization

Contact Info

Product

Resources

About