Keyword Extraction from a Single Document Using Centrality Measures

Palshikar, Girish Keshav

doi:10.1007/978-3-540-77046-6_62

Cited by 63 publications

(52 citation statements)

References 4 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Other early summarization systems such as FRUMP, SUMMONS, CIRCUS and SUMMARIST [47] [48] were based on the use of pre-defined patterns that are labor intensive. Patterns would trigger certain templates to be filled as the text is read [49] [59]. These nodes in the graph that are connected were thus a representation of relatedness characterized by the value of the cosine similarity of their corresponding sentences.…”

Section: Related Workmentioning

confidence: 99%

Extractive Summarization Using Structural Syntax, Term Expansion and Refinement

Elhadi¹

2017

IJIS

View full text Add to dashboard Cite

This paper investigates a procedure developed and reports on experiments performed to studying the utility of applying a combined structural property of a text's sentences and term expansion using WordNet [1] and a local thesaurus [2] in the selection of the most appropriate extractive text summarization for a particular document. Sentences were tagged and normalized then subjected to the Longest Common Subsequence (LCS) algorithm [3] [4] for the selection of the most similar subset of sentences. Calculated similarity was based on LCS of pairs of sentences that make up the document. A normalized score was calculated and used to rank sentences. A selected top subset of the most similar sentences was then tokenized to produce a set of important keywords or terms. The produced terms were further expanded into two subsets using 1) WorldNet; and 2) a local electronic dictionary/thesaurus. The three sets obtained (the original and the expanded two) were then re-cycled to further refine and expand the list of selected sentences from the original document. The process was repeated a number of times in order to find the best representative set of sentences. A final set of the top (best) sentences was selected as candidate sentences for summarization. In order to verify the utility of the procedure, a number of experiments were conducted using an email corpus. The results were compared to those produced by human annotators as well as to results produced using some basic sentences similarity calculation method. Produced results were very encouraging and compared well to those of human annotators and Jacquard sentences similarity.

show abstract

Section: Related Workmentioning

confidence: 99%

Extractive Summarization Using Structural Syntax, Term Expansion and Refinement

Elhadi¹

2017

IJIS

View full text Add to dashboard Cite

show abstract

“…The segment-term matrix can be directly submitted to a set of keyword extraction methods [8,9] or be used to generate a graph-based representation, which are used by another set of methods [10,11]. A graph is defined as G = V, E, W , in which V represents the set of vertices, E represents the set of edges among the vertices and W represents the weights of the edges.…”

Section: Preprocessing and Structuring Textual Documentmentioning

confidence: 99%

“…Here we evaluated 5 statistical methods to compute the scores of the terms: (i) Most Frequent (MF), (ii) Term Frequency -Inverse Sentence Frequency (TF-ISF) [8], (iii) Co-occurrence Statistical Information (CSI) [9], (iv) Eccentricity-Based [11] and (v) TextRank [10]. The first three methods consider solely the segment-term matrix and the last two methods consider a graph representation as input.…”

Section: Preprocessing and Structuring Textual Documentmentioning

confidence: 99%

“…These methods does not require labeled documents and does not need to analyze the entire document collection, i.e., they are "domain independent". Examples of these methods are: TF-ISF (Term Frequency -Inverse Sentence Frequency) [8], CSI (Co-occurrence Statistical Information) [9], TextRank [10], and Eccentricity-Based [11]. The keyword extraction from single documents is very useful for i) large collections, in which the load of the entire collection in memory to extract the keywords is sometimes impossible, and ii) incremental collections, in which the analysis of the entire collection to extract keywords for each new document is unfeasible.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Analysis of Domain Independent Statistical Keyword Extraction Methods for Incremental Clustering

Rossi¹,

Marcacini²,

Rezende³

2014

L&NLM

View full text Add to dashboard Cite

-Incremental clustering is a very useful approach to organize dynamic text collections. Due to the time/space restrictions for incremental clustering, the textual documents must be preprocessed to maintain only their most important information. Domain independent statistical keyword extraction methods are useful in this scenario, since they analyze only the content of each document individually instead of all document collection, are fast and language independent. However, different methods have different assumptions about the properties of keywords in a text, and different methods extract different set of keywords. Different ways to structure a textual document for keyword extraction can also modify the set of extracted keywords. Furthermore, extracting a small number of keywords might degrade the incremental clustering quality and a large number of keywords might increase the clustering process speed. In this article we analyze different ways to structure a textual document for keyword extraction, different domain independent keyword extraction methods, and the impact of the number of keywords on the incremental clustering quality. We also define a framework for domain independent statistical keyword extraction which allows the user set different configurations in each step of the framework. This allows the user tunes the automatic keyword extraction according to its needs or some evaluation measure. A thorough experimental evaluation with several textual collections showed that the domain independent statistical keyword extraction methods obtains competitive results to the use of all terms or even selecting terms analyzing all the text collection. This is a promising evidence that favors computationally efficient methods for preprocessing in text streams or large textual collections.

show abstract

“…This study aims to show that the network analysis method can be effectively used to analyze such data about customers' perception of brands 3 . Network analysis method has been chosen for several reasons.…”

mentioning

confidence: 99%

Mining Texts to Understand Customers' Image of Brands

Ahn

2013

ijecs

View full text Add to dashboard Cite

Text mining is becoming increasingly important in understanding customers and markets these days. This paper presents a method of mining texts about customer sentiments using a network analysis technique. A data set collected about two global mobile device manufactures were used for testing the method. The analysis results show that the method can be effectively used to extract key sentiments in the customers' texts.

show abstract

Keyword Extraction from a Single Document Using Centrality Measures

Cited by 63 publications

References 4 publications

Extractive Summarization Using Structural Syntax, Term Expansion and Refinement

Extractive Summarization Using Structural Syntax, Term Expansion and Refinement

Analysis of Domain Independent Statistical Keyword Extraction Methods for Incremental Clustering

Mining Texts to Understand Customers' Image of Brands

Contact Info

Product

Resources

About