Lecture Notes in Computer Science
DOI: 10.1007/978-3-540-77046-6_62
|View full text |Cite
|
Sign up to set email alerts
|

Keyword Extraction from a Single Document Using Centrality Measures

Abstract: Keywords characterize the topics discussed in a document. Extracting a small set of keywords from a single document is an important problem in text mining. We propose a hybrid structural and statistical approach to extract keywords. We represent the given document as an undirected graph, whose vertices are words in the document and the edges are labeled with a dissimilarity measure between two words, derived from the frequency of their co-occurrence in the document. We propose that central vertices in this gra… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
45
0
7

Publication Types

Select...
4
4

Relationship

0
8

Authors

Journals

citations
Cited by 63 publications
(52 citation statements)
references
References 4 publications
0
45
0
7
Order By: Relevance
“…Other early summarization systems such as FRUMP, SUMMONS, CIRCUS and SUMMARIST [47] [48] were based on the use of pre-defined patterns that are labor intensive. Patterns would trigger certain templates to be filled as the text is read [49] [59]. These nodes in the graph that are connected were thus a representation of relatedness characterized by the value of the cosine similarity of their corresponding sentences.…”
Section: Related Workmentioning
confidence: 99%
“…Other early summarization systems such as FRUMP, SUMMONS, CIRCUS and SUMMARIST [47] [48] were based on the use of pre-defined patterns that are labor intensive. Patterns would trigger certain templates to be filled as the text is read [49] [59]. These nodes in the graph that are connected were thus a representation of relatedness characterized by the value of the cosine similarity of their corresponding sentences.…”
Section: Related Workmentioning
confidence: 99%
“…The segment-term matrix can be directly submitted to a set of keyword extraction methods [8,9] or be used to generate a graph-based representation, which are used by another set of methods [10,11]. A graph is defined as G = V, E, W , in which V represents the set of vertices, E represents the set of edges among the vertices and W represents the weights of the edges.…”
Section: Preprocessing and Structuring Textual Documentmentioning
confidence: 99%
“…Here we evaluated 5 statistical methods to compute the scores of the terms: (i) Most Frequent (MF), (ii) Term Frequency -Inverse Sentence Frequency (TF-ISF) [8], (iii) Co-occurrence Statistical Information (CSI) [9], (iv) Eccentricity-Based [11] and (v) TextRank [10]. The first three methods consider solely the segment-term matrix and the last two methods consider a graph representation as input.…”
Section: Preprocessing and Structuring Textual Documentmentioning
confidence: 99%
See 1 more Smart Citation
“…This study aims to show that the network analysis method can be effectively used to analyze such data about customers' perception of brands 3 . Network analysis method has been chosen for several reasons.…”
mentioning
confidence: 99%