2004
DOI: 10.1142/s0218213004001466
|View full text |Cite
|
Sign up to set email alerts
|

Keyword Extraction From a Single Document Using Word Co-Occurrence Statistical Information

Abstract: We present a new keyword extraction algorithm that applies to a single document without using a corpus. Frequent terms are extracted first, then a set of co-occurrences between each term and the frequent terms, i.e., occurrences in the same sentences, is generated. Co-occurrence distribution shows importance of a term in the document as follows. If the probability distribution of co-occurrence between term a and the frequent terms is biased to a particular subset of frequent terms, then term a is likely to be … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
302
0
8

Year Published

2004
2004
2018
2018

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 597 publications
(311 citation statements)
references
References 8 publications
1
302
0
8
Order By: Relevance
“…TFIDF measure [16], [17] is a very popular method, which extracts keywords that frequently appear in a document, but don't appear frequently in the remainder of the corpus. Matsuo and Ishizuka [18] developed a word cooccurrence method to extract keywords from a single document, and achieved high performance comparable to TFIDF. Yang et al [16] combined TFIDF and word co-occurrence features together to solve this problem.…”
Section: Chinese Abstract Extractionmentioning
confidence: 99%
“…TFIDF measure [16], [17] is a very popular method, which extracts keywords that frequently appear in a document, but don't appear frequently in the remainder of the corpus. Matsuo and Ishizuka [18] developed a word cooccurrence method to extract keywords from a single document, and achieved high performance comparable to TFIDF. Yang et al [16] combined TFIDF and word co-occurrence features together to solve this problem.…”
Section: Chinese Abstract Extractionmentioning
confidence: 99%
“…Like Information Extraction approaches, BOA aims to detect entities in text. Three main categories of natural language processing (NLP) tools play a central role during the extraction of knowledge from text: Keyphrase Extraction (KE) algorithms aim to detect multi-word units that capture the essence of a document [14,13]. Named Entity Recognition (NER) approaches try to discover instances of predefined classes of entities [20,8].…”
Section: Related Workmentioning
confidence: 99%
“…On the other hand, such documents are provided by digital means, so, at least, they are suitable for automatic processing. In fact, there are plenty and very different Natural Language Processing (NLP) techniques to help us to sort through this information overload (e.g., language identification [1][2], document clustering [3][4], keyword extraction [5] [6] or text summarization [7]). …”
Section: Introductionmentioning
confidence: 99%
“…Several don't require training [6] but others do [1][2][3][4][5] [7]. Some rely only on statistical information [1][2][3][4] [6] and others employ complex linguistic data [5] [7].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation