DOI: 10.1007/978-3-540-73499-4_63
|View full text |Cite
|
Sign up to set email alerts
|

Statistical Identification of Key Phrases for Text Classification

Abstract: Abstract.Algorithms for text classification generally involve two stages, the first of which aims to identify textual elements (words and/or phrases) that may be relevant to the classification process. This stage often involves an analysis of the text that is both language-specific and possibly domain-specific, and may also be computationally costly. In this paper we examine a number of alternative keyword-generation methods and phrase-construction strategies that identify key words and phrases by simple, lang… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
18
0

Publication Types

Select...
4
1

Relationship

2
3

Authors

Journals

citations
Cited by 15 publications
(18 citation statements)
references
References 8 publications
0
18
0
Order By: Relevance
“…Accuracy figures, describing the proportion of correctly classified "unseen" documents, were obtained using the Ten-fold Cross Validation (TCV). A support threshold value of 0.1% and a Lower Noise Threshold (LNT) value of 0.2% were used, as suggested in [6]. A confidence threshold value of 50% was used (as proposed in the published evaluations of a number of associative classification studies [5,15,28]).…”
Section: Resultsmentioning
confidence: 99%
See 4 more Smart Citations
“…Accuracy figures, describing the proportion of correctly classified "unseen" documents, were obtained using the Ten-fold Cross Validation (TCV). A support threshold value of 0.1% and a Lower Noise Threshold (LNT) value of 0.2% were used, as suggested in [6]. A confidence threshold value of 50% was used (as proposed in the published evaluations of a number of associative classification studies [5,15,28]).…”
Section: Resultsmentioning
confidence: 99%
“…the first k words for each predefined class) that are selected from the ordered list of potential significant words (in a descending manner based on their contribution value) are defined to be significant words. In [6] the authors (based on the above definitions) propose a statistical "bag of phrases" (DR) approach for TC, namely DelSNcontGO: phrases are Delimited by stop marks (S) and/or noise words (N), and (as phrase contents) made up of sequences of one or more significant words (G) and ordinary words (O); sequences of ordinary words delimited by stop marks and/or noise words that do not include at least one significant word (in the contents) are ignored. The experimental results presented in [6] show that DelSNcontGO performs well with respect to the accuracy of classification.…”
Section: Significant Words (G)mentioning
confidence: 99%
See 3 more Smart Citations