2010
DOI: 10.1016/j.patrec.2010.03.012
|View full text |Cite
|
Sign up to set email alerts
|

Analytical evaluation of term weighting schemes for text categorization

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

1
29
1
2

Year Published

2010
2010
2019
2019

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 51 publications
(33 citation statements)
references
References 19 publications
1
29
1
2
Order By: Relevance
“…Mmutual Information method is used as a method for evaluating the importance of terms (Mmutual Information) [20]; 3) Classification of texts according to scientific areas. The method of k nearest neighbour (kNN) is used for text classification [ 20], [21]; Working with the text files in the corpus with the purpose to perform statistical calculations requires the following preliminary steps: 1) To pre-translate to .txt format the files of different formats (pdf, doc,. docx) in the corpus; 2) To delete all hyphenation beforehand; 3) To perform lemmatization of all the text files in the corpus, to delete all punctuation marks, to change all uppercase letters to lowercase letters.…”
Section: The Procedures Of the University's Information Resources mentioning
confidence: 99%
“…Mmutual Information method is used as a method for evaluating the importance of terms (Mmutual Information) [20]; 3) Classification of texts according to scientific areas. The method of k nearest neighbour (kNN) is used for text classification [ 20], [21]; Working with the text files in the corpus with the purpose to perform statistical calculations requires the following preliminary steps: 1) To pre-translate to .txt format the files of different formats (pdf, doc,. docx) in the corpus; 2) To delete all hyphenation beforehand; 3) To perform lemmatization of all the text files in the corpus, to delete all punctuation marks, to change all uppercase letters to lowercase letters.…”
Section: The Procedures Of the University's Information Resources mentioning
confidence: 99%
“…Bag of word is mainly used for document representation in each text categorization and information retrieval. An important part of the text categorization systems using the bag of word representation therefore known as term-weighting scheme that is responsible for deciding while relevant term is for describing the content of a document [28,29,30,31]. Term-weighting schemes be term frequency, where the value of a word in a document is given by its frequency of occurrence in the document.Even though, as a result of capturing statistical data from the original document provides simplicity of vector space model.…”
Section: Introductionmentioning
confidence: 99%
“…In order to clarify the major differences among weighting schemes that lead to inconsistent performances on different categories, a common analysis framework is recently developed by the authors of this study [15]. It is shown that the term weights can be expressed as a function of two term specific parameters namely, the term occurrence probability in the category under concern (p + ) and in its complement (p − ) [15].…”
Section: Introductionmentioning
confidence: 99%
“…It is shown that the term weights can be expressed as a function of two term specific parameters namely, the term occurrence probability in the category under concern (p + ) and in its complement (p − ) [15]. The weight expressions are also presented in the form of contour lines that are curves along which the term weighting schemes provide constant values to illustrate the distribution of equally weighted terms.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation