2011
DOI: 10.1016/j.eswa.2010.08.066
|View full text |Cite
|
Sign up to set email alerts
|

A comparative study of TF*IDF, LSI and multi-words for text classification

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
233
0
9

Year Published

2013
2013
2021
2021

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 552 publications
(243 citation statements)
references
References 22 publications
1
233
0
9
Order By: Relevance
“…fields. Our analyses as well as findings are consistent with the theoretical accounts that words and key terms, both single and multiword key terms, collectively represent the content of the text and that they can be analyzed statistically while maintaining semantic quality (Zhang, Yoshida, & Tang, 2011). The fact that we used TF-IDF as a key term selection method (hence the emphasis placed on the importance of a term in a document) and the advanced option is checked for multiword (hence the contextual information of individual words is captured) (see Figure 2), and later the multiword list is used to transform the entire corpus increases the reliability of co-occurrence analysis to reveal the structure of the scientific literature and, in turn, the fields under investigation.…”
Section: Discussionsupporting
confidence: 75%
“…fields. Our analyses as well as findings are consistent with the theoretical accounts that words and key terms, both single and multiword key terms, collectively represent the content of the text and that they can be analyzed statistically while maintaining semantic quality (Zhang, Yoshida, & Tang, 2011). The fact that we used TF-IDF as a key term selection method (hence the emphasis placed on the importance of a term in a document) and the advanced option is checked for multiword (hence the contextual information of individual words is captured) (see Figure 2), and later the multiword list is used to transform the entire corpus increases the reliability of co-occurrence analysis to reveal the structure of the scientific literature and, in turn, the fields under investigation.…”
Section: Discussionsupporting
confidence: 75%
“…This model can be extended by using n-grams [2,9,44]. We use the bag-of-words model with a combination of unigrams and bigrams.…”
Section: Puls Overviewmentioning
confidence: 99%
“…The authors used back-propagation neural networks for this purpose. Zhang et al (2011) presented some experimental evaluations of indexing methods on text classification and analyzed that presently we do not have a standard measure to assess the semantic and statistical qualities of text. Attia et al (2014) proposed a linguistic-based multi-view fuzzy ontology information retrieval model that allows the users to define all their linguistic terms according to their subjective view which helps in retrieving documents according to their linguistic term definitions not to our definitions.…”
Section: Background and Literature Reviewmentioning
confidence: 99%