Proceedings of 4th International Conference on Data Management Technologies and Applications 2015
DOI: 10.5220/0005511900260037
|View full text |Cite
|
Sign up to set email alerts
|

A Study on Term Weighting for Text Categorization: A Novel Supervised Variant of tf.idf

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
34
0

Year Published

2016
2016
2024
2024

Publication Types

Select...
4
4

Relationship

0
8

Authors

Journals

citations
Cited by 35 publications
(34 citation statements)
references
References 18 publications
0
34
0
Order By: Relevance
“…where c(t,d) indicates the occurrence of term t appears in document d, and the denominator P i cðt i ; dÞ indicates the total number of terms in document d, D is the total number of documents in the dataset, and d t is the number of the documents a term t appeared in. Many researchers [26][27][28]36] tries to improve the performance of TF-IDF by proposing some modification to the original equation of TF-IDF. In the case of Bloom's taxonomy verb plays an important role to determine the level of the question.…”
Section: Modified Tf-idf (Term Weighting Methods Tfpos-idf)mentioning
confidence: 99%
See 1 more Smart Citation
“…where c(t,d) indicates the occurrence of term t appears in document d, and the denominator P i cðt i ; dÞ indicates the total number of terms in document d, D is the total number of documents in the dataset, and d t is the number of the documents a term t appeared in. Many researchers [26][27][28]36] tries to improve the performance of TF-IDF by proposing some modification to the original equation of TF-IDF. In the case of Bloom's taxonomy verb plays an important role to determine the level of the question.…”
Section: Modified Tf-idf (Term Weighting Methods Tfpos-idf)mentioning
confidence: 99%
“…It assigns scores to the importance of a word in a document, based on the lexical and morphological properties of the text. It is extensively used in many studies [25][26][27][28][29], and in question classification [3,6,20,23,[30][31][32] However, some researchers used TF-IDF as it is, while others proposed some enhancement to the way TF-IDF is calculated in order to improve it. This is because classical TF-IDF does not capture some useful information such as the impact of word distribution between various classes [6].…”
Section: Related Workmentioning
confidence: 99%
“…According to this scheme, the higher the concentration of high-frequency terms in the positive category than in the negative one, the greater the contribution to classification. Domeniconi et al (2015) propose a supervised variant of IDF called inverse document frequency excluding category (IDFEC). Similar to IDF, IDFEC penalizes frequent terms, but different form IDF it avoids penalizing those terms occurring in several documents belonging to the same class.…”
Section: Background and Related Workmentioning
confidence: 99%
“…Similar to IDF, IDFEC penalizes frequent terms, but different form IDF it avoids penalizing those terms occurring in several documents belonging to the same class. Another variant also proposed in [7] results from combining IDFEC and RF, resulting in the IDFEC B scheme. Table 1 shows the definitions of the main scores presented above using the following notation [7,14]:…”
Section: Background and Related Workmentioning
confidence: 99%
“…It would be difficult to find similar links due to that source units (e.g., opinions) and mapped units (e.g., fact of events) of links come from different domains (types of text). E.g., considering the following equation used for connecting two links (s, o), is the cosine similarity between the bag-of-words model of s using tf-idf weighting [30] and that model of s 2 , 0 ≤ a, b ≤ 1 are weights, θ is the threshold. It would be hard to compute weights a and b for finding similar links because we do not know the correlation between domain S and domain O.…”
Section: Composition For Implicit Referencesmentioning
confidence: 99%