2019
DOI: 10.1017/s1351324919000317
|View full text |Cite
|
Sign up to set email alerts
|

Term evaluation metrics in imbalanced text categorization

Abstract: This paper proposes four novel term evaluation metrics to represent documents in the text categorization where class distribution is imbalanced. These metrics are achieved from the revision of the four common term evaluation metrics: chi-square, information gain, odds ratio, and relevance frequency. While the common metrics require a balanced class distribution, our proposed metrics evaluate the document terms under an imbalanced distribution. They calculate the degree of relatedness of terms with respect to m… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(3 citation statements)
references
References 39 publications
0
3
0
Order By: Relevance
“…The second step is to convert the unstructured text data into computers languages. Before that, the text needs to be segmented [26]. Table 1 presents the text parts to be preprocessed and the processing measures.…”
Section: ) Text Information Preprocessingmentioning
confidence: 99%
See 1 more Smart Citation
“…The second step is to convert the unstructured text data into computers languages. Before that, the text needs to be segmented [26]. Table 1 presents the text parts to be preprocessed and the processing measures.…”
Section: ) Text Information Preprocessingmentioning
confidence: 99%
“…The principle of the DT algorithm is to make continuous decisions according to the characteristics of the data set, and finally classify the data lines to achieve the learning effect [29], as shown in Figure 7. The Equations (24)(25)(26)(27) are used to calculate and finally get the whole DT.…”
Section: ) Dt Algorithmmentioning
confidence: 99%
“…Secondly, the requirements for objective function parameters are high. Finally, in the classification process, a large amount of sample data is needed to support its calculation and optimize performance indicators [17][18].…”
Section: Hybrid Leapfrog Algorithmmentioning
confidence: 99%