2012
DOI: 10.1007/s00521-012-1056-5
|View full text |Cite
|
Sign up to set email alerts
|

Improving the precision-recall trade-off in undersampling-based binary text categorization using unanimity rule

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
12
0

Year Published

2014
2014
2023
2023

Publication Types

Select...
4
2
1

Relationship

1
6

Authors

Journals

citations
Cited by 21 publications
(12 citation statements)
references
References 44 publications
0
12
0
Order By: Relevance
“…Such imbalance poses a challenge for categorization, especially when the classes have a high degree of overlap [31]. One possible solution for this problem is balancing of the training-set or re-sampling, [5,10,39]. In a previous paper, we demonstrate that classifiers trained on balanced data perform better, on average, than classifiers trained using the original distribution of labels in the corpus [8].…”
Section: Puls Overviewmentioning
confidence: 98%
“…Such imbalance poses a challenge for categorization, especially when the classes have a high degree of overlap [31]. One possible solution for this problem is balancing of the training-set or re-sampling, [5,10,39]. In a previous paper, we demonstrate that classifiers trained on balanced data perform better, on average, than classifiers trained using the original distribution of labels in the corpus [8].…”
Section: Puls Overviewmentioning
confidence: 98%
“…The data-level approach is based on various re-sampling techniques, [2]. Some re-sampling techniques applied to the text classification task are described in [6,4,18]. Two approaches to re-sampling are oversampling, i.e., adding more instances of the minor classes into the training set, and under-sampling, i.e., removing instances of the major classes from the training set, [11].…”
Section: Related Workmentioning
confidence: 99%
“…For each classifier, the best threshold is trained on one random, originally-distributed development set; → and ∪ denote, respectively, two-stage and union combining methods, described in Section 6. 3±0.9 21.9±0 6. 19.7±0.6 31.5±0.5 22.4±0.6 26.2±0.5 NB+BNS 34.2±1.1 16.6±0.6 15.8±0.5 33.1±0.7 13.4±0.4 19.0±0.5 SVM+IG 31.9±1.3 59.2±1.1 37.1±1.2 30.5±0.4 72.7±0.6 42.9±0.4 SVM+BNS 32.7±0.9 55.2±1.0 36.2±0.7 30.1±0.5 70.8±0.6 42.2±0.5 Rote 35.0±0.8 67.6±1.0 43.8±0.8 42.4±0.6 64.2±0.4 51.1±0.5 Rote→NB+BNS 51.5±0.9 33.6±0.4 36.1±0.4 57.6±0.6 39.1±0.4 46.6±0.4 NB+BNS→Rote 49.7±1.0 24.0±0.2 26.9±0.3 53.3±0.4 23.7±0.3 32.8±0.3 Rote ∪ NB+BNS 59.2±0.9 25.4±0.3 30.7±0.3 64.3±0.5 26.2±0.3 37.2±0.3 Rote→NB+IG 51.8±0.9 39.8±0.6 41.5±0.6 59.1±0.5 47.3±0.4 52.5±0.4 NB+IG→Rote 48.7±1.0 31.5±0.5 33.4±0.4 53.0±0.5 36.3±0.3 43.1±0.3 Rote ∪ NB+IG 57.2±0.9 32.7±0.4 37.3±0.4 63.2±0.5 38.1±0.3 47.5±0.4 Rote→SVM+BNS 48.2±1.0 67.5±1.0 54.7±0.9 53.7±0.5 70.1±0.3 60.8±0.4 SVM+BNS→Rote 48.0±1.1 63.0±1.0 52.6±1.0 50.2±0.4 70.8±0.4 58.7±0.4 Rote ∪ SVM+BNS 54.0±0.9 62.0±0.8 56.1±0.8 58.5±0.4 68.2±0.3 63.0±0.3 Rote→SVM+IG 46.2±1.0 73.7±0.8 55.1±0.8 52.5±0.5 75.9±0.4 62.0±0.4 SVM+IG→Rote 47.0±1.2 67.7±0.9 53.7±1.1 49.9±0.3 73.9±0.3 59.6±0.3 Rote ∪ SVM+IG 52.2±1.1 66.3±0.8 56.9±0.9 57.7±0.4 71.1±0.3 63.7±0.4…”
mentioning
confidence: 97%
“…To evaluate the performance of the DL classification, the F1 score was used as an evaluation metric [108], and it was calculated by Equation (11). In Equation (11), Precision, which is also called user accuracy, denotes the ratio of the number of correctly classified pixels to the number of pixels in the category that represents the classification result; Recall, which is also called producer accuracy, denotes the ratio of the number of correctly classified pixels to the actual number of pixels in the category [109].…”
Section: Accuracy Assessmentmentioning
confidence: 99%