An Efficient Multi-Label SVM Classification Algorithm by Combining Approximate Extreme Points Method and Divide-and-Conquer Strategy

Sun, Zhongwei; Liu, Xiuyan; Hu, Keyong; Li, Zhuang; Liu, Jing

doi:10.1109/access.2020.3024745

“…However, the impact of the applied transformation on the SVM model is not highlighted and it could impact the accuracy of the model. [19] used the binary relevance transformation strategy to realize multi-label classification effectively. It applied the divide-andconquer strategy to divide the representative set into subsets and this can ensure that each representative subset contains a certain number of positive and negative instances.…”

Section: Background and Related Workmentioning

confidence: 99%

SVM transformations for Multi-labeled Topics

El-Sayed¹

2022

Preprint

0

View full text Add to dashboard Cite

The rapid growth of research papers poses a significant challenge for manual curation and interpretation for categorizing the articles into one or more related topics. The Support Vector Machine (SVM) model addressed this multi-labeled topic classification problem over different datasets, however, transforming the multi-labeled dataset into single-labeled ones to fit the SVM-required modeling plays a vital role in the performance. Applying different transformations over the problem leads to different behavior and accuracy for SVM over datasets. This paper employs the SVM model on a research multi-labeled articles dataset. It addresses different kinds of transformations to measure their behavior. It proposes the Least Class Classifier (LCC) technique that challenges the problem of the imbalanced datasets to achieve an equal chance for the minor classes. The results showed the label powerset transformation achieved the best average accuracy score across all topics classification. Both the label powerset and the binary relevance reached $\approx 90\%$ as the Hamming loss measurement for the fraction of topics that are incorrectly assigned. However, the binary relevance illustrated the best recall and precision balancing as per class classification measurements. Moreover, the proposed LCC technique showed promising results for increasing the recall calculations for the minor class in the imbalanced dataset.

show abstract

“…However, the impact of the applied transformation on the SVM model is not highlighted and it could impact the accuracy of the model. [19] used the binary relevance transformation strategy to realize multi-label classification effectively. It applied the divide-andconquer strategy to divide the representative set into subsets and this can ensure that each representative subset contains a certain number of positive and negative instances.…”

Section: Background and Related Workmentioning

confidence: 99%

SVM transformations for Multi-labeled Topics

El-Sayed¹

2022

Preprint

0

View full text Add to dashboard Cite

The rapid growth of research papers poses a significant challenge for manual curation and interpretation for categorizing the articles into one or more related topics. The Support Vector Machine (SVM) model addressed this multi-labeled topic classification problem over different datasets, however, transforming the multi-labeled dataset into single-labeled ones to fit the SVM-required modeling plays a vital role in the performance. Applying different transformations over the problem leads to different behavior and accuracy for SVM over datasets. This paper employs the SVM model on a research multi-labeled articles dataset. It addresses different kinds of transformations to measure their behavior. It proposes the Least Class Classifier (LCC) technique that challenges the problem of the imbalanced datasets to achieve an equal chance for the minor classes. The results showed the label powerset transformation achieved the best average accuracy score across all topics classification. Both the label powerset and the binary relevance reached $\approx 90\%$ as the Hamming loss measurement for the fraction of topics that are incorrectly assigned. However, the binary relevance illustrated the best recall and precision balancing as per class classification measurements. Moreover, the proposed LCC technique showed promising results for increasing the recall calculations for the minor class in the imbalanced dataset.

show abstract