Sequential targeting: A continual learning approach for data imbalance in text classification

Jang, Joel; Kim, Yoonjeon; Choi, Kyoung-Ho; Suh, Sungho

doi:10.1016/j.eswa.2021.115067

Cited by 24 publications

(11 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…At preprocessing step, Support Vector Machine (SVM) and SMOTE were used to balance the training set, and the classification was done with a logistic regression model. Joel Jang [9] used the Internet Movie Database (IMDB) to propose a new training architecture. They partitioned the training data into mutually exclusive subsets and then performed continual learning on a deep learning-based classifier to handle the class imbalance problem.…”

Section: Literature Surveymentioning

confidence: 99%

“…By giving minority categories more weight, class imbalanced learning approaches [7] hope to lessen the bias in model learning that favours majority categories. The various strategies for handling class imbalance in object classification can be categorized into different groups like data-level [8], algorithm-level [9], and hybrid approach [10]. Nevertheless, the majority of them use traditional imbalanced algorithms, which cannot handle the severely unbalanced dataset.…”

Section: Introductionmentioning

confidence: 99%

“…If BR P contains t−objects of class l i and we need to create T number of objects of class l i to make it a major class, then we apply T t times the Algorithm 3 for object O j . This algorithm generates the new object O j by the linear combination of X i and O j , as defined in Equation (9), where w gives the weightage of X i . The value of w is selected randomly in between [0.8, 1.0) and considers T t times to generate T t new objects of class l i .…”

mentioning

confidence: 99%

See 2 more Smart Citations

Rough-Fuzzy Based Synthetic Data Generation Exploring Boundary Region of Rough Sets to Handle Class Imbalance Problem

Naushin

Das

Nayak

et al. 2023

Axioms

View full text Add to dashboard Cite

Class imbalance is a prevalent problem that not only reduces the performance of the machine learning techniques but also causes the lacking of the inherent complex characteristics of data. Though the researchers have proposed various ways to deal with the problem, they have yet to consider how to select a proper treatment, especially when uncertainty levels are high. Applying rough-fuzzy theory to the imbalanced data learning problem could be a promising research direction that generates the synthetic data and removes the outliers. The proposed work identifies the positive, boundary, and negative regions of the target set using the rough set theory and removes the objects in the negative region as outliers. It also explores the positive and boundary regions of the rough set by applying the fuzzy theory to generate the samples of the minority class and remove the samples of the majority class. Thus the proposed rough-fuzzy approach performs both oversampling and undersampling to handle the imbalanced class problem. The experimental results demonstrate that the novel technique allows qualitative and quantitative data handling.

show abstract

Section: Literature Surveymentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

mentioning

confidence: 99%

See 1 more Smart Citation

Rough-Fuzzy Based Synthetic Data Generation Exploring Boundary Region of Rough Sets to Handle Class Imbalance Problem

Naushin

Das

Nayak

et al. 2023

Axioms

View full text Add to dashboard Cite

show abstract

“…Skewness in class samples is also very pervasive in many data mining applications namely text classification [7], risk management, detection of oil spills in satellite radar images of ocean surfaces, medical diagnosis, the detection of fraudulent calls, and spam mail recognition. Class imbalance problems are addressed by many techniques out of which two ways are mostly reported in literature [8].…”

Section: Introductionmentioning

confidence: 99%

An efficient convolutional neural network-based classifier for an imbalanced oral squamous carcinoma cell dataset

Mohapatra,

Tripathy

2024

IJ-AI

View full text Add to dashboard Cite

<span lang="EN-US">Imbalanced datasets pose a major challenge for the researchers while addressing machine learning tasks. In these types of datasets, samples of different classes are not in equal proportion rather the gap between the numbers of individual class samples is significantly large. Classification models perform better for datasets having equal proportion of data tuples in both the classes. But, in reality, the medical image datasets are skewed and hence are not always suitable for a model to achieve improved classification performance. Therefore, various techniques have been suggested in the literature to overcome this challenge. This paper applies oversampling technique on an imbalanced dataset and focuses on a customized convolutional neural network model that classifies the images into two categories: diseased and non-diseased. Outcome of the proposed model can assist the health experts in the detection of oral cancer. The proposed model exhibits 99% accuracy after data augmentation. Performance metrics such as precision, recall and F1-score values are very close to 1. In addition, statistical test is performed to validate the statistical significance of the model. It has been found that the proposed model is an optimised classifier in terms of number of network layers and number of neurons.</span>

show abstract

“…One of the methods used to overcome the imbalanced class problem is sampling. The sampling method modifies the distribution of data between the majority and minority classes in the training dataset to balance the amount of data for each class [17]. One of the sampling methods that is often used is the Synthetic Minority Oversampling Technique (SMOTE).…”

Section: Introductionmentioning

confidence: 99%

Effects of Oversampling Smote and Spectral Transformations in the Classification of Mango Cultivars Using Near-Infrared Spectroscopy

Khumaidi¹,

Raafi'udin²

2022

International Journal on Advanced Science, Engineering and Information Technology

View full text Add to dashboard Cite

Near-Infrared spectroscopy (NIR) is a non-destructive analytical technique that can provide chemical and structural information on samples in a speedy and accurate time. NIR has a wavelength of 750-2500 nm. However, the absorbance bands of the NIR spectrum are often broad, non-specific, and overlapping. NIR spectrum analysis requires a multivariate method which is very subjective to noise arising from instrumentation. There is no standard protocol in modeling for classification and prediction using NIR spectra. Several models have been developed with and without pre-processing techniques. The SMOTE technique can improve the model to predict all class responses accurately. This research contributes to creating a multiclass classification model for grouping mango cultivars by finding the best pre-processing technique and using SMOTE oversampling. The results of the four test scenarios on the model's performance built using the Support Vector Machine (SVM) that the best model is obtained using spectral transformations with LSNV and CLIP operations with 100% accuracy, precision, and recall values. The Decision Tree (DT) has the performance results in 100% model was obtained by using spectral transformation with LSNV, CLIP and SAVGOL operations with parameters {'deriv_order': 0,1, 2, 'filter_win': 11, 13, 'poly_order': 3}. Using SMOTE has better accuracy than without pre-processing, with an accuracy of 92% on SVM and 94% on DT. In comparison, the combination of SMOTE and Spectral Transformation gives classification results for SVM and DT with the same accuracy of 96%, better than using SMOTE only.

show abstract

Sequential targeting: A continual learning approach for data imbalance in text classification

Cited by 24 publications

References 29 publications

Rough-Fuzzy Based Synthetic Data Generation Exploring Boundary Region of Rough Sets to Handle Class Imbalance Problem

Rough-Fuzzy Based Synthetic Data Generation Exploring Boundary Region of Rough Sets to Handle Class Imbalance Problem

An efficient convolutional neural network-based classifier for an imbalanced oral squamous carcinoma cell dataset

Effects of Oversampling Smote and Spectral Transformations in the Classification of Mango Cultivars Using Near-Infrared Spectroscopy

Contact Info

Product

Resources

About