2022
DOI: 10.35377/saucis...1070822
|View full text |Cite
|
Sign up to set email alerts
|

Classification of Imbalanced Offensive Dataset – Sentence Generation for Minority Class with LSTM

Abstract: The classification of documents is one of the problems studied since ancient times and still continues to be studied. With social media becoming a part of daily life and its misuse, the importance of text classification has started to increase. This paper investigates the effect of data augmentation with sentence generation on classification performance in an imbalanced dataset. We propose an LSTM based sentence generation method, Term Frequency-Inverse Document Frequency (TF-IDF) and Word2vec and apply Logist… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
6

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(4 citation statements)
references
References 59 publications
0
4
0
Order By: Relevance
“…Lastly, Ekinci [39] performed a comparative study of imbalanced offensive data classification using an LSTM-based sentence generation method. Various classifiers were trained using TF-IDF and Word2vec for text representation, demonstrating the value of sentence generation methods in handling imbalanced sentiment analysis tasks.…”
Section: Imbalanced Sentiment Analysismentioning
confidence: 99%
“…Lastly, Ekinci [39] performed a comparative study of imbalanced offensive data classification using an LSTM-based sentence generation method. Various classifiers were trained using TF-IDF and Word2vec for text representation, demonstrating the value of sentence generation methods in handling imbalanced sentiment analysis tasks.…”
Section: Imbalanced Sentiment Analysismentioning
confidence: 99%
“…Skip-gram which is an n-gram based model is a learning model in Word2vec [38]. Skip-gram model realized a neural network (NN) architecture and has three layers namely input, projection, and output.…”
Section: Extraction Of Word Embeddingsmentioning
confidence: 99%
“…In align previous research, this study proposes two additional classifiers, namely: k-Nearest Neighbors (KNN) and Long Short-Term Memory (LSTM), which will be tested on the Plant-Disease Relation (PDR) dataset. The main reason for using KNN and LSTM is that these algorithms are also proven to be used to solve unbalanced class problems like what was done by [38], [39], [40], [41]. Furthermore, reference [33] does not work on the KNN and LSTM algorithms, who used Linear SVC, RBF SVM, DTC, RF, LR, and MNB for multi-class text classification tasks.…”
Section: Introductionmentioning
confidence: 99%