2023
DOI: 10.3390/app13179766
|View full text |Cite
|
Sign up to set email alerts
|

Mitigating Class Imbalance in Sentiment Analysis through GPT-3-Generated Synthetic Sentences

Cici Suhaeni,
Hwan-Seung Yong

Abstract: In this paper, we explore the effectiveness of the GPT-3 model in tackling imbalanced sentiment analysis, focusing on the Coursera online course review dataset that exhibits high imbalance. Training on such skewed datasets often results in a bias towards the majority class, undermining the classification performance for minority sentiments, thereby accentuating the necessity for a balanced dataset. Two primary initiatives were undertaken: (1) synthetic review generation via fine-tuning of the Davinci base mode… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
14
0

Year Published

2023
2023
2025
2025

Publication Types

Select...
6
1
1

Relationship

1
7

Authors

Journals

citations
Cited by 9 publications
(14 citation statements)
references
References 40 publications
0
14
0
Order By: Relevance
“…This study is a continuation of our previous work [11], which addressed imbalanced sentiment analysis by fine-tuning the GPT-3 model to generate synthetic data. In addition to determining whether balancing with synthetic data through sentence-by-sentence generation can improve the classification performance, a comparison of the classification performance with that of fine-tuning-based text generation was also conducted.…”
Section: Introductionmentioning
confidence: 95%
See 1 more Smart Citation
“…This study is a continuation of our previous work [11], which addressed imbalanced sentiment analysis by fine-tuning the GPT-3 model to generate synthetic data. In addition to determining whether balancing with synthetic data through sentence-by-sentence generation can improve the classification performance, a comparison of the classification performance with that of fine-tuning-based text generation was also conducted.…”
Section: Introductionmentioning
confidence: 95%
“…In addition to determining whether balancing with synthetic data through sentence-by-sentence generation can improve the classification performance, a comparison of the classification performance with that of fine-tuning-based text generation was also conducted. From the aspect of classification models, the previous study utilized nine traditional machine-learning and deep-learning models [11]. In this study, we focus exclusively on five deep-learning models.…”
Section: Introductionmentioning
confidence: 99%
“…Quteineh et al [64] present a method combining GPT-2 with Monte Carlo Tree Search for textual data augmentation, significantly boosting classifier performance in active learning with small datasets. Suhaeni et al [65] explore using GPT-3 for generating synthetic reviews to address class imbalances in sentiment analysis, specifically for Coursera course reviews. It shows how synthetic data can enhance the balance and quality of training datasets, leading to improved sentiment classification model performance.…”
Section: Existing Research On Gpt's Use In Research Datamentioning
confidence: 99%
“…Literature in [31,35,51,52,56,57] could be attributed to this sub-category. • Text Data Expansion and Enhancement: This involves leveraging GPT to create new textual content and enhance existing datasets, thereby improving machine learning models' performance and addressing data scarcity [12,26,37,39,41,48,53,54,65].…”
mentioning
confidence: 99%
“…These inherently data-driven learning approaches need an extensive curated dataset and long training, and results are not always accurate and can be affected by bias due to potentially unbalanced training data [14,31]. As no single tool has been found to be sufficiently reliable on its own, some SA solutions use an ensemble approach by combining predictions from multiple models into hybrid tools to improve performance and achieve better accuracy [16,[32][33][34][35].…”
Section: Related Workmentioning
confidence: 99%