Mitigating Class Imbalance in Sentiment Analysis through GPT-3-Generated Synthetic Sentences

Suhaeni, Cici; Yong, Hwan-Seung

doi:10.3390/app13179766

Cited by 9 publications

(14 citation statements)

References 40 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This study is a continuation of our previous work [11], which addressed imbalanced sentiment analysis by fine-tuning the GPT-3 model to generate synthetic data. In addition to determining whether balancing with synthetic data through sentence-by-sentence generation can improve the classification performance, a comparison of the classification performance with that of fine-tuning-based text generation was also conducted.…”

Section: Introductionmentioning

confidence: 95%

“…In addition to determining whether balancing with synthetic data through sentence-by-sentence generation can improve the classification performance, a comparison of the classification performance with that of fine-tuning-based text generation was also conducted. From the aspect of classification models, the previous study utilized nine traditional machine-learning and deep-learning models [11]. In this study, we focus exclusively on five deep-learning models.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Enhancing Imbalanced Sentiment Analysis: A GPT-3-Based Sentence-by-Sentence Generation Approach

Suhaeni,

Yong

2024

Applied Sciences

Self Cite

View full text Add to dashboard Cite

This study addresses the challenge of class imbalance in sentiment analysis by utilizing synthetic data to balance training datasets. We introduce an innovative approach using the GPT-3 model’s sentence-by-sentence generation technique to generate synthetic data, specifically targeting underrepresented negative and neutral sentiments. Our method aims to align these minority classes with the predominantly positive sentiment class in a Coursera course review dataset, with the goal of enhancing the performance of sentiment classification. This research demonstrates that our proposed method successfully enhances sentiment classification performance, as evidenced by improved accuracy and F1-score metrics across five deep-learning models. However, when compared to our previous research utilizing fine-tuning techniques, the current method shows a relative shortfall. The fine-tuning approach yields better results in all models tested, indicating the importance of data novelty and diversity in synthetic data generation. In terms of the deep-learning model used for classification, the notable finding is the significant performance improvement of the Recurrent Neural Network (RNN) model compared to other models like CNN, LSTM, BiLSTM, and GRU, highlighting the impact of the model choice and architecture depth. This study emphasizes the critical role of synthetic data quality and strategic deep-learning model implementation in sentiment analysis. The results suggest that the careful consideration of training data and model attributes is vital for optimal sentiment classification.

show abstract

Section: Introductionmentioning

confidence: 95%

Section: Introductionmentioning

confidence: 99%

Enhancing Imbalanced Sentiment Analysis: A GPT-3-Based Sentence-by-Sentence Generation Approach

Suhaeni,

Yong

2024

Applied Sciences

Self Cite

View full text Add to dashboard Cite

show abstract

“…Quteineh et al [64] present a method combining GPT-2 with Monte Carlo Tree Search for textual data augmentation, significantly boosting classifier performance in active learning with small datasets. Suhaeni et al [65] explore using GPT-3 for generating synthetic reviews to address class imbalances in sentiment analysis, specifically for Coursera course reviews. It shows how synthetic data can enhance the balance and quality of training datasets, leading to improved sentiment classification model performance.…”

Section: Existing Research On Gpt's Use In Research Datamentioning

confidence: 99%

“…Literature in [31,35,51,52,56,57] could be attributed to this sub-category. • Text Data Expansion and Enhancement: This involves leveraging GPT to create new textual content and enhance existing datasets, thereby improving machine learning models' performance and addressing data scarcity [12,26,37,39,41,48,53,54,65].…”

mentioning

confidence: 99%

Generative Pre-Trained Transformer (GPT) in Research: A Systematic Review on Data Augmentation

Sufi

2024

Information

View full text Add to dashboard Cite

GPT (Generative Pre-trained Transformer) represents advanced language models that have significantly reshaped the academic writing landscape. These sophisticated language models offer invaluable support throughout all phases of research work, facilitating idea generation, enhancing drafting processes, and overcoming challenges like writer’s block. Their capabilities extend beyond conventional applications, contributing to critical analysis, data augmentation, and research design, thereby elevating the efficiency and quality of scholarly endeavors. Strategically narrowing its focus, this review explores alternative dimensions of GPT and LLM applications, specifically data augmentation and the generation of synthetic data for research. Employing a meticulous examination of 412 scholarly works, it distills a selection of 77 contributions addressing three critical research questions: (1) GPT on Generating Research data, (2) GPT on Data Analysis, and (3) GPT on Research Design. The systematic literature review adeptly highlights the central focus on data augmentation, encapsulating 48 pertinent scholarly contributions, and extends to the proactive role of GPT in critical analysis of research data and shaping research design. Pioneering a comprehensive classification framework for “GPT’s use on Research Data”, the study classifies existing literature into six categories and 14 sub-categories, providing profound insights into the multifaceted applications of GPT in research data. This study meticulously compares 54 pieces of literature, evaluating research domains, methodologies, and advantages and disadvantages, providing scholars with profound insights crucial for the seamless integration of GPT across diverse phases of their scholarly pursuits.

show abstract

“…These inherently data-driven learning approaches need an extensive curated dataset and long training, and results are not always accurate and can be affected by bias due to potentially unbalanced training data [14,31]. As no single tool has been found to be sufficiently reliable on its own, some SA solutions use an ensemble approach by combining predictions from multiple models into hybrid tools to improve performance and achieve better accuracy [16,[32][33][34][35].…”

Section: Related Workmentioning

confidence: 99%

Revealing People’s Sentiment in Natural Italian Language Sentences

Calvagna,

Tramontana,

Verga

2023

Computers

View full text Add to dashboard Cite

Social network systems are constantly fed with text messages. While this enables rapid communication and global awareness, some messages could be aptly made to hurt or mislead. Automatically identifying meaningful parts of a sentence, such as, e.g., positive or negative sentiments in a phrase, would give valuable support for automatically flagging hateful messages, propaganda, etc. Many existing approaches concerned with the study of people’s opinions, attitudes and emotions and based on machine learning require an extensive labelled dataset and provide results that are not very decisive in many circumstances due to the complexity of the language structure and the fuzziness inherent in most of the techniques adopted. This paper proposes a deterministic approach that automatically identifies people’s sentiments at the sentence level. The approach is based on text analysis rules that are manually derived from the way Italian grammar works. Such rules are embedded in finite-state automata and then expressed in a way that facilitates checking unstructured Italian text. A few grammar rules suffice to analyse an ample amount of correctly formed text. We have developed a tool that has validated the proposed approach by analysing several hundreds of sentences gathered from social media: hence, they are actual comments given by users. Such a tool exploits parallel execution to make it ready to process many thousands of sentences in a fraction of a second. Our approach outperforms a well-known previous approach in terms of precision.

show abstract

Mitigating Class Imbalance in Sentiment Analysis through GPT-3-Generated Synthetic Sentences

Cited by 9 publications

References 40 publications

Enhancing Imbalanced Sentiment Analysis: A GPT-3-Based Sentence-by-Sentence Generation Approach

Enhancing Imbalanced Sentiment Analysis: A GPT-3-Based Sentence-by-Sentence Generation Approach

Generative Pre-Trained Transformer (GPT) in Research: A Systematic Review on Data Augmentation

Revealing People’s Sentiment in Natural Italian Language Sentences

Contact Info

Product

Resources

About