Lexical data augmentation for sentiment analysis

Xiang, Rong; Chersoni, Emmanuele; Lu, Qin; Huang, Chu‐Ren; Li, Wenjie; Long, Yunfei

doi:10.1002/asi.24493

Cited by 25 publications

(18 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…It has undergone instruction fine-tuning on the basis of LoRA (Hu et al, 2021), and the constructed prompt's instruction is "Please determine whether the following content expresses a positive sentiment, and output 0 or 1 -->," with specific content included as input, corresponding labels as output. In contrast, the comparative methods encompass the commonly used techniques such as DICT (Zhang et al, 2015), EDA (Wei & Zou, 2019), and PLSDA (Xiang et al, 2021). The metrics is the accuracy (%).…”

Section: Experiments Settingmentioning

confidence: 99%

“…The manual aggregation of domain‐specific training datasets can be a labor‐intensive and costly endeavor. This challenge underscores the critical need for an exploration into data augmentation strategies, which has secured a broad footprint in the realm of image processing, typically implemented through transformations (Alqudah et al, 2023; Niu et al, 2023; Shorten & Khoshgoftaar, 2019; Xiang et al, 2021).…”

Section: Introductionmentioning

confidence: 99%

“…Instead, unique tasks dictate the introduction of bespoke enhancement methodologies. This could span the spectrum from commonsense reasoning and automatic translation to text comprehension and generation, extending to entity extraction and sentiment analysis (Abonizio et al, 2021; Hsu et al, 2021; Liesting et al, 2021; Liu et al, 2021; Shen et al, 2020; Xiang et al, 2021; Yang et al, 2020). Regarding sentiment analysis, popular augmentation strategies encompass synonym substitution (Zhang et al, 2015) and a technique known as easy data augmentation (EDA) (Wei & Zou, 2019).…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Will sentiment analysis need subculture? A new data augmentation approach

Wang,

He,

et al. 2024

Asso for Info Science & Tech

View full text Add to dashboard Cite

Nowadays, the omnipresence of the Internet has fostered a subculture that congregates around the contemporary milieu. The subculture artfully articulates the intricacies of human feelings by ardently pursuing the allure of novelty, a fact that cannot be disregarded in the sentiment analysis. This paper aims to enrich data through the lens of subculture, to address the insufficient training data faced by sentiment analysis. To this end, a new approach of subculture‐based data augmentation (SCDA) is proposed, which engenders enhanced texts for each training text by leveraging the creation of specific subcultural expression generators. The extensive experiments attest to the effectiveness and potential of SCDA. The results also shed light on the phenomenon that disparate subcultural expressions elicit varying degrees of sentiment stimulation. Moreover, an intriguing conjecture arises, suggesting the linear reversibility of certain subcultural expressions.

show abstract

Section: Experiments Settingmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Will sentiment analysis need subculture? A new data augmentation approach

Wang,

He,

et al. 2024

Asso for Info Science & Tech

View full text Add to dashboard Cite

show abstract

“…The UDA approach offers a new way of effectively detecting and utilizing noise in unlabeled data (Xie et al, 2020). It unifies the learning process between labeled and unlabeled data through a specific form of contrastive learning and corresponding data augmentation (Xiang et al, 2021), and achieves superior results in image and regular text classification (Xie et al, 2020). The potential of UDA for short BSCs is worth investigating.…”

Section: Semi-supervised Learningmentioning

confidence: 99%

Aspect sentiment mining of short bullet screen comments from online TV series

Liu

Zhou

Gao

et al. 2023

Asso for Info Science & Tech

View full text Add to dashboard Cite

Bullet screen comments (BSCs) are user‐generated short comments that appear as real‐time overlays on many video platforms, expressing the audience opinions and emotions about different aspects of the ongoing video. Unlike traditional long comments after a show, BSCs are often incomplete, ambiguous in context, and correlated over time. Current studies in sentiment analysis of BSCs rarely address these challenges, motivating us to develop an aspect‐level sentiment analysis framework. Our framework, BSCNET, is a pre‐trained language encoder‐based deep neural classifier designed to enhance semantic understanding. A novel neighbor context construction method is proposed to uncover latent contextual correlation among BSCs over time, and we also incorporate semi‐supervised learning to reduce labeling costs. The framework increases F1 (Macro) and accuracy by up to 10% and 10.2%, respectively. Additionally, we have developed two novel downstream tasks. The first is noisy BSCs identification, which reached F1 (Macro) and accuracy of 90.1% and 98.3%, respectively, through fine‐tuning the BSCNET. The second is the prediction of future episode popularity, where the MAPE is reduced by 11%–19.0% when incorporating sentiment features. Overall, this study provides a methodology reference for aspect‐level sentiment analysis of BSCs and highlights its potential for viewing experience or forthcoming content optimization.

show abstract

“…Thus many methods have been tried out in research so far. Among them are methods for swapping [ 59 ], deleting [ 16 , 38 ], inducing spelling mistakes [ 6 , 10 ], paraphrasing [ 28 ], and replacing of synonyms [ 25 , 61 , 66 ], close embeddings [ 2 , 58 ] and words predicted by a language model [ 11 , 18 , 24 ] on word-level. On a broader level, methods which change the dependency tree [ 45 , 62 ], perform round-trip-translation [ 27 , 47 ], or interpolate the input instances [ 9 , 65 ] are used.…”

Section: Related Workmentioning

confidence: 99%

Data augmentation in natural language processing: a novel text generation approach for long and short text classifiers

Bayer

Kaufhold

Buchhold³

et al. 2022

Int. J. Mach. Learn. & Cyber.

View full text Add to dashboard Cite

In many cases of machine learning, research suggests that the development of training data might have a higher relevance than the choice and modelling of classifiers themselves. Thus, data augmentation methods have been developed to improve classifiers by artificially created training data. In NLP, there is the challenge of establishing universal rules for text transformations which provide new linguistic patterns. In this paper, we present and evaluate a text generation method suitable to increase the performance of classifiers for long and short texts. We achieved promising improvements when evaluating short as well as long text tasks with the enhancement by our text generation method. Especially with regard to small data analytics, additive accuracy gains of up to 15.53% and 3.56% are achieved within a constructed low data regime, compared to the no augmentation baseline and another data augmentation technique. As the current track of these constructed regimes is not universally applicable, we also show major improvements in several real world low data tasks (up to +4.84 F1-score). Since we are evaluating the method from many perspectives (in total 11 datasets), we also observe situations where the method might not be suitable. We discuss implications and patterns for the successful application of our approach on different types of datasets.

show abstract

Lexical data augmentation for sentiment analysis

Cited by 25 publications

References 34 publications

Will sentiment analysis need subculture? A new data augmentation approach

Will sentiment analysis need subculture? A new data augmentation approach

Aspect sentiment mining of short bullet screen comments from online TV series

Data augmentation in natural language processing: a novel text generation approach for long and short text classifiers

Contact Info

Product

Resources

About