TweetEval: Unified Benchmark and Comparative Evaluation for Tweet Classification

Barbieri, Francesco; Camacho-Collados, José; Neves, Leonardo; Espinosa-Anke, Luis

doi:10.48550/arxiv.2010.12421

Cited by 46 publications

(70 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We use Stanford Sentiment Treebank (SST-2) [28], Text Retrieval Conference (TREC-6) [29], TweetEval [30] and BBC News * datasets for our study. These datasets cover both binary and multi-class classification.…”

Section: Dataset Descriptionmentioning

confidence: 99%

On Sensitivity of Deep Learning Based Text Classification Algorithms to Practical Input Perturbations

Miyajiwala,

Ladkat,

Jagadale

et al. 2022

Preprint

View full text Add to dashboard Cite

Text classification is a fundamental Natural Language Processing task that has a wide variety of applications, where deep learning approaches have produced state-of-the-art results. While these models have been heavily criticized for their black-box nature, their robustness to slight perturbations in input text has been a matter of concern. In this work, we carry out a data-focused study evaluating the impact of systematic practical perturbations on the performance of the deep learning based text classification models like CNN, LSTM, and BERT-based algorithms. The perturbations are induced by the addition and removal of unwanted tokens like punctuation and stop-words that are minimally associated with the final performance of the model. We show that these deep learning approaches including BERT are sensitive to such legitimate input perturbations on four standard benchmark datasets SST2, TREC-6, BBC News, and tweet eval. We observe that BERT is more susceptible to the removal of tokens as compared to the addition of tokens. Moreover, LSTM is slightly more sensitive to input perturbations as compared to CNN based model. The work also serves as a practical guide to assessing the impact of discrepancies in train-test conditions on the final performance of models.

show abstract

Section: Dataset Descriptionmentioning

confidence: 99%

On Sensitivity of Deep Learning Based Text Classification Algorithms to Practical Input Perturbations

Miyajiwala,

Ladkat,

Jagadale

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…The Python code for the RoBERTa framework that we applied is adapted from Barbieri et al (2020a) and is available at https://huggingface.co/cardiffnlp/twitter-roberta-basesentiment.…”

Section: Embedding Learning Via Robertamentioning

confidence: 99%

Sentiment Analysis and Effect of COVID-19 Pandemic using College SubReddit Data

Yan¹,

Liu²

2021

Preprint

View full text Add to dashboard Cite

The COVID-19 pandemic has affected societies and human health and well-being in various ways. In this study, we collected Reddit data from 2019 (pre-pandemic) and 2020 (pandemic) from the subreddits communities associated with 8 universities, applied natural language processing (NLP) techniques, and trained graphical neural networks with social media data, to study how the pandemic has affected people's emotions and psychological states compared to the pre-pandemic era. Specifically, we first applied a pre-trained Robustly Optimized BERT pre-training approach (RoBERTa) to learn embedding from the semantic information of Reddit messages and trained a graph attention network (GAT) for sentiment classification. The usage of GAT allows us to leverage the relational information among the messages during training. We then applied subgroup-adaptive model stacking to combine the prediction probabilities from RoBERTa and GAT to yield the final classification on sentiment. With the manually labeled and model-predicted sentiment labels on the collected data, we applied a generalized linear mixed-effects model to estimate the effects of pandemic and online teaching on people's sentiment in a statistically significant manner. The results suggest the odds of negative sentiments in 2020 is 14.6% higher than the odds in 2019 (p-value ă 0.001), and the odds of negative sentiments are 41.6% higher with in-person teaching than with online teaching in 2020 (p-value " 0.037) in the studied population.

show abstract

“…BERT (Devlin et al, 2018) based models have achieved state of the art performance in many downstream tasks due to their superior contextualized representations of language, providing true bidirectional context to word embeddings. We will use the sentiment analysis model from (Barbieri et al, 2020), trained on a large corpus of English Tweets (60 million Tweets) for initializing our algorithm. We will refer to the sentiment analysis model from (Barbieri et al, 2020) as the TweetEval model in the remainder of the paper.…”

Section: Related Workmentioning

confidence: 99%

“…We will use the sentiment analysis model from (Barbieri et al, 2020), trained on a large corpus of English Tweets (60 million Tweets) for initializing our algorithm. We will refer to the sentiment analysis model from (Barbieri et al, 2020) as the TweetEval model in the remainder of the paper. The TweetEval model is built on top of an English RoBERTa (Liu et al, 2019) model.…”

Section: Related Workmentioning

confidence: 99%

Unsupervised Self-Training for Sentiment Analysis of Code-Switched Data

Gupta¹,

Menghani²,

Rallabandi³

et al. 2021

Preprint

View full text Add to dashboard Cite

Sentiment analysis is an important task in understanding social media content like customer reviews, Twitter and Facebook feeds etc. In multilingual communities around the world, a large amount of social media text is characterized by the presence of Code-Switching. Thus, it has become important to build models that can handle code-switched data. However, annotated code-switched data is scarce and there is a need for unsupervised models and algorithms. We propose a general framework called Unsupervised Self-Training and show its applications for the specific use case of sentiment analysis of code-switched data. We use the power of pre-trained BERT models for initialization and fine-tune them in an unsupervised manner, only using pseudo labels produced by zero-shot transfer. We test our algorithm on multiple code-switched languages and provide a detailed analysis of the learning dynamics of the algorithm with the aim of answering the question -'Does our unsupervised model understand the Code-Switched languages or does it just learn its representations?'. Our unsupervised models compete well with their supervised counterparts, with their performance reaching within 1-7% (weighted F1 scores) when compared to supervised models trained for a two class problem.

show abstract

TweetEval: Unified Benchmark and Comparative Evaluation for Tweet Classification

Cited by 46 publications

References 9 publications

On Sensitivity of Deep Learning Based Text Classification Algorithms to Practical Input Perturbations

On Sensitivity of Deep Learning Based Text Classification Algorithms to Practical Input Perturbations

Sentiment Analysis and Effect of COVID-19 Pandemic using College SubReddit Data

Unsupervised Self-Training for Sentiment Analysis of Code-Switched Data

Contact Info

Product

Resources

About