2022
DOI: 10.7717/peerj-cs.1039
|View full text |Cite
|
Sign up to set email alerts
|

RuSentiTweet: a sentiment analysis dataset of general domain tweets in Russian

Abstract: The Russian language is still not as well-resourced as English, especially in the field of sentiment analysis of Twitter content. Though several sentiment analysis datasets of tweets in Russia exist, they all are either automatically annotated or manually annotated by one annotator. Thus, there is no inter-annotator agreement, or annotation may be focused on a specific domain. In this article, we present RuSentiTweet, a new sentiment analysis dataset of general domain tweets in Russian. RuSentiTweet is current… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
13
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2

Relationship

2
3

Authors

Journals

citations
Cited by 6 publications
(14 citation statements)
references
References 51 publications
1
13
0
Order By: Relevance
“…Although several training datasets are available for sentiment analysis of texts from social networks for the Russian language ( Kotelnikov, 2021 ; Smetanin, 2020 ), only two consist of common-domain texts, are manually annotated, and report inter-annotator agreement scores. The first, RuSentiment ( Rogers et al, 2018 ), consists of general-domain texts from VKontake, and the second, RuSentiTweet ( Smetanin, 2022b ), consists of general-domain texts from Twitter.…”
Section: Related Workmentioning
confidence: 99%
See 4 more Smart Citations
“…Although several training datasets are available for sentiment analysis of texts from social networks for the Russian language ( Kotelnikov, 2021 ; Smetanin, 2020 ), only two consist of common-domain texts, are manually annotated, and report inter-annotator agreement scores. The first, RuSentiment ( Rogers et al, 2018 ), consists of general-domain texts from VKontake, and the second, RuSentiTweet ( Smetanin, 2022b ), consists of general-domain texts from Twitter.…”
Section: Related Workmentioning
confidence: 99%
“…At present, pre-trained language models have achieved the highest classification results on most Russian-language sentiment analysis datasets available publicly. For example, fine-tuned RuBERT achieved state-of-the-art results on RuSentiTweet ( Smetanin, 2022b ), LINIS Crowd ( Koltsova, Alexeeva & Kolcov, 2016 ), RuTweetCorp ( Rubtsova, 2013 ), and RuReviews ( Smetanin, 2022b ) datasets; fine-tuned RuRoBERTa-Large achieved SOTA results on RuSentiment ( Rogers et al, 2018 ). Thus, we fine-tuned a pre-trained language model for sentiment classification in this study.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations