Anais Do X Brazilian Workshop on Social Network Analysis and Mining (BraSNAM 2021) 2021
DOI: 10.5753/brasnam.2021.16131
|View full text |Cite
|
Sign up to set email alerts
|

Measuring the Degree of Divergence when Labeling Tweets in the Electoral Scenario

Abstract: Analyzing electoral trends in political scenarios using social media with data mining techniques has become popular in recent years. A problem in this field is to reliably annotate data during the short period of electoral campaigns. In this paper, we present a methodology to measure labeling divergence and an exploratory analysis of data related to the 2018 Brazilian Presidential Elections. As a result, we point out some of the main characteristics that lead to a high level of divergence during the annotation… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
2
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
2
2
2

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(6 citation statements)
references
References 13 publications
0
2
0
Order By: Relevance
“…The reliability analysis of the gold-standard sets conducted in Section 4.3 confirmed that labeling short online texts is a challenging and often confusing task for human annotators (see [31] but also [58]). At the same time, the inter-annotator agreement achieved in this study for the 9 classes is considerably better than in many other reports which dealt with multiple-class annotation of short texts (e.g., [59,60]).…”
Section: Labelingmentioning
confidence: 78%
See 1 more Smart Citation
“…The reliability analysis of the gold-standard sets conducted in Section 4.3 confirmed that labeling short online texts is a challenging and often confusing task for human annotators (see [31] but also [58]). At the same time, the inter-annotator agreement achieved in this study for the 9 classes is considerably better than in many other reports which dealt with multiple-class annotation of short texts (e.g., [59,60]).…”
Section: Labelingmentioning
confidence: 78%
“…As a rule, the developed LDA-based approaches require label-topics or seed keywords to be manually selected in the beginning of the annotation process. The latter may be non-trivial, if feasible at all, in the case of e-commerce data, as multiple-topic manual labeling of large volumes of short texts was shown to produce highly subjective and contradictory results (e.g., see [31]). Wang et al [32] proposed an optimization algorithm called Adaptive Labeled LDA (AL-LDA) to deal with label disparity.…”
Section: Related Workmentioning
confidence: 99%
“…The judgement of whether the tweet fitted into one those categories was based on subjectively analysing the individual tweet without the aid of any other details to remain objective. It is essential that the reliability of these scores be measured (Bobicev & Sokolova, 2017;Landis & Koch, 1977;Santos, Bernardini, Paes, 2021). This can be achieved by other researchers to re-classify the same sample of data to determine the degree of agreement among raters.…”
Section: Manual Classificationmentioning
confidence: 99%
“…This can be achieved by other researchers to re-classify the same sample of data to determine the degree of agreement among raters. This is essential for measuring the reliability of the classified data and also the reliability of the algorithms (Bobicev & Sokolova, 2017;Landis & Koch, 1977;Santos, Bernardini, Paes, 2021). The manual classification of each dataset will be rated by three manual raters, which are known as MR1 (Manual Rater (MR)), MR2, and MR3.…”
Section: Manual Classificationmentioning
confidence: 99%
See 1 more Smart Citation