2023
DOI: 10.48550/arxiv.2301.07015
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Simplistic Collection and Labeling Practices Limit the Utility of Benchmark Datasets for Twitter Bot Detection

Abstract: Accurate bot detection is necessary for the safety and integrity of online platforms. It is also crucial for research on the influence of bots in elections, the spread of misinformation, and financial market manipulation. Platforms deploy infrastructure to flag or remove automated accounts, but their tools and data are not publicly available. Thus, the public must rely on third-party bot detection. These tools employ machine learning and often achieve near perfect performance for classification on existing dat… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(1 citation statement)
references
References 57 publications
0
1
0
Order By: Relevance
“…However, a fundamental problem with these "human-annotation based tools" is the lack of ground-truth bot labels in training samples. Some researchers also mention these concerns [14,15]-the limiting factor in advancing bot detection research is the lack of availability of robust, high-quality data, caused by "simplistic data collection and labeling practices". To highlight the prevalence of this problem, we aggregated several bot labelling techniques and their popularity (see Figure 1) from the review of papers presented in [16].…”
Section: Related Workmentioning
confidence: 99%
“…However, a fundamental problem with these "human-annotation based tools" is the lack of ground-truth bot labels in training samples. Some researchers also mention these concerns [14,15]-the limiting factor in advancing bot detection research is the lack of availability of robust, high-quality data, caused by "simplistic data collection and labeling practices". To highlight the prevalence of this problem, we aggregated several bot labelling techniques and their popularity (see Figure 1) from the review of papers presented in [16].…”
Section: Related Workmentioning
confidence: 99%