Today, through Twitter, researchers propose approaches for classifying user accounts. However, they have to face confidence challenges owing to the diversity of the types of data propagated throughout Twitter. In addition, the messages from Twitter are imprecise, very short and even written in many dialects and languages. Moreover, the majority of the related works focus on the overall user's activity, which makes them not suitable at the post-level classification. This paper presents an alternative approach for classifying user accounts as being accounts of bots, humans or organizations. The suggested approach consists in accurately classifying user accounts from one single post by leveraging a minimal number of language-independent features. We performed several experiments over a Twitter datasets and supervised learn-based algorithms. Our results demonstrated that simply using a minimal number of language-independent features extracted from one single post is sufficient to classify user accounts accurately and quickly. Our proposed approach yielded high F1-measure (>95%) and high AUC (>99%) using Random Forest.Keywords. Social network analysis, twitter user classification, human vs. bot vs. organization, statisticalbased approach, content-based approach, hybrid-based approach.Human 0,939 0,940 0,939 0.993 Bot 0,986 0,977 0,981 0.995 Organization 0,888 0,919 0,902 0.992
Nowadays, bot detection from Twitter attracts the attention of several researchers around the world. Different bot detection approaches have been proposed as a result of these research efforts. Four of the main challenges faced in this context are the diversity of types of content propagated throughout Twitter, the problem inherent to the text, the lack of sufficient labeled datasets and the fact that the current bot detection approaches are not sufficient to detect bot activities accurately. We propose, Twitterbot+, a bot detection system that leveraged a minimal number of language-independent features extracted from one single tweet with temporal enrichment of a previously labeled datasets. We conducted experiments on three benchmark datasets with standard evaluation scenarios, and the achieved results demonstrate the efficiency of Twitterbot+ against the state-of-the-art. This yielded a promising accuracy results (>95%). Our proposition is suitable for accurate and real-time use in a Twitter data collection step as an initial filtering technique to improve the quality of research data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.