Abstract-Twitter is one of the most popular sources for disseminating news and propaganda in the Arab region. Spammers are now creating abusive accounts to distribute adult content in Arabic tweets, which is prohibited by Arabic norms and cultures. Arab governments are facing a massive challenge to detect these accounts. This paper evaluates different machine learning algorithms for detecting abusive accounts with Arabic tweets, using Naïve Bayes (NB), Support Vector Machine (SVM), and Decision Tree (J48) classifiers. We are not aware of another existing data set of abusive accounts with Arabic tweets, and this is the first study to investigate this issue. The data set for this analysis was collected based on the top five Arabic swearing words. The results show that the Naïve Bayes (NB) classifier with 10 tweets and 100 features has the best performance with 90% accuracy rate.Index Terms-Arabic text classification, machine learning, pornographic spam, social network abuse.
I. INTRODUCTIONTwitter is a micro blogger provider where users compose messages of not more than 140 characters. These messages are called tweets, and may contain text, pictures, videos or hyperlinks. The usernames in Twitter start with a prefix (@). Twitter users create their social networks through followers and following relationships. Tweets will be posted on the user and the followers' timelines and can be found by Twitter's search engine. The tweets can be forwarded to the user's followers by clicking -Retweet‖. At the same time, the tweet can be replayed by including the username prefixed by @ in the tweet. The tweets' topics can be indexed using hashtags for each topic. All hashtags in Twitter are preceded with the hash (#) symbol and can also be searched through Twitter's search engine.Since the 2011 Arab spring, the number of Twitter users in Arab nations has been escalating. Twitter has registered five million active users in Arab countries, who send on average 17 million tweets a day. Twitter, like other social media, is a popular medium for disseminating news and propaganda Consequently, spammers are exploiting Twitter's popularity in the Middle East to disseminate malicious content. These mal-actors have opened up Twitter accounts to launch spamming campaigns targeting Arabic speakers within the 22 nations in the Middle East. Some of the Arab nations have attempted, but failed, to censor Internet traffic to block malicious URLs and contents from abusive social media accounts. These attempts have failed because spam detection tools trained in the English language are being implemented Manuscript received March 12, 2015; revised June 9, 2015. The authors are with Computer Science Department, George Mason University, Fairfax, VA 22030 USA (e-mail: eabozina@gmu.edu, ambaziir@gmu.edu, jjonesu@gmu.edu).on Arabic spam [4], [5]. Spammers are exploiting this loophole to launch successful spam campaigns.In the meantime, the number of abusive accounts has been increasing over time by exploiting the simplicity of using emails as a verificati...