Abstract-Analysis of criminal social graph structures can enable us to gain valuable insights into how these communities are organized. Such as, how large scale and centralized these criminal communities are currently? While these types of analysis have been completed in the past, we wanted to explore how to construct a large scale social graph from a smaller set of leaked data that included only the criminal's email addresses.We begin our analysis by constructing a 43 thousand node social graph from one thousand publicly leaked criminals' email addresses. This is done by locating Facebook profiles that are linked to these same email addresses and scraping the public social graph from these profiles. We then perform a large scale analysis of this social graph to identify profiles of high rank criminals, criminal organizations and large scale communities of criminals. Finally, we perform a manual analysis of these profiles that results in the identification of many criminally focused public groups on Facebook. This analysis demonstrates the amount of information that can be gathered by using limited data leaks.
Abstract-Twitter is one of the most popular sources for disseminating news and propaganda in the Arab region. Spammers are now creating abusive accounts to distribute adult content in Arabic tweets, which is prohibited by Arabic norms and cultures. Arab governments are facing a massive challenge to detect these accounts. This paper evaluates different machine learning algorithms for detecting abusive accounts with Arabic tweets, using Naïve Bayes (NB), Support Vector Machine (SVM), and Decision Tree (J48) classifiers. We are not aware of another existing data set of abusive accounts with Arabic tweets, and this is the first study to investigate this issue. The data set for this analysis was collected based on the top five Arabic swearing words. The results show that the Naïve Bayes (NB) classifier with 10 tweets and 100 features has the best performance with 90% accuracy rate.Index Terms-Arabic text classification, machine learning, pornographic spam, social network abuse. I. INTRODUCTIONTwitter is a micro blogger provider where users compose messages of not more than 140 characters. These messages are called tweets, and may contain text, pictures, videos or hyperlinks. The usernames in Twitter start with a prefix (@). Twitter users create their social networks through followers and following relationships. Tweets will be posted on the user and the followers' timelines and can be found by Twitter's search engine. The tweets can be forwarded to the user's followers by clicking -Retweet‖. At the same time, the tweet can be replayed by including the username prefixed by @ in the tweet. The tweets' topics can be indexed using hashtags for each topic. All hashtags in Twitter are preceded with the hash (#) symbol and can also be searched through Twitter's search engine.Since the 2011 Arab spring, the number of Twitter users in Arab nations has been escalating. Twitter has registered five million active users in Arab countries, who send on average 17 million tweets a day. Twitter, like other social media, is a popular medium for disseminating news and propaganda Consequently, spammers are exploiting Twitter's popularity in the Middle East to disseminate malicious content. These mal-actors have opened up Twitter accounts to launch spamming campaigns targeting Arabic speakers within the 22 nations in the Middle East. Some of the Arab nations have attempted, but failed, to censor Internet traffic to block malicious URLs and contents from abusive social media accounts. These attempts have failed because spam detection tools trained in the English language are being implemented Manuscript received March 12, 2015; revised June 9, 2015. The authors are with Computer Science Department, George Mason University, Fairfax, VA 22030 USA (e-mail: eabozina@gmu.edu, ambaziir@gmu.edu, jjonesu@gmu.edu).on Arabic spam [4], [5]. Spammers are exploiting this loophole to launch successful spam campaigns.In the meantime, the number of abusive accounts has been increasing over time by exploiting the simplicity of using emails as a verificati...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.