Abstract. The analysis of malicious user behavior patterns in social networks has important implications for detecting malicious pages, fraudsters, and financial frauds.Traditional anomaly detection technology general based on classification algorithm using content feature and user behavior feature, but these type of methods are often with low efficiency, data acquisition difficulty and ignoring the network topology information.This paper puts forward a network graph structure based, unsupervised anomaly detection algorithm GBKD-Forest, we extracted three types of structure characteristics, within the Bagging method random sampling features to establish KD-Tree Forest, to isolate the abnormal samples.Evaluation through the experiment, the proposed algorithm in terms of accuracy and AUC is superior to other graph based anomaly detection algorithm and classical classification algorithm, at the same time, the time complexity of this algorithm has a linear relation with the number of nodes, low space complexity is suitable for large-scale network anomaly detection datasets.
IntroductionWith the rapid development of Internet, people could contact each other in social network and shop online. However there are lots of risk behind the convenience of social medium-fake review, fake followers, phishing website, telecommunications fraud. The complex behavior of user makes it had to predict people's behavior and anomaly detection.Existing anomaly behavior detection include content based method [1], behavior feature based method [2] and graph based method [3].Content based suspicious detection technology generally based on the user's personal information and the content of the message they published. Malicious users may publish spam ads, malicious links or illegal content but normal user won't. Behavior feature based detection focus on user's behavior, as for social network these feature include message sent time, number of tweets, comment and forward and online active time. Different from content based method, behavior based method classify the user according to the occurrence frequency of the content rather than the content itself. The advantage of this two kinds of method is their high accuracy of prediction, and their shortcoming is that the content is hard to collect, analysis and storage, the computational complexity is very high.Graph-based anomaly detection methods have raised a generous concern abroad these years. Graph theory and machine learning method perform excellent in this area, outlier is an observation that differs so much from other observations as to arouse suspicion, and in a (static/dynamic) graph outlier is node, edge or substructure that differ from majority of other objects in the graph. The advantage of graph include: (1) Strong representation of data: such as who-follows-whom in Twitter, who-rates-what in Amazon and who-likes-what in Facebook these data can be absrtracted as directed/undirected, weighted/unweighted graph (2)Powerfull representation of relationship and reliance between objects: gra...