With the widespread use of social networks, spam messages against them have become a major issue. Spam detection methods can be broadly divided into expert-based and machine learning-based detection methods. When experts participate in spam detection, the detection accuracy is fairly high. However, this method is highly time-consuming and expensive. Conversely, methods using machine learning have the advantage of automation, but their accuracy is relatively low. This paper proposes a spam-detection framework that combines and fully exploits the advantages of both methods. To reduce the workload of the experts, all messages are first analyzed via a primary machine learning filter, and those that are determined to be normal messages are allowed through, whereas suspicious messages are flagged. The flagged messages are subsequently analyzed by an expert to enhance the overall system accuracy. In the filtering process, costbased machine learning is used to prevent the fatal error of misidentifying a spam message as a normal message. In addition, to obviate the continuously evolving spam trends, a module that periodically updates the expert-diagnosis results on the training dataset is incorporated into the framework. The results of experiments conducted, on an imbalanced dataset of spam tweets and normal tweets in a ratio similar to the actual situation in real life, indicate that the proposed framework has a spam-detection rate of almost 92.8%, which is higher than that of the conventional machine learning technique. Furthermore, the proposed framework delivered stable high performance even in an environment where social network messages changed continuously, unlike the conventional technique, which exhibited large performance deviations.INDEX TERMS Expert decision making, machine learning, real-time spam detection, social network, Twitter spam.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.