A masquerader is someone who impersonates another user and operates a computer system with privileged access. Computer security problems caused by masqueraders are serious. Although anomaly detection is considered to be the best way to detect masqueraders, due to the low probability of detection and high error rate, this method is still in the research phase. Thus far, a number of methods, such as the Support Vector Machine (SVM), the Hidden Markov Model (HMM), and the Naïve Bayes (N. Bayes) classifier technique, have been investigated in order to further improve accuracy of detection. In the present paper, a method of integrating Data Mining and Natural Language Processing, namely, the N -Gram_Square root Term Frequency-Inverse Document Frequency (N -Gram_STF-IDF), is proposed. Using the proposed method, sequences to be detected are segmented via N -Gram characteristics, and non-normal users are then detected using a STF-IDF classifier. We perform an experiment using Schonlau and Greenberg data sets and the proposed method and compare the obtained results with results obtained using various other methods.
Abstract²Masquerader is someone who impersonates another user and operates computer system with privileged access. It ¶s difficult to detect out by conventional techniques as firewall or misuse-based intrusion detection. Anomaly detection has been considered as a promising approach for masquerade detection, which is based on the idea that significant departures from normal behavior could be considered due to a masquerade. However, for low detection accuracy and high false alarm rate, it is still in research stage. Till now, many methods have been proposed from different viewpoints, such as Hidden Markov Model, Naive Bayes, SVM, and so on. Compared with other methods that with well theoretical backgrounds, two intuitive determined statistical methods: the Customized Grammars method and the Self Signature approach combined with Uniqueness, reported the much better detection efficiency. Especially, both methods based on the intuitive notion that the more frequently a usage pattern was employed by current user previously, the more indicative of normal. In other hand, the statistics of usage pattern in the Customized Grammars method was based on sequential grammars, and that of the Self Signature approach combined with Uniqueness was on commands and 2-grams. In this paper, these two methods are compared and evaluated on two benchmark data sets of Unix command sequence: the Schonlau data and the Greenberg data. As a result, contributions of command frequency and command sequence grammar in IDS were analyzed and clarified.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.