Detection methods based on n-gram models have been widely studied for the identification of attacks and malicious software. These methods usually build on one of two learning schemes: anomaly detection, where a model of normality is constructed from n-grams, or classification, where a discrimination between benign and malicious n-grams is learned. Although successful in many security domains, previous work falls short of explaining why a particular scheme is used and more importantly what renders one favorable over the other for a given type of data. In this paper we provide a close look on n-gram models for intrusion detection. We specifically study anomaly detection and classification using n-grams and develop criteria for data being used in one or the other scheme. Furthermore, we apply these criteria in the scope of web intrusion detection and empirically validate their effectiveness with different learning-based detection methods for client-side and service-side attacks.
Hosts infected with malicious software, so called malware, are ubiquitous in today's computer networks. The means whereby malware can infiltrate a network are manifold and range from exploiting of software vulnerabilities to tricking a user into executing malicious code. Monitoring and detection of all possible infection vectors is intractable in practice. Hence, we approach the problem of detecting malicious software at a later point when it initiates contact with its maintainer; a process referred to as "phoning home". In particular, we introduce Botzilla, a method for detection of malware communication, which proceeds by repetitively recording network traffic of malware in a controlled environment and generating network signatures from invariant content patterns. Experiments conducted at a large university network demonstrate the ability of Botzilla to accurately identify malware communication in network traffic with very low false-positive rates.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.