Nowadays, information and communications technology systems are fundamental assets of our social and economical model, and thus they should be properly protected against the malicious activity of cybercriminals. Defence mechanisms are generally articulated around tools that trace and store information in several ways, the simplest one being the generation of plain text files coined as security logs. Such log files are usually inspected, in a semi-automatic way, by security analysts to detect events that may affect system integrity, confidentiality and availability. On this basis, we propose a parameter-free method to detect security incidents from structured text regardless its nature. We use the Normalized Compression Distance to obtain a set of features that can be used by a Support Vector Machine to classify events from a heterogeneous cybersecurity environment. In particular, we explore and validate the application of our method in four different cybersecurity domains: HTTP anomaly identification, spam detection, Domain Generation Algorithms tracking and sentiment analysis. The results obtained show the validity and flexibility of our approach in different security scenarios with a low configuration burden.
In cybersecurity, there is a call for adaptive, accurate and efficient procedures to identifying performance shortcomings and security breaches. The increasing complexity of both Internet services and traffic determines a scenario that in many cases impedes the proper deployment of intrusion detection and prevention systems. Although it is a common practice to monitor network and applications activity, there is not a general methodology to codify and interpret the recorded events. Moreover, this lack of methodology somehow erodes the possibility of diagnosing whether event detection and recording is adequately performed. As a result, there is an urge to construct general codification and classification procedures to be applied on any type of security event in any activity log. This work is focused on defining such a method using the so-called normalized compression distance (NCD). NCD is parameter-free and can be applied to determine the distance between events expressed using strings. As a first step in the concretion of a methodology for the integral interpretation of security events, this work is devoted to the characterization of web logs. On the grounds of the NCD, we propose an anomaly-based procedure for identifying web attacks from web logs. Given a web query as stored in a security log, a NCD-based feature vector is created and classified using a support vector machine. The method is tested using the CSIC-2010 data set, and the results are analyzed with respect to similar proposals.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.