“…The most common loss function in the reviewed publications is the Cross-Entropy (CE), in particular, the categorical cross-entropy for multi-class prediction [20], [57] or binary cross-entropy that only differentiates between the normal and anomalous class [61]. Other common loss functions include the Hyper-Sphere Objective Function (HS) where the distance to the center of a hyper-sphere represents the anomaly score [24], [39], [41], [62], the Mean Squared Error (MSE) that is used for regression [20], [27], [28], [47], [50], [53], [68], and the Kullback-Leibler Divergence (KL) and Marginal Likelihood (ML) that are useful to measure loss in probability distributions [49], [58].…”
Section: B Deep Learning Techniquesmentioning
confidence: 99%
“…While such approaches draw less semantic information from the single tokens, they have the advantage of being more flexible as they rely on generally applicable heuristics rather than pre-defined parsers and are therefore widely applicable. Some approaches make use of a combination (COM) of parsing and token-based pre-processing strategies, in particular, by generating token vectors from parsed events rather than raw log lines [28], [38].…”
Section: Log Data Preparationmentioning
confidence: 99%
“…3 shows the event count vector for time window T 1 as [3, 1, 0, 0], indicating that three log lines corresponding to the first log key (E1) appeared in T 1, in particular, the lines L1, L2, and L4. Besides frequencies, many other statistical properties (STAT) may be computed from event occurrences, such as the percentage of seasonal logs [77], the lengths of log messages [53], log activity rates [32], entropy-based scores for chunks of log lines [28], or the presence of sudden bursts in event occurrences [77].…”
Automatic log file analysis enables early detection of relevant incidents such as system failures. In particular, selflearning anomaly detection techniques capture patterns in log data and subsequently report unexpected log event occurrences to system operators without the need to provide or manually model anomalous scenarios in advance. Recently, an increasing number of approaches leveraging deep learning neural networks for this purpose have been presented. These approaches have demonstrated superior detection performance in comparison to conventional machine learning techniques and simultaneously resolve issues with unstable data formats. However, there exist many different architectures for deep learning and it is nontrivial to encode raw and unstructured log data to be analyzed by neural networks. We therefore carry out a systematic literature review that provides an overview of deployed models, data pre-processing mechanisms, anomaly detection techniques, and evaluations. The survey does not quantitatively compare existing approaches but instead aims to help readers understand relevant aspects of different model architectures and emphasizes open issues for future work.
“…The most common loss function in the reviewed publications is the Cross-Entropy (CE), in particular, the categorical cross-entropy for multi-class prediction [20], [57] or binary cross-entropy that only differentiates between the normal and anomalous class [61]. Other common loss functions include the Hyper-Sphere Objective Function (HS) where the distance to the center of a hyper-sphere represents the anomaly score [24], [39], [41], [62], the Mean Squared Error (MSE) that is used for regression [20], [27], [28], [47], [50], [53], [68], and the Kullback-Leibler Divergence (KL) and Marginal Likelihood (ML) that are useful to measure loss in probability distributions [49], [58].…”
Section: B Deep Learning Techniquesmentioning
confidence: 99%
“…While such approaches draw less semantic information from the single tokens, they have the advantage of being more flexible as they rely on generally applicable heuristics rather than pre-defined parsers and are therefore widely applicable. Some approaches make use of a combination (COM) of parsing and token-based pre-processing strategies, in particular, by generating token vectors from parsed events rather than raw log lines [28], [38].…”
Section: Log Data Preparationmentioning
confidence: 99%
“…3 shows the event count vector for time window T 1 as [3, 1, 0, 0], indicating that three log lines corresponding to the first log key (E1) appeared in T 1, in particular, the lines L1, L2, and L4. Besides frequencies, many other statistical properties (STAT) may be computed from event occurrences, such as the percentage of seasonal logs [77], the lengths of log messages [53], log activity rates [32], entropy-based scores for chunks of log lines [28], or the presence of sudden bursts in event occurrences [77].…”
Automatic log file analysis enables early detection of relevant incidents such as system failures. In particular, selflearning anomaly detection techniques capture patterns in log data and subsequently report unexpected log event occurrences to system operators without the need to provide or manually model anomalous scenarios in advance. Recently, an increasing number of approaches leveraging deep learning neural networks for this purpose have been presented. These approaches have demonstrated superior detection performance in comparison to conventional machine learning techniques and simultaneously resolve issues with unstable data formats. However, there exist many different architectures for deep learning and it is nontrivial to encode raw and unstructured log data to be analyzed by neural networks. We therefore carry out a systematic literature review that provides an overview of deployed models, data pre-processing mechanisms, anomaly detection techniques, and evaluations. The survey does not quantitatively compare existing approaches but instead aims to help readers understand relevant aspects of different model architectures and emphasizes open issues for future work.
“…VeLog proposed by Qian et al [42] achieves sequential modeling of execution paths and the number of execution times using variational autoencoders (VAE). Catillo et al proposes AutoLog [43] which models term-weightings with autoencoders.…”
An enterprise today deploys multiple security middleboxes such as firewalls, IDS, IPS, etc. in its network to collect different kinds of events related to threats and attacks. These events are streamed into a SIEM (Security Information and Event Management) system for analysts to investigate and respond quickly with appropriate actions. However, the number of events collected for a single enterprise can easily run into hundreds of thousands per day, much more than what analysts can investigate under a given budget constraint (time). In this work, we look into the problem of prioritizing suspicious events or anomalies to analysts for further investigation. We develop SIERRA, a system that processes event logs from multiple and diverse middleboxes to detect and rank anomalous activities. SIERRA takes an unsupervised approach and therefore has no dependence on ground truth data. Different from other works, SIERRA defines contexts, that help it to provide visual explanations of highly-ranked anomalous points to analysts, despite employing unsupervised models. We evaluate SIERRA using months of logs from multiple security middleboxes of an enterprise network. The evaluations demonstrate the capability of SIERRA to detect top anomalies in a network while outperforming naive application of existing anomaly detection algorithms as well as a state-of-the-art SIEM-based anomaly detection solution.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.