Log-Based Anomaly Detection with Multi-Head Scaled Dot-Product Attention Mechanism

Du, Qingfeng; Zhao, Liang; Xu, Jincheng; Han, Yongqi; Zhang, Shuang-Li

doi:10.1007/978-3-030-86472-9_31

Cited by 7 publications

(6 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Transformers (TF) make use of so-called self-attention mechanisms to embed data instances into a vector space, where similar instances should be closer to each other than dissimilar ones [24], [33], [39]. The goal of Transformers is to assign weights to specific inputs according to the context of their occurrence, such as words in sentences.…”

Section: B Deep Learning Techniquesmentioning

confidence: 99%

“…comp. Failures [1], [17], [20], [22], [23], [25], [27], [29]- [31], [33], [37]- [40], [42], [44], [46]- [48], [51], [52], [55], [56], [58], [59], [61], [62], [65]- [74], [76]- [78] BlueGene/L (BGL) [89] 2007 High-perf. comp.…”

Section: Data Setmentioning

confidence: 99%

“…-Failures [1], [20], [22], [24], [25], [28], [30], [33]- [37], [39]- [44], [46]- [48], [54], [55], [59], [61], [62], [64]- [66], [69], [73], [74] Thunderbird [89] 2007 High-perf. comp.…”

Section: Data Setmentioning

confidence: 99%

“…For example, aggregation of logs in windows could require to count detected events as true positives as long as they are close enough to the actual anomaly in the event sequence [26], [57]. Since a majority of the reviewed papers rely on the HDFS data set where labels are only available for whole event sessions rather than single events, the most common method to compute aforementioned metrics relies on counting of in-/correctly identified non-/anomalous sessions [30], [33], [44], [45], [62].…”

Section: Data Setmentioning

confidence: 99%

See 3 more Smart Citations

Deep Learning for Anomaly Detection in Log Data: A Survey

Landauer¹,

Onder²,

Skopik³

et al. 2022

Preprint

View full text Add to dashboard Cite

Automatic log file analysis enables early detection of relevant incidents such as system failures. In particular, selflearning anomaly detection techniques capture patterns in log data and subsequently report unexpected log event occurrences to system operators without the need to provide or manually model anomalous scenarios in advance. Recently, an increasing number of approaches leveraging deep learning neural networks for this purpose have been presented. These approaches have demonstrated superior detection performance in comparison to conventional machine learning techniques and simultaneously resolve issues with unstable data formats. However, there exist many different architectures for deep learning and it is nontrivial to encode raw and unstructured log data to be analyzed by neural networks. We therefore carry out a systematic literature review that provides an overview of deployed models, data pre-processing mechanisms, anomaly detection techniques, and evaluations. The survey does not quantitatively compare existing approaches but instead aims to help readers understand relevant aspects of different model architectures and emphasizes open issues for future work.

show abstract

Section: B Deep Learning Techniquesmentioning

confidence: 99%

Section: Data Setmentioning

confidence: 99%

“…-Failures [1], [20], [22], [24], [25], [28], [30], [33]- [37], [39]- [44], [46]- [48], [54], [55], [59], [61], [62], [64]- [66], [69], [73], [74] Thunderbird [89] 2007 High-perf. comp.…”

Section: Data Setmentioning

confidence: 99%

Section: Data Setmentioning

confidence: 99%

See 2 more Smart Citations

Deep Learning for Anomaly Detection in Log Data: A Survey

Landauer¹,

Onder²,

Skopik³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…[77] also employed the transformer-encoder architecture to develop an unsupervised anomaly detection technique called A2Log. There are other recent research studies that utilized the self-attention with different transformer variants for error and anomaly detection such as LAnoBERT [49], LogAttention [25], and [48]. However, our model utilized self-attention and transformer neural network architecture to predict failures in HPC system components (nodes).…”

Section: Related Workmentioning

confidence: 99%

Clairvoyant

Alharthi

Jhumka

et al. 2022

Proceedings of the 36th ACM International Conference on Supercomputing

View full text Add to dashboard Cite

System failures are expected to be frequent in the exascale era such as current Petascale systems. The health of such systems is usually determined from challenging analysis of large amounts of unstructured & redundant log data. In this paper, we leverage log data and propose Clairvoyant, a novel self-supervised (i.e., no labels needed) model to predict node failures in HPC systems based on a recent deep learning approach called transformer-decoder and the self-attention mechanism. Clairvoyant predicts node failures by (i) predicting a sequence of log events and then (ii) identifying if a failure is a part of that sequence. We carefully evaluate Clairvoyant and another state-of-the-art failure prediction approach -Desh, based on two real-world system log datasets. Experiments show that Clairvoyant is significantly better: e.g., it can predict node failures with an average Bleu, Rouge, and MCC scores of 0.90, 0.78, and 0.65 respectively while Desh scores only 0.58, 0.58, and 0.25. More importantly, this improvement is achieved with faster training and prediction time, with Clairvoyant being about 25× and 15× faster than Desh respectively.

show abstract