System Log Detection Model Based on Conformal Prediction

Ren, Yitong; Gu, Zhaojun; Zhi, Wang; Tian, Zhihong; Liu, Chunbo; Lu, Hui; Du, Xiaojiang; Guizani, Mohsen

doi:10.3390/electronics9020232

Cited by 8 publications

(2 citation statements)

References 38 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…When unstable logs occur, the experience with the new log can be updated to the previous experience without retraining the model, better mitigating the impact of unstable log problems. Conformal prediction [20] provides a statistical p value that could be used to calculate confidence and guide the algorithm to make decisions or evaluations. Adding the new learned experience into the algorithm decision will effectively mitigate the impact of unstable logs.…”

Section: Limitations Of Existing Workmentioning

confidence: 99%

LogCAD: An Efficient and Robust Model for Log-Based Conformal Anomaly Detection

Liu

Liang

Hou

et al. 2022

Security and Communication Networks

View full text Add to dashboard Cite

Log files are usually semistructured files that record the historical operation information of systems or devices. Researchers often find anomalies by analyzing logs, so as to identify system operation faults and cyberattacks. Traditional classification-based methods, especially deep learning methods, can effectively solve the problem of static log anomaly detection. However, when addressing dynamic unstable logs caused by concept drift and noise, the performance of those methods decreased significantly, and false positives are prone to occur. Retraining model is a choice to solve the log instability problem, but this will greatly increase the computational complexity for deep learning models. The log-based conformal anomaly detection (LogCAD) builds a confidence evaluation mechanism for multiple labels, which can achieve good detection results by making collaborative decisions based on multiple weak classifiers without deep learning. Moreover, LogCAD can be easily extended to dynamic unstable logs. It incrementally updates the trained model with conformal detection results of new samples. Experimental results show that LogCAD can achieve excellent detection results for both dynamic unstable logs and static stable logs. Compared with LogRobust and other deep learning models, it has higher efficiency and wider application scope.

show abstract

Section: Limitations Of Existing Workmentioning

confidence: 99%

LogCAD: An Efficient and Robust Model for Log-Based Conformal Anomaly Detection

Liu

Liang

Hou

et al. 2022

Security and Communication Networks

View full text Add to dashboard Cite

show abstract

“…At present, the most popular probability prediction algorithms are conformal predictor and Venn-Abers predictor. The conformal predictor gives p value as an estimate of prediction reliability under confidence [19], but that is not a direct probability. The paper is aimed at introducing an algorithm that converts the results of the conformal predictor into probabilities and giving estimates of the probabilities of the predicted results, which makes the results more intuitive.…”

Section: Introductionmentioning

confidence: 99%

Valid Probabilistic Anomaly Detection Models for System Logs

Liu

Pan

et al. 2020

Wireless Communications and Mobile Computing

Self Cite

View full text Add to dashboard Cite

System logs can record the system status and important events during system operation in detail. Detecting anomalies in the system logs is a common method for modern large-scale distributed systems. Yet threshold-based classification models used for anomaly detection output only two values: normal or abnormal, which lacks probability of estimating whether the prediction results are correct. In this paper, a statistical learning algorithm Venn-Abers predictor is adopted to evaluate the confidence of prediction results in the field of system log anomaly detection. It is able to calculate the probability distribution of labels for a set of samples and provide a quality assessment of predictive labels to some extent. Two Venn-Abers predictors LR-VA and SVM-VA have been implemented based on Logistic Regression and Support Vector Machine, respectively. Then, the differences among different algorithms are considered so as to build a multimodel fusion algorithm by Stacking. And then a Venn-Abers predictor based on the Stacking algorithm called Stacking-VA is implemented. The performances of four types of algorithms (unimodel, Venn-Abers predictor based on unimodel, multimodel, and Venn-Abers predictor based on multimodel) are compared in terms of validity and accuracy. Experiments are carried out on a log dataset of the Hadoop Distributed File System (HDFS). For the comparative experiments on unimodels, the results show that the validities of LR-VA and SVM-VA are better than those of the two corresponding underlying models. Compared with the underlying model, the accuracy of the SVM-VA predictor is better than that of LR-VA predictor, and more significantly, the recall rate increases from 81% to 94%. In the case of experiments on multiple models, the algorithm based on Stacking multimodel fusion is significantly superior to the underlying classifier. The average accuracy of Stacking-VA is larger than 0.95, which is more stable than the prediction results of LR-VA and SVM-VA. Experimental results show that the Venn-Abers predictor is a flexible tool that can make accurate and valid probability predictions in the field of system log anomaly detection.

show abstract