Automatic Hyperparameter Tuning Method for Local Outlier Factor, with Applications to Anomaly Detection

Xu, Zhengyu; Kakde, Deovrat; Chaudhuri, Arin

doi:10.1109/bigdata47090.2019.9006151

Cited by 37 publications

(12 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Ensemble based methods: The first ensemble learning approach to outlier detection runs on LOF when they are learned with different sets of hyperparameters such that the resultant combination is the anomaly scores [36]. Isolation Forest (IF) is another ensemble-based algorithm that builds a forest of random binary trees such that anomalous instances have short average path lengths on the trees [21; 22; 14].…”

Section: Related Workmentioning

confidence: 99%

Time Series Anomaly Detection with label-free Model Selection

Jung,

Ramanan,

Amjadi

et al. 2021

Preprint

View full text Add to dashboard Cite

Anomaly detection for time-series data becomes an essential task for many datadriven applications fueled with an abundance of data and out-of-the-box machinelearning algorithms. In many real-world settings, developing a reliable anomaly model is highly challenging due to insufficient anomaly labels and the prohibitively expensive cost of obtaining anomaly examples. It imposes a significant bottleneck to evaluate model quality for model selection and parameter tuning reliably. As a result, many existing anomaly detection algorithms fail to show their promised performance after deployment. In this paper, we propose LaF-AD, a novel anomaly detection algorithm with labelfree model selection for unlabeled times-series data. Our proposed algorithm performs a fully unsupervised ensemble learning across a large number of candidate parametric models. We develop a model variance metric that quantifies the sensitivity of anomaly probability with a bootstrapping method. Then it makes a collective decision for anomaly events by model learners using the model variance. Our algorithm is easily parallelizable, more robust for ill-conditioned and seasonal data, and highly scalable for a large number of anomaly models. We evaluate our algorithm against other state-of-the-art methods on a synthetic domain and a benchmark public data set.Preprint. Under review.

show abstract

Section: Related Workmentioning

confidence: 99%

Time Series Anomaly Detection with label-free Model Selection

Jung,

Ramanan,

Amjadi

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…The performance of LOF is highly dependent on the values of contamination and n_neighbors [29]. We set the value of n_neighbors to 20, which is the default value of the utilized Machine Learning algorithm [27] and defines the number of neighbors that need to be taken into consideration to detect the outliers.…”

Section: Testing the Modelmentioning

confidence: 99%

“…The performance of an LOF algorithm depends on its parameters' values, contamination and neighborhood size [29]. When experimenting with synthetic data with the known anomaly portion, contamination value and neighborhood size can be tuned based on this known anomaly portion data and report better results.…”

Section: Testing the Modelmentioning

confidence: 99%

“…When experimenting with synthetic data with the known anomaly portion, contamination value and neighborhood size can be tuned based on this known anomaly portion data and report better results. In a real-world scenario, these parameters can be tuned based on historic evidence of malicious insiders or use the Xu et al [29] methodology for automatic tuning Local Outlier Factor's hyperparameters.…”

Section: Testing the Modelmentioning

confidence: 99%

See 1 more Smart Citation

Mitigating Insider Threats Using Bio-Inspired Models

2020

View full text Add to dashboard Cite

Insider threats have become a considerable information security issue that governments and organizations must face. The implementation of security policies and procedures may not be enough to protect organizational assets. Even with the evolution of information and network security technology, the threat from insiders is increasing. Many researchers are approaching this issue with various methods in order to develop a model that will help organizations to reduce their exposure to the threat and prevent damage to their assets. In this paper, we approach the insider threat problem and attempt to mitigate it by developing a machine learning model based on Bio-inspired computing. The model was developed by using an existing unsupervised learning algorithm for anomaly detection and we fitted the model to a synthetic dataset to detect outliers. We explore swarm intelligence algorithms and their performance on feature selection optimization for improving the performance of the machine learning model. The results show that swarm intelligence algorithms perform well on feature selection optimization and the generated, near-optimal, subset of features has a similar performance to the original one.

show abstract

“…All of above LOF-based algorithms determine whether the data point is an outlier by examining the outlier factor of the points in K -distance neighborhood, and they usually suffer from the high time overhead problem. In order to overcome this problem, various solutions have been proposed, including POLOF algorithm (Optimized Pruning-based Outlier Detecting algorithm), NLOF algorithm, IncLOF algorithm (Incremental Local Outlier Factor), INFLOF algorithm (Influenced Local Outlier Factor) and so on [29]- [31]. These algorithms combined clustering algorithm with the outlier detection algorithms to achieve better performance.…”

Section: Introductionmentioning

confidence: 99%

An Outlier Detection Approach Based on Improved Self-Organizing Feature Map Clustering Algorithm

et al. 2019

View full text Add to dashboard Cite

Local Outlier Factor (LOF) outlier detecting algorithm has good accuracy in detecting global and local outliers. However, the algorithm needs to traverse the entire dataset when calculating the local outlier factor of each data point, which adds extra time overhead and makes the algorithm execution inefficient. In addition, if the K-distance neighborhood of an outlier point P contains some outliers that are incorrectly judged by the algorithm as normal points, then P may be misidentified as normal point. To solve the above problems, this paper proposes a Neighbor Entropy Local Outlier Factor (NELOF) outlier detecting algorithm. Firstly, we improve the Self-Organizing Feature Map (SOFM) algorithm and use the optimized SOFM clustering algorithm to cluster the dataset. Therefore, the calculation of each data point's local outlier factor only needs to be performed inside the small cluster. Secondly, this paper replaces the K-distance neighborhood with relative K-distance neighborhood and utilizes the entropy of relative K neighborhood to redefine the local outlier factor, which improves the accuracy of outlier detection. Experiments results confirm that our optimized SOFM algorithm can avoid the random selection of neurons, and improve clustering effect of traditional SOFM algorithm. In addition, the proposed NELOF algorithm outperforms LOF algorithm in both accuracy and execution time of outlier detection.

show abstract

Automatic Hyperparameter Tuning Method for Local Outlier Factor, with Applications to Anomaly Detection

Cited by 37 publications

References 34 publications

Time Series Anomaly Detection with label-free Model Selection

Time Series Anomaly Detection with label-free Model Selection

Mitigating Insider Threats Using Bio-Inspired Models

An Outlier Detection Approach Based on Improved Self-Organizing Feature Map Clustering Algorithm

Contact Info

Product

Resources

About