The first successful isolation-based anomaly detector, ie, iForest, uses trees as a means to perform isolation. Although it has been shown to have advantages over existing anomaly detectors, we have identified 4 weaknesses, ie, its inability to detect local anomalies, anomalies with a high percentage of irrelevant attributes, anomalies that are masked by axis-parallel clusters, and anomalies in multimodal data sets. To overcome these weaknesses, this paper shows that an alternative isolation mechanism is required and thus presents iNNE or isolation using Nearest Neighbor Ensemble.Although relying on nearest neighbors, iNNE runs significantly faster than the existing nearest neighbor-based methods such as the local outlier factor, especially in data sets having thousands of dimensions or millions of instances. This is because the proposed method has linear time complexity and constant space complexity. KEYWORDSanomaly detection, ensemble learning, isolation-based, nearest neighbor, outlier detection INTRODUCTIONAnomaly detection is an important data mining task that has a diverse range of applications in various domains. 1,2 The explosive growth of databases in both size and dimensionality is challenging for anomaly detection methods in two important aspects: the requirement of low computational 968 /journal/coin Computational Intelligence. 2018;34:968-998. BANDARAGODA ET AL. 969cost and the susceptibility to issues in high-dimensional data sets. Efficient methods are required in time-critical applications such as network intrusion detection and credit card fraud detection. However, the time complexity of most existing methods is on the order of O(n 2 ) (where n is the data set size), which is prohibitively expensive for large data sets. Therefore, efficient and scalable methods for large data sets are highly desirable.iForest 3 is a unique anomaly detector because it utilizes an isolation mechanism to detect anomalies. iForest isolates each instance from the rest of the instances through recursive axis-parallel subdivisions. Those instances that can be easily isolated are likely to be anomalies.The key advantage of iForest is its linear execution time, which makes it extremely efficient in comparison to other methods, and thus, it is a very attractive option for large data sets. iForest has been shown 3,4 to have better detection accuracy and faster runtime than many state-of-the-art methods including the local outlier factor (LOF) 5 and optimal reciprocal collision avoidance. 6 Despite these advantages, our investigation finds that the current isolation mechanism has weaknesses in detecting the following 4 types of anomalies.1. Local anomalies: iForest uses a global anomaly score that is not sensitive to the local data distribution of a data set. 2. Anomalies with low relevant dimensions: In high-dimensional data, iForest can only utilize a subset of the dimensions to create isolation trees. Each subset does not usually contain sufficient relevant dimensions to detect anomalies when the number of relevant dimens...
In this paper we propose a new machine learning model for classification of nocturnal awakenings in acute insomnia and normal sleep. The model does not require sleep diaries or any other subjective information from the individuals who took part of the study. It is based on nocturnal actigraphy collected from pre-medicated individuals with acute insomnia and normal sleep controls. We have derived dynamical and statistical features from the actigraphy time series data. These features are combined using two machine learning techniques namely Random Forest (RF) and Support Vector Machine (SVM). RF shows better performance (accuracy-84%) than SVM (73%) in classifying individuals with insomnia from healthy sleepers. The developed model provides a signature of the condition of acute insomnia obtained from actigraphy only and is very promising as a tool to detect the condition in a non-invasive way and without sleep diaries or any other subjective information. INDEX TERMS Acute insomnia, actigraphy, machine learning, insomnia detection, dynamical features.
Most density-based clustering methods have difficulties detecting clusters of hugely different densities in a dataset. A recent density-based clustering CFSFDP appears to have mitigated the issue. However, through formalising the condition under which it fails, we reveal that CFSFDP still has the same issue. To address this issue, we propose a new measure called Local Contrast, as an alternative to density, to find cluster centers and detect clusters. We then apply Local Contrast to CFSFDP, and create a new clustering method called LC-CFSFDP which is robust in the presence of varying densities. Our empirical evaluation shows that LC-CFSFDP outperforms CFSFDP and three other state-of-the-art variants of CFSFDP.
Natural gas has been proposed as a solution to increase the security of energy supply and reduce environmental pollution around the world. Being able to forecast natural gas price benefits various stakeholders and has become a very valuable tool for all market participants in competitive natural gas markets. Machine learning algorithms have gradually become popular tools for natural gas price forecasting. In this paper, we investigate data-driven predictive models for natural gas price forecasting based on common machine learning tools, i.e., artificial neural networks (ANN), support vector machines (SVM), gradient boosting machines (GBM), and Gaussian process regression (GPR). We harness the method of cross-validation for model training and monthly Henry Hub natural gas spot price data from January 2001 to October 2018 for evaluation. Results show that these four machine learning methods have different performance in predicting natural gas prices. However, overall ANN reveals better prediction performance compared with SVM, GBM, and GPR.Energies 2019, 12, 1680 2 of 17 usage of resources based on accurate predictions. Accurate natural gas price forecasting not only provides an important guide for effective implementation of energy policy and planning, but also is extremely significant in economic planning, energy investment, and environmental conservation. Therefore, researchers continue to study natural gas price forecasting models with great interest, with the aim of making predictions as accurate as possible in future.There are plenty of methods for analyzing and forecasting natural gas prices and machine learning is increasingly used. Machine learning algorithms can learn from historical relationships and trends in the data and make data-driven predictions or decisions. A great number of researchers have investigated natural gas price prediction with the aid of various machine learning methods so far. For instance, Abrishami and Varahrami mixed the data handling neural network technique with a rule-based expert system in forecasting natural gas prices [3]. Busse et al. used a nonlinear autoregressive exogenous model neural network [4]. Azadeh et al. studied a hybrid neuro-fuzzy method composed of ANN, fuzzy linear regression, and conventional regression [5]. Salehnia et al. developed several nonlinear models using the gamma test, including local linear regression, dynamic local linear regression, and ANN models [6]. Ceperic et al. proposed a strategic seasonality-adjusted, support vector regression machine-based model [7]. Su et al. utilized a least squares regression boosting algorithm in natural gas price prediction [8].As indicated in the abovementioned existing studies that exploited machine learning tools for natural gas price prediction, ANN and SVM are widely used machine learning methods in forecasting natural gas prices. In addition to ANN and SVM, this study will introduce two other common machine learning approaches, GBM and GPR (these two methods were used for forecasting hourly loads in US [9]). ...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.