Abstract:Problem statement:The matching problem of complex objects is one of the most difficult task in the pattern recognition field. These problems are made difficult by seemingly infinite varieties of shapes and classes which are used. The difficulties are related to absolute shape measurement, given the impossibility of directly mapping shapes, as such, into a feature space. Approach: In this study, an object was modeled using boundaries pixel distance. The invariant has been resulted from the distance of each boun… Show more
“…Zhang et al further considered the impact of outdoor temperature and used a three-piecewise linear regression method to fit the relationship between energy consumption and outdoor temperature. However, linear regression-based methods require well-defined independent variables [31]. Brown et al predicted electricity demand using K-nearest neighborhood (KNN) in a kernel regression method [5].…”
− A prediction-based anomaly detection method − Anomaly detection system based on real-time big data architecture − Iterative detection model update and real-time anomaly detection − Supporting real-time anomaly detection for scalable smart meter data 1
“…Zhang et al further considered the impact of outdoor temperature and used a three-piecewise linear regression method to fit the relationship between energy consumption and outdoor temperature. However, linear regression-based methods require well-defined independent variables [31]. Brown et al predicted electricity demand using K-nearest neighborhood (KNN) in a kernel regression method [5].…”
− A prediction-based anomaly detection method − Anomaly detection system based on real-time big data architecture − Iterative detection model update and real-time anomaly detection − Supporting real-time anomaly detection for scalable smart meter data 1
“…Therefore, penalty function strategies do not always guarantee practical results. The advantage of linear regression is that, with the dependent variables being well defined, the technique is able to extract time series features (Magld, 2012). Lee and Fung (1997) showed that linear and nonlinear regressions can also be used for outlier detection, but they used a 5% upper and lower threshold limit for choosing outliers after fitting, which yielded many false positives for very large data sets.…”
This paper introduces a probabilistic approach to anomaly detection, specifically in natural gas time series data. In the natural gas field, there are various types of anomalies, each of which is induced by a range of causes and sources. The causes of a set of anomalies are examined and categorized, and a Bayesian maximum likelihood classifier learns the temporal structures of known anomalies. Given previously unseen time series data, the system detects anomalies using a linear regression model with weather inputs, after which the anomalies are tested for false positives and classified using a Bayesian classifier. The method can also identify anomalies of an unknown origin. Thus, the likelihood of a data point being anomalous is given for anomalies of both known and unknown origins. This probabilistic anomaly detection method is tested on a reported natural gas consumption data set.
“…Jakkula and Cook use statistics and clustering to identify outliers in power datasets collected from smart environments [14], but they have not considered the impact of the exogenous variables, e.g., weather temperature, on the electricity consumption. Linear regression can extract time series features when the dependent variables are well-defined [31]. The early experience of identifying outliers in linear regression is through setting a threshold limit, but this yields many false positives for large data sets [18].…”
With the widely used smart meters in the energy sector, anomaly detection becomes a crucial mean to study the unusual consumption behaviors of customers, and to discover unexpected events of using energy promptly. Detecting consumption anomalies is, essentially, a real-time big data analytics problem, which does data mining on a large amount of parallel data streams from smart meters. In this paper, we propose a supervised learning and statistical-based anomaly detection method, and implement a Lambda system using the in-memory distributed computing framework, Spark and its extension Spark Streaming. The system supports not only iterative detection model refreshment from scalable data sets, but also real-time detection on scalable live data streams. This paper empirically evaluates the system and the detection algorithm, and the results show the effectiveness and the scalability of the proposed lambda detection system.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.