Cluster Summarization with Dense Region Detection

Bigdeli, Elnaz; Mohammadi, Mahdi; Raahemi, Bijan; Matwin, Stan

doi:10.1007/978-3-319-25840-9_5

Cited by 3 publications

(10 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Labelling [E.Bigdeli et al, 2015b] • Summarizing Arbitrary shape clustering using Guassian mixture model [E.Bigdeli et al, 2014b] • Cluster Summarization with Dense Region Detection [E.Bigdeli et al, 2014a] Chapter 2…”

Section: Published Papersmentioning

confidence: 99%

Incremental anomaly detection using two-layer cluster-based structure

Bigdeli

Mohammadi

Raahemi

et al. 2018

Information Sciences

Self Cite

View full text Add to dashboard Cite

Anomaly detection algorithms face several challenges, including processing speed and dealing with noise in data. In this thesis, a two-layer clusterbased anomaly detection structure is presented which is fast, noise-resilient and incremental. In this structure, each normal pattern is considered as a cluster, and each cluster is represented using a Gaussian Mixture Model (GMM). Then, new instances are presented to the GMM to be labeled as normal or abnormal.The proposed structure comprises three main steps. In the first step, the data are clustered. The second step is to represent each cluster in a way that enables the model to classify new instances. The Summarization based on Gaussian Mixture Model (SGMM) proposed in this thesis represents each cluster as a GMM.In the third step, a two-layer structure efficiently updates clusters using In most real-time anomaly detection applications, incoming instances are often similar to previous ones. In these cases, there is no need to update clusters based on duplicates, since they have already been modeled in the cluster distribution. The two-layer structure is responsible for identifying redundant instances. In this structure, redundant instance are ignored, and the remaining new instances are used to update clusters. Ignoring redundant instances, which are typically in the majority, makes the detection phase fast.Each part of the general structure is validated in this thesis. The experiments include, detection rates, clustering goodness, time, memory usage and the complexity of the algorithms. The accuracy of the clustering and summarization of clusters using GMMs is evaluated, and compared to that of other methods. Using Davies-Bouldin (DB) and Dunn indexes, the distances for original and regenerated clusters using GMMs is almost zero with SGMM method while this value for ABACUS is around 0.01. Moreover, the results show that the SGMM algorithm is 3 times faster than ABACUS in running time, using one-third of the memory used by ABACUS.The CPL method, used to label new instances, is found to collectively remove the effect of noise, while increasing the accuracy of labeling new instances. In a noisy environment, the detection rate of the CPL method is 5% higher than other algorithms such as one-class SVM. The false alarm iii rate is decreased by 10% on average. Memory use is 20 times lesser that that of the one-class SVM.The proposed method is found to lower the false alarm rate, which is one of the basic problems for the one-class SVM. Experiments show the false alarm rate is decreased from 5% to 15% among different datasets, while the detection rate is increased from 5% to 10% in different datasets with twolayer structure. The memory usage for the two-layer structure is 20 to 50 times less than that of one-class SVM. One-class SVM uses support vectors in labeling new instances, while the labeling of the two-layer structure depends on the number of GMMs. The experiments show that the two-layer structure is 20 to 50 times faster than the one-class SVM in labelin...

show abstract

Section: Published Papersmentioning

confidence: 99%

Incremental anomaly detection using two-layer cluster-based structure

Bigdeli

Mohammadi

Raahemi

et al. 2018

Information Sciences

Self Cite

View full text Add to dashboard Cite

show abstract

“…Model (SGMM) [6]. Both of these techniques are advantageous in discovering and summarizing arbitrary-shape clusters [6,7].…”

Section: Contributionsmentioning

confidence: 99%

“…Both of these techniques are advantageous in discovering and summarizing arbitrary-shape clusters [6,7]. DBSCAN is a density-based clustering algorithm, which finds clusters based on the concept of connecting dense regions, and discovers arbitrary-shape clusters [7].…”

Section: Contributionsmentioning

confidence: 99%

“…Summarization also can help to reduce the complexity of arbitrary-shape clustering methods. However, the summarization of arbitrary-shape clusters can be a challenge [6].…”

Section: Cluster Summarizationmentioning

confidence: 99%

“…After clustering, data in the produced clusters can be summarized to reduce the required storage and processing time, as preserving all cluster members is typically not feasible for large volumes of data. Several cluster summarization techniques have been developed to meet requirements for different applications and types of data [4,5,6]. …”

mentioning

confidence: 99%

See 2 more Smart Citations

Distributed Gaussian Mixture Model Summarization Using the MapReduce Framework

Esmaeilpour

Bigdeli

Cheraghchi

et al. 2016

Advances in Artificial Intelligence

Self Cite

View full text Add to dashboard Cite

With an accelerating rate of data generation, sophisticated techniques are essential to meet scalability requirements. One of the promising avenues for handling large datasets is distributed storage and processing. Hadoop is a well-known framework for distributed storage and processing. Further, data summarization is a useful concept for managing large datasets. Data summarization techniques are intended to produce compact yet representative summaries for the entire dataset.Consolidation of these tools can allow a distributed implementation of data summarization. In this thesis, this goal is achieved by proposing and implementing a distributed Gaussian Mixture Model Summarization using the MapReduce framework (MR-SGMM). The main purpose of the proposed method is to summarize a dataset with a density-based clustering algorithm called DBSCAN algorithm, and then summarize each discovered cluster using the SGMM approach in a distributed manner. Testing the implementation with synthetic and real datasets is used to demonstrate its validity and efficiency.ii

show abstract

Stream-data-clustering based adaptive alarm threshold setting approaches for industrial processes with multiple operating conditions

Yuehan

Yang

et al. 2022

ISA Transactions

View full text Add to dashboard Cite

Cluster Summarization with Dense Region Detection

Cited by 3 publications

References 17 publications

Incremental anomaly detection using two-layer cluster-based structure

Incremental anomaly detection using two-layer cluster-based structure

Distributed Gaussian Mixture Model Summarization Using the MapReduce Framework

Stream-data-clustering based adaptive alarm threshold setting approaches for industrial processes with multiple operating conditions

Contact Info

Product

Resources

About