A New Procedure of Clustering Based on Multivariate Outlier Detection

Jayakumar, G. S. David Sam; Thomas, Bejoy John

doi:10.6339/jds.201301_11(1).0005

Cited by 6 publications

(5 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…On the other hand, as mentioned in many studies, K-means is sensitive to noise and outliers, and may not give accurate results ( Hodge & Austin, 2004 ; Gan & Ng, 2017 ). Alternatively, K-medoids or partition around medoids (PAM) is less sensitive to local minima problem and, therefore, some studies targeted to use these hard clustering algorithms in outlier detection ( Jayakumar & Thomas, 2013 ; Kumar, Kumar & Singh, 2013 ). However, the hard clustering algorithms such as K-means and PAM force each data point to belong to the nearest cluster.…”

Section: Related Workmentioning

confidence: 99%

Two novel outlier detection approaches based on unsupervised possibilistic and fuzzy clustering

Cebecí

Cebeci

Tahtali

et al. 2022

PeerJ Computer Science

View full text Add to dashboard Cite

Outliers are data points that significantly deviate from other data points in a data set because of different mechanisms or unusual processes. Outlier detection is one of the intensively studied research topics for identification of novelties, frauds, anomalies, deviations or exceptions in addition to its use for data cleansing in data science. In this study, we propose two novel outlier detection approaches using the typicality degrees which are the partitioning result of unsupervised possibilistic clustering algorithms. The proposed approaches are based on finding the atypical data points below a predefined threshold value, a possibilistic level for evaluating a point as an outlier. The experiments on the synthetic and real data sets showed that the proposed approaches can be successfully used to detect outliers without considering the structure and distribution of the features in multidimensional data sets.

show abstract

Section: Related Workmentioning

confidence: 99%

Two novel outlier detection approaches based on unsupervised possibilistic and fuzzy clustering

Cebecí

Cebeci

Tahtali

et al. 2022

PeerJ Computer Science

View full text Add to dashboard Cite

show abstract

“…Clustering is one of the informal ways to identifying outliers (Jayakumar & Thomas, 2013;Johnson & Wichern, 2002). The aim of clustering is to group a set of observations into clusters based on similarities or distances (dissimilarities) (Irani et al, 2016;Johnson & Wichern, 2002).…”

Section: Cluster Analysismentioning

confidence: 99%

“…The most popular and easiest way to compute similarity is Euclidean distance but it does not take into account the covariance structure and is not appropriate for multivariate data (Almeida et al, 2007). Studies such as Hardin and Rocke (2004) and Jayakumar and Thomas (2013) used Mahalanobis distance as a similarity measure. In clustering, outliers are defined as observations that is far from any clusters or have large distance from the centre of each cluster (Hardin & Rocke, 2004;Zhang, 2013).…”

Section: Cluster Analysismentioning

confidence: 99%

“…Additionally, the discarded observations (outliers) in the first step may optionally assign to these clusters (Almeida et al, 2007). Jayakumar and Thomas (2013) proposed a new method of outlier based clustering based on Mahalanobis distance and found that their method is easier to implement compared to other clustering algorithms. The Mahalanobis distance computed for each observation and upper control limit (UCL) is used as a cutoff value to determine outliers.…”

Section: Cluster Analysismentioning

confidence: 99%

See 1 more Smart Citation

A Review on Outliers-Detection Methods for Multivariate Data

Mutalib

Satari

Yusoff

2021

JOSMA

View full text Add to dashboard Cite

Data in practice are often of high dimension and multivariate in nature. Detection of outliers has been one of the problems in multivariate analysis. Detecting outliers in multivariate data is difficult and it is not sufficient by using only graphical inspection. In this paper, a nontechnical and brief outlier detection method for multivariate data which are projection pursuit method, methods based on robust distance and cluster analysis are reviewed. The strengths and weaknesses of each method are briefly discussed.

show abstract

“…Anomaly detection is an unsupervised target detection technique where no prior knowledge about the target or the background is available, focusing on distinguishing unusual material from a typical background (Shaw and Manolakis, 2002). Mahalanobis Distance (MD) is based on correlations between variables and can be used to identify and analyze different patterns (Jayakumar and Thomas, 2013). MD has been used for many different purposes, including detection of outliers (De Maesschalck, Jouan-Rimbaud and Massart, 2000).…”

Section: Introductionmentioning

confidence: 99%

Using Mahalanobis distance and decision tree to analyze abnormal patterns of behavior in a maintenance outsourcing process-a case study

Chen

Kuo

Lin

2020

JQME

View full text Add to dashboard Cite

PurposeThe purpose of this paper is to analyze abnormal behavior patterns in a maintenance outsourcing process. Based on the results, the managers can focus on the abnormal behavior and the direction of the investigation can be narrowed. The abnormal behavior can be identified more easily.Design/methodology/approachMaholanobis Distance (MD) and Decision Tree (DT) are integrated to analyze for abnormal behavior patterns. To prevent abnormal behaviors, a maintenance outsourcing case must be passed by several managers in different departments. In this research, some criteria for pairs of managers are calculated first. Based on the criteria, the MDs of these pairs can be calculated. Pairs are categorized by their MDs. Any pair whose MD is higher than a threshold is labeled “abnormal” while the remaining are labeled “normal”. After oversampling the minority class of abnormal, a DT is built by Classification and Regression Trees (CART) based on the labeled dataset. Finally, the combination of criteria for abnormal categories is extracted from the tree.FindingsThrough the results from the DT, the combinations of criteria provide obvious characteristics of cases that are categorized as abnormal, and then provide a direction for investigators. Thus, the range of investigation can be narrowed. The empirical results show that the result of the proposed integrated methodology is helpful for abnormal behavior pattern analysis.Practical implicationsThis research is intended to help an organization to enhance their investigation in a large number of maintenance outsourcing cases. About 8,000 cases are collected for analysis.Originality/valueThe integration of MD and DT for analyzing abnormal behavior patterns in a maintenance outsourcing process is not found in the literature. Moreover, the empirical results show that the proposed integrated methodology is helpful in a real application.

show abstract

A New Procedure of Clustering Based on Multivariate Outlier Detection

Cited by 6 publications

References 18 publications

Two novel outlier detection approaches based on unsupervised possibilistic and fuzzy clustering

Two novel outlier detection approaches based on unsupervised possibilistic and fuzzy clustering

A Review on Outliers-Detection Methods for Multivariate Data

Using Mahalanobis distance and decision tree to analyze abnormal patterns of behavior in a maintenance outsourcing process-a case study

Contact Info

Product

Resources

About