2015
DOI: 10.1007/s00357-015-9171-5
|View full text |Cite
|
Sign up to set email alerts
|

Outlier Identification in Model-Based Cluster Analysis

Abstract: In model-based clustering based on normal-mixture models, a few outlying observations can influence the cluster structure and number. This paper develops a method to identify these, however it does not attempt to identify clusters amidst a large field of noisy observations. We identify outliers as those observations in a cluster with minimal membership proportion or for which the cluster-specific variance with and without the observation is very different. Results from a simulation study demonstrate the abilit… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
19
0

Year Published

2016
2016
2024
2024

Publication Types

Select...
8

Relationship

0
8

Authors

Journals

citations
Cited by 35 publications
(19 citation statements)
references
References 23 publications
0
19
0
Order By: Relevance
“…Further, mixture-based methods overall have been shown to be especially well suited to clustering known datasets that are composed of both small and large groups (Holden and Kelley, 2010;Holden et al, 2011;Finch et al, 2014). As finite mixture analysis is itself considered a robust outlier detection tool, these data were prescreened only for unambiguous measurement or data entry error prior to clustering (Yamanishi et al, 2004;Tao and Pi, 2009;Evans et al, 2015). Additionally, the sample was screened for systematic error related to observer bias in data collection or reporting.…”
Section: Craniometric Datamentioning
confidence: 99%
See 1 more Smart Citation
“…Further, mixture-based methods overall have been shown to be especially well suited to clustering known datasets that are composed of both small and large groups (Holden and Kelley, 2010;Holden et al, 2011;Finch et al, 2014). As finite mixture analysis is itself considered a robust outlier detection tool, these data were prescreened only for unambiguous measurement or data entry error prior to clustering (Yamanishi et al, 2004;Tao and Pi, 2009;Evans et al, 2015). Additionally, the sample was screened for systematic error related to observer bias in data collection or reporting.…”
Section: Craniometric Datamentioning
confidence: 99%
“…As the multivariate normal finite mixture model is desirable for its ability to both handle data with multiple outliers and serve as a tool for their detection (Aitkin and Wilson, 1980;Scott, 2004;Yamanishi et al, 2004;Tao and Pi, 2009;Evans et al, 2015), the optimal cluster solution is reviewed post hoc to determine the presence of, and assign appropriate significance to, potential outliers within the core FDB dataset. It is expected that outlying observations will coalesce to form small and sparsely, if not singularly, populated clusters (Aitkin and Wilson, 1980;Scott, 2004).…”
Section: Population Inference From Craniometricsmentioning
confidence: 99%
“…We selected the following suite of 12 standard ILDs: maximum cranial length (GOL), cranial base length (BNL), cranial vault height (BBH), maximum cranial breadth (XCB), biauricular breadth (AUB), nasal height (NLH), nasal breadth (NLB), mastoid height (MDH), orbital height (OBH), frontal chord (FRC), parietal chord (PAC), and occipital chord (OCC) (Moore-Jansen et al 1994). We prescreened our final dataset for unrealistic measurements but not for atypical cases because model-based clustering is a robust tool for outlier detection (Evans et al 2015;Tao and Pi 2009;Yamanishi et al 2004). Prior to cluster analysis, we converted the ILDs to Mosimann shape variables by geometric mean transformation to account for the issue of size differences (Darroch and Mosimann 1985).…”
Section: Craniometric Data Selection and Treatmentmentioning
confidence: 99%
“…Rather, if necessary, it is possible to deal with outliers using winsorization, or trimming, or employ clustering techniques, resampling (bootstrapping), or robust statistical analyses, which provide an approximation for a probability distribution, based on the central data. [9][10][11][12][13] In winsorization, the anomalous datum is substituted with a value that is beyond that of the next nearest value, bringing the outlier closer to the remainder of the data. 1 In the case of Table 1, the age of 93 years could be winsorized to 70 years, one unit higher than the next highest age: 69 years (Figure 1).…”
mentioning
confidence: 99%