2017
DOI: 10.1080/00401706.2017.1340909
|View full text |Cite
|
Sign up to set email alerts
|

Detecting Deviating Data Cells

Abstract: A multivariate dataset consists of n cases in d dimensions, and is often stored in an n by d data matrix. It is well-known that real data may contain outliers. Depending on the situation, outliers may be (a) undesirable errors which can adversely affect the data analysis, or (b) valuable nuggets of unexpected information. In statistics and data analysis the word outlier usually refers to a row of the data matrix, and the methods to detect such outliers only work when at least half the rows are clean. But often… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
65
0
2

Year Published

2017
2017
2023
2023

Publication Types

Select...
8
2

Relationship

1
9

Authors

Journals

citations
Cited by 91 publications
(68 citation statements)
references
References 25 publications
1
65
0
2
Order By: Relevance
“…We have shown that exact affine equivariance must be lost, but it is a reasonable price to be paid in order to achieve an arbitrarily high breakdown for the resulting trimmed estimators. This conclusion parallels similar findings in other situations where contamination produces only a minority of “good” observations, as in the case of cellwise contamination (see, e.g., Farcomeni, , ; Agostinelli, Leung, Yohai, & Zamar, ; Rousseeuw & Van den Bossche, ). We also support the use of adaptive trimming schemes, in order to explore the effect of different levels of trimming and to find a sensible trade‐off between robustness and efficiency.…”
Section: Discussionsupporting
confidence: 88%
“…We have shown that exact affine equivariance must be lost, but it is a reasonable price to be paid in order to achieve an arbitrarily high breakdown for the resulting trimmed estimators. This conclusion parallels similar findings in other situations where contamination produces only a minority of “good” observations, as in the case of cellwise contamination (see, e.g., Farcomeni, , ; Agostinelli, Leung, Yohai, & Zamar, ; Rousseeuw & Van den Bossche, ). We also support the use of adaptive trimming schemes, in order to explore the effect of different levels of trimming and to find a sensible trade‐off between robustness and efficiency.…”
Section: Discussionsupporting
confidence: 88%
“…Detecting cellwise outliers is a hard problem, since the outlyingness of a cell depends on the relation of its column to the other columns of the data, and on the values of the other cells in its row (some of which may be outlying themselves). The DetectDeviatingCells algorithm addresses these issues, and apart from flagging cells it also provides a graphical output called a cellmap.…”
Section: Detecting Outlying Cellsmentioning
confidence: 99%
“…A not-so-large contaminated cell that passes the univariate filter could be flagged when viewed together with other correlated components, especially for highly correlated data. To overcome this deficiency, we introduce a consistent bivariate filter and use it in combination with UF and a new filter developed by Rousseeuw and Van den Bossche (2016) in the first step of the two-step procedure. Maronna (2015) made a remark that UF-GSE, which uses a fixed loss function ρ in the second step, cannot handle well high-dimensional casewise outliers.…”
Section: Introductionmentioning
confidence: 99%