Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 2002
DOI: 10.1145/775047.775050
|View full text |Cite
|
Sign up to set email alerts
|

Scalable robust covariance and correlation estimates for data mining

Abstract: Covariance and correlation estimates have important applications in data mining. In the presence of outliers, classical estimates of covariance and correlation matrices are not reliable. A small fraction of outliers, in some cases even a single outlier, can distort the classical covariance and correlation estimates making them virtually useless. That is, correlations for the vast majority of the data can be very erroneously reported; principal components transformations can be misleading; and multidimensional … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
23
0
2

Year Published

2006
2006
2018
2018

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 51 publications
(25 citation statements)
references
References 19 publications
0
23
0
2
Order By: Relevance
“…However, robust correlation measures can be used to construct multivariate covariance matrices, based on pairwise covariances (see Kettering 1972, andMaronna andZamar 2002). For instance, Alqallaf et al (2002) use the Quadrant correlation to get a robust scatter matrix in very high dimensions. The resulting multivariate is highly robust and very fast to compute.…”
Section: Resultsmentioning
confidence: 99%
“…However, robust correlation measures can be used to construct multivariate covariance matrices, based on pairwise covariances (see Kettering 1972, andMaronna andZamar 2002). For instance, Alqallaf et al (2002) use the Quadrant correlation to get a robust scatter matrix in very high dimensions. The resulting multivariate is highly robust and very fast to compute.…”
Section: Resultsmentioning
confidence: 99%
“…In recognition of this opportunity, [10] and [2] recently proposed new pairwise methods based on a modification of approaches (iii) and (ii) respectively, that preserve positive definiteness and have computational complexity O(np 2 ). However, these pairwise methods are not affine equivariant and may be upset by the so called two-dimensional structural outliers.…”
Section: Related Workmentioning
confidence: 99%
“…Our choice for c in the code was 0.00001. In actuality, by our 蠄 function, we are using a Huberized estimator, which in the limiting case is Quadrant Correlation [2]. The limiting case here would be to use the sign function in the place of 蠄.…”
Section: Quadrant Correlationmentioning
confidence: 99%
See 1 more Smart Citation
“…However, three-or higher-dimensional outliers may not be detected by univariate and bivariate analyses. Khan, Van Aelst, and Zamar 6 mentioned that the correlation matrix obtained from the pairwise correlation approach may not be positive definite, forcing the use of a correction for positive definiteness in some cases 7 . These problems have motivated us to improve this strategy by using a fast and robust multivariate location and dispersion that is robust to multivariate outliers.…”
Section: Introductionmentioning
confidence: 99%