_ This paper presents a distribution free multivariate Kolmogorov-Smirnov goodness of fit test. The test uses an statistic which is built using Rosenblatt's transformation and an algorithm is developed to compute it in the bivariate case. An approximate test, that can be easily computed in any dimension, is also presented. The power of these multivariate tests is studied in a simulation study.
In this paper we consider the problem of building a linear prediction model when the number of candidate predictors is large and the data possibly contains anomalies that are difficult to visualize and clean. We aim at predicting the nonoutlying cases. Therefore, we need a method that is robust and scalable at the same time. We consider the stepwise algorithm LARS which is computationally very efficient but sensitive to outliers. We introduce two different approaches to robustify LARS. The plug-in approach replaces the classical correlations in LARS by robust correlation estimates. The cleaning approach first transforms the dataset by shrinking the outliers toward the bulk of the data (which we call multivariate Winsorization) and then applies LARS to the transformed data. We show that the plug-in approach is time-efficient and scalable and
We investigate the performance of robust estimates of multivariate location
under nonstandard data contamination models such as componentwise outliers
(i.e., contamination in each variable is independent from the other variables).
This model brings up a possible new source of statistical error that we call
"propagation of outliers." This source of error is unusual in the sense that it
is generated by the data processing itself and takes place after the data has
been collected. We define and derive the influence function of robust
multivariate location estimates under flexible contamination models and use it
to investigate the effect of propagation of outliers. Furthermore, we show that
standard high-breakdown affine equivariant estimators propagate outliers and
therefore show poor breakdown behavior under componentwise contamination when
the dimension $d$ is high.Comment: Published in at http://dx.doi.org/10.1214/07-AOS588 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.