In robust regression we often have to decide how many are the unusual observations, which should be removed from the sample in order to obtain better fitting for the rest of the observations. Generally, we use the basic principle of LTS, which is to fit the majority of the data, identifying as outliers those points that cause the biggest damage to the robust fit. However, in the LTS regression method the choice of default values for high break down-point affects seriously the efficiency of the estimator. In the proposed approach we introduce penalty cost for discarding an outlier, consequently, the best fit for the majority of the data is obtained by discarding only catastrophic observations. This penalty cost is based on robust design weights and high break down-point residual scale taken from the LTS estimator. The robust estimation is obtained by solving a convex quadratic mixed integer programming problem, where in the objective function the sum of the squared residuals and penalties for discarding observations is minimized. The proposed mathematical programming formula is suitable for small-sample data. Moreover, we conduct a simulation study to compare other robust estimators with our approach in terms of their efficiency and robustness.
We consider the problem of deleting bad influential observations (outliers) in linear regression models. The problem is formulated as a Quadratic Mixed Integer Programming (QMIP) problem, where penalty costs for discarding outliers are used into the objective function. The optimum solution defines a robust regression estimator called penalized trimmed squares (PTS). Due to the high computational complexity of the resulting QMIP problem, the proposed robust procedure is computationally suitable for small sample data. The computational performance and the effectiveness of the new procedure are improved significantly by using the idea of -Insensitive loss function from support vectors machine regression. Small errors are ignored, and the mathematical formula gains the sparseness property. The good performance of the -Insensitive PTS (IPTS) estimator allows identification of multiple outliers avoiding masking or swamping effects. The computational effectiveness and successful outlier detection of the proposed method is demonstrated via simulated experiments.
Computation of typical statistical sample estimates such as the median or least squares fit usually require the solution of an unconstrained optimization problem with a convex objective function, that can be solved efficiently by various methods. The presence of outliers in the data dictates the computation of a robust estimate, which can be defined as the optimum statistical estimate for a subset that contains at least half of the observations. The resulting problem is now a combinatorial optimization problem which is often computationally intractable. Classical statistical methods for multivariate location 渭 and scatter matrix 危 estimation are based on the sample mean vector and covariance matrix, which are very sensitive in the presence of outlier observations. We propose a new method for robust location and scatter estimation which is composed of two stages. In the first stage an unbiased multivariate L 1 -median center for all the observations is attained by a novel procedure called the least trimmed Euclidean deviations estimator. This robust median defines a coverage set of observations which is used in the second stage to iteratively compute the set of outliers which violate the correlational structure of the data set. Extensive computational experiments indicate that the proposed method outperforms existing methods in accuracy, robustness and computational time.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations鈥揷itations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.