Support vector machines (SVMs) appeared in the early nineties as optimal margin classifiers in the context of Vapnik's statistical learning theory. Since then SVMs have been successfully applied to real-world data analysis problems, often providing improved results compared with other techniques. The SVMs operate within the framework of regularization theory by minimizing an empirical risk in a well-posed and consistent way. A clear advantage of the support vector approach is that sparse solutions to classification and regression problems are usually obtained: only a few samples are involved in the determination of the classification or regression functions. This fact facilitates the application of SVMs to problems that involve a large amount of data, such as text processing and bioinformatics tasks. This paper is intended as an introduction to SVMs and their applications, emphasizing their key features. In addition, some algorithmic extensions and illustrative real-world applications of SVMs are shown.Comment: This paper commented in: [math/0612820], [math/0612821], [math/0612822], [math/0612824]. Rejoinder in [math.ST/0612825]. Published at http://dx.doi.org/10.1214/088342306000000493 in the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org
In this work we focus on the use of SVMs for monitoring techniques applied to nonlinear profiles in the Statistical Process Control (SPC) framework. We develop a new methodology based on Functional Data Analysis for the construction of control limits for nonlinear profiles. In particular, we monitor the fitted curves themselves instead of monitoring the parameters of any model fitting the curves. The simplicity and effectiveness of the data analysis method has been tested against other statistical approaches using a standard data set in the process control literature.
In this paper, we investigate the problem of estimating high-density regions from univariate or multivariate data samples. We estimate minimum volume sets, whose probability is specified in advance, known in the literature as density contour clusters. This problem is strongly related to One-Class Support Vector Machines (OCSVM). We propose a new method to solve this problem, the One-Class Neighbor Machine (OCNM) and we show its properties. In particular, the OCNM solution asymptotically converges to the exact minimum volume set prespecified. Finally, numerical results illustrating the advantage of the new method are shown.
In this paper we address the problem of multivariate outlier detection using the (unsupervised) self-organizing map (SOM) algorithm introduced by Kohonen. We examine a number of techniques, based on summary statistics and graphics derived from the trained SOM, and conclude that they work well in cooperation with each other. Useful tools include the median interneuron distance matrix and the projection ofthe trained map (via Sammon's projection). SOM quantization errors provide an important complementary source of information for certain type of outlying behavior. Empirical results are reported on both artificial and real data.Key Words Self-organization; Atypical Data; Robustness; Dimensionality Reduction; Nonlinear Projections .• Muñoz, Departamento de Estadística y Econometría, Universidad Carlos III de Madrid; Muruzábal, Departamento de Estadística y Econometría, Universidad Carlos III de Madrid. We are grateful to M. Botta and A. Atkinson for sharing their data sets. Support from CICYT and DGICYT (Spain) research grants is appreciated. Self-Organizing Maps for Outlier DetectionAlberto Muñoz and Jorge Muruzábal Department of Statistics and Econometrics U niversity Carlos III, 28903 Getafe, Spain e-mail: albmun@est-econ.uc3m.es.jorge@eco.uc3m.es AbstractIn this paper we address the problem of multivariate outlier detection using the (unsupervised) self-organizing map (SOM) algorithm introduced by Kohonen. We examine a number of techniques, based on summary statistics and graphics derived from the trained SOM, and conclude that they work well in cooperation with each other. Useful tools include the median interneuron distance matrix and the projection of the trained map (via Sammon's projection). SOM quantization errors provide an important complementary source of information for certain type of outlying behavior. Empirical results are reported on both artificial and real data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.