In data analysis processes, the treatment of outliers in quantitative variables is very critical as it affects the quality of the conclusions. However, despite the existence of very good tools for detecting outliers, dealing with them is not always straightforward. Indeed, statisticians recommend modeling the process underlying outliers to identify the best way to deal with them. In the context of Data Science and Machine Learning, the identification of processes that generate outliers remains problematic because this work requires a visual human interpretation of certain statistical tools. The techniques proposed so far, are systematic imputations by a central tendency characteristic, usually the arithmetic mean or median. Although adapted to the framework of Data Science and Machine Learning, these different approaches cause a fundamental problem, that of modifying the distribution of the initial data. The purpose of our paper is to propose an algorithm that allows the automatic processing of outliers by a software while preserving the distributional structure of the treated variable, whatever the law of probability is. The method is based on the moustache box theory developed by John Tukey. The procedure is tested with existing real data. All treatments are performed with the R programming language.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.