Outlier detection in multivariate data is an important topic across various disciplines, especially when dealing with high amounts of data. This publication focuses on the practical impact of data preparation techniques for outlier detection in driveability data. Driveability (also referred to as drive quality) is a key decisive factor for the marketability of a vehicle, as the final decision to buy a vehicle is mostly made after a test drive. During the vehicle development process, driveability targets are constantly monitored by tracking of objective performance indicators derived from sensor signals and/or simulation models. With the variables of interest for driveability evaluation being of highly different magnitude, data scaling methods, also referred to as data normalization methods, are applied and the impact on the outlier detection is discussed. Specifically, three different data preparation techniques suitable for multivariate data are applied to three selected datasets. After scaling the data, the outlier detection is performed by the well-established DBSCAN algorithm. Parameters of the investigated techniques are varied and the effect on the detected outliers is discussed in detail. The discussion is further aided by statistical exploration of the outliers identified by DBSCAN with different scaling techniques and a comparison with human-detected outliers. The results demonstrate advantages and disadvantages of the three investigated approaches.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.