Classification by means of machine learning models constitutes one relevant technology in process automation and predictive maintenance. However, common techniques such as deep networks or random forests suffer from their black box characteristics and possible adversarial examples. In this contribution, we give an overview about a popular alternative technology from machine learning, namely modern variants of learning vector quantization, which, due to their combined discriminative and generative nature, incorporate interpretability and the possibility of explicit reject options for irregular samples. We give an explicit bound on minimum changes required for a change of the classification in case of LVQ networks with reject option, and we demonstrate the efficiency of reject options in two examples.
Prototype-based machine learning methods such as learning vector quantization (LVQ) offer flexible classification tools, which represent a classification in terms of typical prototypes. This representation leads to a particularly intuitive classification scheme, since prototypes can be inspected by a human partner in the same way as data points. Yet, it bears the risk of revealing private information included in the training data, since individual information of a single training data point can significantly influence the location of a prototype. In this contribution, we investigate the question how to algorithmically extend LVQ such that it provably obeys privacy constraints as offered by the notion of so-called differential privacy. More precisely, we demonstrate the sensitivity of LVQ to single data points and hence the need of its extension to private variants in case of possibly sensitive training data. We investigate three technologies which have been proposed in the context of differential privacy, and we extend these technologies to LVQ schemes. We investigate the effectiveness and efficiency of these schemes for various data sets, and we evaluate their scalability and robustness as regards the choice of meta-parameters and characteristics of training sets. Interestingly, one algorithm, which has been proposed in the literature due to its beneficial mathematical properties, does not scale well with data dimensionality, while two alternative techniques, which are based on simpler principles, display good results in practical settings. * Funding by the CITEC center of excellence (EXC 277) is gratefully acknowledged.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.