Abstract. Classification is one of the most frequent tasks in machine learning. However, the variety of classification tasks as well as classifier methods is huge. Thus the question is coming up: which classifier is suitable for a given problem or how can we utilize a certain classifier model for different tasks in classification learning. This paper focuses on learning vector quantization classifiers as one of the most intuitive prototype based classification models. Recent extensions and modifications of the basic learning vector quantization algorithm, which are proposed in the last years, are highlighted and also discussed in relation to particular classification task scenarios like imbalanced and/or incomplete data, prior data knowledge, classification guarantees or adaptive data metrics for optimal classification.
An increasing number of known RNA 3D structures contributes to the recognition of various RNA families and identification of their features. These tasks are based on an analysis of RNA conformations conducted at different levels of detail. On the other hand, the knowledge of native nucleotide conformations is crucial for structure prediction and understanding of RNA folding. However, this knowledge is stored in structural databases in a rather distributed form. Therefore, only automated methods for sampling the space of RNA structures can reveal plausible conformational representatives useful for further analysis. Here, we present a machine learning-based approach to inspect the dataset of RNA three-dimensional structures and to create a library of nucleotide conformers. A median neural gas algorithm is applied to cluster nucleotide structures upon their trigonometric description. The clustering procedure is two-stage: (i) backbone-and (ii) ribose-driven. We show the resulting library that contains RNA nucleotide representatives over the entire data, and we evaluate its quality by computing normal distribution measures and average RMSD between data points as well as the prototype within each cluster.
Abstract. We present prototype-based classification schemes, e. g. learning vector quantization, with cost-function-based and geometrically motivated reject options. We evaluate the reject schemes in experiments on artificial and benchmark data sets. We demonstrate that reject options improve the accuracy of the models in most cases, and that the performance of the proposed schemes is comparable to the optimal reject option of the Bayes classifier in cases where the latter is available. MotivationPowerful machine learning methods such as recent learning vector quantization (LVQ) models based on cost functions or support vector machines and linear time approximations thereof provide state of the art classification algorithms for automated data analysis [1,2,3,4]. Their linear time complexity, high accuracy, and excellent generalization ability make them suitable also for large data sets. However, generalization bounds and training algorithms rely on the assumption of data being i.i.d. This limits the suitability for big data analysis, streaming data which displays a trend, in presence of outliers, or regions of strong overlap in the data. These cases require enhancing the classifier by measures of certainty that a model has taken a classification decision for a certain point or a data region. Such reject options constitute a first step towards incremental adaptation of the model complexity tailored to data regions with a high degree of uncertainty.While there exist popular extensions of SVM to provide a confidence value of the classification [5,6,7] and first models have been proposed for distance-based k-nearest neighbor approaches [8], only few approaches address prototype-based classifiers [9, 10] thereby lacking a comparison to theoretically motivated alternatives such as explicit stochastic models. In this contribution, we are interested in efficient, online-computable reject options for LVQ classifiers and their behavior in comparison to mathematically well founded statistical models. For this purpose, we address the cost-function based models generalized LVQ (GLVQ) [11] and generalized matrix LVQ (GMLVQ) [1] as well as the probabilistic counterpart robust soft LVQ (RSLVQ) [2]. We propose simple geometric reject options for these models which can be computed efficiently in online scenarios, and we compare these reject options to more costly alternatives based on a probabilistic modeling and an optimum reject option for the Bayes classifier [12].
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.