We present SpotOn, a web server to identify and classify interfacial residues as Hot-Spots (HS) and Null-Spots (NS). SpotON implements a robust algorithm with a demonstrated accuracy of 0.95 and sensitivity of 0.98 on an independent test set. The predictor was developed using an ensemble machine learning approach with up-sampling of the minor class. It was trained on 53 complexes using various features, based on both protein 3D structure and sequence. The SpotOn web interface is freely available at: http://milou.science.uu.nl/services/SPOTON/.
Many real life problems require the classification of items into naturally ordered classes. These problems are traditionally handled by conventional methods intended for the classification of nominal classes where the order relation is ignored. This paper introduces a new machine learning paradigm intended for multi-class classification problems where the classes are ordered. The theoretical development of this paradigm is carried out under the key idea that the random variable class associated with a given query should follow a unimodal distribution. In this context, two approaches are considered: a parametric, where the random variable class is assumed to follow a specific discrete distribution; a nonparametric, where the random variable class is assumed to be distribution-free. In either case, the unimodal model can be implemented in practice by means of feedforward neural networks and support vector machines, for instance. Nevertheless, our main focus is on feedforward neural networks. We also introduce a new coefficient, r(int), to measure the performance of ordinal data classifiers. An experimental study with artificial and real datasets is presented in order to illustrate the performances of both parametric and nonparametric approaches and compare them with the performances of other methods. The superiority of the parametric approach is suggested, namely when flexible discrete distributions, a new concept introduced here, are considered.
The cosmetic result is an important endpoint for breast cancer conservative treatment (BCCT), but the verification of this outcome remains without a standard. Objective assessment methods are preferred to overcome the drawbacks of subjective evaluation. In this paper a novel algorithm is proposed, based on support vector machines, for the classification of ordinal categorical data. This classifier is then applied as a new methodology for the objective assessment of the aesthetic result of BCCT. Based on the new classifier, a semi-objective score for quantification of the aesthetic results of BCCT was developed, allowing the discrimination of patients into four classes.
The revision of the 1995 land cover dataset for the Vale do Sousa region, in the northwest of Portugal, was carried out by supervised classification of a multispectral image from the Advanced Spaceborne Thermal Emission and Reflectance Radiometer (ASTER) sensor. The nine reflective bands of ASTER were used, covering the spectral range from 0.52-2.43 mm. The image was initially ortho-rectified and segmented into 51 186 objects, with an average object size of 135 pixels (about 3 ha). A total of 582 of these objects were identified for training nine land cover classes. The image was classified using an algorithm based on a fuzzy classifier, Support Vector Machines (SVM), K Nearest Neighbours (K-NN) and a Logistic Discrimination (LD) classifier. The results from the classification were evaluated using a set of 277 validation sites, independently gathered. The overall accuracy was 44.6% for the fuzzy classifier, 70.5% for the SVM, 60.9% for the K-NN and 72.2% for the LD classifier. The difficulty in discriminating between some of the forest land cover classes was examined by separability analysis and unsupervised classification with hierarchical clustering. The forest classes were found to overlap in the multi-spectral space defined by the nine ASTER bands used.
Understanding protein-protein interactions is a key challenge in biochemistry. In this work, we describe a more accurate methodology to predict Hot-Spots (HS) in protein-protein interfaces from their native complex structure compared to previous published Machine Learning (ML) techniques. Our model is trained on a large number of complexes and on a significantly larger number of different structural- and evolutionary sequence-based features. In particular, we added interface size, type of interaction between residues at the interface of the complex, number of different types of residues at the interface and the Position-Specific Scoring Matrix (PSSM), for a total of 79 features. We used twenty-seven algorithms from a simple linear-based function to support-vector machine models with different cost functions. The best model was achieved by the use of the conditional inference random forest (c-forest) algorithm with a dataset pre-processed by the normalization of features and with up-sampling of the minor class. The method has an overall accuracy of 0.80, an F1-score of 0.73, a sensitivity of 0.76 and a specificity of 0.82 for the independent test set.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.