Abstract. G protein-coupled receptors (GPCRs) have a key function in regulating the function of cells due to their ability to transmit extracelullar signals. Given that the 3D structure and the functionality of most GPCRs is unknown, there is a need to construct robust classification models based on the analysis of their amino acid sequences for protein homology detection. In this paper, we describe the supervised classification of the different subtypes of class C GPCRs using support vector machines (SVMs). These models are built on different transformations of the amino acid sequences based on their physicochemical properties. Previous research using semi-supervised methods on the same data has shown the usefulness of such transformations. The obtained classification models show a robust performance, as their Matthews correlation coefficient is close to 0.91 and their prediction accuracy is close to 0.93.
G protein-coupled receptors (GPCRs) are integral cell membrane proteins of relevance for pharmacology. The tertiary structure of the transmembrane domain, a gate to the study of protein functionality, is unknown for almost all members of class C GPCRs, which are the target of the current study. As a result, their investigation must often rely on alignments of their amino acid sequences. Sequence alignment entails the risk of missing relevant information. Various approaches have attempted to circumvent this risk through alignment-free transformations of the sequences on the basis of different amino acid physicochemical properties. In this paper, we use several of these alignment-free methods, as well as a basic amino acid composition representation, to transform the available sequences. Novel semi-supervised statistical machine learning methods are then used to discriminate the different class C GPCRs types from the transformed data. This approach is relevant due to the existence of orphan proteins to which type labels should be assigned in a process of deorphanization or reverse pharmacology. The reported experiments show that the proposed techniques provide accurate classification even in settings of extreme class-label scarcity and that fair accuracy can be achieved even with very simple transformation strategies that ignore the sequence ordering.
Medical diagnosis can often be understood as a classification problem. In oncology, this typically involves differentiating between tumour types and grades, or some type of discrete outcome prediction. From the viewpoint of computer-based medical decision support, this classification requires the availability of accurate diagnoses of past cases as training target examples. The availability of such labeled databases is scarce in most areas of oncology, and especially so in neuro-oncology. In such context, semi-supervised learning oriented towards classification can be a sensible data modeling choice. In this study, semi-supervised variants of Generative Topographic Mapping, a model of the manifold learning family, are applied to two neuro-oncology problems: the diagnostic discrimination between different brain tumour pathologies, and the prediction of outcomes for a specific type of aggressive brain tumours. Their performance compared favorably with those of the alternative Laplacian Eigenmaps and Semi-Supervised SVM for Manifold Learning models in most of the experiments.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.