Describing the connectivity of chemical and/or biological systems using networks is a straight gate for the introduction of mathematical tools in proteomics. Networks, in some cases even very large ones, are simple objects that are composed at least by nodes and edges. The nodes represent the parts of the system and the edges geometric and/or functional relationships between parts. In proteomics, amino acids, proteins, electrophoresis spots, polypeptidic fragments, or more complex objects can play the role of nodes. All of these networks can be numerically described using the so-called Connectivity Indices (CIs). The transformation of graphs (a picture) into CIs (numbers) facilitates the manipulation of information and the search for structure-function relationships in Proteomics. In this work, we review and comment on the challenges and new trends in the definition and applications of CIs in Proteomics. Emphasis is placed on 1-D-CIs for DNA and protein sequences, 2-D-CIs for RNA secondary structures, 3-D-topographic indices (TPGIs) for protein function annotation without alignment, 2-D-CIs and 3-D-TPGIs for the study of drug-protein or drug-RNA quantitative structure-binding relationships, and pseudo 3-D-CIs for protein surface molecular recognition. We also focus on CIs to describe Protein Interaction Networks or RNA co-expression networks. 2-D-CIs for patient blood proteome 2-DE maps or mass spectra are also covered.
The development of 2D graph-theoretic representations for DNA sequences was very important for qualitative and quantitative comparison of sequences. Calculation of numeric features for these representations is useful for DNA-QSAR studies. Most of all graph-theoretic representations identify each one of the four bases with a unitary walk in one axe direction in the 2D space. In the case of proteins, twenty amino acids instead of four bases have to be considered. This fact has limited the introduction of useful 2D Cartesian representations and the corresponding sequences descriptors to encode protein sequence information. In this study, we overcome this problem grouping amino acids into four groups: acid, basic, polar and non-polar amino acids. The identification of each group with one of the four axis directions determines a novel 2D representation and numeric descriptors for proteins sequences. Afterwards, a Markov model has been used to calculate new numeric descriptors of the protein sequence. These descriptors are called herein the sequence 2D coupling numbers (f k ). In this work, we calculated the f k values for 108 sequences of different polygalacturonases (PGs) and for 100 sequences of other proteins. A Linear Discriminant Analysis model derived here (PG = 5.36 AE f 1 À 3.98 AE f 3 À 42.21) successfully discriminates between PGs and other proteins. The model correctly classified 100% of a subset of 81 PGs and 75 non-PG proteins sequences used to train the model. The model also correctly classified 51 out of 52 (98.07%) of proteins sequences used as external validation series. The uses of different group of amino acids and/or axes orientation give different results, so it is suggested to be explored for other databases. Finally, to illustrates the use of the model we report the isolation and prediction of the PG action for a novel sequence (AY908988) isolated by our group from Psidium guajava L. This prediction coincides very well with sequence alignment results found by the BLAST methodology. These findings illustrate the possibilities of the sequence descriptors derived for this novel 2D sequence representation in proteins sequence QSAR studies.
Three-dimensional (3D) protein structures now frequently lack functional annotations because of the increase in the rate at which chemical structures are solved with respect to experimental knowledge of biological activity. As a result, predicting structure-function relationships for proteins is an active research field in computational chemistry and has implications in medicinal chemistry, biochemistry and proteomics. In previous studies stochastic spectral moments were used to predict protein stability or function (González-Díaz, H. et al. Bioorg Med Chem 2005, 13, 323; Biopolymers 2005, 77, 296). Nevertheless, these moments take into consideration only electrostatic interactions and ignore other important factors such as van der Waals interactions. The present study introduces a new class of 3D structure molecular descriptors for folded proteins named the stochastic van der Waals spectral moments ((o)beta(k)). Among many possible applications, recognition of kinases was selected due to the fact that previous computational chemistry studies in this area have not been reported, despite the widespread distribution of kinases. The best linear model found was Kact = -9.44 degrees beta(0)(c) +10.94 degrees beta(5)(c) -2.40 degrees beta(0)(i) + 2.45 degrees beta(5)(m) + 0.73, where core (c), inner (i) and middle (m) refer to specific spatial protein regions. The model with a high Matthew's regression coefficient (0.79) correctly classified 206 out of 230 proteins (89.6%) including both training and predicting series. An area under the ROC curve of 0.94 differentiates our model from a random classifier. A subsequent principal components analysis of 152 heterogeneous proteins demonstrated that beta(k) codifies information different to other descriptors used in protein computational chemistry studies. Finally, the model recognizes 110 out of 125 kinases (88.0%) in a virtual screening experiment and this can be considered as an additional validation study (these proteins were not used in training or predicting series).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.