The increasing interest in bioactive peptides with therapeutic potentials has been reflected in a large variety of biological databases published over the last years. However, the knowledge discovery process from these heterogeneous data sources is a nontrivial task, becoming the essence of our research endeavor. Therefore, we devise a unified data model based on molecular similarity networks for representing a chemical reference space of bioactive peptides, having an implicit knowledge that is currently not explicitly accessed in existing biological databases. Indeed, our main contribution is a novel workflow for the automatic construction of such similarity networks, enabling visual graph mining techniques to uncover new insights from the “ocean” of known bioactive peptides. The workflow presented here relies on the following sequential steps: (i) calculation of molecular descriptors by applying statistical and aggregation operators on amino acid property vectors; (ii) a two-stage unsupervised feature selection method to identify an optimized subset of descriptors using the concepts of entropy and mutual information; (iii) generation of sparse networks where nodes represent bioactive peptides, and edges between two nodes denote their pairwise similarity/distance relationships in the defined descriptor space; and (iv) exploratory analysis using visual inspection in combination with clustering and network science techniques. For practical purposes, the proposed workflow has been implemented in our visual analytics software tool (http://mobiosd-hub.com/starpep/), to assist researchers in extracting useful information from an integrated collection of 45120 bioactive peptides, which is one of the largest and most diverse data in its field. Finally, we illustrate the applicability of the proposed workflow for discovering central nodes in molecular similarity networks that may represent a biologically relevant chemical space known to date.
Motivation Bioactive peptides have gained great attention in the academy and pharmaceutical industry since they play an important role in human health. However, the increasing number of bioactive peptide databases is causing the problem of data redundancy and duplicated efforts. Even worse is the fact that the available data is non-standardized and often dirty with data entry errors. Therefore, there is a need for a unified view that enables a more comprehensive analysis of the information on this topic residing at different sites. Results After collecting web pages from a large variety of bioactive peptide databases, we organized the web content into an integrated graph database (starPepDB) that holds a total of 71 310 nodes and 348 505 relationships. In this graph structure, there are 45 120 nodes representing peptides, and the rest of the nodes are connected to peptides for describing metadata. Additionally, to facilitate a better understanding of the integrated data, a software tool (starPep toolbox) has been developed for supporting visual network analysis in a user-friendly way; providing several functionalities such as peptide retrieval and filtering, network construction and visualization, interactive exploration and exporting data options. Availability and implementation Both starPepDB and starPep toolbox are freely available at http://mobiosd-hub.com/starpep/. Supplementary information Supplementary data are available at Bioinformatics online.
BackgroundAntimicrobial peptides are a promising alternative for combating pathogens resistant to conventional antibiotics. Computer-assisted peptide discovery strategies are necessary to automatically assess a significant amount of data by generating models that efficiently classify what an antimicrobial peptide is, before its evaluation in the wet lab. Model’s performance depends on the selection of molecular descriptors for which an efficient and effective approach has recently been proposed. Unfortunately, how to adapt this method to the selection of molecular descriptors for the classification of antimicrobial peptides and the performance it can achieve, have only preliminary been explored.ResultsWe propose an adaptation of this successful feature selection approach for the weighting of molecular descriptors and assess its performance. The evaluation is conducted on six high-quality benchmark datasets that have previously been used for the empirical evaluation of state-of-art antimicrobial prediction tools in an unbiased manner. The results indicate that our approach substantially reduces the number of required molecular descriptors, improving, at the same time, the performance of classification with respect to using all molecular descriptors. Our models also outperform state-of-art prediction tools for the classification of antimicrobial and antibacterial peptides.ConclusionsThe proposed methodology is an efficient approach for the development of models to classify antimicrobial peptides. Particularly in the generation of models for discrimination against a specific antimicrobial activity, such as antibacterial. One of our future directions is aimed at using the obtained classifier to search for antimicrobial peptides in various transcriptomes.Electronic supplementary materialThe online version of this article (10.1186/s12864-018-5030-1) contains supplementary material, which is available to authorized users.
Californiconus californicus, previously named Conus californicus, has always been considered a unique species within cone snails, because of its molecular, toxicological and morphological singularities; including the wide range of its diet, since it is capable of preying indifferently on fish, snails, octopus, shrimps, and worms. We report here a new cysteine pattern conotoxin assigned to the O1-superfamily capable of inhibiting the growth of Mycobacterium tuberculosis (Mtb). The conotoxin was tested on a pathogen reference strain (H37Rv) and multidrug-resistant strains, having an inhibition effect on growth with a minimal inhibitory concentration (MIC) range of 3.52–0.22 μM, similar concentrations to drugs used in clinics. The peptide was purified from the venom using reverse phase high-performance liquid chromatography (RP-HPLC), a partial sequence was constructed by Edman degradation, completed by RACE and confirmed with venom gland transcriptome. The 32-mer peptide containing eight cysteine residues was named O1_cal29b, according to the current nomenclature for this type of molecule. Moreover, transcriptomic analysis of O-superfamily toxins present in the venom gland of the snail allowed us to assign several signal peptides to O2 and O3 superfamilies not described before in C. californicus, with new conotoxins frameworks.
Protein structure and protein function should be related, yet the nature of this relationship remains unsolved. Mapping the critical residues for protein function with protein structure features represents an opportunity to explore this relationship, yet two important limitations have precluded a proper analysis of the structure-function relationship of proteins: (i) the lack of a formal definition of what critical residues are and (ii) the lack of a systematic evaluation of methods and protein structure features. To address this problem, here we introduce an index to quantify the protein-function criticality of a residue based on experimental data and a strategy aimed to optimize both, descriptors of protein structure (physicochemical and centrality descriptors) and machine learning algorithms, to minimize the error in the classification of critical residues. We observed that both physicochemical and centrality descriptors of residues effectively relate protein structure and protein function, and that physicochemical descriptors better describe critical residues. We also show that critical residues are better classified when residue criticality is considered as a binary attribute (i.e., residues are considered critical or not critical). Using this binary annotation for critical residues 8 models rendered accurate and non-overlapping classification of critical residues, confirming the multi-factorial character of the structure-function relationship of proteins.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.