Identification of secretory proteins in body fluids is one of the key challenges in the development of non-invasive diagnostics. It has been shown in the part that a significant number of proteins are secreted by cells via exosomes called exosomal proteins. In this study, an attempt has been made to build a model that can predict exosomal proteins with high precision. All models are trained, tested, and evaluated on a non-redundant dataset comprising 2831 exosomal and 2831 non-exosomal proteins, where no two proteins have more than 40% similarity. Initially, the standard similarity-based method BLAST was used to predict exosomal proteins, which failed due to low-level similarity in the dataset. To overcome this challenge, machine learning based models have been developed using compositional features of proteins and achieved highest AUROC of 0.70. The performance of the ML-based models improved significantly to AUROC of 0.73 when evolutionary information in the form of PSSM profiles was used for building models. Our analysis indicates that exosomal proteins have wide range of motifs. In addition, it was observed that exosomal proteins contain different types of sequence-based motifs, which can be used for predicting exosomal proteins. Finally, a hybrid method has been developed that combines a motif-based approach and an ML-based model for predicting exosomal proteins, achieving a maximum AUROC 0.85 and MCC of 0.56 on an independent dataset. The hybrid model in this study performs better than the presently available methods when assessed on an independent dataset. A web server and a standalone software ExoProPred has been created for the scientific community to provide service, code, and data. (https://webs.iiitd.edu.in/raghava/exopropred/).
Peptide hormones are genome-encoded signal transduction molecules released in multicellular organisms. The dysregulation of hormone release can cause multiple health problems and it is crucial to study these hormones for therapeutic purposes. To help the research community working in this field, we developed a prediction server that classifies hormonal peptides and non-hormonal peptides. The dataset used in this study was collected for both plants and animals from Hmrbase2 and PeptideAtlas databases. It comprises non-redundant 1174 hormonal and 1174 non-hormonal peptide sequences which were combined and divided into 80% training and 20% validation sets. We extracted a wide variety of compositional features from these sequences to develop various Machine Learning (ML) and Deep Learning (DL) models. The best performing model was logistic regression model trained on top 50 features which achieved an AUROC of 0.93. To enhance the performance of ML model, we applied Basic Local Alignment Search Tool (BLAST) to identify hormonal sequences using similarity among them, and motif search using Motif-Emerging and Classes-Identification (MERCI) to detect motifs present in hormonal and non-hormonal sequences. We combined our best performing classification model, i.e., logistic regression model with BLAST and MERCI to form a hybrid model that can predict hormonal peptide sequences accurately. The hybrid model is able to achieve an AUROC of 0.96, an accuracy of 89.79%, and an MCC of 0.8 on the validation set. This hybrid model has been incorporated on the publicly available website of HOPPred at https://webs.iiitd.edu.in/raghava/hoppred/.
Background and objective: Hormones are essential for cell communication and hence regulate various physiological processes. The discrepancies in the hormones or their receptors can break this communication and cause major endocrinological disorders. It is, therefore, indispensable for the therapeutics and diagnostics of hormonal diseases. Methods: We collected widespread information on peptide and non-peptide hormones and hormone receptors. The information was collected from HMDB, UniProt, HORDB, ENDONET, PubChem and literature. Results: Hmrbase2 is an updated version of Hmrbase. The current version contains a total of 12056 entries which is more than twice the entries in the previous version. These include 7406, 753, and 3897 entries for peptide hormones, non-peptide hormones and hormone receptors, respectively, from 803 organisms compared to the 562 organisms in the previous version. The database also hosts 5662 hormone receptor pairs. The source organism, function, and subcellular location are provided for peptide hormones and receptors and properties like melting point; water solubility is provided for non-peptide hormones. Besides browsing and keyword search, an advanced search option has also been provided. Additionally, a similarity search module has been incorporated, enabling users to run similarity searches against peptide hormone sequences using BLAST and Smith-Waterman. Conclusions: To make the database accessible to various users, we designed a user-friendly, responsive website that can be easily used on smartphones, tablets, and desktop computers. The updated database version, Hmrbase2, offers improved data content compared to the previous version. Homebase 2.0 is freely available at https://webs.iiitd.edu.in/raghava/hmrbase2.
Saliva as a non-invasive diagnostic fluid has immense potential as a tool for early diagnosis and prognosis of patients. The information about salivary biomarkers is broadly scattered across various resources and research papers. It is important to bring together all the information on salivary biomarkers to a single platform. This will accelerate research and development in non-invasive diagnosis and prognosis of complex diseases. We collected widespread information on five types of salivary biomarkers—proteins, metabolites, microbes, micro-ribonucleic acid (miRNA) and genes found in humans. This information was collected from different resources that include PubMed, the Human Metabolome Database and SalivaTecDB. Our database SalivaDB contains a total of 15 821 entries for 201 different diseases and 48 disease categories. These entries can be classified into five categories based on the type of biomolecules; 6067, 3987, 2909, 2272 and 586 entries belong to proteins, metabolites, microbes, miRNAs and genes, respectively. The information maintained in this database includes analysis methods, associated diseases, biomarker type, regulation status, exosomal origin, fold change and sequence. The entries are linked to relevant biological databases to provide users with comprehensive information. We developed a web-based interface that provides a wide range of options like browse, keyword search and advanced search. In addition, a similarity search module has been integrated which allows users to perform a similarity search using Basic Local Alignment Search Tool and Smith–Waterman algorithm against biomarker sequences in SalivaDB. We created a web-based database—SalivaDB, which provides information about salivary biomarkers found in humans. A wide range of web-based facilities have been integrated to provide services to the scientific community. https://webs.iiitd.edu.in/raghava/salivadb/
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.