Understanding protein secretion pathways are of paramount importance in studying diseases caused by bacteria and their respective treatments. Most such paths must signal ways to identify secretion. However, some proteins, known as non-classical secreted proteins, do not have signaling ways. This study aims to classify such proteins from predictive machine-learning techniques. Guided by the literature, we collected a set of physical-chemical characteristics of amino acids from the AA index site bolding know protein motifs, like hydrophobicity. In this work, we developed a six steps method (Alignment, Preliminary classification, mean outliers, two Clustering algorithms, and Random choice) to filter data from raw genomes and compose a negative dataset in contrast to a positive dataset of 141 proteins also gathered from the literature. Using a conventional Random Forest machine-learning algorithm, we obtained an accuracy of ~91% on classifying non-classical secreted proteins in a validation dataset with 14 positive and 92 negatives proteins - sensitivity and specificity of 91 and ~86%, respectively, performance compared to state of the art for non-classical secretion classification, but a less sophisticated algorithm allows us to classify bacterial proteins concerning secretion by non-classical pathways more rapidly. Therefore, this research has shown that selecting an appropriate descriptors' set and an expressive training dataset compensates for not using an advanced machine learning algorithm for the secretion by non-classical pathways purpose. The data and software from this work, available at https://github.com/santosardr/non-CSPs, can be downloaded for standalone use without needing third-party software.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.