Abstract. This paper presents a real-valued negative selection algorithm with good mathematical foundation that solves some of the drawbacks of our previous approach [11]. Specifically, it can produce a good estimate of the optimal number of detectors needed to cover the non-self space, and the maximization of the non-self coverage is done through an optimization algorithm with proven convergence properties. The proposed method is a randomized algorithm based on Monte Carlo methods. Experiments are performed to validate the assumptions made while designing the algorithm and to evaluate its performance. 3
The mycobacterial cell envelope has been implicated in the pathogenicity of tuberculosis and therefore has been a prime target for the identification and characterization of surface proteins with potential application in drug and vaccine development. In this study, the genome of Mycobacterium tuberculosis H37Rv was screened using Machine Learning tools that included feature-based predictors, general localizers and transmembrane topology predictors to identify proteins that are potentially secreted to the surface of M. tuberculosis, or to the extracellular milieu through different secretory pathways. The subcellular localization of a set of 8 hypothetically secreted/surface candidate proteins was experimentally assessed by cellular fractionation and immunoelectron microscopy (IEM) to determine the reliability of the computational methodology proposed here, using 4 secreted/surface proteins with experimental confirmation as positive controls and 2 cytoplasmic proteins as negative controls. Subcellular fractionation and IEM studies provided evidence that the candidate proteins Rv0403c, Rv3630, Rv1022, Rv0835, Rv0361 and Rv0178 are secreted either to the mycobacterial surface or to the extracellular milieu. Surface localization was also confirmed for the positive controls, whereas negative controls were located on the cytoplasm. Based on statistical learning methods, we obtained computational subcellular localization predictions that were experimentally assessed and allowed us to construct a computational protocol with experimental support that allowed us to identify a new set of secreted/surface proteins as potential vaccine candidates.
BackgroundMost predictive methods currently available for the identification of protein secretion mechanisms have focused on classically secreted proteins. In fact, only two methods have been reported for predicting non-classically secreted proteins of Gram-positive bacteria. This study describes the implementation of a sequence-based classifier, denoted as NClassG+, for identifying non-classically secreted Gram-positive bacterial proteins.ResultsSeveral feature-based classifiers were trained using different sequence transformation vectors (frequencies, dipeptides, physicochemical factors and PSSM) and Support Vector Machines (SVMs) with Linear, Polynomial and Gaussian kernel functions. Nested k-fold cross-validation (CV) was applied to select the best models, using the inner CV loop to tune the model parameters and the outer CV group to compute the error. The parameters and Kernel functions and the combinations between all possible feature vectors were optimized using grid search.ConclusionsThe final model was tested against an independent set not previously seen by the model, obtaining better predictive performance compared to SecretomeP V2.0 and SecretPV2.0 for the identification of non-classically secreted proteins. NClassG+ is freely available on the web at http://www.biolisi.unal.edu.co/web-servers/nclassgpositive/
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.