The machine learning (ML) as well as quantitative structure activity relationship (QSAR) method has been explored for predicting compounds with antibacterial activities at impressive performance. It is desirable to test additional ML methods, select most representative sets of molecular descriptors, and subject the developed prediction models to rigorous evaluations. This work evaluated three ML methods, support vector classification (SVC), k-nearest neighbor (k-NN), and C4.5 decision tree, which were trained and tested by 230 antibacterial and 381 nonantibacterial compounds. A well-established feature selection method was used to select representative molecular descriptors from a larger pool than that used in reported studies. The performance of the developed prediction models was tested by 5-fold cross-validation and independent evaluation set. SVC produced the best prediction accuracies of 96.66 and 98.15% for antibacterial compounds, and 99.50 and 98.02% for nonantibacterial compounds respectively, which are slightly improved against those of the reported ML as well as QSAR models and outperform the k-NN and C4.5 decision tree models developed in this work. Our study suggests that ML methods, particularly SVC, are potentially useful for facilitating the discovery of antibacterial agents.
The human Ether-a-Go-Go-Related Gene (hERG) potassium ion channel plays a crucial role in cardiac repolarization. The inhibition of this channel by marketed drug may prolong the length of time between the start of Q wave and end of the T wave on an electrocardiogram (QT interval) and then possibly leads to fatal cardiac arrhythmia. Therefore, it is vital to predict the potential hERG inhibitors in early stages of drug discovery. This work explored the identifying of hERG inhibitors by different machine learning methods including C4.5 Decision Tree (C4.5 DT), multilayer perceptron, Radial Basis Function Network (RBFNetwork), and Support Vector Machine (SVM). Recursive Feature Elimination (RFE), a feature selection method, was used to select molecular descriptors appropriate for distinguishing hERG inhibitors and noninhibitors. For Data_91 classification system with 91 compounds, the prediction accuracies of those methods are 72.1 -92.3% for hERG inhibitors and 81.7 -98.1% for noninhibitors in a five-fold crossvalidation calculation. For an independent validation set of 47 compounds of Data_91, these methods gave an overall accuracy of 72.3 -80.9% using the selected descriptors. This suggests that the combination of the machine learning methods and the feature selection method can provide an effective way to filter the hERG inhibitors in the drug discovery process.
Gamma-secretase inhibitors have been explored for the prevention and treatment of Alzheimer's disease (AD). Methods for prediction and screening of gamma-secretase inhibitors are highly desired for facilitating the design of novel therapeutic agents against AD, especially when incomplete knowledge about the mechanism and three-dimensional structure of gamma-secretase. We explored two machine learning methods, support vector machine (SVM) and random forest (RF), to develop models for predicting gamma-secretase inhibitors of diverse structures. Quantitative analysis of the receiver operating characteristic (ROC) curve was performed to further examine and optimize the models. Especially, the Youden index (YI) was initially introduced into the ROC curve of RF so as to obtain an optimal threshold of probability for prediction. The developed models were validated by an external testing set with the prediction accuracies of SVM and RF 96.48 and 98.83% for gamma-secretase inhibitors and 98.18 and 99.27% for noninhibitors, respectively. The different feature selection methods were used to extract the physicochemical features most relevant to gamma-secretase inhibition. To the best of our knowledge, the RF model developed in this work is the first model with a broad applicability domain, based on which the virtual screening of gamma-secretase inhibitors against the ZINC database was performed, resulting in 368 potential hit candidates.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.