Support Vector Machines (SVM) is a powerful classification and regression tool that is becoming increasingly popular in various machine learning applications. We tested the ability of SVM, in comparison with well-known neural network techniques, to predict drug-likeness and agrochemical-likeness for large compound collections. For both kinds of data, SVM outperforms various neural networks using the same set of descriptors. We also used SVM for estimating the activity of Carbonic Anhydrase II (CA II) enzyme inhibitors and found that the prediction quality of our SVM model is better than that reported earlier for conventional QSAR. Model characteristics and data set features were studied in detail.
Support Vector Machines (SVM) is a powerful classification and regression tool that is becoming increasingly popular in various machine learning applications. We tested the ability of SVM, in comparison with wellknown neural network techniques, to predict drug-likeness and agrochemical-likeness for large compound collections. For both kinds of data, SVM outperforms various neural networks using the same set of descriptors. We also used SVM for estimating the activity of Carbonic Anhydrase II (CA II) enzyme inhibitors and found that the prediction quality of our SVM model is better than that reported earlier for conventional QSAR. Model characteristics and data set features were studied in detail.
An analysis of the chemical environment of the oxygen atoms in the DNP database compared to the CMC and SCD databases was performed. Some structural clusters were identified which are predominant among the natural products and can be considered as distinctive features of NPs. Fifty-three oxygen-containing structural fragments that are distinctive for the DNP (distinctive set of fragments DSF) in comparison with the SCD have been identified. A new descriptor Mc was introduced for describing the ratio of atoms involved in the DSF to the total number of heavy atoms. A significant difference in the Mc values among the reference databases allowed the use of a specific cluster of the DSF as a tool for performing similarity searches for oxygen-containing NP molecules, or for evaluation or comparison of databases according to their NP-likeness. An example illustrating that the suggested approach could allow not only estimating the NP-likeness, but also serve as a tool for designing new NP-like compounds is provided. The suggested approach for NP-likeness evaluation moves away from the traditional ideas of scaffolds, cycles, linkers and substituents.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.