Chemical acute oral toxicity is an important end point in drug design and environmental risk assessment. However, it is difficult to determine by experiments, and in silico methods are hence developed as an alternative. In this study, a comprehensive data set containing 12, 204 diverse compounds with median lethal dose (LD₅₀) was compiled. These chemicals were classified into four categories, namely categories I, II, III and IV, based on the criterion of the U.S. Environmental Protection Agency (EPA). Then several multiclassification models were developed using five machine learning methods, including support vector machine (SVM), C4.5 decision tree (C4.5), random forest (RF), κ-nearest neighbor (kNN), and naïve Bayes (NB) algorithms, along with MACCS and FP4 fingerprints. One-against-one (OAO) and binary tree (BT) strategies were employed for SVM multiclassification. Performances were measured by two external validation sets containing 1678 and 375 chemicals, separately. The overall accuracy of the MACCS-SVM(OAO) model was 83.0% and 89.9% for external validation sets I and II, respectively, which showed reliable predictive accuracy for each class. In addition, some representative substructures responsible for acute oral toxicity were identified using information gain and substructure frequency analysis methods, which might be very helpful for further study to avoid the toxicity.
Mutagenicity is one of the most important end points
of toxicity.
Due to high cost and laboriousness in experimental tests, it is necessary
to develop robust in silico methods to predict chemical
mutagenicity. In this paper, a comprehensive database containing 7617
diverse compounds, including 4252 mutagens and 3365 nonmutagens, was
constructed. On the basis of this data set, high predictive models
were then built using five machine learning methods, namely support
vector machine (SVM), C4.5 decision tree (C4.5 DT), artificial neural
network (ANN), k-nearest neighbors (kNN), and naïve Bayes (NB), along with five fingerprints, namely
CDK fingerprint (FP), Estate fingerprint (Estate), MACCS keys (MACCS),
PubChem fingerprint (PubChem), and Substructure fingerprint (SubFP).
Performances were measured by cross validation and an external test
set containing 831 diverse chemicals. Information gain and substructure
analysis were used to interpret the models. The accuracies of fivefold
cross validation were from 0.808 to 0.841 for top five models. The
range of accuracy for the external validation set was from 0.904 to
0.980, which outperformed that of Toxtree. Three models (PubChem-kNN, MACCS-kNN, and PubChem-SVM) showed
high and reliable predictive accuracy for the mutagens and nonmutagens
and, hence, could be used in prediction of chemical Ames mutagenicity.
Biodegradation is the principal environmental dissipation process. Due to a lack of comprehensive experimental data, high study cost and time-consuming, in silico approaches for assessing the biodegradable profiles of chemicals are encouraged and is an active current research topic. Here we developed in silico methods to estimate chemical biodegradability in the environment. At first 1440 diverse compounds tested under the Japanese Ministry of International Trade and Industry (MITI) protocol were used. Four different methods, namely support vector machine, k-nearest neighbor, naïve Bayes, and C4.5 decision tree, were used to build the combinatorial classification probability models of ready versus not ready biodegradability using physicochemical descriptors and fingerprints separately. The overall predictive accuracies of the best models were more than 80% for the external test set of 164 diverse compounds. Some privileged substructures were further identified for ready or not ready biodegradable chemicals by combining information gain and substructure fragment analysis. Moreover, 27 new predicted chemicals were selected for experimental assay through the Japanese MITI test protocols, which validated that all 27 compounds were predicted correctly. The predictive accuracies of our models outperform the commonly used software of the EPI Suite. Our study provided critical tools for early assessment of biodegradability of new organic chemicals in environmental hazard assessment.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.