Genotoxicity tests can detect compounds that have an adverse effect on the process of heredity. The micronucleus assay, a genotoxicity test method, has been widely used to evaluate the presence and extent of chromosomal damage in human beings. Due to the high cost and laboriousness of experimental tests, computational approaches for predicting genotoxicity based on chemical structures and properties are recognized as an alternative. In this study, a dataset containing 641 diverse chemicals was collected and the molecules were represented by both fingerprints and molecular descriptors. Then classification models were constructed by six machine learning methods, including the support vector machine (SVM), naïve Bayes (NB), k-nearest neighbor (kNN), C4.5 decision tree (DT), random forest (RF) and artificial neural network (ANN). The performance of the models was estimated by five-fold cross-validation and an external validation set. The top ten models showed excellent performance for the external validation with accuracies ranging from 0.846 to 0.938, among which models Pubchem_SVM and MACCS_RF showed a more reliable predictive ability. The applicability domain was also defined to distinguish favorable predictions from unfavorable ones. Finally, ten structural fragments which can be used to assess the genotoxicity potential of a chemical were identified by using information gain and structural fragment frequency analysis. Our models might be helpful for the initial screening of potential genotoxic compounds.
Reproductive toxicity is an important regulatory endpoint in health hazard assessment. Because the in vivo tests are expensive, time consuming and require a large number of animals, which must be killed, in silico approaches as the alternative strategies have been developed to assess the potential reproductive toxicity (reproductive toxicity) of chemicals. Some prediction models for reproductive toxicity have been developed, but most of them were built only based on one single endpoint such as embryo teratogenicity; therefore, these models may not provide reliable predictions for toxic chemicals with other endpoints, such as sperm reduction or gonadal dysgenesis. Here, a total of 1823 chemicals for reproductive toxicity characterized by multiple endpoints were used to develop structure‐activity relationship models by six machine‐learning approaches with nine molecular fingerprints. Among the models, MACCSFP‐SVM model has the best performance for the external validation set (area under the curve = 0.900, classification accuracy = 0.836). The applicability domain was analyzed, and a rational boundary was found to distinguish inaccurate predictions and accurate predictions. Moreover, several structural alerts for characterizing reproductive toxicity were identified using the information gain combining substructure frequency analysis. Our results would be helpful for the prediction of the reproductive toxicity of chemicals.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.