Up to now, publicly available data sets to build and evaluate Ames mutagenicity prediction tools have been very limited in terms of size and chemical space covered. In this report we describe a new unique public Ames mutagenicity data set comprising about 6500 nonconfidential compounds (available as SMILES strings and SDF) together with their biological activity. Three commercial tools (DEREK, MultiCASE, and an off-the-shelf Bayesian machine learner in Pipeline Pilot) are compared with four noncommercial machine learning implementations (Support Vector Machines, Random Forests, k-Nearest Neighbors, and Gaussian Processes) on the new benchmark data set.
In silico prediction tools for Ames mutagenicity (Salmonella typhimurium reverse mutation assay) represent a costeffective high throughput approach for the prioritization of compounds before submission to experimental testing. Various modeling approaches have been pursued in this field during the last few years. However, the publicly available data sets used for modeling are mostly very limited in terms of size and chemical coverage. Hence, a reasonable comparison of the different modeling methodologies is so far -as for most QSAR problems -impossible.In this work we describe a collection of about 6000 nonconfidential compounds together with their biological activity in the Ames mutagenicity test. This very large, unique and valuable data set built from public sources is made available in machine-readable form (smiles strings) to be used as a benchmark by other researchers. Based on these data we built three statistical prediction models for Ames mutagenicity based on CORINA and DRAGON descriptors. The methods used are a support vector machine, a random forest and Gaussian processes. All three approaches are evaluated within the same cross-validation setting. To facilitate this valuable benchmark, the exact validation protocol including the exact random splits will be made publicly available. The results show that all three methods yield satisfactory results, reaching sensitivity and specificity values of greater than 70% or 80%, respectively. The application of Gaussian processes, previously not applied to Ames mutagenicity prediction proves slightly superior to the other two methods.
We previously described a multiplexed in vitro genotoxicity assay based on flow cytometric analysis of detergent-liberated nuclei that are simultaneously stained with propidium iodide and labeled with fluorescent antibodies against p53, γH2AX, and phospho-histone H3. Inclusion of a known number of microspheres provides absolute nuclei counts. The work described herein was undertaken to evaluate the interlaboratory transferability of this assay, commercially known as MultiFlow™ DNA Damage Kit— p53, γH2AX, Phospho-histone H3. For these experiments seven laboratories studied reference chemicals from a group of 84 representing clastogens, aneugens, and non-genotoxicants. TK6 cells were exposed to chemicals in 96-well plates over a range of concentrations for 24 hrs. At 4 and 24 hrs cell aliquots were added to the MultiFlow reagent mix and following a brief incubation period flow cytometric analysis occurred, in most cases directly from a 96-well plate via a robotic walk-away data acquisition system. Multiplexed response data were evaluated using two analysis approaches, one based on global evaluation factors (i.e., cutoff values derived from all inter-laboratory data), and a second based on multinomial logistic regression that considers multiple biomarkers simultaneously. Both data analysis strategies were devised to categorize chemicals as predominately exhibiting a clastogenic, aneugenic, or non-genotoxic mode of action (MoA). Based on the aggregate 231 experiments that were performed, assay sensitivity, specificity, and concordance in relation to a priori MoA grouping were ≥ 92%. These results are encouraging as they suggest that two distinct data analysis strategies can rapidly and reliably predict new chemicals’ predominant genotoxic MoA based on data from an efficient and transferable multiplexed in vitro assay.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.