ABSTRACT:Drug-induced liver injury (DILI) is one of the most important reasons for drug development failure at both preapproval and postapproval stages. There has been increased interest in developing predictive in vivo, in vitro, and in silico models to identify compounds that cause idiosyncratic hepatotoxicity. In the current study, we applied machine learning, a Bayesian modeling method with extended connectivity fingerprints and other interpretable descriptors. The model that was developed and internally validated (using a training set of 295 compounds) was then applied to a large test set relative to the training set (237 compounds) for external validation. The resulting concordance of 60%, sensitivity of 56%, and specificity of 67% were comparable to results for internal validation. The Bayesian model with extended connectivity functional class fingerprints of maximum diameter 6 (ECFC_6) and interpretable descriptors suggested several substructures that are chemically reactive and may also be important for DILI-causing compounds, e.g., ketones, diols, and ␣-methyl styrene type structures. Using Smiles Arbitrary Target Specification (SMARTS) filters published by several pharmaceutical companies, we evaluated whether such reactive substructures could be readily detected by any of the published filters. It was apparent that the most stringent filters used in this study, such as the Abbott alerts, which captures thiol traps and other compounds, may be of use in identifying DILI-causing compounds (sensitivity 67%). A significant outcome of the present study is that we provide predictions for many compounds that cause DILI by using the knowledge we have available from previous studies. These computational models may represent cost-effective selection criteria before in vitro or in vivo experimental studies.