As a critical issue in drug development
and postmarketing
safety surveillance, drug-induced liver injury (DILI) leads to failures
in clinical trials as well as retractions of on-market approved drugs.
Therefore, it is important to identify DILI compounds in the early-stages
through in silico and in vivo studies. It is difficult using conventional
safety testing methods, since the predictive power of most of the
existing frameworks is insufficiently effective to address this pharmacological
issue. In our study, we employ a natural language processing (NLP)
inspired computational framework using convolutional neural networks
and molecular fingerprint-embedded features. Our development set and
independent test set have 1597 and 322 compounds, respectively. These
samples were collected from previous studies and matched with established
chemical databases for structural validity. Our study comes up with
an average accuracy of 0.89, Matthews’s correlation coefficient
(MCC) of 0.80, and an AUC of 0.96. Our results show a significant
improvement in the AUC values compared to the recent best model with
a boost of 6.67%, from 0.90 to 0.96. Also, based on our findings,
molecular fingerprint-embedded featurizer is an effective molecular
representation for future biological and biochemical studies besides
the application of classic molecular fingerprints.