Various toxicity and pharmacokinetic evaluations as screening
experiments
are needed at the drug discovery stage. Currently, to reduce the use
of animal experiments and developmental expenses, the development
of high-performance predictive models based on quantitative structure–activity
relationship analysis is desired. From these evaluation targets, we
selected 50% lethal dose (LD50), blood–brain barrier
penetration (BBBP), and the clearance (CL) pathway for this investigation
and constructed predictive models for each target using 636–11,886
compounds. First, we constructed predictive models using the DeepSnap-deep
learning (DL) method and images of compounds as features. The calculated
area under the curve (AUC) and balanced accuracy (BAC) were, respectively,
0.887 and 0.818 for LD50, 0.893 and 0.824 for BBBP, and
0.883 and 0.763 for the CL pathway. Next, molecular descriptors (MDs)
of compounds were calculated using Molecular Operating Environment,
alvaDesc, and ADMET Predictor to construct predictive models using
the MD-based method. Using these MDs, we constructed predictive models
using DataRobot. The calculated AUC and BAC were, respectively, 0.931
and 0.805 for LD50, 0.919 and 0.849 for BBBP, and 0.900
and 0.807 for the CL pathway. In this investigation, we constructed
predictive models combining the DeepSnap-DL and MD-based methods.
In ensemble models using the mean predictive probability of the DeepSnap-DL
and MD-based methods, the calculated AUC and BAC were, respectively,
0.942 and 0.842 for LD50, 0.936 and 0.853 for BBBP, and
0.908 and 0.832 for the CL pathway, with improved predictive performance
observed for all variables compared with either single method alone.
Moreover, in consensus models that adopted only compounds for which
the results of the two methods agreed, the calculated BAC for LD50, BBBP, and the CL pathway were 0.916, 0.918, and 0.847,
respectively, indicating higher predictive performance than the ensemble
models for all three variables. The predictive models combining the
DeepSnap-DL and MD-based methods displayed high predictive performance
for LD50, BBBP, and the CL pathway. Therefore, the application
of this approach to prediction targets in various drug discovery screenings
is expected to accelerate drug discovery.