Lipophilicity (logD) and aqueous solubility (logS
w) play
a central role in drug development.
The accurate prediction of these properties remains to be solved due
to data scarcity. Current methodologies neglect the intrinsic relationships
between physicochemical properties and usually ignore the ionization
effects. Here, we propose an attention-driven mixture-of-experts (MoE)
model named ALipSol, which explicitly reproduces the hierarchy of
task relationships. We adopt the principle of divide-and-conquer by
breaking down the complex end point (logD or logS
w) into simpler ones (acidic pK
a, basic pK
a, and logP) and allocating a specific expert network for each subproblem.
Subsequently, we implement transfer learning to extract knowledge
from related tasks, thus alleviating the dilemma of limited data.
Additionally, we substitute the gating network with an attention mechanism
to better capture the dynamic task relationships on a per-example
basis. We adopt local fine-tuning and consensus prediction to further
boost model performance. Extensive evaluation experiments verify the
success of the ALipSol model, which achieves RMSE improvement of 8.04%,
2.49%, 8.57%, 12.8%, and 8.60% on the Lipop, ESOL, AqSolDB, external
logD, and external logS data sets,
respectively, compared with Attentive FP and the state-of-the-art in silico tools. In particular, our model yields more significant
advantages (Welch’s t-test) for small training
data, implying its high robustness and generalizability. The interpretability
analysis proves that the atom contributions learned by ALipSol are
more reasonable compared with the vanilla Attentive FP, and the substitution
effects in benzene derivatives agreed well with empirical constants,
revealing the potential of our model to extract useful patterns from
data and provide guidance for lead optimization.