The interpretability is an important issue for end-to-end
learning
models. Motivated by computer vision algorithms, an interpretable
noncovalent interaction (NCI) correction multimodal (TFRegNCI) is
proposed for NCI prediction. TFRegNCI is based on RegNet feature extraction
and a transformer encoder fusion strategy. RegNet is a network design
paradigm that mainly focuses on local features. Meanwhile, the Vision
Transformer is also leveraged for feature extraction, because it can
capture global features better than RegNet while lowering the computational
cost. Using a transformer encoder as the fusion strategy rather than
multilayer perceptron can enhance model performance, due to its emphasis
on important features with less parameters. Therefore, the proposed
TFRegNCI achieved high accurate prediction (mean absolute error of
∼0.1 kcal/mol) comparing with the coupled cluster single double
(triple) (CCSD(T)) benchmark. To further improve the model efficiency,
TFRegNCI applies two-dimensional (2D) inputs transformed from three-dimensional
(3D) electron density cubes, which saves time (30%), while the model
accuracy remains. To improve model interpretability, a visualization
module, Gradient-weighted Regression Activation Mapping (Grad-RAM)
has been embedded. Grad-RAM is promoted from the classification algorithm,
Gradient-weighted Class Activation Mapping, to perform feature visualization
for the regression task. With Grad-RAM, the visual location map for
features in deep learning models can be displayed. The feature map
visualizations suggest that the 2D model has the similar performance
as the 3D model, because of equally effective feature extractions
from electron density. Moreover, the valid feature region on the location
map by the 3D model is consistent with the NCIPLOT NCI isosurface.
It is confirmed that the model does extract significant features related
to the NCI interaction. The interpretable analyses are carried out
through molecular orbital contribution on effective features. Thereby,
the proposed model is likely to be a promising tool to reveal some
essential information on NCIs, with regard to the level of electronic
theory.