“…Besides, neural networks raise more attention in fusion especially since the appearance of RNN and LSTM [36,47]. More recently, transformer-based [51] fusion raises growing attention [1,48,37,16,21], especially after its application in vision [7]. In addition to that, there are also some modelagnostic fusion methods, including the simple concatenation [27,6,58] and element-wise operation [8,50].…”