“…Usually, the final classification decision is a weighted mixture of the decoding results of both modalities (Sarasola-Sanz et al, 2017 ; Spüler et al, 2018 ; Tryon et al, 2019 ). And actually, the aggregation of multiple decoding outputs is a usual technique employed in machine learning to fuse information of different decoders and obtain a single decision output (Fumanal-Idocin et al, 2021a , b , 2022 ).…”