Improving Multimodal fusion via Mutual Dependency Maximisation

Colombo, Pierre; Chapuis, Emile; Labeau, Matthieu; Clavel, Chloé

doi:10.18653/v1/2021.emnlp-main.21

Cited by 13 publications

(5 citation statements)

References 46 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Among available contrast measures, the Fisher-Rao distance is parameter-free and thus, it is easy to use in practice while the AB-Divergence achieves better results but requires to select α and β. Future work includes extending our metrics to new tasks such as SLU (Chapuis et al 2020(Chapuis et al , 2021Dinkar et al 2020;Colombo, Clavel, and Piantanida 2021), controlled sentence generation (Colombo et al 2019(Colombo et al , 2021b and multi-modal learning (Colombo et al 2021a;Garcia et al 2019).…”

Section: Summary and Concluding Remarksmentioning

confidence: 99%

InfoLM: A New Metric to Evaluate Summarization & Data2Text Generation

Colombo

Clavel

Piantanida

2022

AAAI

Self Cite

View full text Add to dashboard Cite

Assessing the quality of natural language generation (NLG) systems through human annotation is very expensive. Additionally, human annotation campaigns are time-consuming and include non-reusable human labour. In practice, researchers rely on automatic metrics as a proxy of quality. In the last decade, many string-based metrics (e.g., BLEU or ROUGE) have been introduced. However, such metrics usually rely on exact matches and thus, do not robustly handle synonyms. In this paper, we introduce InfoLM a family of untrained metrics that can be viewed as a string-based metric that addresses the aforementioned flaws thanks to a pre-trained masked language model. This family of metrics also makes use of information measures allowing the possibility to adapt InfoLM to different evaluation criteria. Using direct assessment, we demonstrate that InfoLM achieves statistically significant improvement and two figure correlation gains in many configurations compared to existing metrics on both summarization and data2text generation tasks.

show abstract

Section: Summary and Concluding Remarksmentioning

confidence: 99%

InfoLM: A New Metric to Evaluate Summarization & Data2Text Generation

Colombo

Clavel

Piantanida

2022

AAAI

Self Cite

View full text Add to dashboard Cite

show abstract

“…• Recently Colombo et al (2021) conducted experiments introducing a information regularizer on existing architectures. The main differences between the our method and their method are a) our method focuses on synergy terms whereas their proposal is optimizing joint mutual information between different unimodal representations; and b) they experiment with variational measures of information.…”

Section: Modelsmentioning

confidence: 99%

Multimodal fusion via cortical network inspired losses

Shankar¹

2022

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

View full text Add to dashboard Cite

Information integration from different modalities is an active area of research. Human beings and, in general, biological neural systems are quite adept at using a multitude of signals from different sensory perceptive fields to interact with the environment and each other. Recent work in deep fusion models via neural networks has led to substantial improvements over unimodal approaches in areas like speech recognition, emotion recognition and analysis, captioning and image description. However, such research has mostly focused on architectural changes allowing for fusion of different modalities while keeping the model complexity manageable. Inspired by neuroscientific ideas about multisensory integration and processing, we investigate the effect of introducing neural dependencies in the loss functions. Experiments on multimodal sentiment analysis tasks with different models show that our approach provides a consistent performance boost.

show abstract

“…Multi-modal fusion, which integrates information from multiple modalities into a compact and informative representation, poses a significant challenge as it requires effectively correlating the semantics of diverse modalities. In recent years, several approaches have been developed to learn the joint embeddings of multiple modalities [1,2]. However, each modality exhibits distinct representations and statistical features, making it difficult to capture complex intermodal correlations.…”

Section: Introductionmentioning

confidence: 99%

Multi-Modal Representation via Contrastive Learning with Attention Bottleneck Fusion and Attentive Statistics Features

Guo,

Liao,

et al. 2023

Entropy

View full text Add to dashboard Cite

The integration of information from multiple modalities is a highly active area of research. Previous techniques have predominantly focused on fusing shallow features or high-level representations generated by deep unimodal networks, which only capture a subset of the hierarchical relationships across modalities. However, previous methods are often limited to exploiting the fine-grained statistical features inherent in multimodal data. This paper proposes an approach that densely integrates representations by computing image features’ means and standard deviations. The global statistics of features afford a holistic perspective, capturing the overarching distribution and trends inherent in the data, thereby facilitating enhanced comprehension and characterization of multimodal data. We also leverage a Transformer-based fusion encoder to effectively capture global variations in multimodal features. To further enhance the learning process, we incorporate a contrastive loss function that encourages the discovery of shared information across different modalities. To validate the effectiveness of our approach, we conduct experiments on three widely used multimodal sentiment analysis datasets. The results demonstrate the efficacy of our proposed method, achieving significant performance improvements compared to existing approaches.

show abstract

Improving Multimodal fusion via Mutual Dependency Maximisation

Cited by 13 publications

References 46 publications

InfoLM: A New Metric to Evaluate Summarization & Data2Text Generation

InfoLM: A New Metric to Evaluate Summarization & Data2Text Generation

Multimodal fusion via cortical network inspired losses

Multi-Modal Representation via Contrastive Learning with Attention Bottleneck Fusion and Attentive Statistics Features

Contact Info

Product

Resources

About