From Large Language Models to Large Multimodal Models: A Literature Review

Huang, Dawei; Yan, Chuan; Li, Qing; Peng, Xiaojiang

doi:10.3390/app14125068

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

Supporting

Mentioning

Contrasting

Year Published

2024

Publication Types

Select...

Other1

Article1

Relationship

Self Cite0

Independent2

Authors

Journals

Cited by 2 publications

References 100 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

SZTU-CMU at MER2024: Improving Emotion-LLaMA with Conv-Attention for Multimodal Emotion Recognition

Cheng,

Tu,

Huang

et al. 2024

Proceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing

View full text Add to dashboard Cite

SZTU-CMU at MER2024: Improving Emotion-LLaMA with Conv-Attention for Multimodal Emotion Recognition

Cheng,

Tu,

Huang

et al. 2024

Proceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing

View full text Add to dashboard Cite

Integrating Vision and Olfaction via Multi-Modal LLM for Robotic Odor Source Localization

Hassan,

Wang,

Mahmud

2024

Sensors

View full text Add to dashboard Cite

Odor source localization (OSL) technology allows autonomous agents like mobile robots to localize a target odor source in an unknown environment. This is achieved by an OSL navigation algorithm that processes an agent’s sensor readings to calculate action commands to guide the robot to locate the odor source. Compared to traditional ‘olfaction-only’ OSL algorithms, our proposed OSL algorithm integrates vision and olfaction sensor modalities to localize odor sources even if olfaction sensing is disrupted by non-unidirectional airflow or vision sensing is impaired by environmental complexities. The algorithm leverages the zero-shot multi-modal reasoning capabilities of large language models (LLMs), negating the requirement of manual knowledge encoding or custom-trained supervised learning models. A key feature of the proposed algorithm is the ‘High-level Reasoning’ module, which encodes the olfaction and vision sensor data into a multi-modal prompt and instructs the LLM to employ a hierarchical reasoning process to select an appropriate high-level navigation behavior. Subsequently, the ‘Low-level Action’ module translates the selected high-level navigation behavior into low-level action commands that can be executed by the mobile robot. To validate our algorithm, we implemented it on a mobile robot in a real-world environment with non-unidirectional airflow environments and obstacles to mimic a complex, practical search environment. We compared the performance of our proposed algorithm to single-sensory-modality-based ‘olfaction-only’ and ‘vision-only’ navigation algorithms, and a supervised learning-based ‘vision and olfaction fusion’ (Fusion) navigation algorithm. The experimental results show that the proposed LLM-based algorithm outperformed the other algorithms in terms of success rates and average search times in both unidirectional and non-unidirectional airflow environments.

show abstract

From Large Language Models to Large Multimodal Models: A Literature Review

Cited by 2 publications

References 100 publications

SZTU-CMU at MER2024: Improving Emotion-LLaMA with Conv-Attention for Multimodal Emotion Recognition

SZTU-CMU at MER2024: Improving Emotion-LLaMA with Conv-Attention for Multimodal Emotion Recognition

Integrating Vision and Olfaction via Multi-Modal LLM for Robotic Odor Source Localization

Contact Info

Product

Resources

About