Odor source localization (OSL) technology allows autonomous agents like mobile robots to localize a target odor source in an unknown environment. This is achieved by an OSL navigation algorithm that processes an agent’s sensor readings to calculate action commands to guide the robot to locate the odor source. Compared to traditional ‘olfaction-only’ OSL algorithms, our proposed OSL algorithm integrates vision and olfaction sensor modalities to localize odor sources even if olfaction sensing is disrupted by non-unidirectional airflow or vision sensing is impaired by environmental complexities. The algorithm leverages the zero-shot multi-modal reasoning capabilities of large language models (LLMs), negating the requirement of manual knowledge encoding or custom-trained supervised learning models. A key feature of the proposed algorithm is the ‘High-level Reasoning’ module, which encodes the olfaction and vision sensor data into a multi-modal prompt and instructs the LLM to employ a hierarchical reasoning process to select an appropriate high-level navigation behavior. Subsequently, the ‘Low-level Action’ module translates the selected high-level navigation behavior into low-level action commands that can be executed by the mobile robot. To validate our algorithm, we implemented it on a mobile robot in a real-world environment with non-unidirectional airflow environments and obstacles to mimic a complex, practical search environment. We compared the performance of our proposed algorithm to single-sensory-modality-based ‘olfaction-only’ and ‘vision-only’ navigation algorithms, and a supervised learning-based ‘vision and olfaction fusion’ (Fusion) navigation algorithm. The experimental results show that the proposed LLM-based algorithm outperformed the other algorithms in terms of success rates and average search times in both unidirectional and non-unidirectional airflow environments.