Composite robots often encounter difficulties due to changes in illumination, external disturbances, reflective surface effects, and cumulative errors. These challenges significantly hinder their capabilities in environmental perception and the accuracy and reliability of pose estimation. We propose a nonlinear optimization approach to overcome these issues to develop an integrated localization and navigation framework, IIVL-LM (IMU, Infrared, Vision, and LiDAR Fusion for Localization and Mapping). This framework achieves tightly coupled integration at the data level using inputs from an IMU (Inertial Measurement Unit), an infrared camera, an RGB (Red, Green and Blue) camera, and LiDAR. We propose a real-time luminance calculation model and verify its conversion accuracy. Additionally, we designed a fast approximation method for the nonlinear weighted fusion of features from infrared and RGB frames based on luminance values. Finally, we optimize the VIO (Visual-Inertial Odometry) module in the R3LIVE++ (Robust, Real-time, Radiance Reconstruction with LiDAR-Inertial-Visual state Estimation) framework based on the infrared camera’s capability to acquire depth information. In a controlled study, using a simulated indoor rescue scenario dataset, the IIVL-LM system demonstrated significant performance enhancements in challenging luminance conditions, particularly in low-light environments. Specifically, the average RMSE ATE (Root Mean Square Error of absolute trajectory Error) improved by 23% to 39%, with reductions from 0.006 to 0.013. At the same time, we conducted comparative experiments using the publicly available TUM-VI (Technical University of Munich Visual-Inertial Dataset) without the infrared image input. It was found that no leading results were achieved, which verifies the importance of infrared image fusion. By maintaining the active engagement of at least three sensors at all times, the IIVL-LM system significantly boosts its robustness in both unknown and expansive environments while ensuring high precision. This enhancement is particularly critical for applications in complex environments, such as indoor rescue operations.