Currently, one of the key technologies for autonomous navigation of unmanned mobile robots is SLAM, which faces many challenges in practical applications. These challenges include a lack of texture, deterioration in sensor performance, and interference from moving objects in dynamic outdoor environments, all of which have an impact on the mapping system. To address these issues, this paper proposes a framework for lidar, vision camera, and inertial navigation data, resulting in fusion and dynamic object removing. The system consists of three sub-modules: the Lidar-Inertial Module (LIM), the Visual-Inertial Module (VIM), and the Dynamic-Object-Removing Module (DORM). LIM and VIM assist each other, with lidar point clouds providing three-dimensional information for the global voxel map and the camera providing pixel-level color information. At the same time, the DORM performs synchronous dynamic object detection to remove dynamic objects from the global map. The system constructs a multi-sensor factor graph using the state and observation models, and the optimal solution is obtained using least squares. Furthermore, this paper employs triangle descriptors and bundle adjustment methods for loop closure detection in order to reduce accumulated errors and maintain consistency. Experimental results demonstrate that the system can perform clean state estimation, dynamic removing and scene reconstruction in a variety of complex scenarios.