Traditional visual simultaneous localization and mapping (SLAM) systems mostly based on small-area static environments. In recent years, some studies focused on combining semantic information with visual SLAM. However, most of them are hard to obtain better performance in the large-scale dynamic environment. And the accuracy, rapidity of the system still needs to strengthen. In this paper, we develop a more efficient semantic SLAM system in the two-wheeled mobile robot by using semantic segmentation to recognize people, chairs, and other objects in every keyframe. With a preliminary understanding of the environment, fusing the RGB-D camera and encoders information, to localization and creating a dense colored octree map without dynamic objects. Besides, for the incomplete identification of movable objects, we used image processing algorithms to enhance the semantic segmentation effect. In the proposed method, enhanced semantic segmentation in keyframes dramatically increases the efficiency of the system. Moreover, fusing the different sensors can highly raise localization accuracy. We conducted experiments on various datasets and in some real environments and compared them with DRE-SLAM, DS-SLAM, to evaluate the performance of the proposed approach. The results suggest we significantly improve the processing efficiency, robustness, and quality of the map.