Simultaneous localization and mapping (SLAM) has emerged as a critical technology enabling robots to navigate in unknown environments, drawing extensive attention within the robotics research community. However, traditional visual SLAM ignores the presence of dynamic objects in indoor scenes, and dynamic point features of dynamic objects can lead to incorrect data correlation, making the traditional visual SLAM is difficult to accurately estimate the camera’s pose when the objects in the scenes are moving. Using only point features cannot fully extract geometric information in dynamic indoor scenes, reducing the system’s robustness. To solve this problem, we develop a RGB-D SLAM system called DIG-SLAM. Firstly, the objects’ contour regions are extracted using the YOLOv7 instance segmentation method, serving as a prerequisite for determining dynamic objects and constructing a semantic information map. Meanwhile, the line features are extracted using the line segment detector (LSD) algorithm, and the redundant line features are optimized via K-means clustering. Secondly, moving consistency checks combined with instance partitioning determine dynamic regions, and the point and line features of the dynamic regions are removed. Finally, the combination of static line features and point features optimizes the camera pose. Meanwhile, a static semantic octree map is created to provide richer and higher-level scene understanding and perception capabilities for robots or autonomous systems. The experimental results on the Technische Universität München (TUM) dataset show that the average absolute trajectory error of the developed DIG-SLAM is reduced by 28.68% compared with the dynamic semantic SLAM (DS-SLAM). Compared with other dynamic SLAM methods, the proposed system shows better camera pose estimation accuracy and system’s robustness in dynamic indoor environments and better map building in real indoor scenes.