At present, low-cost Red Green Blue Depth (RGB-D) sensors are mainly used in indoor robot environment perception, but the depth information obtained by RGB-D cameras has problems such as poor accuracy and high noise, and the generated 3D color point cloud map has low accuracy. In order to solve these problems, this article proposes a vision sensor-based point cloud map generation algorithm for robot indoor navigation. The aim is to obtain a more accurate point cloud map through visual SLAM and Kalman filtering visual-inertial navigation attitude fusion algorithm. The results show that in the positioning speed test data of the fusion algorithm in this study, the average time-consuming of camera tracking is 23.4 ms, which can meet the processing speed requirement of 42 frames per second. The yaw angle error of the fusion algorithm is the smallest, and the ATE test values of the algorithm are smaller than those of the Inertial measurement unit and Simultaneous-Localization-and-Mapping algorithms. This research algorithm can make the mapping process more stable and robust. It can use visual sensors to make more accurate route planning, and this algorithm improves the indoor positioning accuracy of the robot. In addition, the research algorithm can also obtain a dense point cloud map in real time, which provides a more comprehensive idea for the research of robot indoor navigation point cloud map generation.