Visual simultaneous localization and mapping (vSLAM) are considered a fundamental technology for augmented reality and intelligent mobile robots. However, rigid scene assumption is common in vSLAM, which limits the wide usage in populated real-world environments. Recently, with the widespread use of artificial neural networks, many solutions have tried to eliminate the influence of dynamic objects using semantic information provided by object detection or semantic segmentation. Mask R-CNN is popular in many applications, but is usually slow and limits the speed of vSLAM because it waits for the semantic results before camera ego-motion estimation. We had previously introduced a real-time vSLAM, RDS-SLAM, which isolates tracking and semantic segmentation by adding a semantic thread and moving probability estimation. However, Mask R-CNN only supplies a small amount of semantic information because only a few keyframes can be segmented within a short time. Therefore, in this study, we propose a novel vSLAM, RDMO-SLAM, which can leverage more semantic information while ensuring the real-time nature by adding semantic label prediction using dense optical flow. Besides, we also estimate the velocity of each landmark and use them as constraints to reduce the influence of dynamic objects in tracking. Demonstrations are presented, which compare the proposed method to comparable state-of-the-art approaches using dynamic sequences. We improved the real-time performance from 15 Hz (RDS-SLAM) to 30 Hz while keeping robust tracking in dynamic scenes.