Simultaneous Localization and Mapping (SLAM) has been widely applied in computer vision and robotics. For the dynamic environments which are very common in the real word, traditional visual SLAM system faces significant drop in localization and mapping accuracy due to the static world assumption. Recently, the semantic visual SLAM systems towards dynamic scenes have gradually attracted more and more attentions, which use the semantic information of images to help remove dynamic feature points. Existing semantic visual SLAM systems commonly detect the dynamic feature points by the semantic prior, geometry constraint or the combine of them, then map points corresponding to dynamic feature points are removed. In the visual SLAM framework, pose calculation is essentially around the 3D map points, so the essence of improving the accuracy of visual SLAM system is to build a more accurate and reliable map. These existing semantic visual SLAM systems are actually adopting an indirect way to acquire reliable map points, and several drawbacks exist. In this paper, we present SDF-SLAM: Semantic Depth Filter SLAM, a visual semantic SLAM system towards dynamic environments, which utilizes the technology of depth filter to directly judge whether a 3D map point is dynamic or not. First, the semantic information is integrated into the original pure geometry SLAM system by the semantic optical flow method to perform reliable map initialization. Second, design the semantic depth filter that satisfies the Gaussian Uniform mixture distribution to describe the inverse depth of each map point. Third, updating the inverse depth of 3D map point in a Bayesian estimation framework, and dividing the 3D map point into active one or inactive one. Last, only the active map points are utilized to achieve robust camera pose tracking. Experiments on TUM dataset demonstrate that our approach outperforms original ORB-SLAM2 and other state-of-the-art semantic SLAM systems.