Accurate smartphone-based outdoor localization systems in deep urban canyons are increasingly needed for various IoT applications. As smart cities have developed, building information modeling (BIM) has become widely available. This article, for the first time, presents a semantic Visual Positioning System (VPS) for accurate and robust position estimation in urban canyons where the global navigation satellite system (GNSS) tends to fail. In the offline stage, a material segmented BIM is used to generate segmented images. In the online stage, an image is taken with a smartphone camera that provides textual information about the surrounding environment. The approach utilizes computer vision algorithms to segment between the different types of material class identified in the smartphone image. A semantic VPS method is then used to match the segmented generated images with the segmented smartphone image. Each generated image contains position information in terms of latitude, longitude, altitude, yaw, pitch, and roll. The candidate with the maximum likelihood is regarded as the precise position of the user. The positioning result achieved an accuracy of 2.0 m among high-rise buildings on a street, 5.5 m in a dense foliage environment, and 15.7 m in an alleyway. This represents an improvement in positioning of 45% compared to the current state-of-the-art method. The estimation of yaw achieved accuracy of 2.3°, an eight-fold improvement compared to the smartphone IMU.