Abstract. With every new generation of smart devices, new sensors are introduced, such as depth camera or UWB sensors. Combined with the rapidly growing number of smart mobile devices, indoor positioning systems (IPS) have seen increasing interest due to numerous indoor location-based services (ILBS) and mobile applications at large. Wi-Fi Received Signal Strength (RSS) based fingerprinting positioning (WF) techniques are popularly used in many IPS as the widespread deployment of IEEE 802.11 WLAN (Wi-Fi) networks, as this technique requires no line-of-sight to the access points (APs), and it is easy to extract Wi-Fi signal from 802.11 networks with smart devices. However, WF techniques have problems with fingerprint variance, i.e., fluctuation of the sensed signal, and efficient map updating due to the frequently changing environment. To address these problems, we propose a novel framework of IPS which uses particle filter to fuse WF and state-of-the-art CNN-based visual localization method to better adapt to changing indoor environment. The suggested system was tested with real-world crowdsourced data collected by multiple devices in an office hallway. The experimental results demonstrate that the system can achieve robust localization at a 0.3~1.5 m mean error (ME) accuracy, and map updating with a 79% correction rate.