Increasing attention has been paid to high-precision indoor localization in dense urban and indoor environments. Previous studies have shown single indoor localization methods based on WiFi fingerprints, surveillance cameras or Pedestrian Dead Reckoning (PDR) are restricted by low accuracy, limited tracking region, and accumulative error, etc., and some defects can be resolved with more labor costs or special scenes. However, requesting more additional information and extra user constraints is costly and rarely applicable. In this paper, a two-stage indoor localization system is presented, integrating WiFi fingerprints, the vision of surveillance cameras, and PDR (the system abbreviated as iWVP). A coarse location using WiFi fingerprints is done advanced, and then an accurate location by fusing data from surveillance cameras and the IMU sensors is obtained. iWVP uses a matching algorithm based on motion sequences to confirm the identity of pedestrians, enhancing output accuracy and avoiding corresponding drawbacks of each subsystem. The experimental results show that the iWVP achieves high accuracy with an average position error of 4.61 cm, which can effectively track pedestrians in multiple regions in complex and dynamic indoor environments.