Cloud-free remote sensing images are required for many applications, such as land cover classification, land surface temperature retrieval and agricultural-drought monitoring. Cloud cover in remote sensing images can be pervasive, dynamic and often unavoidable. Current techniques of cloud removal for the VNIR (visible and near-infrared) bands still encounters the problem of pixel values estimated for the cloudy area incomparable and inconsistent with the cloud-free region in the target image. In this paper, we proposed an efficient approach to remove thick clouds and their shadows in VNIR bands using multi-temporal images with good maintenance of DN (digital number) value consistency. We constructed the spectral similarity between the target image and reference one for DN value estimation of the cloudy pixels. The information reconstruction was done with 10 neighboring cloud-free pair-pixels with the highest similarity over a small window centering the cloudy pixel between target and reference images. Four Landsat5 TM images around Nanjing city of Jiangsu Province in Eastern China were used to validate the approach over four representative surface patterns (mountain, plain, water and city) for diverse sizes of cloud cover. Comparison with the conventional approaches indicates high accuracy of the approach in cloud removal for the VNIR bands. The approach was applied to the Landsat8 OLI (Operational Land Imager) image on 29 April 2016 in Nanjing area using two reference images. Very good consistency was achieved in the resulted images, which confirms that the proposed approach could be served as an alternative for cloud removal in the VNIR bands using multi-temporal images.