Depth-map merging approaches have become more and more popular in multi-view stereo (MVS) because of their flexibility and superior performance. The quality of depth map used for merging is vital for accurate 3D reconstruction. While traditional depth map estimation has been performed in a discrete manner, we suggest the use of a continuous counterpart. In this paper, we first integrate silhouette information and epipolar constraint into the variational method for continuous depth map estimation. Then, several depth candidates are generated based on a multiple starting scales (MSS) framework. From these candidates, refined depth maps for each view are synthesized according to path-based NCC (normalized cross correlation) metric. Finally, the multiview depth maps are merged to produce 3D models. Our algorithm excels at detail capture and produces one of the most accurate results among the current algorithms for sparse MVS datasets according to the Middlebury benchmark. Additionally, our approach shows its outstanding robustness and accuracy in free-viewpoint video scenario.