This paper proposes an unsupervised approach to construct a deep learning based stereo matching method using single-view videos (SMV). From videos, a set of corresponding points are computed between images, and image patches that center at the computed points are extracted. Negative and positive samples constitute a dataset to train a similarity network that is then used as a matching cost function. In addition, we propose a local-global matching cost network that exploits the first feature maps (local features) accompanying with last feature maps (global features) as output feature of the proposed network. The concatenated features are connected to full-connected layers and the network outputs a similarity measure of an image patch pair as a matching cost. Computed matching costs are aggregated using semiglobal matching and cross-based cost aggregation, followed by sub-pixel interpolation, left-right consistency check, median and bilateral filtering. We evaluate the proposed stereo matching methods using popular stereo matching datasets, including KITTI 2012 and 2015, and Middlebury. We submit the disparity maps to their benchmark servers to evaluate the performance of SMV. We also compared the generalization of SMV and baseline methods using the training sets of the three datasets. The benchmark results show that SMV is the most accurate method among unsupervised approach, and it even outperforms several deep learning based stereo matching using supervised manner. The evaluation results of generalization show that SMV is comparative with the baseline method, MC-CNN, which is trained with supervision. INDEX TERMS Stereo matching, unsupervised learning, video extraction.