For future 3D TV broadcasting systems and navigation applications, it is necessary to have accurate stereo matching which could precisely estimate depth map from two distanced cameras. In this paper, we first suggest a trinary cross color (TCC) census transform, which can help to achieve accurate disparity raw matching cost with low computational cost. The two-pass cost aggregation (TPCA) is formed to compute the aggregation cost, then the disparity map can be obtained by a range winner-take-all (RWTA) process and a white hole filling procedure. To further enhance the accuracy performance, a range left-right checking (RLRC) method is proposed to classify the results as correct, mismatched, or occluded pixels. Then, the image-based refinements for the mismatched and occluded pixels are proposed to refine the classified errors. Finally, the image-based cross voting and a median filter are employed to complete the fine depth estimation. Experimental results show that the proposed semi-global stereo matching system achieves considerably accurate disparity maps with reasonable computation cost.