CNN-based stereo matching methods achieve great performance but come with high computational requirements. Pruning a CNN can reduce the complexity but may in turn lead to memory conflicts, lowering throughput. Our proposed architecture and memory mapping technique aim at reducing conflicts to exploit extremely sparse stereo matching networks. To maintain a high utilization of processing elements, we decompose the de-convolution operation into several convolution operations. The proposed architecture provides a 2.1× speed up over SCNN. Compared to the software implementation, only 0.01% performance drop is observed, so that the proposed architecture obtains state-of-the-art accuracy compared to existing sparsity aware hardware implementations.