Quantization in lossy video compression may incur severe quality degradation, especially at low bit-rates. Developing post-processing methods that improve visual quality of decoded images is of great importance, as they can be directly incorporated in any existing compression standard or paradigm. We propose in this paper a two-stage method, a texture detail restoration stage followed by a deep convolutional neural network (CNN) fusion stage, for video compression artifact reduction. The first stage performs in a patch-by-patch manner. For each patch in the current decoded frame, one prediction is formed based on the sparsity prior assuming that natural image patches can be represented by sparse activation of dictionary atoms. Under the temporal correlation hypothesis, we search the best matching patch in each reference frame, and select several matches with more texture details to tile motion compensated predictions. The second stage stacks the predictions obtained in the preceding stage along with the decoded frame itself to form a tensor, and proposes a deep CNN to learn the mapping between the tensor as input and the original uncompressed image as output. Experimental results demonstrate that the proposed two-stage method can remarkably improve, both subjectively and objectively, the quality of the compressed video sequence. INDEX TERMS compression artifact reduction, convolutional neural networks, high efficiency video coding, sparse representation, temporal correlation.