SUMMARYThe deblocking filter in high-efficiency video coding (HEVC) has huge computational complexity because of its high content-adaptive coding structure as well as high-definition. Parallelization for it based on massively parallel architectures such as graphics processing unit becomes an urgent demand. However, a large number of conditional branches and data dependencies severely hinder its efficient parallelization. In this paper, a novel parallel optimization strategy based on graphics processing unit is presented for concurrent deblocking in HEVC/H.265 standard to improve the parallel performance. First, by reducing various conditional branches, a normalization mechanism for instruction stream based on feature vector is proposed, which improves the efficiency of boundary strength computation dramatically. The idea can also be applied to edge discrimination. Second, a parallel mechanism based on an adaptive post-correction is presented to process vertical and horizontal edges filtering concurrently, which improves the processing speed obviously, while producing negligible quality loss. Experimental results show that the strategy presented outperforms the existing state-of-the-art method with accelerating factor up to 32.