Abstract-In turbo decoding of product codes, we propose an algorithm implementation, based on the Chase-Pyndiah algorithm, which exhibits a modular, simple structure with fine grain parallelism. It is implemented into deep pipelined architectures, including an interleaving block decoding scheme, with good potential on FPGAs and MP-SoCs targets. We include an evaluation of the essential parameters of those architectures, which are situated in a different area of the block turbo decoder implementation design space 1 .