Backward projection is one of the most time-consuming steps in method-based iterative reconstruction computed tomography. The 3D backprojection memory access pattern is potentially enough regular to exploit efficiently the computation power of acceleration boards based on GPU or FPGA. This paper proposes an OpenCL acceleration of the voxel-driven 3D back-projection algorithm on an Arria 10 FPGA. This design flow is based initially on an offline memory access analysis, then iteratively on a performance analysis of each new implementation represented on a Berkeley Roofline model. By taking advantage of the FPGAs local memory architecture, we have succeeded to design an efficient pipeline reaching maximum bandwidth with stall-free access underlining this platform's interest for memory optimization. Our design flow allowed for a significant improvement of our initial algorithm's computational intensity, resulting in better performance on FPGA. It reaches comparable performance to an embedded GPU implementation and other computed tomography algorithms on FPGAs.