Automated compiler analyses and transformation techniques aim at improving design productivity of the mapping process of applications expressed in high-level programming languages to FPGAs. These transformations allow a compiler tool to reduce the number of design cycles and eliminate tedious and error-prone low-level transformations required in this mapping process, while still leading to good designs. Scalar replacement, also known as register promotion, is a very important data-oriented transformation that leads to designs that reduce the number of external memory accesses, and thus reduce execution time, at the expense of storage resource's. In this article we present a combination of loop transformation techniques, namely loop unrolling, loop splitting, and loop interchange with scalar replacement to enable partial data reuse on computations expressed by tightly nested loops pervasive in image processing algorithms. We describe a performance modelling in the presence of partial data reuse. Our experimental results reveal that our model accurately captures the non-trivial execution effects of pipelined implementations in the presence of partial data reuse due to the need to fill-up data buffers. The model thus allows a compiler to explore a large design space with high accuracy, ultimately allowing quickly it to find better designs than designs with limited manual search or brute-force approaches.