Abstract-Due to their flexibility and high performance, Coarse Grained Reconfigurable Array (CGRA) are a topic of increasing research interest. However, CGRAs also have the potential to achieve very high energy efficiency in comparison to other reconfigurable architectures when hardware optimizations are applied. Some of these optimizations are common for more traditional processors but can also lead to large efficiency gains for reconfigurable architectures. This paper investigates three hardware based loop optimization techniques that can significantly improve the energy efficiency of CGRAs. The three techniques are evaluated on processing kernels from the image processing domain as well as an industrial computer vision application. Energy consumption and area estimates are obtained using a CGRA synthesized with a commercial 40nm library. For the three applied techniques (zero-overhead loop accelerator, single-cycle loop support, and loop buffers) the simulation results show overall energy gains of 6.8% for zero-overhead loop support, 13.2% for ZOLA combined with single-cycle loop support and 18.3% for a combination of all optimizations.