Clustering L0 buffers is effective for energy reduction in the instruction memory hierarchy of embedded VLIW processors. However, the efficiency of the clustering depends on the schedule of the target application. Especially in heterogeneous or data clustered VLIW processors, determining energy efficient scheduling is more constraining.This article proposes a realistic technique supported by a tool flow to explore operation shuffling for improving generation of L0 clusters. The tool flow explores assignment of operations for each cycle and generates various schedules. This approach makes it possible to reduce energy consumption for various processor architectures. However, the computational complexity is large because of the huge exploration space. Therefore, some heuristics are also developed, which reduce the size of the exploration space while the solution quality remains reasonable. Furthermore, we also propose a technique to support VLIW processors with multiple data clusters, which is essential to apply the methodology to real world processors.The experimental results indicate potential gains of up to 27.6% in energy in L0 buffers, through operation shuffling for heterogeneous processor architectures as well as a homogeneous This work is supported by JSPS Grant-in-Aid for JSPS Fellows #17.9887. Based on Kobayashi, Y., Jayapala, M., Raghavan, P., Catthoor, F., and Imai, M. Operation shuffling for low energy L0 cluster generation on heterogeneous VLIW processors.