For real-time tasks, cache behavior must be constrained via cache locking or predicted by WCET analysis. Since the former gives up energy efficiency for predictability, this paper proposes a novel code optimization that reduces the miss rate of unlocked instruction caches and, provenly, does not increase the WCET. We optimized the 37 programs from the Mälardalen WCET benchmark for 36 cache configurations and two technologies. By exploiting software prefetching on top of on-demand fetching, we reduced the memory's contribution to the energy consumption (by 11.2%), to the average case execution time (by 10.2%), and to the WCET (by 17.4%).