Architecting PCM, especially MLC PCM, as main memory for MCUs is a promising technique to replace conventional DRAM deployment. However, PCM/MLC PCM suffers from long write latency and large write energy. Recent work has proposed a compiler directed dual-write (CDDW) scheme to combat the drawbacks of PCM by adopting fast or slow mode for different write operations. For large-scale loops, we observe that write instances' lifetime is very long and can only be written by the expensive slow mode. This paper proposes a write mode aware loop tiling approach to effectively reduce the lifetime of write instances and maximize the number of efficient fast writes in loops. The experimental results show that the proposed approach improves performance by 50.8% and reduces dynamic energy by 32.0% across a set of benchmarks compared to the CDDW approach on average. research interests include memory and parallelism optimization for embedded systems, software/hardware co-design, real time systems and computer security.