Clustered architectures have been proposed as a solution to the scalability problem of wide ILP processors. VLIW architectures, being wide-issue by design, benefit significantly from clustering. Such architectures, being both statically scheduled and clustered, require specialized code generation techniques, as they require explicit Inter-Cluster Copy instructions (ICCs) be scheduled in the code stream. In this work we propose CAeSaR, a novel instruction scheduling algorithm that improves code generation for such architectures. It combines cluster assignment, instruction scheduling and inter-cluster communication reuse all in one single unified algorithm. The proposed algorithm improves performance by any phase-ordering issues among these three code generation and optimization steps. We evaluate CAeSaR on the MediabenchII and SPEC CINT2000 benchmarks and compare it against the state-ofthe-art instruction scheduling algorithm. Our results show an improvement in execution time of up to 20.3%, and 13.8% on average, over the current state-of-the-art across the benchmarks.