As processors migrate to multi-and many-core architectures, the role of the communication network becomes more important. Efficient communication architecture can drastically improve overall system performance. Taking into account the application behavior can facilitate system-level solutions that manage the communication cost. To address this issue, we propose a Clustered Globally Asynchronous Locally Synchronous Network-on-Chip (C-GALS NoC) communication architecture. C-GALS NoC is composed of local, synchronous clusters and a global asynchronous network. Additionally, we propose a cluster based communication-aware mapping algorithm (CAM) for mapping the application tasks to the C-GALS NoC, while minimizing the communication cost. The synergy of the C-GLAS NoC and the CAM algorithm results in a system-level mechanism that, according to our results, provides up to 2x and 3x, in performance and power improvement, respectively, in comparison with a regular GALS NoC. Finally, we demonstrate that C-GALS NoC is standard-cell compatible by synthesizing it using Design Compiler.