Abstract-Continuous improvements in integration scale have made major microprocessor vendors to move to designs that integrate several processing cores on the same chip. Chip multiprocessors (CMPs) constitute a good alternative to traditional monolithic designs for several reasons, among others, better levels of performance, scalability, and performance/energy ratio. On the other hand, higher clock frequencies and the increasing transistor density have revealed power dissipation and temperature as critical design issues in current and future architectures. Previous studies have shown that the interconnection network of a Chip Multiprocessor (CMP) has significant impact on both overall performance and energy consumption. Moreover, wires used in such interconnect can be designed with varying latency, bandwidth, and power characteristics. In this work, we show how messages can be efficiently managed, from the point of view of both performance and energy, in tiled CMPs using a heterogeneous interconnect. Our proposal consists of two approaches. The first is Reply Partitioning, a technique that splits replies with data into a short Partial Reply message that carries a subblock of the cache line that includes the word requested by the processor plus an Ordinary Reply with the full cache line. This technique allows all messages used to ensure coherence between the L1 caches of a CMP to be classified into two groups: critical and short, and noncritical and long. The second approach is the use of a heterogeneous interconnection network composed of low-latency wires for critical messages and low-energy wires for noncritical ones. Detailed simulations of 8 and 16-core CMPs show that our proposal obtains average savings of 7 percent in execution time and 70 percent in the Energy-Delay squared Product (ED 2 P ) metric of the interconnect over previous works (from 24 to 30 percent average ED 2 P improvement for the full CMP). Additionally, the sensitivity analysis shows that although the execution time is minimized for subblocks of 16 bytes, the best choice from the point of view of the ED 2 P metric is the 4-byte subblock configuration with an additional improvement of 2 percent over the 16-byte one for the ED 2 P metric of the full CMP.Index Terms-Tiled chip multiprocessor, energy-efficient architectures, cache coherence protocol, heterogeneous on-chip interconnection network, parallel scientific applications.