In heterogeneous multicore systems, the memory subsystem plays a critical role, since most core-to-core communications are conducted through the main memory. Memory efficiency has a substantial impact on system performance. Although memory traffic from multimedia cores generally manifests high row-buffer locality, which is beneficial to memory efficiency, the locality is often lost as memory streams are forwarded through networks-on-chip (NoC). Previous studies have discussed the techniques that improve memory visibility to reveal scattered row-buffer hit opportunities to the memory scheduler. However, extending local memory visibility introduces little benefit after the locality has been severely diluted. As the alternative approach, preserving row-buffer locality in the NoC has not been well explored. What is worse, it remains to be studied how to perform network traffic scheduling with the awareness of both memory efficiency and quality-of-service (QoS). In this article, we propose a router design with embedded row-index caches to enable locality-aware packet forwarding. The proposed design requires minor modifications to existing router microarchitecture and can be easily implemented with priority arbiters to integrate QoS support. Extensive evaluations show that the proposed design achieves higher memory efficiency than prior memory-aware routers, in addition to providing QoS support. On basis of extant QoS-aware routers, locality-aware forwarding helps to increase row-buffer hits by 58.32% and reduce memory latency by 14.45% on average. It also introduces a net reduction in DRAM and NoC energy cost by 27.82%.