The growing parallelism in most of today's applications has led to an increased demand for parallel computing in processors. General Purpose Graphics Processing Units (GPGPUs) have been used extensively to provide the necessary computation for highly parallel applications. GPGPUs generate huge volumes of network traffic between memory controllers (MCs) and cores. As a result, the network-on-chip (NoC) fabric can become a performance bottleneck, especially for memory intensive applications on GPGPUs. Traditional mesh-based NoC topologies are not suitable for GPGPUs as they possess high network latency that leads to congestion at MCs and an increase in application execution time. In this paper, we propose a novel memory-aware circuit overlay NoC that exploits characteristics of traffic in GPGPUs to eliminate router arbitration at each hop. Our experimental results show that our approach yields an improvement of 40-75% in NoC latency, 20-70% in execution time, and 10-65% in overall energy consumption compared to the state-of-the-art.