Many adaptive routing protocols have been developed for Networks -on-Chip to improve the network performance by traffic reduction. In this paper, we present an adaptive routing algorithm based upon the Q-routing, which distributes traffic by a learning method in the entire network. The learning method utilizes local and global traffic information and can select the minimum latency path to the destination. Since the routing table sizes become one of the main sources of area consumption in the Q-routing algorithm, we propose a clustering approach in order to reduce the area overhead. Furthermore, this approach improves the observability of the traffic condition. Experimental results for different traffic patterns and network loads show that the proposed method achieves significant performance improvement over the Q-routing, C-routing, DBAR and Dynamic XY algorithms.