5G cellular IoT has several advantages compared to other access technologies, enabling operators to serve a wider area and more IoT devices. However, in the urban transportation system, a massive number of vehicles exhaust the available resources in the cell, resulting in excessive load in the 5G cellular network. This article proposes a novel reinforcement learning based V2V routing (RLbR) framework, which offloads non-realtime traffic into the V2V network and significantly relieves the load of 5G cellular network. Meanwhile, we propose a V2V routing algorithm. Specifically, the Q-values of neighbouring vehicles are firstly calculated according to the cache factor CF and energy factor EF and evaluate the quality of neighbouring vehicles. Then, the position factor PF is calculated, based on which, the vehicle forwards the data packet. In addition, an environment model is designed to accelerate the convergence of Q-table. The results show that the RLbR framework brings the highest offload rate compared to the other three frameworks, and simultaneously, the proposed algorithm improves the lifetime of the V2V network and performs well in terms of delivery ratio and average delay.