A fundamental question in any peer-to-peer ride-sharing system is how to, both effectively and efficiently, meet the request of passengers to balance the supply and demand in real time. On the passenger side, traditional approaches focus on pricing strategies by increasing the probability of users' call to adjust the distribution of demand. However, previous methods do not take into account the impact of changes in strategy on future supply and demand changes, which means drivers are repositioned to different destinations due to passengers' calls, which will affect the driver's income for a period of time in the future. Motivated by this observation, we make an attempt to optimize the distribution of demand to handle this problem by learning the long-term spatio-temporal values as a guideline for pricing strategy. In this study, we propose an offline deep reinforcement learning based method focusing on the demand side to improve the utilization of transportation resources and customer satisfaction. We adopt a spatio-temporal learning method to learn the value of different time and location, then incentivize the ride requests of passengers to adjust the distribution of demand to balance the supply and demand in the system. In particular, we model the problem as a Markov Decision Process (MDP). The problem is solved in two steps: 1) based on historical trip data from the ride-hailing platform, we propose a deep reinforcement learning based method with constrained actions to summarize demand and supply patterns into a Spatio-Temporal network, 2) to solve the budget constraint problem in the spatio-temporal incentive, we formulate an integer programming step in real-time policy learning process, where each state-action pair is valued in terms of immediate reward, future gains and discount budget. Through experiments, we demonstrate that our proposed approach can deliver improvement on the marketplace efficiency.