With efficient load management, in peak times, customers can reduce their consumption, shift the shiftable loads to proper times or even store the electrical energy in the low prices range. To achieve these effects, the dynamic pricing policy is one of the most effective ways to encourage users to change their consumption patterns. However, because of the uncertainty of electricity consumption, it is a complicated problem to determine an optimal pricing policy. For this purpose, in this study, a deep contextual bandit algorithm is considered to solve this problem, which uses a deep neural network to learns the context and the associated reward. Existing works on dynamic pricing policy used some limited datasets and assumptions, which may be contrary to the nature of the real-world markets. Due to the different demands of the consumers, which change throughout the year, and the existence of multi pricing agents in future markets, it is important to have a good market structure, method, and especially datasets which representative of all possible demands.The simulations were performed in different system models. Results show that the proposed algorithm can improve the system's reliability, reduce energy cost, and control the power system's ramp rate. Especially in multiagent markets with multiobjective costs, we can adjust the pricing model based on the market strategy to satisfy customers or reduce energy consumption.