The deregulation and liberalization of the energy market in the 1990s prompted short-term electricity trading, allowing energy markets to produce net output over a range of time periods as a result of this decentralized system, most commonly minutes to days ahead of time. The energy industry urgently requires a system that has undergone substantial modernization in place to handle a variety of issues, including the current climate, renewable resources, and the energy framework. In this dissertation, we investigate a deep reinforcement learning framework for both wholesale and local energy trading, which probes the challenge of RL to optimize the real-world problem in the energy exchange. First, we introduce the MB-A3C algorithm for day-ahead energy bidding to reduce WPP's costs. Also, we have illustrated that our model can generate a strategy that obtains a more than 15% reduction in average cost per day in Denmark and Sweden (Nord Pool dataset). Second, the MB-A3C3 approach is carried out and conducted on a large-scale, real-world, hourly 20122013 dataset of 300 households in Sydney, Australia. When internal trade (trading among houses) increased and external trade (trading to the grid) decreased, our multiple agent RL (MB-A3C3) significantly lowered energy bills by 17%. In closing the gap between real-world and theoretical problems, the algorithms herein aid in reducing wind power production costs and customers' electricity bills.