Peer-to-peer (P2P) energy trading system has the ability to completely revolutionize the current household energy system by sharing energy among residents. As the number of customers employing distributed energy resources (DERs) such as solar rooftops increase, innovation in the double auction market (DA) system is becoming more significant. In this paper, a novel model-based asynchronous advantage actor-centralized-critic with communication (MB-A3C3) approach is carried out. The model is conducted on a large scale real-world hourly 2012-2013 dataset of 300 households in Sydney having rooftop solar systems installed in New South Wales (NSW), Australia. Results reveal that the MB-A3C3 approach outperforms other reinforcement learning methods (MADDPG and A3C3), producing lower community energy bills for 300 households. In closing the gap between the real-world and theoretical problems, the algorithms herein aid in reducing customers' electricity bills.INDEX TERMS peer-to-peer energy trading, model-based reinforcement learning, multi-agent reinforcement learning, deep learning approach