Device-to-device (D2D) communication offering a direct communication channel between devices has been introduced as an alternative communication technique for next generation wireless networks, which aims to alleviate the workload traditionally shouldered by base stations (BS) in cellular systems. However, D2D pairs encountering connectivity issues or requiring extended communication ranges will necessitate the involvement of a relay node (RN) to facilitate communication. For the cases wherein an efficient communication with the target device is only possible over a relay device, finding the candidate relay among source-relay-destination devices with the best link availability and optimized end-to-end throughput are challenges to be considered in relay assisted D2D communication. Despite the plethora of studies on relay selection in D2D communication, there persists a need for a method that systematically integrates multiple disruptive factors inherent in wireless channels. While Reinforcement Learning (RL) has primarily found application in resource management tasks like power control, resource block (RB) availability, and spectrum allocation, its utilization for relay selection in D2D communication within dynamic wireless environments remains largely unexplored. In this paper, we propose to use multi-agent reinforcement learning based relay selection (MARS) in which the source device and/or pairing devices act as learning agents. The resource selection agent (RSA), the link agent (LA), and the transmission agent (TA) are involved cooperatively in the MARS method to find the optimum relay. Source nodes in D2D pairs iteratively update their strategies through interactions with the wireless environment and other devices in order to maximize its cost function to select the most convenient relay in the multi hop D2D communication scenario. The MARS method takes into account the combined effect of SINR, link reliability, and throughput values for the estimation of cost function order to select the optimum relay node. We have performed extensive simulations for different device density scenarios in wireless environment and compared the performance of our MARS method with SINR based relay selection approach. The findings show that our relay selection approach for D2D communication outperforms SINR-based methods in terms of end-to-end link reliability and throughput performance.