The design of beamforming for downlink multi-user massive multi-input multi-output (MIMO) relies on accurate downlink channel state information (CSI) at the transmitter (CSIT). In fact, it is difficult for the base station (BS) to obtain perfect CSIT due to user mobility, and latency/feedback delay (between downlink data transmission and CSI acquisition). Hence, robust beamforming under imperfect CSIT is needed. In this paper, considering multiple antennas at all nodes (base station and user terminals), we develop a multi-agent deep reinforcement learning (DRL) framework for massive MIMO under imperfect CSIT, where the transmit and receive beamforming are jointly designed to maximize the average information rate of all users. Leveraging this DRL-based framework, interference management is explored and three DRL-based schemes, namely the distributed-learning-distributed-processing scheme, partial-distributed-learning-distributed-processing, and central-learning-distributed-processing scheme, are proposed and analyzed. This paper 1) highlights the fact that the DRL-based strategies outperform the random action-chosen strategy and the delay-sensitive strategy named as sample-and-hold (SAH) approach, and achieved over 90% of the information rate of two selected benchmarks with lower complexity: the zero-forcing channel-inversion (ZF-CI) with perfect CSIT and the Greedy Beam Selection strategy, 2) demonstrates the inherent robustness of the proposed designs in the presence of channel aging. 3) conducts detailed convergence and scalability analysis on the proposed framework.