This paper investigates various multi-agent reinforcement learning (MARL) techniques for designing grant-free random access (RA) schemes suitable for low-complexity, low-power battery-operated devices in massive machine-type communication (mMTC). Previous studies on RA with MARL have shown limitations in terms of scalability and suitability for mMTC. To address scalability and practicality of the proposed methods, we examine the impact of excluding agent identification in the observation vector of each agent on network performance. We employ value decomposition networks (VDN) and QMIX algorithms with parameter sharing (PS) and compare their policies with the deep recurrent Qnetwork (DRQN). Our simulation results demonstrate that the MARL-based RA schemes can achieve a better throughput-fairness trade-off between agents without having to condition on the agent identifiers. We also present a correlated traffic model, which is more descriptive of mMTC scenarios, and show that the proposed algorithm can easily adapt to traffic non-stationarities. Moreover, the robustness of the proposed method in terms of scalability is also shown through simulations.INDEX TERMS Massive machine-type communications, MARL, reinforcement learning, grant-free random access, scalability.1 This article has been accepted for publication in IEEE Transactions on Machine Learning in Communications and Networking. This is the author's version which has not been fully edited an content may change prior to final publication.