We consider a non-orthogonal multiple access in a random-access ALOHA system, in which each user randomly accesses one out of different time slots and send uplink packets based on power differences. In the context of an asymmetric game, we propose a non-orthogonal multiple access ALOHA system based on multi-agent reinforcement learning tools that can help each user to find its best strategies of improving the rates of successful action choices. While taking into account not only collisions, but also fading, we analyze the mean rewards of actions under general settings and focus on the case that involves two different groups of users. To characterize the behaviors of accessing strategies, we apply multi-agent action value methods that consider either greedy or non-greedy actions, combined with an acceleration gradient descent. Our results show that in the proposed system, users employing the greedy action-based methods can be randomly divided into two groups of users and increase the rates of successful action choices. Interestingly, in relatively limited channels, such greedy methods turn many of users to be with a state of barring-access. In this case, the proposed acceleration, non-greedy action methods are shown to reduce such unfairness, at a loss of successful action rates.