To tackle the heterogeneous requirements of beyond 5G (B5G) and future 6G wireless networks, conventional medium access control (MAC) procedures need to evolve to enable base stations (BSs) and user equipments (UEs) to automatically learn innovative MAC protocols catering to extremely diverse services. This topic has received significant attention, and several reinforcement learning (RL) algorithms, in which BSs and UEs are cast as agents, are available with the aim of learning a communication policy based on agents' local observations. However, current approaches are typically overfitted to the environment they are trained in, and lack robustness against unseen conditions, failing to generalize in different environments.To overcome this problem, in this work, instead of learning a policy in the high dimensional and redundant observation space, we leverage the concept of observation abstraction (OA) rooted in extracting useful information from the environment. This in turn allows learning communication protocols that are more robust and with much better generalization capabilities than current baselines. To learn the abstracted information from observations, we propose an architecture based on autoencoder (AE) and imbue it into a multi-agent proximal policy optimization (MAPPO) framework. Simulation results corroborate the effectiveness of leveraging abstraction when learning protocols by generalizing across environments, in terms of number of UEs, number of data packets to transmit, and channel conditions.