The high level of autonomy and intelligence that is envisioned in sixth generation (6G) networks necessitates the development of learning-aided solutions, especially in cases in which conventional Channel State Information (CSI)-based network processes introduce high signaling overheads. Moreover, in wireless topologies characterized by fast varying channels, timely and accurate CSI acquisition might not be possible and the transmitters (CSIT) only have statistical CSI available. This work focuses on the appropriate selection of relaying mode in a cooperative network, comprising a single information source, one buffer-aided (BA) relay with full-duplex (FD) capabilities, and a single destination. Here, prior to each transmission, the relay should select to operate either in FD mode with power control, or, resort to half-duplex (HD) relaying when excessive self-interference (SI) arises. Targeting the selection of the best relaying mode, we propose an FD/HD mode selection mechanism, namely multi-armed bandit-aided mode selection (MABAMS), relying on reinforcement learning and the processing of acknowledgements/negative-acknowledgements (ACK/NACK) packets for acquiring useful information on channel statistics. As a result, MABAMS does not require continuous CSI acquisition and exchange and nullifies the negative effect of outdated CSI. The proposed algorithm's average throughput performance is evaluated, highlighting a performance-complexity trade-off over alternative solutions, based on pilot-based channel estimation that result in spectral and energy costs while obtaining instantaneous CSI.INDEX TERMS 6G, full-duplex, buffer-aided relays, multi-armed bandits (MAB), relay mode selection, reinforcement learning.