Full-duplex relaying is an enabling technique of sixth generation (6G) mobile networks, promising tremendous rate and spectral efficiency gains. In order to improve the performance of full-duplex communications, power control is a viable way of avoiding excessive loop interference at the relay. Unfortunately, power control requires channel state information of source-relay, relaydestination and loop interference channels, thus resulting in increased overheads. Aiming to offer a low-complexity alternative for power control in such networks, we adopt reward-based learning in the sense of multi-armed bandits. More specifically, we present bandit-based power control, relying on acknowledgements/negative-acknowledgements observations by the relay. Our distributed algorithms avoid channel state information acquisition and exchange, and can alleviate the impact of outdated channel state information. Two cases are examined regarding the channel statistics of the wireless network, namely, strict-sense stationary and non-stationary channels. For the latter, a sliding window approach is adopted to further improve the performance. Performance evaluation highlights a performance-complexity trade-off, compared to optimal power control with full channel knowledge and significant gains over cases considering channel estimation and feedback overheads, outdated channel knowledge, no power control and random power level selection. Finally, it is shown that the sliding-window bandit-based algorithm provides improved performance in non-stationary settings by efficiently adapting to abrupt changes of the wireless channels.