In this paper, we study a wireless powered cognitive internet of things (IoT) network, where cognitive radio (CR) and non-orthogonal multiple access (NOMA) technologies are exploited to improve spectral efficiency, and radio frequency based energy harvesting (RF-EH) technology is integrated to achieve the sustainable IoT network. To ensure the freshness of information delivery, we investigate the age of information (AoI) as a performance metric, and formulate a long-term average AoI minimization problem under energy sustainability constraint, in which the working mode and transmit power of the secondary devices (SDs) are jointly optimized. Then, we reformulate it as a decentralized Markov decision process (Dec-MDP) with continuous action space. Accordingly, a deep reinforcement learning (DRL) framework is exploited, and a multi-agent twin delayed deep deterministic policy gradient algorithm with dual action selection mechanism (MATD3-DAS) is proposed, which adopts the centralized training and decentralized execution (CTDE) framework and exploits both actor and critic networks to select actions for improving exploration ability. Simulation results show that the proposed algorithm can significantly reduce the longterm average AoI, where the decrements approach 9.58% and 52.34% compared with the MATD3 algorithm and TD3-DAS algorithm with centralized training and centralized execution (CTCE).
INDEX TERMSCognitive radio, non-orthogonal multiple access, radio frequency based energy harvesting, age of information, multi-agent deep reinforcement learning.