Computation offloading via device-to-device communications can improve the performance of mobile edge computing by exploiting the computing resources of user devices. However, most proposed optimization-based computation offloading schemes lack self-adaptive abilities in dynamic environments due to time-varying wireless environment, continuous-discrete mixed actions, and coordination among devices. The conventional reinforcement learning based approaches are not effective for solving an optimal sequential decision problem with continuous-discrete mixed actions. In this paper, we propose a hierarchical deep reinforcement learning (HDRL) framework to solve the joint computation offloading and resource allocation problem. The proposed HDRL framework has a hierarchical actor-critic architecture with a meta critic, multiple basic critics and actors. Specifically, a combination of deep Q-network (DQN) and deep deterministic policy gradient (DDPG) is exploited to cope with the continuous-discrete mixed action spaces. Furthermore, to handle the coordination among devices, the meta critic acts as a DQN to output the joint discrete action of all devices and each basic critic acts as the critic part of DDPG to evaluate the output of the corresponding actor. Simulation results show that the proposed HDRL algorithm can significantly reduce the task computation latency compared with baseline offloading schemes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.