Combined with wireless power transfer (WPT) technology, mobile edge computing can provide continuous energy supply and computing resources for mobile devices, and improve their battery life and business application scenarios. This paper first designs the mobile edge computing (MEC) model of mobile devices with random mobility and hybrid access point (HAP) with data transmission and energy transmission. On this basis, the selection of target server and the amount of data offloading are taken as the learning objectives, and the task offloading strategy based on multi-agent deep reinforcement learning is constructed. Then combined with MADDPG algorithm and SAC algorithm, the problems of multi-agent environment instability and the difficulty of convergence are solved. The final experimental results show that the improved algorithm based on MADDPG and SAC has good stability and convergence. Compared with other algorithms, it has achieved good results in energy consumption, delay and task failure rate. INDEX TERMS Mobile edge computing, task offloading, wireless power transfer, multi-agent, deep reinforcement learning.
Conducting fraud transactions has become popular among e-commerce sellers to make their products favorable to the platform and buyers, which decreases the utilization efficiency of buyer impressions and jeopardizes the business environment. Fraud detection techniques are necessary but not enough for the platform since it is impossible to recognize all the fraud transactions. In this paper, we focus on improving the platform's impression allocation mechanism to maximize its profit and reduce the sellers' fraudulent behaviors simultaneously. First, we learn a seller behavior model to predict the sellers' fraudulent behaviors from the real-world data provided by one of the largest ecommerce company in the world. Then, we formulate the platform's impression allocation problem as a continuous Markov Decision Process (MDP) with unbounded action space. In order to make the action executable in practice and facilitate learning, we propose a novel deep reinforcement learning algorithm DDPG-ANP that introduces an action norm penalty to the reward function. Experimental results show that our algorithm significantly outperforms existing baselines in terms of scalability and solution quality.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.