Recently, multi-agent policy gradient (MAPG) methods witness vigorous progress. However, there is a discrepancy between the performance of MAPG methods and state-of-the-art multi-agent value-based approaches. In this paper, we investigate the causes that hinder the performance of MAPG algorithms and present a multiagent decomposed policy gradient method (DOP). This method introduces the idea of value function decomposition into the multi-agent actor-critic framework. Based on this idea, DOP supports efficient off-policy learning and addresses the issue of centralized-decentralized mismatch and credit assignment in both discrete and continuous action spaces. We formally show that DOP critics have sufficient representational capability to guarantee convergence. In addition, empirical evaluations on the StarCraft II micromanagement benchmark and multi-agent particle environments demonstrate that our method significantly outperforms state-of-the-art value-based and policy-based multi-agent reinforcement learning algorithms. Demonstrative videos are available at https://sites.google.com/view/dop-mapg/.
BackgroundPoor sleep quality has become a common health problem encountered by college students.MethodsHealth belief scale (HBS), physical activity rating scale (PARS-3), mobile phone addiction tendency scale (MPATS) and Pittsburgh sleep quality index (PSQI) were adopted to analyze the data collected from survey questionnaires, which were filled out by 1,019 college students (including 429 males and 590 females) from five comprehensive colleges and universities from March 2022 to April 2022. The data collected from survey questionnaires were analyzed using SPSS and its macro-program PROCESS.Results(1) Health belief, physical activity, mobile phone addiction and sleep quality are significantly associated with each other (P < 0.01); (2) physical activity plays a mediating role between health belief and sleep quality, and the mediating effects account for 14.77%; (3) mobile phone addiction can significantly moderate the effect size of health belief (β = 0.062, p < 0.05) and physical activity (β = 0.073, P < 0.05) on sleep quality, and significantly moderate the effect size of health belief on physical activity (β = −0.112, p < 0.001).ConclusionThe health belief of college students can significantly improve their sleep quality; college students’ health belief can not only improve their sleep quality directly, but also improve their sleep quality through physical activity; mobile phone addiction can significantly moderate the effect size of health belief on sleep quality, the effect size of health belief on physical activity, and the effect size of physical activity on sleep quality.
We study deep reinforcement learning (RL) algorithms with delayed rewards. In many real-world tasks, instant rewards are often not readily accessible or even defined immediately after the agent performs actions. In this work, we first formally define the environment with delayed rewards and discuss the challenges raised due to the non-Markovian nature of such environments. Then, we introduce a general off-policy RL framework with a new Q-function formulation that can handle the delayed rewards with theoretical convergence guarantees. For practical tasks with high dimensional state spaces, we further introduce the HC-decomposition rule of the Q-function in our framework which naturally leads to an approximation scheme that helps boost the training efficiency and stability. We finally conduct extensive experiments to demonstrate the superior performance of our algorithms over the existing work and their variants.Preprint. Under review.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.