Deep reinforcement learning (DRL) methods have emerged as a feasible solution for addressing the power resource allocation problem in ultra-dense small-cell networks (UDSCNs). In this paper, we propose a novel actor-critic-based low-coupling policy optimization (LCPO) framework. Our framework aims to achieve practicality by employing a design that consists of training and execution modules with low coupling. By adopting policy optimization methods, including advantage actor-critic (A2C) and proximal policy optimization (PPO) with statedependent exploration (SDE) technique, LCPO demonstrates stable performance. In this study, we define the research problem of power resource allocation in UDSCNs and present the mathematical algorithm employed in the LCPO framework. We compare the performance of LCPO with other algorithms, such as deep deterministic policy gradient (DDPG) and fractional programming (FP) algorithms. Through extensive simulations, our proposed LCPO framework outperforms DDPG and FP algorithms in terms of both performance and execution time. Furthermore, to provide an up-to-date overview of the current state-of-the-art, we incorporate recent research papers in the field. The inclusion of these papers enhances the relevance of our study and allows readers to gain insights into the latest advancements in power resource allocation in UDSCNs. The results of our research highlight the effectiveness of the LCPO framework in addressing the power resource allocation problem in UDSCNs. The proposed framework offers superior performance compared to existing algorithms, making it a promising solution for optimizing power allocation in UDSCNs.INDEX TERMS Actor-critic, power allocation, policy optimization, small-cell networks.