The intensive deployment of sixth-generation (6G) base stations is expected to greatly enhance network service capabilities, offering significantly higher throughput and lower latency compared to previous generations. However, this advancement is accompanied by a notable increase in the number of network elements, leading to increased power consumption. This not only worsens carbon emissions but also significantly raises operational costs for network operators. To address the challenges arising from this surge in network energy consumption, there is a growing focus on innovative energy-saving technologies designed for 6G networks. These technologies involve strategies for dynamically adjusting the operational status of base stations, such as activating sleep modes during periods of low demand, to optimize energy use while maintaining network performance and efficiency. Furthermore, integrating artificial intelligence into the network’s operational framework is being explored to establish a more energy-efficient, sustainable, and cost-effective 6G network. In this paper, we propose a small base station sleeping control scheme in heterogeneous dense small cell networks based on federated reinforcement learning, which enables the small base stations to dynamically enter appropriate sleep modes, to reduce power consumption while ensuring users’ quality-of-service (QoS) requirements. In our scheme, double deep Q-learning is used to solve the complex non-convex base station sleeping control problem. To tackle the dynamic changes in QoS requirements caused by user mobility, small base stations share local models with the macro base station, which acts as the central control unit, via the X2 interface. The macro base station aggregates local models into a global model and then distributes the global model to each base station for the next round of training. By alternately performing model training, aggregation, and updating, each base station in the network can dynamically adapt to changes in QoS requirements brought about by user mobility. Simulations show that compared with methods based on distributed deep Q-learning, our proposed scheme effectively reduces the performance fluctuations caused by user handover and achieves lower network energy consumption while guaranteeing users’ QoS requirements.