The booming growth of the internet of things has brought about widespread deployment of devices and massive amounts of sensing data to be processed. Federated learning (FL)-empowered mobile edge computing, known for pushing artificial intelligence to the network edge while preserving data privacy in learning cooperation, is a promising way to unleash the potential information of the data. However, FL’s multi-server collaborative operating architecture inevitably results in communication energy consumption between edge servers, which poses great challenges to servers with constrained energy budgets, especially wireless communication servers that rely on battery power. The device-to-device (D2D) communication mode developed for FL turns high-cost and long-distance server interactions into energy-efficient proximity delivery and multi-level aggregations, effectively alleviating the server energy constraints. A few studies have been devoted to D2D-enabled FL management, but most of them have neglected to investigate server computing power for FL operation, and they have all ignored the impact of dataset characteristics on model training, thus failing to fully exploit the data processing capabilities of energy-constrained edge servers. To fill this gap, in this paper we propose a D2D-assisted FL mechanism for energy-constrained edge computing, which jointly incorporates computing power allocation and dataset correlation into FL scheduling. In view of the variable impacts of computational power on model accuracy at different training stages, we design a partite graph-based FL scheme with adaptive D2D pairing and aperiodic variable local iterations of heterogeneous edge servers. Moreover, we leverage graph learning to exploit the performance gain of the dataset correlation between the edge servers in the model aggregation process, and we propose a graph-and-deep reinforcement learning-based D2D server pairing algorithm, which effectively reduces FL model error. The numerical results demonstrate that our designed FL schemes have great advantages in improving FL training accuracy under edge servers’ energy constraints.