UAV-assisted communication facilitates efficient data collection from IoT nodes by exploiting UAVs’ flexible deployment and wide coverage capabilities. In this paper, we consider a scenario in which UAVs equipped with high-precision sensors collect sensing data from ground terminals (GTs) in real-time over a wide geographic area and transmit the collected data to a ground base station (BS). Our research aims to jointly optimize the trajectory scheduling and the allocation of collection time slots for multiple UAVs, to maximize the system’s data collection rates and fairness while minimizing energy consumption within the task deadline. Due to UAVs’ limited sensing distance and battery energy, ensuring timely data processing in target areas presents a challenge. To address this issue, we propose a novel constraint optimization-based deep reinforcement learning–Lagrangian UAV real-time data collection management (CDRLL—RDCM) framework utilizing centralized training and distributed execution. In this framework, a CNN–GRU network units extract spatial and temporal features of the environmental information. We then introduce the PPO–Lagrangian algorithm to iteratively update the policy network and Lagrange multipliers at different time scales, enabling the learning of more effective collaborative policies for real-time UAV decision-making. Extensive simulations show that our proposed framework significantly improves the efficiency of multi-UAV collaboration and substantially reduces data staleness.