“…In summary, multi-agent learning, particularly through advanced algorithms like LAMAPPO, CL-MADDPG, and MADDPG, offers a pathway to effectively manage the resource allocation challenges in SAGIN. These methods allow for decentralized decision-making, where each agent learns to cooperate with others, leading to improved overall network Propose RL-BA-VNA, a reinforcement learning-based algorithm for virtual network resource allocation that optimizes node embedding and prioritizes high-bandwidth virtual network requests [111] Maximize the energy efficiency of SAG IoT Networks A two-tier SAG IoT HetNets consisting of macrocell base stations and aerial base stations Energy efficiency, total system data rates, total system power consumption Propose a cluster-based HetNets energy-efficient resource allocation (CHERA) mechanism that divides the network into independent base station (BS) clusters for distributed EE optimization [112] Maximum safety SAGIN for HSRs, LEO, GEO,terrestrial base stations Data rate, bit error rate, end-to-end latency Establish a dual C-plane connection, introduce gain and handover factors for safety service prioritization, and implement a Q-learning algorithm considering train and satellite movements for resource allocation [12] Minimize the task completion time and satellite resource usage Vehicles in remote areas, edge computing-enabled LEO, MEO, GEO, ground cloud servers Task completion time, satellite resource usage, system reward performance, accuracy of offloading and caching actions Present a preclassification scheme to reduce the action space and propose a deep imitation learning (DIL)-driven offloading and caching algorithm for real-time decision-making [29] Maximum end-to-end QoE Utilize DRL to optimize the multi-domain VNE algorithm for SAGIN, incorporating network modeling, attribute setting, policy network implementation, and evaluating algorithm efficiency through simulations [59] Improve global data transmission capacity LEO, UAVs, SDN architecture Transmission capacity of SAGIN, network load balancing Predict the transmission capacity of links in SAGIN, formulate the traffic scheduling problem as a modified maximum flow problem, and employ a DRL model to make global optimal traffic scheduling decisions [60] Maximize the sum log spectral efficiency Multibeam GEO Satcom system Sum log spectral efficiency Implement an enhanced DRL algorithm based on TD3 with independent training, prioritized experience replay, scaling factor, and noise rebound to address the bound action problem, and demonstrate its superior performance through simulations compared to baseline schemes [114] Proactive SDN-based resource management considering user priorities, latency, SLA, and budget constraints in SAGIN [61] Minimize energy consumption Aircraft networks, LEO, HAPs, SDN, NFV, and MEC technologies Service request reception rate, end-to-end delay, and average energy consumption Implemented a hierarchical SAGIN-MEC structure and developed a DRL-G algorithm combining heuristics with deep reinforcement learning for resource allocation [116] Minimize latency Satellites, UAVs, terrestrial networks, eMBB and URLLC slices Network availability, latency experienced by eMBB and URLLC traffic, service costs Implement a deep reinforcement learning (DRL) approach using the deep deterministic policy gradient (DDPG) algorithm for efficient resource allocation and UAV traje...…”