In this paper, the problem of joint unmanned aerial vehicle (UAV) trajectory planning and loworbit satellites (LEO-Sats) selection in space-air-ground integrated networks (SAGIN) will be investigated. This problem is of utmost importance when SAGIN is exploited for post-disaster relief services, where ground base stations (GBSs) within the post-disaster area are completely damaged or malfunctioned. In this scenario, UAV will provide wireless connectivity for the victims, while LEO-Sats will relay the UAV data to the nearest survival GBS. UAV trajectory should be optimized to maximize the collected data from the victims subject to its limited battery capacity, while it should jointly select the best LEO-Sat in each visited location within its optimized trajectory. The selected LEO-Sat should maximize UAV's achievable data rate while maintaining a long remining visible time to minimize frequent LEO-Sats' handovers. In this paper, an online learning approach using multi-armed bandit (MAB) models will be proposed to address this highly dynamic problem. As LEO-Sat selection should be performed after UAV arrives at a dedicated location in its optimized trajectory, the problem is divided into two MAB stages. In the first stage, the battery constraint UAV trajectory optimization is modeled as budget constraint MAB (BC-MAB) game using BC-upper confidence bound (BC-UCB) algorithm. In the second stage, LEO-Sat selection in each visited location is modeled as contextual MAB with variable arms (CMAB-VA) game using LinUCB-VA algorithm. Numerical analysis confirms the superior performance of the proposed approach over candidate benchmarks.INDEX TERMS Unmanned Areial Vehicles (UAV), Low Earth Orbit (LEO) Satellite, Contextual Bandit, Space Air Ground Integrated (SAGIN)