The integration of space and air components considering satellites and unmanned aerial vehicles (UAVs) into terrestrial networks in a space-terrestrial integrated network (STIN) has been envisioned as a promising solution to enhancing the terrestrial networks in terms of fairness, performance, and network resilience. However, employing UAVs introduces some key challenges, among which backhaul connectivity, resource management, and efficient three-dimensional (3D) trajectory designs of UAVs are very crucial. In this paper, low-Earth orbit (LEO) satellites are employed to alleviate the backhaul connectivity issues with UAV networks, where we address the problem of jointly determining backhaul-aware 3D trajectories of UAVs, resource management, and associations between users, satellites and base stations (BSs) in an STIN while satisfying ground users' quality-of-experience requirements and provisioning fairness concerning users' data rates. The proposed approach maximizes a novel objective function with joint consideration for BS's load and fairness, which can be categorized as a non-deterministic polynomial time hard (NPhard) problem. To tackle this issue, we leverage a reinforcement learning framework, in which our problem is modeled as a multi-armed bandit problem. Accordingly, BSs learn the environment and its dynamics and then make decisions under an upper confidence bound based method. Simulation results show that our proposed approach outperforms the benchmark methods in terms of fairness, throughput, and load.