This letter investigates the resource allocation problem for multiple Unmanned Aerial Vehicles (UAVs)-served Machine-to-Machine (M2M) communications. Our goal is to maximize the sum-rate of UAVs-served M2M communications by jointly considering the transmission power, transmission mode, frequency spectrum, relay selection and the trajectory of UAVs. In order to model the uncertainty of stochastic environments, we formulate the resource allocation problem to be a Markov game, which is the generalization of Markov Decision Process (MDP) for the case of multiple agents. However, owning to the UAVs mobility poses the difficulty of perceiving the environment, we propose a Long Short-Term Memory (LSTM) with Generative Adversarial Networks (GANs) framework to better track and forecast the UAVs mobility and improving the network reward. Numerical results demonstrate that the proposed framework outperforms the conventional LSTM and Deep Q-Network (DQN) algorithms.