With the urban air mobility (UAM) quickly evolving, the great demand for public airborne transit and deliveries, besides creating a big market, will result in a series of technical, operational, and safety problems. This paper addresses the strategic conflict issue in low-altitude UAM operations with multi-agent reinforcement learning (MARL). Considering the difference in flight characteristics, the aircraft performance is fully integrated into the design process of strategic deconfliction components. With this concept, the multi-resolution structure for the low-altitude airspace organization, Gaussian Mixture Model (GMM) for the speed profile generation, and dynamic separation minima enable efficient UAM operations. To resolve the demand and capacity balancing (DCB) issue and the separation conflict at the strategic stage, the multi-agent asynchronous advantage actor-critic (MAA3C) framework is built with mask recurrent neural networks (RNNs). Meanwhile, variable agent number, dynamic environments, heterogeneous aircraft performance, and action selection between speed adjustment and ground delay can be well handled. Experiments conducted on a developed prototype and various scenarios indicate the obvious advantages of the constructed MAA3C in minimizing the delay cost and refining speed profiles. And the effectiveness, scalability, and stabilization of the MARL solution are ultimately demonstrated.