Artificial intelligence is becoming increasingly important in the air combat domain. Most air combat research now assumes that all aircraft information is known. In practical applications, however, some aircraft information, such as their position, attitude, velocity, etc., can be incorrect or impossible to obtain due to realistic limitations and sensor errors. In this paper, we propose a deep reinforcement learningbased framework for developing a model capable of performing within visual range (WVR) air-to-air combat under the conditions of a partially observable Markov decision process (POMDP) with insufficient information. To deal robustly with such a situation, we use recurrent neural networks and apply a soft actorcritic (SAC) algorithm to cope effectively with realistic limitations and sensor errors. Additionally, to raise the efficiency and effectiveness of learning, we apply the curriculum learning technique to restrict the scope of exploration in state space. Finally, simulations and experiments show that the proposed techniques can deal with practical problems caused by sensor limitations and errors in a noisy environment while also being efficient and effective in reducing the training time for learning.INDEX TERMS Air-to-air combat, limitation and error of sensors, recurrent neural network, reinforcement learning, soft actor-critic.