The integration of electronic reconnaissance platforms and payloads in the electromagnetic and spatial domains of cognition is necessary for the successful execution of cognitive electronic reconnaissance missions. However, in partially observable environments, platforms and payloads interact in separate operational domains, making it a challenge for the model. The authors create a theoretical model of cognitive electronic reconnaissance, present an analysis method of model elements and derives adequate requirements for optimal radiation source sorting or recognition. Then, a 3D cognitive reconnaissance simulator ‘Scouter’ to simulate the dynamic interaction process of space electronic reconnaissance missions is developed and a deep reinforcement learning framework to optimise the electronic reconnaissance strategies of Unmanned Aerial Vehicle (UAV) is designed. For the first time, a method for judging reconnaissance area coverage using a behavioural angle is proposed. This method enables a deep reinforcement learning‐based UAV with better reconnaissance behaviours and to realise the stable targeting of the target by the receiving antenna beam during reconnaissance. Moreover, the fast convergence of UAV agent training is ensured through the normalisation of partially observable status and reward shaping. Verified by adequate numerical simulations, a trained UAV can complete a 100 reconnaissance cycles mission with a completion rate of 91.9%. Even after the reconnaissance cycles are prolonged to 600, the mission completion rate can still be maintained at 78.5%. The most satisfying aspect is that the UAV comprehends the idea of reconnaissance and behaves in a way that is typical of it, such as by reciprocating or circling the edge of the radar's detecting power.