In this paper we consider a joint detection, mapping and navigation problem by an unmanned aerial vehicle (UAV) with real-time learning capabilities. We formulate this problem as a Markov decision process (MDP), where the UAV is equipped with a THz radar capable to electronically scan the environment with high accuracy and to infer its probabilistic occupancy map. The navigation task amounts to maximizing the desired mapping accuracy and coverage and to decide whether targets (e.g., people carrying radio devices) are present or not. With the numerical results, we analyze the robustness of the considered Q-learning algorithm, and we discuss practical applications.