Shallow underground explosive source localization technology is a key technology in the field of underground space localization. The existing approaches mainly aim to improve the localization accuracy, but need to deploy enormous sensors in the monitoring area, and rely on a large number of back-end workstations to solve. These methods have the defects of considerable calculation and high time cost, and are hard to satisfy the precise and real-time requirements of onsite testing, ultimately resulting in slow localization speed and accurate localization failure. Fortunately, emerging deep reinforcement learning (DRL) can effectively solve the problem of slow search policy by modeling the source localication as a Markov decision process (MDP). Therefore, a Curiosity-Driven Deep Dueling Double Q-learning Network (C-D3QN) is subsequently proposed to solve the above MDP. The overestimation problem is solved by decoupling selection and evaluation of the bootstrap action, and the action difference is effectively increased by introducing the dueling network that separately represents state values and action advantages. Meanwhile, the exploration is jointly reinforced by an intrinsic reward outputted from the curiosity module and an extrinsic reward supplied by the environment, guaranteeing the convergence to global optimal. Finally, extensive simulation results based on the outfield experiment data show that compared with other algorithms, the proposed scheme can significantly improve exploration ability and learning speed as well as generalization and robustness. Additionally, compared to the baseline algorithm DQN, the C-D3QN algorithm can offer an improved localization accuracy as high as 99.62% and an increased localization speed of 66.23%.