With the development of robotics and wearable devices, there is a need for information processing under the assumption that an agent itself is mobile. Especially, understanding an acoustic environment around an agent is an important issue. In this paper, we solve a task in which a moving agent estimates the Direction of Arrival (DoA) of the surrounding sound sources. To this end, we propose a novel training method, Trajectory-based Direction Selection (TDS). In TDS, a mixture of binaural audio recorded by two agents and their trajectories are given as input to a network. Then, the network is trained to estimate the DoA of surrounding sounds that correspond to each agent's trajectory separately. By corresponding the agent's trajectory to the binaural audio with TDS, we can estimate DoAs of multiple sounds even with binaural audio as audio input, which has not been realized by sound-only methods. In simulated environments covering both single and multiple sources, our method outperforms existing DoA estimation methods.INDEX TERMS Audio processing, Direction of Arrival estimation, embodied agents, multi-modal learning