Accurately predicting drug–target interactions is a critical yet challenging task in drug discovery. Traditionally, pocket detection and drug–target affinity prediction have been treated as separate aspects of drug–target interaction, with few methods combining these tasks within a unified deep learning system to accelerate drug development. In this study, we propose EMPDTA, an end-to-end framework that integrates protein pocket prediction and drug–target affinity prediction to provide a comprehensive understanding of drug–target interactions. The EMPDTA framework consists of three main modules: pocket online detection, multimodal representation learning for affinity prediction, and multi-task joint training. The performance and potential of the proposed framework have been validated across diverse benchmark datasets, achieving robust results in both tasks. Furthermore, the visualization results of the predicted pockets demonstrate accurate pocket detection, confirming the effectiveness of our framework.