Addressing the issue of inadequate convergence and suboptimal accuracy in classical data-driven algorithms for coherent polarization–direction-of-arrival (DOA) estimation, a novel high-precision two-dimensional coherent polarization–DOA estimation method utilizing a sequence-embedding fusion (SEF) transformer is proposed for the first time. Drawing inspiration from natural language processing (NLP), this approach employs transformer-based multitasking text inference to facilitate joint estimation of polarization and DOA. This method leverages the multi-head self-attention mechanism of the transformer to effectively capture the multi-dimensional features within the spatial-polarization domain of the covariance matrix data. Additionally, an SEF module was proposed to fuse the spatial-polarization domain features from different dimensions. The module is a combination of a convolutional neural network (CNN) with local information extraction capabilities and a feature dimension transformation function, serving to improve the model’s ability to fuse information about features in the spatial-polarization domain. Moreover, to enhance the model’s expressive capacity, we designed a multi-task parallel output mode and a multi-task weighted loss function. Simulation results demonstrate that our method outperforms classical data-driven approaches in both accuracy and generalization, and the estimation accuracy of our method is improved relative to the traditional model-driven algorithm.