To enhance our ability to model long-range semantical dependencies, we introduce a novel approach for linguistic steganography through English translation. This method leverages attention mechanisms and probability distribution theory, known as NMT-stega (Neural Machine Translation-steganography). Specifically, to optimize translation accuracy and make full use of valuable source text information, we employ an attention-based NMT model as our translation technique. To address potential issues related to the degradation of text quality due to secret information embedding, we have devised a dynamic word pick policy based on probability variance. This policy adaptively constructs an alternative set and dynamically adjusts embedding capacity at each time step, guided by variance thresholds. Additionally, we have incorporated prior knowledge into the model by introducing a hyper-parameter that balances the contributions of the source and target text when predicting the embedded words. Extensive ablation experiments and comparative analyses, conducted on a large-scale Chinese-English corpus, validate the effectiveness of the proposed method across several critical aspects, including embedding rate, text quality, anti-steganography, and semantical distance. Notably, our numerical results demonstrate that the NMT-stega method outperforms alternative approaches in anti-steganography tasks, achieving the highest scores in two steganalysis models, NFZ-WDA (with score of 53) and LS-CNN (with score of 56.4). This underscores the superiority of NMT-stega in the anti-steganography attack task. Furthermore, even when generating longer sentences, with average lengths reaching 47 words, our method maintains strong semantical relationships, as evidenced by a semantic distance of 87.916. Moreover, we evaluate the proposed method using two metrics, Bilingual Evaluation Understudy and Perplexity, and achieve impressive scores of 42.103 and 23.592, respectively, highlighting its exceptional performance in the machine translation task.