This paper presents an in-depth study and analysis of the model of English writing using artificial intelligence algorithms of neural networks. Based on word vectors, the unsupervised disambiguation, and clustering of multimedia contexts extracted from massive online videos, the disambiguation accuracy reaches over 0.7, and the resulting small-scale multimedia context set can cover up to 90% of vocabulary learning tasks; user experiments show that the multimedia context learning system based on this method can improve the effectiveness and experience of ESL vocabulary learning, as well as the long-term word sense memory of learners. The results are 30% better. Based on the dependency grammatical relations and semantic metrics of collocations on a large-scale professional corpus, we established a collocation intention description and retrieval method in line with users’ linguistic cognition and doubled the usage rate of collocation retrieval on the actual deployment system after half a year, becoming a user “sticky” ESL writing aid, and further defined style. Dictionaries only provide basic lexical definitions, and, even if supported by example sentences, they still cannot meet the needs of ESL authors in terms of expressive accuracy and richness. However, the current machine translation is based on the black box deep neural network construction, and its translation process is not understandable and interactive. Among the three algorithmic models constructed in this paper, the multitask learning model outperforms the conditional random field model and the LSTM-CRF model because the multitask learning model with auxiliary tasks solves the problem of sparse data to a certain extent, allowing the model to be trained more adequately in the case of uneven label distribution, and thus performs better than other models in the task of grammatical error detection.