The English for Academic Purposes (EAP) is pivotal for scholarly communication; however, it poses significant challenges for non-native English speakers. Recently, Large Language Models (LLMs) have been extensively utilized in EAP to assist with writing tasks. EAP writing assistance typically encompasses several downstream tasks in natural language processing (NLP), such as Grammatical Error Correction (GEC). Non\-etheless, some studies have revealed that the performance of LLMs in GEC tasks is inferior to traditional GEC solutions. To explore the capabilities of LLMs more thoroughly in aspects like deep semantic and syntactic structures, this study aims to rigorously assess the performance of LLMs in the Sentence-level Revision (SentRev) task. We designed three sets of meticulous experiments to evaluate the efficacy of different LLMs. The first experiment assessed LLMs using prompts in ten different languages, finding that the SentRev performance of LLMs was heavily influenced by the language of the prompt and the quality of the input text. The second experiment investigated the performance of English LLMs with minimal prompting in the SentRev task, yet the results showed no significant changes, contradicting some prior studies. In the third experiment, we devised an innovative and straightforward method that significantly enhanced the performance of multiple LLMs by integrating academic phrases from the Formulaic Language (FL) Academic Phrasebank\footnote{\url{https://www.phrasebank.manchester.ac.uk/}}, thus overcoming the performance limitations imposed by different languages on LLMs. Additionally, our study highlights the deficiencies in existing evaluation benchmarks and suggests that higher-level, discourse-based EAP text evaluation benchmarks merit deeper exploration.