“…NMT approaches, which achieve state-of-the-art results, are encoder-decoder methods where encoders and decoders could be of different possible architectures such as RNNS , CNNS (Gehring et al, 2016), or Transformers (Vaswani et al, 2017), which were applied successfully on the GEC task (Yuan and Briscoe, 2016;Yuan et al, 2019;Junczys-Dowmunt et al, 2018). Recent approaches, utilize pre-trained large language models and achieve state-of-the-art results (Rothe et al, 2021;Tarnavskyi et al, 2022) by only fine-tuning them, solving the data bottleneck requirement for large networks.…”